Location Details

The workshop will be held on Columbia's Morningside Campus in 603 Schermerhorn Hall. If traveling by subway, take the 1 line to 116th street. The map below shows how to walk to Schermerhorn from the subway stop. From College Walk (the pedestrian boulevard corresponding to W. 116th Street between Broadway and Amsterdam Avenue), go up the right (east) side steps of Low Plaza, which is the large open space in front of the domed Low Library Building. Continue on the walkway that runs along the east side of the Low Library Building. Make a right as you reach the cafe terrace of Uris Hall. The red-brick, tall-windowed Schermerhorn Hall will then be on the left. Room 603 is right next to the Geology Library. Inside Schermerhorn, take the elevator or stairs to the sixth floor.

The building should be unlocked on Saturday by 9am (the workshop start time). Arriving earlier than 9 is recommended, especially for early presenters. We will try to prop the door open, but if you have any problems getting in, call or text Ryan (+1-617-800-4237).

http://library.columbia.edu/content/libraryweb/locations/geology/directions/_jcr_content/layout_par_main_1/image_v2.img.png/1476487366066.png

Food

Light breakfast, lunch (sandwiches), and a steady supply of coffee will be provided.

Dinner will be at 8pm at Dinosaur Bar-B-Que, a short walk from Columbia.

Workshop Program

Here is the full list of presenters and talk titles

Saturday, Nov. 5

Time Presenter Presentation
9:00-9:15 Ryan Abernathey, Columbia University / LDEO Introduction
9:15-9:30 Spencer Hill, UCLA & Caltech "infinite-diff" and "animal-spharm": xarray-based finite differencing and spherical harmonics
9:30-9:45 Spencer Clark, Princeton University The other "aospy": automated climate data analysis and management
9:45-10:00 Daniel Rothenberg, MIT A Pythonic Approach to Simplifying Climate Data Analysis Pipelines
10:00-10:15 Phillip Wolfram, Los Alamos National Lab Climate analysis at exascale
10:15-10:30 Jeremy McGibbon, University of Washington Lessons from Model Code
10:30-11:00 coffee break  
11:00-11:15 Julien Le Sommer, CNRS What data-structures and utilities are needed for leveraging dask and xarray for the analysis ocean model output?
11:15-11:30 Guillaume SERAZIN, IRD/LEGOS, Toulouse, France Filtering ocean dataset using dask and xarray
11:30-11:45 Joy Monteiro, Stockholm University Data management à la GOAT
11:45-12:00 Joe Hamman, University of Washington & NCAR xarray applications in hydroclimatology
12:00-12:15 Brian Rose, University at Albany Climlab: a Python toolkit for interactive, process-oriented climate modeling
12:15-12:30 Kevin Paul, NCAR A Brief Survey of Python Efforts & Interests at the National Center for Atmospheric Research
12:30-1:30 lunch  
1:30-1:45 Matthew Rocklin, Continuum Analytics Dask
1:45-2:00 Stephan Hoyer, Google Research Developer level APIs for xarray
2:00-3:00 group discussion  
3:00-4:00 test drive someone else's package  
4:00-5:00 group discussion  

Preparing your Code

In order to make the most out of our workshop, participants are encouraged to polish their code as much as possible beforehand. Through our discussions on Saturday, we will try to select one or more projects as the target of a coding sprint on Sunday. To enable others to effectively contribute to your project, try to follow best practices for open source software development. Below is a list of recommended practices for participating projects:

  • Hosting on GitHub to facilitate issues tracking, pull requests, etc.
  • Comprehensive test suite. Matt Rocklin's blog post on testing clearly explains the importance of testing to a scientific software project. Without tests, it is effectively impossible for others (or even you) to make significant contributions to your code.
  • Continuous integration of testing via an online service such as Travis CI or circleci. The use of continuous integration really facilitates collaboration, making it clear when pull requests break existing functionality.
  • Code coverage assesment. Coverage tells you which parts of your code are covered by your test suite. Coverage can be integrated with GitHub via services such as Coveralls and Codecov.
  • Comprehensive documentation. Many modern python packages use Sphinx for the documentation, which integrates with http://readthedocs.org/ via GitHub hooks.
  • Clear specification of dependencies. It's safe to assume that everyone will be using Anaconda for package management, so an environment.yml file is sufficient to specify your project's dependencies.

In addition to these general best practices, there are some specific issues related to analysis of GCM data. Most of us are aiming for somewhat universal tools (i.e. tools compatible with many different models), but in practice, we usually have a specific model we focus on and use for testing our code. Therefore, participants are highly encouraged to make some sample output from their favorite model (i.e. CAM, NEMO, MITgcm, etc.) available online in a publicly accessible location. These links will be shared with the group. If you don't have the ability to post sample data online, you can upload it to the LDEO anonymous incoming FTP server in the "aospy" directory.

Guidelines for Presentations

As discussed in our hackpad brainstorming session, a major goal of our workshop is to learn what each other is doing in order to most efficiently collaborate on building the software tools we need and avoid duplicating effort. This is the rationale for having individual presentations.

Presentations should be 15 minutes long with 5 minutes for questions. Your presentation may cover whatever aspect of your work you feel is most relevant to the goals of the workshop. For example, you might present an overview of the xarray-based python package you have developed for analyzing GCM output. Or you could talk about a particular computational challenge you want to solve to reach your scientific aims.

Here are some suggestions to make the most out of our individual presentations.

  • Keep in mind that we have an audience with many different backgrounds at this workshop, from atmospheric scientists, to oceanographers, to hydrologists, to physicists-turned-software-developers. Within these categories, experience levels also vary widely. Keep your presentation simple, clearly introducing unfamiliar concepts, acronyms, jargon, etc.
  • If you present on an analysis package, clearly state what problem your software aims to solve. If you present on a specific model or scientific problem, clearly state what obstacles you face in accomplishing your scientific goals.
  • Upload your presentation to the web somewhere (e.g. slideshare) so others can easily review it.

You may prepare your presentation in powerpoint, keynote, PDF format, or online slideshow. We will try to load all the presentations onto one laptop to save time, but since the workshop is small, we will be flexible about this.

Workshop Logistics

The workshop will be held Saturday and Sunday, November 5-6, at Schermerhorn Hall on the campus of Columbia University.

Travel

Details for travel to Columbia can be found on the University Travel Portal.

UPDATE: if you are receiving travel support, please book your own travel. You will be reimbursed at the workshop.

Tentative Schedule

Saturday, Nov. 5  
9am - 12pm Individual Presentations
12pm - 1pm Lunch
1pm - 3pm Individual Presentations
3pm - 5pm Group Discussion
7pm Group Dinner
Sunday, Nov. 6  
9am - 12pm Code Sprint
12pm - 1pm Lunch
1pm - 4pm Code Sprint
4pm - 5pm Closing Discussion

Workshop Motivation

The purpose of this workshop is to get together "in real life" a group of scientists and software developers who have, until now, mostly been interacting online. Our group started on the xarray mailing list and then moved to a hackpad.

The motivation of this workshop, and the group in general, is to build a toolkit for the analysis of ocean and atmosphere general circulation model (GCM) output based on xarray and dask.

Quoting from the summary hackpad,

Problem: Each group builds on their own (in some cases proprietary) infrastructure: NCAR has PyNIO; PCMDI has UV-CDAT; UKMO has Iris. Ideal: single "standard" toolkit for analysis, i.e. create for climate-related sciences what astropy is for astronomy-related sciences.

None of the existing platforms utilize xarray and its amazing netCDF-like named dimensions and coordinates. Given their age, it's safe to assume also that none of the aforementioned packages are built from the core (no pun intended) to support out-of-core computations. xarray provides the opportunity to do so through dask. Moreover, through dask-distributed, this has the potential of extending straightforwardly from single nodes to across nodes, thereby harnessing the full power of large computational clusters that many climate researchers have access to.

It is our determination that xarray and out-of-core are the future of climate- related computation using Python, and conversely that Python is the future of climate-related computation. This puts us in a unique position to introduce a set of tools that becomes the community standard and greatly enhances the speed and reliability of such computations and therefore of the science itself.

Workshop Registration

The Columbia AOSPY workshop will be held Saturday-Sunday, Nov. 5-6 at Columbia University.

Please register for the workshop via the google form below. You will have an opportunity to request travel support in Section 2 of the form.

Registration is limited to invited participants. For any questions, please email Ryan Abernathey.