Xarray Tips and Tricks

Xarray Tips and Tricks

Build a multi-file dataset from an OpenDAP server

One thing we love about xarray is the open_mfdataset function, which combines many netCDF files into a single xarray Dataset.

But what if the files are stored on a remote server and accessed over OpenDAP. An example can be found in NOAA's NCEP Reanalysis catalog.

more ...

Binder for Reproducible Research

Binder for Reproducible Research

This final lesson is concerned with the topic of reproducibility. Nearly everyone agrees that reproducibility is an important principle for science: if results are not reproducible, they are not valid.

But how do we achieve reproducibility in practice? In computational / data science, a particular analysis, calculation, or notebook may depend on hundreds of different software packages, each with many different versions. Reproducibility of our results depends on having the correct version.

more ...

Maps with Cartopy

Maps in Scientific Python

Making maps is a fundamental part of geoscience research. Maps differ from regular figures in the following principle ways:

  • Maps require a projection of geographic coordinates on the 3D Earth to the 2D space of your figure.
  • Maps often include extra decorations besides just our data (e.g. continents, country borders, etc.)
  • more ...

Dask for Parallel Computing and Big Data

Dask for Parallel Computing in Python

In past lectures, we learned how to use numpy, pandas, and xarray to analyze various types of geoscience data. In this lecture, we address an incresingly common problem: what happens if the data we wish to analyze is "big data"

Aside: What is "Big Data"?

There is a lot of hype around the buzzword "big data" today. Some people may associate "big data" with specific sortware platforms (e.g. "Hadoop", "spark"), while, for others, "big data" means specific machine learning techniques. But I think wikipedia's definition more ...




Xarray Fundamentals

Xarray for multidimensional gridded data

In last week's lecture, we saw how Pandas provided a way to keep track of additional "metadata" surrounding tabular datasets, including "indexes" for each row and labels for each column. These features, together with Pandas' many useful routines for all kinds of data munging and analysis, have made Pandas one of the most popular python packages in the world.

more ...

Groupby in Pandas

Pandas: Groupby

groupby is an amazingly powerful function in pandas. But it is also complicated to use and understand. The point of this lesson is to make you feel confident in using groupby and its cousins, resample and rolling.

These notes are loosely based on the Pandas GroupBy Documentation.

Imports:


Pandas for Tabular Data

Pandas

Pandas is a an open source library providing high-performance, easy-to-use data structures and data analysis tools. Pandas is particularly suited to the analysis of tabular data, i.e. data that can can go into a table. In other words, if you can imagine the data in an Excel spreadsheet, then Pandas is the tool for the job.

more ...

More Matplotlib

More Matplotlib

Matplotlib is the dominant plotting / visualization package in python. It is important to learn to use it well. In the last lecture, we saw some basic examples in the context of learning numpy. This week, we dive much deeper. The goal is to understand how matplotlib represents figures internally.

more ...

Numpy and Matplotlib

Numpy and Matplotlib

These are two of the most fundamental parts of the scientific python "ecosystem". Most everything else is built on top of them.


Functions, Classes, and Modules

Python Functions, Classes, and Modules

For longer and more complex tasks, it is important to organize your code into reuseable elements. For example, if you find yourself cutting and pasting the same or similar lines of code over and over, you probably need to define a function to encapsulate that code and make it reusable. An important principle in programming in DRY more ...




Introduction to JupyterLab

JupyterLab will be our primary method for interacting with the computer. JupyterLab contains a complete environment for interactive scientific computing which runs in your web browser. Jupyter is an open source python project that was started by scientists like yourselves who wanted a more effective way to interact with their …

more ...

Introduction to Python

Core Python Language

Mostly copied from the official python tutorial

Invoking Python

There are three main ways to use python.

  1. By running a python file, e.g. python myscript.py
  2. Through an interactive console (python interpreter or ipython shell)
  3. In an interactive iPython notebook

We will be using the iPython notebook.

Python Versions

There are two versions of the python language out there: python 2 and python 3. Python 2 is more common in the wild but is depracated. The community is moving to python 3. As new python learners, you should learn python 3. But it is important to be aware that python 2 exists. It is possible that a package you want to use is only supported in python 2. In general, it is pretty easy to switch between then.

more ...