Dask for Parallel Computing and Big Data
Dask for Parallel Computing in Python¶
In past lectures, we learned how to use numpy, pandas, and xarray to analyze various types of geoscience data. In this lecture, we address an incresingly common problem: what happens if the data we wish to analyze is "big data"
Aside: What is "Big Data"?¶
There is a lot of hype around the buzzword "big data" today. Some people may associate "big data" with specific sortware platforms (e.g. "Hadoop", "spark"), while, for others, "big data" means specific machine learning techniques. But I think wikipedia's definition more ...
Parallel Programming with MPI For Python
MPI For Python
more ...Organization and Packaging of Python Projects
Organization and Packaging of Python Projects¶
A complex research project often relies and many different programs and software packages to accomplish the research goals. An important part of scientific computing is deciding how to organize and structure the code you use for research. A well-structured project can make you a more efficient and effective researcher. It is also a key component of scientific reproducibility more ...
Map making in Python with Basemap
Basemap Tutorial
This brief tutorial will look at the Basemap toolkit extension for matplotlib. Basemap allows you to create map plots in python. It extends matplotlib's functionality by adding geographical projections and some datasets for plotting coast lines and political boundaries, among other things.
We only have time to cover a few examples here, which I have modified from a few places:
more ...Assignment 8 - Xarray
Assignment 8 : Xarray¶
Due Thursday, Oct. 26
In this assignment, we will use Xarray to analyze top-of-atmosphere radiation data from NASA's CERES project.
Public domain, by NASA, from Wikimedia Commons
I have pre-downloaded and subsetted a portion of this dataset for use in our class. You can download it here: http://ldeo.columbia.edu/~rpa/CERES_EBAF-TOA_Edition4.0_200003-201701.condensed.nc. The size of the data file is 702.53 MB. It will take a minute or two to download.
more ...Intermediate Python III: Xarray for Multidimensional Data
Xarray for multidimensional gridded data¶
In last week's lecture, we saw how Pandas provided a way to keep track of additional "metadata" surrounding tabular datasets, including "indexes" for each row and labels for each column. These features, together with Pandas' many useful routines for all kinds of data munging and analysis, have made Pandas one of the most popular python packages in the world.
more ...Assignment 7 - Pandas
Assignment 7 - Pandas¶
Due Oct 19.
In this assignment we will use pandas to examine earthquake data.
Start by importing pandas, numpy and matplotlib.
I saved you some time by pre-downloading some data in .csv format from the USGS Earthquakes Database. It is located at:
http://www.ldeo.columbia.edu/~rpa/usgs_earthquakes_2014.csv
You don't even need to download it. You can open it directly with Pandas.
more ...Intermediate Python II: Pandas for Tabular Data
Pandas¶
Pandas is a an open source library providing high-performance, easy-to-use data structures and data analysis tools. Pandas is particularly suited to the analysis of tabular data, i.e. data that can can go into a table. In other words, if you can imagine the data in an Excel spreadsheet, then Pandas is the tool for the job.
more ...Assignment 6 - Numpy and Matplotlib
Assignment 6 - Numpy and Matplotlib¶
Due Thursday October 12¶
Your assignment should be handed in as an ipython notebook checked into your github repository in a new folder named assignment_6
. To download this assignment, your best option is to clone the original github repository for the course website:
git clone https://github.com/rabernat/research_computing.git
and then navigate to the assignment
more ...
Intermediate Python I: NumPy arrays and matplotlib
Numpy and Matplotlib¶
These are two of the most fundamental parts of the scientific python "ecosystem". Most everything else is built on top of them.
Introduction to Python
Core Python Language¶
Mostly copied from the official python tutorial
Invoking Python¶
There are three main ways to use python.
- By running a python file, e.g.
python myscript.py
- Through an interactive console (python interpreter or ipython shell)
- In an interactive iPython notebook
We will be using the iPython notebook.
Python Versions¶
There are two versions of the python language out there: python 2 and python 3. Python 2 is more common in the wild but is depracated. The community is moving to python 3. As new python learners, you should learn python 3. But it is important to be aware that python 2 exists. It is possible that a package you want to use is only supported in python 2. In general, it is pretty easy to switch between then.
more ...