Dask for Parallel Computing and Big Data
Dask for Parallel Computing in Python¶
In past lectures, we learned how to use numpy, pandas, and xarray to analyze various types of geoscience data. In this lecture, we address an incresingly common problem: what happens if the data we wish to analyze is "big data"
Aside: What is "Big Data"?¶
There is a lot of hype around the buzzword "big data" today. Some people may associate "big data" with specific sortware platforms (e.g. "Hadoop", "spark"), while, for others, "big data" means specific machine learning techniques. But I think wikipedia's definition more ...
Assignment 8 - Xarray
Assignment 8 : Xarray¶
Due Thursday, Oct. 26
In this assignment, we will use Xarray to analyze top-of-atmosphere radiation data from NASA's CERES project.
Public domain, by NASA, from Wikimedia Commons
I have pre-downloaded and subsetted a portion of this dataset for use in our class. You can download it here: http://ldeo.columbia.edu/~rpa/CERES_EBAF-TOA_Edition4.0_200003-201701.condensed.nc. The size of the data file is 702.53 MB. It will take a minute or two to download.
more ...