Assignment 6: Pandas Groupby with Hurricane Data

Import pandas and matplotlib

In [ ]:
 

Use the following code to download and unzip a CSV file of the NOAA IBTrACS hurricane dataset.

In [ ]:
! wget ftp://eclipse.ncdc.noaa.gov/pub/ibtracs/v03r10/all/csv/Allstorms.ibtracs_all.v03r10.csv.gz
! gunzip Allstorms.ibtracs_all.v03r10.csv.gz

Examine the first few lines of the file.

Then use the following code to load as a pandas dataframe. Think about the options being used and why.

In [ ]:
df = pd.read_csv('Allstorms.ibtracs_all.v03r10.csv',
                 parse_dates=['ISO_time'], usecols=range(12),
                 skiprows=[0, 2], na_values=[-999, 'NOT NAMED'])
df.head()

1) Get the unique values of the Basin, Sub_basin, and Nature columns

In [ ]:
 
In [ ]:
 
In [ ]:
 

2) Fix these columns by eliminating the whitespace at the beginning of each

In [ ]:
 

3) Filter the dataframe to eliminate columns with no position information

In [ ]:
 

4) Rename the Wind(WMO) and Pres(WMO) columns to eliminate the parentheses

This makes them accessible to TAB completion.

In [ ]:
 

5) Get the 10 largest rows in the dataset by Wind

In [ ]:
 

You will notice some names are repeated.

6) Group the data on Serial_Num and get the 10 largest hurricanes by Wind

In [ ]:
 

7) Make a bar chart of the wind speed of the 20 strongest-wind hurricanes

Use the name on the x-axis

In [ ]:
 

8) Plot the count of all datapoints by Basin

as a bar chart

In [ ]:
 

9) Plot the count of unique hurricanes by Basin

as a bar chart. (You will need to call groupby twice.)

In [ ]:
 

10) Make a hexbin of the location of datapoints in Latitude and Longitude

In [ ]:
 

11) Find Hurricane Katrina (from 2005) and plot its track as a scatter plot

Use wind speed to color the points.

In [ ]:
 

12) Make time the index on your dataframe

In [ ]:
 

13) Plot the count of all datapoints per year as a timeseries

You should use resample

In [ ]:
 

14) Plot all tracks from the North Atlantic in 2005

You will probably have to iterate through a GroupBy object

In [ ]:
 

15) Create a filtered dataframe that contains only data since 1970 from the North Atlantic ("NA") Basin

Use this for the rest of the assignment

In [ ]:
 

16) Plot the number of datapoints per day

Make sure you figure is big enough to actually see the plot

In [ ]:
 

17) Calculate the climatology of datapoint counts as a function of dayofyear

Plot the mean and standard deviation on a single figure

In [ ]:
 

18) Use transform to calculate the anomaly of daily counts from the climatology

Resample the anomaly timeseries at annual resolution and plot

In [ ]:
 

Which years stand out as having anomalous hurricane activity?

In [ ]: