Assignment 6: Pandas Groupby with Hurricane Data¶
Import pandas and matplotlib
Use the following code to download and unzip a CSV file of the NOAA IBTrACS hurricane dataset.
! wget ftp://eclipse.ncdc.noaa.gov/pub/ibtracs/v03r10/all/csv/Allstorms.ibtracs_all.v03r10.csv.gz
! gunzip Allstorms.ibtracs_all.v03r10.csv.gz
Examine the first few lines of the file.
Then use the following code to load as a pandas dataframe. Think about the options being used and why.
df = pd.read_csv('Allstorms.ibtracs_all.v03r10.csv',
parse_dates=['ISO_time'], usecols=range(12),
skiprows=[0, 2], na_values=[-999, 'NOT NAMED'])
df.head()
1) Get the unique values of the Basin
, Sub_basin
, and Nature
columns¶
2) Fix these columns by eliminating the whitespace at the beginning of each¶
3) Filter the dataframe to eliminate columns with no position information¶
4) Rename the Wind(WMO)
and Pres(WMO)
columns to eliminate the parentheses¶
This makes them accessible to TAB completion.
5) Get the 10 largest rows in the dataset by Wind
¶
You will notice some names are repeated.
6) Group the data on Serial_Num
and get the 10 largest hurricanes by Wind
¶
7) Make a bar chart of the wind speed of the 20 strongest-wind hurricanes¶
Use the name on the x-axis
8) Plot the count of all datapoints by Basin¶
as a bar chart
9) Plot the count of unique hurricanes by Basin¶
as a bar chart. (You will need to call groupby
twice.)
10) Make a hexbin
of the location of datapoints in Latitude and Longitude¶
11) Find Hurricane Katrina (from 2005) and plot its track as a scatter plot¶
Use wind speed to color the points.
12) Make time the index on your dataframe¶
13) Plot the count of all datapoints per year as a timeseries¶
You should use resample
14) Plot all tracks from the North Atlantic in 2005¶
You will probably have to iterate through a GroupBy
object
15) Create a filtered dataframe that contains only data since 1970 from the North Atlantic ("NA") Basin¶
Use this for the rest of the assignment
16) Plot the number of datapoints per day¶
Make sure you figure is big enough to actually see the plot
17) Calculate the climatology of datapoint counts as a function of dayofyear
¶
Plot the mean and standard deviation on a single figure
18) Use transform
to calculate the anomaly of daily counts from the climatology¶
Resample the anomaly timeseries at annual resolution and plot
Which years stand out as having anomalous hurricane activity?