Assignment 7 - Pandas¶
Due Oct 19.
In this assignment we will use pandas to examine earthquake data.
Start by importing pandas, numpy and matplotlib.
I saved you some time by pre-downloading some data in .csv format from the USGS Earthquakes Database. It is located at:
You don't even need to download it. You can open it directly with Pandas.
1) Use Pandas' read_csv function directly on this url to open it as a DataFrame¶
(Don't use any special options). Display the first few rows and the DataFrame info.
You should have seen that the dates were not automatically parsed into datetime types.
2) Re-read the data in such a way that all date columns are identified as dates and the earthquake id is used as the index¶
Verify that this worked using the
describe to get the basic statistics of all the columns¶
Note the highest and lowest magnitude of earthquakes in the databse.
It looks like US states are being treated differently from foreign countries. We would like to fix that.
How can we tell if a name is a US state name? Python has a package for that: https://pypi.python.org/pypi/us!
This is a good time to try installing a new package using
pip. Pip is the original python package manager that predates
conda is more oriented towards data science while
pip is more general purpose. There are lots more packages on
pip than on
conda. You can read a comparision of these two utilities if you want to know more.
8) Install the
us package using pip, either directly from the notebook or the command line¶
The shell command is
pip install us.
11) Write a function to check whether a string is a US state name.¶
This function should not be case sensitive. It should also strip any whitespace out of the test string.
13) reindex this boolean series to match the dataframe's index¶
Fill the null values with
14) Now re-assign the country column in the DataFrame to
USA if the row is a state.¶
Also add the state name as a new column.
17) Analyze the distribution of the Earthquake magnitudes in the filtered distribution¶
Make a histogram of the Earthquake count versus magnitude. Make sure to use a Logarithmic scale. What sort of relationship do you see?
fig, ax = plt.subplots() df_filt.hist('mag', bins=20, ax=ax) ax.set_yscale('log')
18) Visualize the locations of earthquakes by making a scatterplot of their latitude and longitude.¶
Use the filtered data. Color it by magnitude. Make it pretty