Skip to main content

Plotting Geospatial Data with Python

Last month I wrote a blog post diving into the nitty gritty details about how to download a satellite image as a GeoTIFF file using Google’s Earth Engine API in Python.  I also shared my code in this GitHub repo so that you all can use it freely.  Over the past few weeks, I imagine that you’ve downloaded satellite imagery to your heart’s content, and now you may be wondering, “How can I plot my favorite geospatial dataset on top of this awesome satellite image using Python?”  This blog post will help get you started toward answering that question.

Let’s use the Chicago map that we downloaded previously along with some of the fantastic data available at the Chicago Data Portal.  I encourage you to peruse the different datasets there; many have geospatial information (e.g., latitude and longitude columns, POINT objects, POLYGON geometries, etc.).  For this post, I’m going to work with two data files:

The three primary Python packages that I’ll use to plot these data are matplotlib, rasterio and GeoPandas.  I will assume that you already have a decent handle on how to use matplotlib.  Rasterio is a great package for plotting and manipulating GeoTIFF files.  (I used rasterio in last month’s blog post as well.)  GeoPandas is the “go to” package in Python for working with geospatial data sets, and, as the name suggests, provides much of the functionality that pandas offers as well.  If you’ve never used GeoPandas, you may want to read this RCS blog post to get started.

As usual, I recommend working with Anaconda’s Python distribution and installing these packages using conda.  I found that to install GeoPandas, it is best to create a new conda environment and install everything through the conda-forge channel.  (Otherwise, you may end up with many conflicts that prohibit you from installing the packages you want.)  If you’ve followed the steps in my previous blog post to set up that environment, you should be all set to install GeoPandas (and pandas) in your environment (after activating it), using the command,

Otherwise, I recommend looking at the start of this Jupyter notebook from my previous blog post, under “Getting started”, and then running the install command within your new “geo_forge” environment.

For this demo, I created this Jupyter notebook in the same GitHub repository that I linked above, which goes through my steps to create a figure exploring this Chicago pothole data.  I’ll walk through the main steps in the code below, but first, here is the final figure in all its glory:

Chicago potholes plots

Figure 1: Number of potholes reported to 311 in Chicago between the years of 2011 and 2018.  (left) Choropleth plotted on top of a satellite image of Chicago showing the number of potholes reported over all dates for the given zip code.  (right) Histogram showing the number of potholes (over all zip codes) within one-month intervals.

Reading in the Files

I use pandas to read in the CSV file of reported potholes, GeoPandas to read in the Shapefile with the zip code boundaries, and rasterio to read in the GeoTIFF file:

Within the notebook, you can see that both the pandas and GeoPandas DataFrames can be displayed in a similar table format.  Let me draw your attention specifically to the “geometry” column in the Shapefile data (gdf), which contains POLYGON objects.  Each POLYGON object defines the shape of the given zip code on the map, using longitude and latitude coordinates (i.e., the boxes on top of the satellite image in the figure above).  These POLYGON geometries will allow us to plot the zip code boundaries on top of the satellite image.

Manipulating the Data

My goal is to plot each zip code region on top of the satellite image, colored by the total number of potholes reported within the respective zip code over all dates.   A figure like this is called a “choropleth”, which is a standard format for plotting geometric map data (that you’ve probably seen in many places online).

To create this choropleth, I first sum up all the pothole reports (from the CSV) file for each of the zip codes in the Shapefile.  Fortunately, both the CSV file and the Shapefile contain the zip codes, so I can match using those values instead of checking for overlapping geometries (which would be more computationally intensive):

In case you run into data that cannot be matched so easily, here is how you could get started.  Notice that the CSV file contains columns for latitude and longitude.  If you wanted to use those data to plot points on top of a map, or match to other geometries, you could use GeoPandas to create POINT objects with a command like:

GeoPandas (and the Shapely underbelly) allows you to check if a POINT lies within a POLYGON with an if statement like:

where point would be a POINT object, e.g., from the “geometry” column in the df pandas DataFrame (containing the CSV dataset, and with POINTs added as above), and poly would be a POLYGON object from the “geometry” column of the gdf GeoPandas DataFrame (containing the Shapefile dataset).

Since I am also interested in working with dates, I convert the “CREATION DATE” column in df into Python datetime objects.  This way I can more easily sort and plot the data later on.

Plotting the Data

I created two views of the data in the figure above.  On the left I created a choropleth plotted on top of the satellite image, and on the right, I created a standard histogram (plotted horizontally).  Let’s start with the left-hand plot.  First, I create the figure and plot the satellite image, contained within the “chicago” object, using rasterio:

Next, I overlay the choropleth;  GeoPandas makes this very simple, using code like:

I define a colormap to use (“Reds”).  Next, I define a normalization that maps the “Total Potholes” column in gdf to be between zero and one.  This allows me to grab a color from the colormap based on the number of potholes reported for a given zip code.  For additional emphasis, I also define a separate transparency (alpha) level for each zip code region, tied to the same column as the colormap.  Finally, I plot the choropleth with the last line of code above.

In the figure above, I also include labels for each of the zip codes and a colorbar.  (The zip code labels turned out to be more of a challenge than it’s probably work going through in detail here.) I encourage you to look at the code in the notebook to see how I accomplished this.

Finally, on the right side of the figure I include a histogram of the number of potholes reported over all zip codes each month.  I make use of numpy’s datetime64 utilities to define bins at one month intervals.  Below is a simplified example script (and see the notebook for the full code).

Notice in the code above how I create the bins using np.arange, but defining the interval using np.timedelta64.  This is the key step to creating the bins in date.  I also use this method in the notebook to create the axis labels at 3-month intervals and the dotted lines at 1-year intervals.

Finally, though I didn’t use it in the example above, if you wanted to plot points on top of a map, here is a code snippet to get you started:

I first define a GeoPandas DataFrame from df, specifying the columns that I want to use for the geometry.  (Note, this command is very similar to the example I provided above to add POINT geometries to df.)  Then I  plot the first 100 points on the figure.  You can also change the color and markersize for each point if you so desire.

You are now equipped to manipulate and plot geospatial data using GeoPandas.  I encourage you to go to the Chicago Data Portal and download some of your own data to plot on top of the Chicago GeoTIFF image and to also investigate for data sets and satellite images of your other favorite cities!