This post is part of a series of posts on online learning resources for data science and programming.
By Antonio Nanni, Data Science Research Consultant
GeoPandas is a powerful Python package for writing, analyzing and visualizing geospatial data. It is currently the most popular tool to handle this kind of data on Python. It is based on the very popular package pandas, which is naturally integrated in many well-known data-analysis packages. This means that GeoPandas makes it easy to use advanced and efficient techniques on geospatial data, while boosting the reproducibility and scalability of your analysis. It is therefore worth knowing GeoPandas if you are implementing a data analysis pipeline involving geospatial data at any stage.
As with other guides in this series, we’re focusing on resources that can be accessed for free by members of the Northwestern community, and we’re focusing on resources other than full-length online courses.
Getting Started
10 Minutes to Pandas
You can’t use GeoPandas without using pandas (I will confess that GeoPandas is the reason why I approached pandas in the first place). GeoPandas is substantially an extension of the very popular pandas package for data handling. So, your very first step should be to brush up on your basic pandas. This tutorial covers a lot of ground and it is updated regularly, since it is part of the official documentation. If you are completely new to pandas, check out my previous blog post for further advice.
Introduction to GeoPandas video
Henrikki Tenkanen and Vuokko Heikinheimo
The Digital Geography Lab at the University of Helsinki regularly teaches a course named Automating GIS Processes. If you are looking for an introduction to geospatial analysis in Python, I cannot recommend this course enough. They have kindly made the course material freely available to everyone. The material is very clear, their code and data are available to everyone, the actual lectures are on YouTube and you even have exercises (no solutions though) at the end of the lectures. In its entirety, the course covers more than just GeoPandas. However, GeoPandas is the main focus for the first part of the course – and it is the basis in the second part. In particular, this lesson (Lesson 2) provides a very gentle and clear introduction to the most important concepts of GeoPandas: loading files, coordinate reference systems, and the properties of the GeoSeries and GeoDataFrame objects.
Mapping Tools
In the analysis of geospatial data, producing maps is often an important intermediary step or even the final output of the analysis. Not to mention that maps are very helpful to debug or check your results. For this reason, GeoPandas comes equipped with integrated functions to produce choropleth maps. Under the hood, the functions are implemented in the Python package Matplotlib. Therefore, it is relatively easy to further elaborate the image using the great flexibility of this popular plotting library. This official user guide provides a concise but effective introduction to the mapping tools native in GeoPandas. It does a good job at addressing the most common needs in a relatively small space and is an excellent starting point.
Getting Better
The Shapely User Manual
Sean Gillies
Shapely is the computational geometry package that pandas uses under the hood to deal with geometry problems such as What is the area of this polygon? How distant are these two points? etc. Knowing Shapely will greatly extend your ability to merge, modify and understand maps. If you need to do some advance operation on your map, Shapely is what you want to use. Unfortunately, to the best of my knowledge, there is no gentle tutorial for this package. However, in my own experience, the User Manual is so clear and well written that you do not really need a tutorial. Read the Introduction and the Geometric Objects part. After that, you can substantially use this manual as a reference for your own needs.
Interactive Folium Map
Henrikki Tenkanen and Vuokko Heikinheimo
I mentioned already that the Automatic GIS Processes course covered more ground than just GeoPandas. Indeed, the course also introduces Folium (and its interaction with GeoPandas) during Lesson 5. Folium is a mapping package to build interactive, mobile-friendly, cool-looking maps. It allows you to produce interactive visualizations of your geospatial data that you can export on your own webpage. Unlike other general-purposes packages for interactive plots (such as Bokeh), Folium is specifically engineered for mapping with a very natural and powerful syntax geared towards creating multi-layer maps. If you plan to eventually publish your map on a website, I strongly recommend you take a look at this lecture and start your exploration of Folium.
Stuck?
If you have a question about GeoPandas, don’t know which resource to start with, or need to learn something not covered above, remember you can always request a free consultation with our data science consultants. We’re more than happy to answer questions and point you in the right direction.