Last Updated: January 2024
Pandas is one of the most popular Python packages for handling and analyzing data. It mostly deals with tabular data (such as CSV or Comma Separated Values) and implements a wide range of functions to help users select, manipulate, and visualize data. Pandas’ powerful syntax can often reduce long for-loops or plotting commands to just one line of code. One of the best features of pandas is that it easily communicates with other popular packages for data analysis such as Matplotlib (for visualization) or Scikit-Learn (for machine learning). In the galaxy of Python data analysis packages, pandas is right at the center – a lot of other packages rotate around it. Therefore, if you are planning to build a Python pipeline for data analysis, it is extremely helpful to know how to use Pandas and what it can do for you.
Getting Started
Python Pandas Tutorial
George McIntire, Brendan Martin, Lauren Washington
Very approachable introduction to Pandas, including more detail on what the package can do, how it fits within the data science toolkit, when you should use it, how to install it, and the most important features.
10 Minutes to Pandas
Official pandas documentation
This tutorial covers a lot of ground and does a great job at showcasing the power of Pandas. While it is short, it will likely take you longer than 10 minutes though. This is best for those who already have some minimal background knowledge of Pandas (or NumPy) and are not starting from scratch. If you are completely new to pandas, try the previous or following tutorials first.
Pandas Tutorial
W3Schools
Very approachable and comprehensive tutorial on Pandas. Ranges from basic features – such as Pandas’ data structures and how to read different file formats – to cleaning data and plotting. W3Schools is generally a great place to learn about coding.
Intro to Pandas
Alfredo Deza, Noah Gift
If you prefer video tutorials, this one provides a one-hour introduction to Pandas, including how to load and export data, manipulate datasets, apply functions and transform columns, query for specific data, and perform common operations.
Getting Better
Indexing and selecting data
Official pandas documentation
Selection is where the powerful syntax of pandas really shines. Chances are you can select rows and columns with just one line of code. For those of you with R experience, the Pandas selection syntax is close to the selection of rows and columns on R data frames. However, the syntax also presents new subtleties. The documentation is really good at showing how to index data frames in Pandas. Take the time to read through it once and then use it as reference every time you have doubts. The official documentation is the best source for the most up-to-date information as new versions of pandas are released.
Working with Text Data
Official pandas documentation
If you work with a lot of text or natural language data, you certainly want to take a look at how to handle it with Pandas. Pandas provides a specific data type to work with strings.
Quickstart tutorial to NumPy
Official NumPy documentation
NumPy, the core package for handling numerical data in Python, is the backbone of Pandas and many other data analysis packages. For this reason, it is very easy to use NumPy with data loaded with Pandas. Knowing some NumPy will certainly enhance your understanding of Pandas and, most likely, other packages that you are currently using. It will allow you to easily (and efficiently!) implement mathematical functions above and beyond what Pandas already offers. This tutorial is a very good entry point to NumPy. It provides a good overview of the way NumPy handles data and how you can manipulate NumPy data in a vectorized form – without using loops.
Python for Data Analysis (2nd edition)
Wes McKinney
This book wraps it all up. It is really a great resource for data analysis in Python in general. It is especially good in the chapters about Pandas (mostly chapters 5 – 8, 10, and 12) since it was written by the original developer of Pandas himself. The chapters are mostly stand-alone, so you don’t have to go through the entire book to learn about particular features of Pandas or aspects of data analysis. The exposition is detailed and extensive (a lot of ground covered!), but still very accessible. This is highly recommended as a reference book.
Pandas Exercises
Guilherme Samora
The best way to learn Pandas is by using it yourself! This GitHub repo provides a lot of good exercises to practice and learn Pandas. Have fun!