Recently I went looking online for a satellite image of Chicago to use in an RCS workshop. I figured that a simple Google search would easily return a place to download a ready-made image of the city with latitude and longitude embedded that I could use in my favorite geographic information system (GIS) application. There are plenty of satellites in the sky mapping the Earth surface; many of these make their images available for free, and of course we are all familiar with mapping utilities like Google maps. So, since these images exist online, I should be able to easily download a free tile image, perhaps a GeoTIFF image, centered on a major city like Chicago, right? Wrong.
This is not the first time I’ve run into this issue. In the past I’ve found various ad hoc workarounds. This time I decided to write a generalizable Python script to download satellite images, and I want to share that code with you here. You can find the full code on my GitHub repo, and I’ll break it down section by section below.
Google’s Earth Engine
Google has developed a fantastic (and free) tool to access imagery from a variety of Earth-facing satellites, called Earth Engine. There is a Python API (and a JavaScript API) to access the service, which I will use here. In order to use the API you need to sign up.
The online documentation for Earth Engine is a bit limited, especially for Python. But I’ve found that the JavaScript tutorials can also be helpful since the syntax is very similar to Python. You can find a list of available datasets here. Within each dataset’s description there is a code snippet (typically in JavaScript) that shows how to access the dataset. Some tutorials can be found here, and here is a helpful introduction to the API in Python.
My goal with this project is to download a full color (RGB) satellite image at decent resolution and save it to my computer in GeoTIFF format so that I can use it with various other mapping applications.
After trying a few different data sources for optical satellite images, it appears that ESA’s Sentinal S2 Surface Reflectance dataset is the best option for my typical needs (at least at the time of writing). It provides RGB images at 10m resolution.
Getting Started
I am admittedly not an expert with GIS data. (I am an astronomer by training, and we generally look away from Earth!) So, as we often do, I started by scouring the internet for code snippets and advice on how to accomplish this goal. I ended up piecing together methods from various tutorials with a LOT of trial and error to make this work.
The first thing you need to do is to install the Google Earth Engine API and rasterio. I use Anaconda’s Python distribution, and I created a new environment to work in (which I strongly recommend). Here are the install commands that I used:
The Main Python Script
I created a Python script titled EarthEngineToGeoTIFF.py, which contains a single function called getSentinalS2SRImage
. This function downloads an RGB image from the Sentinal S2 Surface Reflectance satellite at the given central (lon, lat) coordinates and with a desired size (sze, in degrees), and saves the image to a GeoTIFF file with a name specified by the filename input. (I also provide a few additional input options, which are explained within the comments of the code.) At the end of this post, I also walk through a simple script in a Jupyter notebook to initiate the authentication (with Google), run this Python script and download the image.
Note that I have chosen to use the Sentinal satellite images, but there are MANY other satellites and datasets that you could choose from. In principle, you should be able to make minimal modifications to the code I created in order to download data from your favorite satellite.
Creating the Earth Engine Image object
Through Earth Engine, you access images from a given satellite by first defining an ImageCollection
. This ImageCollection
can have many filter options to narrow down to the exact images that you desire. I filter by location (by passing a geometry) and by date (which can be specified on the input to my function). Then I sort the collection by the percentage of cloud cover and take the first one (i.e., the image with the least amount of cloud cover). Finally, I add the longitude and latitude pixel coordinates to the collection.
Generating the individual GeoTIFF files for each band
I want an RGB image, so I need three bands from the satellite. Looking at the documentation for this satellite, shows that I want bands B4, B3 and B2. For each of these bands, I have Earth Engine export an image to Google Drive, and then I download the image locally to my computer. (Note that the files will be saved to your Google Drive storage; you may want to delete these later, to save space.) Each download is a .zip (compressed) file, and the code unpacks it into the desired .tif file.
Combining into a single RGB GeoTIFF image
I use rasterio to create the combined RGB GeoTIFF image. First, I open each of the three images that I downloaded in the previous step. The pixel values from the satellite images are intensities and can range from close to zero up to thousands (or perhaps higher; note that I clip these using the vmin and vmax inputs in my code). However, normal color images are expected to have pixel values between 0 and 255. So, I will need to rescale the pixel values for each image. I clip the values to be between the 2nd and 98th percentiles of the intensities for the combined pixel array (including all three bands), and then rescale to be between 0 and 255. The clipping replaces any possible bad pixels from the images so that they don’t affect the overall image quality. Then, I create a new image and write the red (B4), green (B3) and blue (B2) bands to the file. Finally, I remove the intermediate files from my computer.
Limitations
Though this script appears to work well during my testing, there are certainly limitations to the functionality. There is almost definitely a limit to the size of the image that you can download. In my tests, I didn’t exceed 1 degree. There will also likely be regions on Earth and/or dates where there simply aren’t images available from this satellite. If you try to use the script and see errors, I recommend decreasing the image size you are requesting (maybe try <0.1 degrees) and then trying a different date range.
Testing in a Jupyter Notebook
I also created a simple Jupyter notebook to test the code by downloading an image of Chicago, and I’ll go through the steps here.
After importing the necessary libraries, you need to authenticate with Earth Engine. Running the following code in the notebook generates a url that you need to follow in order to get an access token to copy and paste into your notebook. This access token allows you to use Earth Engine (assuming that you have already registered, see above) while the notebook kernel is active.
Next, I define the location and image size that I want (for Chicago) and call my script.
Finally, I use rasterio to open and plot the image.
The final GeoTIFF image is available on my GitHub repo here. (Note that you will need to open this in a GIS program to view it; otherwise it may look like a big black square. Here is a PNG version that will open in standard image software.) Now you can use this image with standard mapping packages, for instance to investigate location-based datasets from the Chicago Data Portal. Enjoy!