Dead Pixels and Georeferencing
Mar 25, 2017
When working with scanned aerial imagery, I often come across a portion of the image that isn’t map friendly. This area is typically the border around the scanned image. This post will step through how to remove this unwanted area by isolating pixels based on their value. I assume you are familiar with QGIS, GDAL, and the command line.
Download and view imagery
The University of Toronto Map & Data Library holds a great collection of aerial imagery from all over Canada. We will work with some imagery covering Toronto from 1947.
Pick any four connected scenes to download. I chose these four:
Open up the .jpg files in each scene within QGIS. Depending on which files you opened in which order, you will see something similar to this:
The problem is that the border has been scanned in along with the image. If we try to merge these images, the border will be included in the merge and will cover a part of the merged image, similar to how the the images are rendered in QGIS with the overlap:
Almost always, the border is a different colour than the information within the image. A good way to think about the process of removing this border is to think of the image as a collection of individual pixels, each of which hold a value. Different pixel values render different colours. We can get a sense of the values of the image by viewing the metadata and raster stats of the image.
Within the Metadata tab in QGIS we can find the Properties area. Let’s take two pieces of information here to breakdown.
No Data
Any pixel in the image that whose value matches the no data value will be transparent. We can set the no data value on an image, making unwanted pixels transparent. Currently the no data value is 0.
Data Type
The data type tells us the range of values we can expect in our data. In the GIS industry you may see this referred to as radiometric resolution. These images are 8 bit unsigned integer. For our purposes this means the values will range from 0 to 255.
Identifying unwanted pixels
Select an image in the table of contents, and activate the Identify Features tools. By clicking around the image you can get a sense of the values of the pixels. Click on various parts of the border in the image to get a sense of the range of values we would like to remove from the image.
The border values consist of pixels with values 240 and higher. You may notice that this value of 240 is also present within the image, and also that there are outlier values within the border, such as scale labels, or slightly darker areas from an uneven scan. Unfortunately these artifacts will remain in the final image, but are much less noticeable than the border area since the backdrop can simply be set to white. Additionally they do add a rustic look to imagery, sometimes to the benefit of the map.
Raster Calculator
With the raster calculator we can change values that meet a condition in an image. From investigating the image I chose 240 as a threshold. We will set all values above 240 to 0:
After we run the raster calculator:
And set the no data value to 0:
And finally an improvement on our original image:
This process can be replicated for the other images and finally merged into one image. However at this point you will want to automate this process as much as possible, as depending on the number of images it may take an unreasonable amount of time to process your data with the raster calculator.
Batch processing with GDAL
Rather than manually process these images, we can use GDAL and the command line to batch process all of our images.
Here is an example of a bash script that uses GDAL to apply the above process on all jpg files within a directory.
#!/bin/bash
output_location="~/nodata_"
for f in ~/*.jpg;
do
if $f ==
gdal_calc.py -A $f --A_band=1 --outfile=$output_location${f:(-11)} --calc="A*(A<240)" --NoDataValue=0
done
The variable ‘f’ is shorthand for ‘file’. We are looping through all jpg file types, and running gdal_calc.py on the file. -A is the flag for the file name. –A is the flag for which band to use(here we explicitly use band 1). The –calc flag is our raster calculator expression, and we set the noData value to 0.
This is the same process we applied with the raster calculator, where we substitute GDAL for our UI based approach in QGIS. The benefit of processing the files this way is that we can process the entire repository of imagery with no additional manual work. For an example of how a stitched image covering a large area might look after this process, see the 1954 Toronto map.