First Concepts in GeoInfo
NOTE: The Yves’ Cookbook on GitHub provides many useful code examples for geomatics. This YouTube channel also offers plenty of resources.
Installing Your Libraries
The commands to install libraries are always the same. You can either use PIP:
pip install your_lib
or CONDA:
conda install your_lib
To get started with geomatics development, there are a few essential libraries:
- OGR/GDAL: A long-standing C++ library used in geomatics. It allows manipulation of both vector and raster data. Installation can be tricky. It’s often easier to use
rasterio
andfiona
, which are Python interfaces to GDAL/OGR. - rasterio: A simpler interface to GDAL for manipulating raster data.
- fiona: The interface to OGR for vector data. Its consistent conventions with
rasterio
make it very user-friendly. - pandas/geopandas: For handling formatted data (e.g., CSVs). These libraries allow fast and simple data manipulation and
geopandas
can also read vector data formats, potentially replacingfiona
. - matplotlib: Essential for producing charts and plots, a complete replacement for Excel charting.
Your development environment should include these libraries. Depending on your project, additional more specific ones may be needed.
Reading Spatial Data
Vector Data
Common formats Python supports include Shapefile, GeoJSON, GPKG, CSV, etc.
Using fiona
, you can read spatial data like a Shapefile:
import fiona
# Open Shapefile
with fiona.open('file.shp', 'r') as src:
for feature in src:
print(feature)
Or a GeoJSON file:
import fiona
# Open GeoJSON file
with fiona.open('path/to/your/file.geojson', 'r') as src:
for feature in src:
print(feature)
An alternative to fiona
is geopandas
:
import geopandas as gpd
# Read the Shapefile
gdf = gpd.read_file('path/to/your/file.shp')
# Display first 5 rows
print(gdf.head())
Raster Data
To read raster files in Python, use the rasterio
library:
import rasterio
import matplotlib.pyplot as plt
# Open raster file
with rasterio.open('my_file.tif') as src:
raster = src.read(1)
# Display the raster
plt.imshow(raster, cmap='gray')
plt.show()
Raster files may contain multiple bands, each representing a different variable. src.read()
can read multiple bands at once. The loaded images are NumPy arrays, enabling full use of NumPy operations.
Writing Spatial Data
Vector Data
Suppose you have a list of features called features
that you want to write to a Shapefile named my_file.shp
:
import fiona
schema = {
'geometry': 'Point',
'properties': {
'name': 'str',
'value': 'int'
}
}
with fiona.open('my_file.shp', 'w', 'ESRI Shapefile', schema) as dst:
for feature in features:
dst.write(feature)
The schema
defines the geometry type and attributes. Here, it’s a Point geometry with two attributes: a string (name
) and an integer (value
).
Raster Data
Suppose you have a NumPy matrix called matrix
to write to a GeoTIFF file named my_file.tif
:
import rasterio
height, width = matrix.shape
metadata = {
'count': 1,
'crs': 'EPSG:4326',
'dtype': 'float32',
'driver': 'GTiff',
'height': height,
'width': width,
'transform': rasterio.transform.from_bounds(0, 0, width, height, width, height),
'nodata': -9999
}
with rasterio.open('my_file.tif', 'w', **metadata) as dst:
dst.write(matrix, 1)
The metadata
dictionary defines the raster’s properties such as number of bands, spatial reference, data type, and geotransform. from_bounds()
sets the spatial extent and resolution.
Database Interactions
To interact with a spatial database in Python, use pandas
or geopandas
for importing, manipulating, and exporting tabular data. You’ll also need a connector library like psycopg2
(PostgreSQL) or pyodbc
(SQL Server).
To load spatial data from a PostgreSQL table my_spatial_data
into a DataFrame
:
import pandas as pd
import psycopg2
conn = psycopg2.connect("dbname=my_database user=my_user password=my_password")
df = pd.read_sql_query("SELECT * FROM my_spatial_data", conn)
conn.close()
For spatial databases, use geopandas
:
import geopandas as gpd
import psycopg2
conn = psycopg2.connect("dbname=my_database user=my_user password=my_password")
gdf = gpd.read_postgis("SELECT * FROM my_spatial_data", conn, geom_col="geom")
conn.close()
This returns a GeoDataFrame
that includes both attributes and geometries, allowing for full geospatial analysis and export.
Useful Videos
Here is a series of videos covering the concepts in this page:
- Various ways to read vector and raster data with Python 3
- Reading and processing images with Numpy
- Basic operations with Numpy
- NDVI calculation using Numpy, GDAL, and Matplotlib
- Read CSV data and create a matrix with Numpy, then display with Matplotlib
- Using Kaggle to analyze data (Pandas, Matplotlib, and folium)
- Getting started with Pandas and Anaconda (Spyder)
- Analyzing multiple CSV files with Pandas (Bixi data example)
- Consolidating CSV files with Pandas
- Using Geopandas
- How to write a GeoTIFF or Shapefile with Rasterio and Fiona in a few lines