Cloud-based raster processors out there

Hi all,

Just trying to get my head around some of the new big raster processors out there, in addition of course to Google Earth Engine. Bear with me while I sort through these. Thanks for raster sleuth Stefania Di Tomasso for helping. 

1. Geotrellis (https://geotrellis.io/)

Geotrellis is a Scala-based raster processing engine, and it is one of the first geospatial libraries on Spark.  Geotrellis is able to process big datasets. Users can interact with geospatial data and see results in real time in an interactive web application (for regional, statewide dataset).  For larger raster datasets (eg. US NED). GeoTrellis performs fast batch processing using Akka clustering to distribute data across the cluster.  GeoTrellis was designed to solve three core problems, with a focus on raster processing:

  • Creating scalable, high performance geoprocessing web services;
  • Creating distributed geoprocessing services that can act on large data sets; and
  • Parallelizing geoprocessing operations to take full advantage of multi-core architecture.

Features:

  • GeoTrellis is designed to help a developer create simple, standard REST services that return the results of geoprocessing models.
  • GeoTrellis will automatically parallelize and optimize your geoprocessing models where possible.
  • In the spirit of the object-functional style of Scala, it is easy to both create new operations and compose new operations with existing operations.

2. GeoPySpark - in synthesis GeoTrellis for Python community

Geopyspark provides python bindings for working with geospatial data on PySpark (PySpark is the Python API for Spark). Spark is open source processing engine originally developed at UC Berkeley in 2009.  GeoPySpark makes Geotrellis (https://geotrellis.io/) accessible to the python community.  Scala is a difficult language so they have created this Python library. 

3. RasterFoundry (https://www.rasterfoundry.com/)

They say: "We help you find, combine and analyze earth imagery at any scale, and share it on the web." And "Whether you’re working with data already accessible through our platform or uploading your own, we do the heavy lifting to make processing your imagery go quickly no matter the scale."

Key RasterFoundry workflow: 

  1. Browse public data
  2. Stitch together imagery
  3. Ingest your own data
  4. Build an analysis pipeline
  5. Edit and iterate quickly
  6. Integrate with their API

4. GeoNotebooks

From the Kitware blog: Kitware has partnered with The NASA Earth Exchange (NEX) to design GeoNotebook, a Jupyter Notebook extension created to solve these problems (i.e. big raster data stacks from imagery). Their shared vision: a flexible, reproducible analysis process that makes data easy to explore with statistical and analytics services, allowing users to focus more on the science by improving their ability to interactively assess data quality at scale at any stage of the processing.

Extending Jupyter Notebooks and Jupyter Hub, this python analysis environment provides the means to easily perform reproducible geospatial analysis tasks that can be saved at any state and easily shared. As the geospatial datasets come in, they are ingested into the system and converted into tiles for visualization, creating a dynamic map that can be managed from the web UI and can communicate back to a server to perform operations like data subsetting and visualization. 

Blog post: https://blog.kitware.com/geonotebook-data-driven-quality-assurance-for-geospatial-data/ 

False precision in the English Lidar release

Great commentary from Martin Isenburgon of LASTools fame on releasing data with false precision. This deals with the new open data release by the Environment Agency in England. So far LiDAR-derived DTM and DSM rasters have been released for 72% of the entire English territory at horizontal resolutions of 50 cm, 1 m, and 2 m. They can be downloaded here. The rasters are distributed as zipped archives of tiles in textual ASC format (*.asc). 

Martin gives us a cautionary tale on how not to release national data. It is not the ASC format that he has problems with, but the vertical precision. He says:

"The vertical resolution ranges from femtometers to attometers. This means that the ASCII numbers that specify the elevation for each grid cell are written down with 15 to 17 digits after the decimal point."

Example heights might be something like: 79.9499969482421875 or 80.23999786376953125. These data should be resolved to about the cm, not attometer, whatever that is. Crazy man!

Read the full post: http://rapidlasso.com/2015/09/02/england-releases-national-lidar-dem-with-insane-vertical-resolution/

Big Data for sustainability: an uneven track record with great potential

An interesting position piece on the appropriate uses of big data for climate resilience. The author, Amy Luers, points out three opportunities and three risks.

She sums up:

"The big data revolution is upon us. How this will contribute to the resilience of human and natural systems remains to be seen. Ultimately, it will depend on what trade-offs we are willing to make. For example, are we willing to compromise some individual privacy for increased community resilience, or the ecological systems on which they depend?—If so, how much, and under what circumstances?"

Read more from this interesting article here.