Day 2 Wrap Up from the NEON Data Institute 2017

First of all, Pearl Street Mall is just as lovely as I remember, but OMG it is so crowded, with so many new stores and chains. Still, good food, good views, hot weather, lovely walk.

Welcome to Day 2!
Our morning session focused on reproducibility and workflows with the great Naupaka Zimmerman. Remember the characteristics of reproducibility - organization, automation, documentation, and dissemination. We focused on organization, and spent an enjoyable hour sorting through an example messy directory of misc data files and code. The directory looked a bit like many of my directories. Lesson learned. We then moved to working with new data and git to reinforce yesterday's lessons. Git was super confusing to me 2 weeks ago, but now I think I love it. We also went back and forth between Jupyter and python stand alone scripts, and abstracted variables, and lo and behold I got my script to run. All the git stuff is from

The afternoon focused on Lidar (yay!) and prior to coding we talked about discrete and waveform data and collection, and the opentopography ( project with Benjamin Gross. The opentopography talk was really interesting. They are not just a data distributor any more, they also provide a HPC framework (mostly TauDEM for now) on their servers at SDSC ( They are going to roll out a user-initiated HPC functionality soon, so stay tuned for their new "pluggable assets" program. This is well worth checking into. We also spent some time live coding with Python with Bridget Hass working with a CHM from the SERC site in California, and had a nerve-wracking code challenge to wrap up the day.

Fun additional take-home messages/resources:

Thanks to everyone today! Megan Jones (our fearless leader), Naupaka Zimmerman (Reproducibility), Tristan Goulden (Discrete Lidar), Keith Krause (Waveform Lidar), Benjamin Gross (OpenTopography), Bridget Hass (coding lidar products).

Day 1 Wrap Up
Day 2 Wrap Up 
Day 3 Wrap Up
Day 4 Wrap Up

 Our home for the week

Our home for the week

Cloud-based raster processors out there

Hi all,

Just trying to get my head around some of the new big raster processors out there, in addition of course to Google Earth Engine. Bear with me while I sort through these. Thanks for raster sleuth Stefania Di Tomasso for helping. 

1. Geotrellis (

Geotrellis is a Scala-based raster processing engine, and it is one of the first geospatial libraries on Spark.  Geotrellis is able to process big datasets. Users can interact with geospatial data and see results in real time in an interactive web application (for regional, statewide dataset).  For larger raster datasets (eg. US NED). GeoTrellis performs fast batch processing using Akka clustering to distribute data across the cluster.  GeoTrellis was designed to solve three core problems, with a focus on raster processing:

  • Creating scalable, high performance geoprocessing web services;
  • Creating distributed geoprocessing services that can act on large data sets; and
  • Parallelizing geoprocessing operations to take full advantage of multi-core architecture.


  • GeoTrellis is designed to help a developer create simple, standard REST services that return the results of geoprocessing models.
  • GeoTrellis will automatically parallelize and optimize your geoprocessing models where possible.
  • In the spirit of the object-functional style of Scala, it is easy to both create new operations and compose new operations with existing operations.

2. GeoPySpark - in synthesis GeoTrellis for Python community

Geopyspark provides python bindings for working with geospatial data on PySpark (PySpark is the Python API for Spark). Spark is open source processing engine originally developed at UC Berkeley in 2009.  GeoPySpark makes Geotrellis ( accessible to the python community.  Scala is a difficult language so they have created this Python library. 

3. RasterFoundry (

They say: "We help you find, combine and analyze earth imagery at any scale, and share it on the web." And "Whether you’re working with data already accessible through our platform or uploading your own, we do the heavy lifting to make processing your imagery go quickly no matter the scale."

Key RasterFoundry workflow: 

  1. Browse public data
  2. Stitch together imagery
  3. Ingest your own data
  4. Build an analysis pipeline
  5. Edit and iterate quickly
  6. Integrate with their API

4. GeoNotebooks

From the Kitware blog: Kitware has partnered with The NASA Earth Exchange (NEX) to design GeoNotebook, a Jupyter Notebook extension created to solve these problems (i.e. big raster data stacks from imagery). Their shared vision: a flexible, reproducible analysis process that makes data easy to explore with statistical and analytics services, allowing users to focus more on the science by improving their ability to interactively assess data quality at scale at any stage of the processing.

Extending Jupyter Notebooks and Jupyter Hub, this python analysis environment provides the means to easily perform reproducible geospatial analysis tasks that can be saved at any state and easily shared. As the geospatial datasets come in, they are ingested into the system and converted into tiles for visualization, creating a dynamic map that can be managed from the web UI and can communicate back to a server to perform operations like data subsetting and visualization. 

Blog post: 

ESRI @ GIF Open GeoDev Hacker Lab

We had a great day today exploring ESRI open tools in the GIF. ESRI is interested in incorporating more open tools into the GIS workflow. According to, this means working with:

  1. Open Standards: OGC, etc. 
  2. Open Data formats: supporting open data standards, geojson, etc. 
  3. Open Systems: open APIs, etc. 

We had a full class of 30 participants, and two great ESRI instructors (leaders? evangelists?) John Garvois and Allan Laframboise, and we worked through a range of great online mapping (data, design, analysis, and 3D) examples in the morning, and focused on using ESRI Leaflet API in the afternoon. Here are some of the key resources out there.

Great Stuff! Thanks Allan and John

Croudsourced view of global agriculture: mapping farm size around the world

From Live Science. Two new maps released Jan. 16 considerably improve estimates of the amount of land farmed in the world — one map reveals the world's agricultural lands to a resolution of 1 kilometer, and the other provides the first look at the sizes of the fields being used for agriculture.

The researchers built the cropland database by combining information from several sources, such as satellite images, regional maps, video and geotagged photos, which were shared with them by groups around the world. Combining all that information would be an almost-impossible task for a handful of scientists to take on, so the team turned the project into a crowdsourced, online game. Volunteers logged into "Cropland Capture" on a computer or a phone and determined whether an image contained cropland or not. Participants were entered into weekly prize drawings.

Using Social Media to Discover Public Values, Interests, and Perceptions about Cattle Grazing on Park Lands

“Moment of Truth—and she was face to faces with this small herd…” Photo and comment by Flickr™ user, Doug GreenbergIn a recent open access journal article published in Envrionmental Management, colleague Sheila Barry explored the use of personal photography in social media to gain insight into public perceptions of livestock grazing in public spaces. In this innovative paper, Sheila examined views, interests, and concerns about cows and grazing on the photo-sharing website, FlickrTM. The data were developed from photos and associated comments posted on Flickr™ from February 2002 to October 2009 from San Francisco Bay Area parks, derived from searching photo titles, tags, and comments for location terms, such as park names, and subject terms, such as cow(s) and grazing. She found perceptions about cattle grazing that seldom show up at a public meeting or in surveys. Results suggest that social media analysis can help develop a more nuanced understanding of public viewpoints useful in making decisions and creating outreach and education programs for public grazing lands. This study demonstrates that using such media can be useful in gaining an understanding of public concerns about natural resource management. Very cool stuff!

Open Access Link:

Help to Validate Global Land Cover with GeoWiki and Cropland Capture

Courtesy of the International Institute for Applied Systems Analysis

This creative project from GeoWiki seeks to get croudsourced feedback on crop types from participants around the world. They say: 

By 2050 we will need to feed more than 2 billion additional people on the Earth. By playing Cropland Capture, you will help us to improve basic information about where cropland is located on the Earth's surface. Using this information, we will be better equipped at tackling problems of future food security and the effects of climate change on future food supply. Get involved and contribute to a good cause! Help us to identify cropland area!

Oh yeah, and there are prizes!

Each week (starting Nov. 15th) the top three players with the highest score at the end of each week will be added to our weekly winners list. After 25 weeks, three people will be drawn randomly from this list to become our overall winners. Prizes will include an Amazon Kindle, a brand new smartphone and a tablet.

NASA shares satellite and climate data on Amazon’s cloud


NASA has announced a partnership with Amazon Web Services that the agency hopes will spark wider collaboration on climate research. In an effort that is in some ways parallel to Google's Earth Engine, NASA has uploaded terabytes of data to Amazon's public cloud and made it available to the anyone. 

Three data sets are already up at Amazon. The first is climate change forecast data for the continental United States from NASA Earth Exchange (NEX) climate simulations, scaled down to make them usable outside of a supercomputing environment. The other two are satellite data sets—one from from the US Geological Survey's Landsat, and the other a collection of Moderate Resolution Imaging Spectroradiometer (MODIS) data from NASA's Terra and Aqua Earth remote sensing satellites.

More Here

troubling report of OSM vandalism

From Sarah. This is a troubling story from ReadWriteWeb reporting that someone at a range of Google IP addresses in India has been editing the collaboratively made map of the world in some very unhelpful ways, like moving and deleting information and reversing the direction of one-way streets on the map.

Update: Google sent the following statement to ReadWriteWeb on Tuesday morning. "The two people who made these changes were contractors acting on their own behalf while on the Google network. They are no longer working on Google projects."

A Google spokesperson told BoingBoing on Friday that the company was "mortified" by the discovery - but now it appears the same Google contractor may be behind mayhem rippling throughout one of the world's biggest maps. Google says it's investigating these latest allegations.

ESRI's ChangeMatters and New Landsat Image Services

Yesterday at the annual ASPRS conference in Milwaukee, WI (yes there were sausages shaped like the state), Jack Dangermond announced the release of ChangeMatters, and new Landsat Image Services from ESRI.

ChangeMatters. Working with partners, ESRI developed this web application - ChangeMatters - which allows users throughout the globe to quickly view the GLS Landsat imagery both multi-spectrally (in different Landsat band combinations) and multi-temporally (across epochs), and to conduct simple change detection analysis.

Image Services, with examples of vegetation, false color, land-water band combinations in seamless, color matched Landsat mosaics. Downloads will be available soon. Pretty nice. Website.

Example from ChangeMatters: Las Vegas from 1975 - 2000. Green is increase and red decrease in veg