NASA Data and the Distributed Active Archive Centers

I’ve been away from the blog for awhile, but thought I’d catch up a bit. I am in beautiful Madison Wisconsin (Lake Mendota! 90 degrees! Rain! Fried cheese curds!) for the NASA LP DAAC User Working Group meeting. This is a cool deal where imagery and product users meet with NASA team leaders to review products and tools. Since this UWG process is new to me, I am highlighting some of the key fun things I learned. 

What is a DAAC?
A DAAC is a Distributed Active Archive Center, run by NASA Earth Observing System Data and Information System (EOSDIS). These are discipline-specific facilities located throughout the United States. These institutions are custodians of EOS mission data and ensure that data will be easily accessible to users. Each of the 12 EOSDIS DAACs process, archive, document, and distribute data from NASA's past and current Earth-observing satellites and field measurement programs. For example, if you want to know about snow and ice data, visit the National Snow and Ice Data Center (NSIDC) DAAC. Want to know about social and population data? Visit the Socioeconomic Data and Applications Data Center (SEDAC). These centers of excellence are our taxpayer money at work collecting, storing, and sharing earth systems data that are critical to science, sustainability, economy, and well-being.

What is the LP DAAC?
The Land Processes Distributed Active Archive Center (LP DAAC) is one of several discipline-specific data centers within the NASA Earth Observing System Data and Information System (EOSDIS). The LP DAAC is located at the USGS Earth Resources Observation and Science (EROS) Center in Sioux Falls, South Dakota. LP DAAC promotes interdisciplinary study and understanding of terrestrial phenomena by providing data for mapping, modeling, and monitoring land-surface patterns and processes. To meet this mission, the LP DAAC ingests, processes, distributes, documents, and archives data from land-related sensors and provides the science support, user assistance, and outreach required to foster the understanding and use of these data within the land remote sensing community.

Why am I here?
Each NASA DAAC has established a User Working Group (UWG). There are 18 people on the LP DAAC committee, 12 members from the land remote sensing community at large, like me! Some cool stuff going on. Such as...

New Sensors
Two upcoming launches are super interesting and important to what we are working on. First, GEDI (Global Ecosystem Dynamics Investigation) will produce the first high resolution laser ranging observations of the 3D structure of the Earth. Second, ECOSTRESS (The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station), will measure the temperature of plants: stressed plants get warmer than plants with sufficient water. ECOSTRESS will use a multispectral thermal infrared radiometer to measure surface temperature. The radiometer will acquire the most detailed temperature images of the surface ever acquired from space and will be able to measure the temperature of an individual farmer's field. Both of these sensors will be deployed on the International Space Station, so data will be in swaths, not continuous global coverage. Also, we got an update from USGS on the USGS/NASA plan for the development and deployment of Landsat 10. Landsat 9 comes 2020, Landsat 10 comes ~2027.

Other Data Projects
We heard from other data providers, and of course we heard from NEON! Remember I posted a series of blogs about the excellent NEON open remote sensing workshop I attended last year. NEON also hosts a ton of important ecological data, and has been thinking through the issues associated with cloud hosting. Tristin Goulden was here to give an overview.

Tools Cafe
NASA staff gave us a series of demos on their WebGIS services; AppEEARS; and their data website. Their webGIS site uses ArcGIS Enterprise, and serves web image services, web coverage services and web mapping services from the LP DAAC collection. This might provide some key help for us in IGIS and our REC ArcGIS online toolkits. AppEEARS us their way of providing bundles of LP DAAC data to scientists. It is a data extraction and exploration tool. Their LP DAAC data website redesign (website coming soon), which was necessitated in part by the requirement for a permanent DOI for each data product.

User Engagement
LP DAAC is going full-force in user engagement: they do workshops, collect user testimonials, write great short pieces on “data in action”, work with the press, and generally get the story out about how NASA LP DAAC data is used to do good work. This is a pretty great legacy and they are committed to keep developing it. Lindsey Harriman highlighted their excellent work here.

Grand Challenges for remote sensing
Some thoughts about our Grand Challenges: 1) Scaling: From drones to satellites. It occurs to me that an integration between the ground-to-airborne data that NEON provides and the satellite data that NASA provides had better happen soon; 2) Data Fusion/Data Assimilation/Data Synthesis, whatever you want to call it. Discovery through datasets meeting for the first time; 3) Training: new users and consumers of geospatial data and remote sensing will need to be trained; 4) Remote Sensible: Making remote sensing data work for society. 

A primer on cloud computing
We spent some time on cloud computing. It has been said that cloud computing is just putting your stuff on “someone else’s computer”, but it is also making your stuff “someone else’s problem”, because cloud handles all the painful aspects of serving data: power requirements, buying servers, speccing floor space for your servers, etc. Plus, there are many advantages of cloud computing. Including: Elasticity. Elastic in computing and storage: you can scale up, or scale down or scale sideways. Elastic in terms of money: You pay for only what you use. Speed. Commercial clouds CPUs are faster than ours, and you can use as many as you want. Near real time processing, massive processing, compute intensive analysis, deep learning. Size. You can customize this; you can be fast and expensive or slow and cheap. You use as much as you need. Short-term storage of large interim results or long-term storage of data that you might use one day.

 Image courtesy of Chris Lynnes

Image courtesy of Chris Lynnes

We can use the cloud as infrastructure, for sharing data and results, and as software (e.g. ArcGIS Online, Google Earth Engine). Above is a cool graphic showing one vision of the cloud as a scaled and optimized workflow that takes advantage of the cloud: from pre-processing, to analytics-optimized data store, to analysis, to visualization. Why this is a better vision: some massive processing engines, such as SPARC or others, require that data be organized in a particular way (e.g. Google Big Table, Parquet, or DataCube). This means we can really crank on processing, especially with giant raster stacks. And at each step in the workflow, end-users (be they machines or people) can interact with the data. Those are the green boxes in the figure above. Super fun discussion, leading to importance of training, and how to do this best. Tristan also mentioned Cyverse, a new NSF project, which they are testing out for their workshops.

 Image attribution: Corey Coyle

Image attribution: Corey Coyle

Super fun couple of days. Plus: Wisconsin is green. And warm. And Lake Mendota is lovely. We were hosted at the University of Wisconsin by Mutlu Ozdogan. The campus is gorgeous! On the banks of Lake Mendota (image attribution: Corey Coyle), the 933-acre (378 ha) main campus is verdant and hilly, with tons of gorgeous 19th-century stone buildings, as well as modern ones. UW was founded when Wisconsin achieved statehood in 1848, UW–Madison is the flagship campus of the UW System. It was the first public university established in Wisconsin and remains the oldest and largest public university in the state. It became a land-grant institution in 1866. UW hosts nearly 45K undergrad and graduate students. It is big! It has a med school, a business school, and a law school on campus. We were hosted in the UW red-brick Romanesque-style Science Building (opened in 1887). Not only is it the host building for the geography department, it also has the distinction of being the first building in the country to be constructed of all masonry and metal materials (wood was used only in window and door frames and for some floors), and may be the only one still extant. How about that! Bye Wisconsin!

Day 2 Wrap Up from the NEON Data Institute 2017

First of all, Pearl Street Mall is just as lovely as I remember, but OMG it is so crowded, with so many new stores and chains. Still, good food, good views, hot weather, lovely walk.

Welcome to Day 2! http://neondataskills.org/data-institute-17/day2/
Our morning session focused on reproducibility and workflows with the great Naupaka Zimmerman. Remember the characteristics of reproducibility - organization, automation, documentation, and dissemination. We focused on organization, and spent an enjoyable hour sorting through an example messy directory of misc data files and code. The directory looked a bit like many of my directories. Lesson learned. We then moved to working with new data and git to reinforce yesterday's lessons. Git was super confusing to me 2 weeks ago, but now I think I love it. We also went back and forth between Jupyter and python stand alone scripts, and abstracted variables, and lo and behold I got my script to run. All the git stuff is from http://swcarpentry.github.io/git-novice/

The afternoon focused on Lidar (yay!) and prior to coding we talked about discrete and waveform data and collection, and the opentopography (http://www.opentopography.org/) project with Benjamin Gross. The opentopography talk was really interesting. They are not just a data distributor any more, they also provide a HPC framework (mostly TauDEM for now) on their servers at SDSC (http://www.sdsc.edu/). They are going to roll out a user-initiated HPC functionality soon, so stay tuned for their new "pluggable assets" program. This is well worth checking into. We also spent some time live coding with Python with Bridget Hass working with a CHM from the SERC site in California, and had a nerve-wracking code challenge to wrap up the day.

Fun additional take-home messages/resources:

Thanks to everyone today! Megan Jones (our fearless leader), Naupaka Zimmerman (Reproducibility), Tristan Goulden (Discrete Lidar), Keith Krause (Waveform Lidar), Benjamin Gross (OpenTopography), Bridget Hass (coding lidar products).

Day 1 Wrap Up
Day 2 Wrap Up 
Day 3 Wrap Up
Day 4 Wrap Up

 Our home for the week

Our home for the week

Cloud-based raster processors out there

Hi all,

Just trying to get my head around some of the new big raster processors out there, in addition of course to Google Earth Engine. Bear with me while I sort through these. Thanks for raster sleuth Stefania Di Tomasso for helping. 

1. Geotrellis (https://geotrellis.io/)

Geotrellis is a Scala-based raster processing engine, and it is one of the first geospatial libraries on Spark.  Geotrellis is able to process big datasets. Users can interact with geospatial data and see results in real time in an interactive web application (for regional, statewide dataset).  For larger raster datasets (eg. US NED). GeoTrellis performs fast batch processing using Akka clustering to distribute data across the cluster.  GeoTrellis was designed to solve three core problems, with a focus on raster processing:

  • Creating scalable, high performance geoprocessing web services;
  • Creating distributed geoprocessing services that can act on large data sets; and
  • Parallelizing geoprocessing operations to take full advantage of multi-core architecture.

Features:

  • GeoTrellis is designed to help a developer create simple, standard REST services that return the results of geoprocessing models.
  • GeoTrellis will automatically parallelize and optimize your geoprocessing models where possible.
  • In the spirit of the object-functional style of Scala, it is easy to both create new operations and compose new operations with existing operations.

2. GeoPySpark - in synthesis GeoTrellis for Python community

Geopyspark provides python bindings for working with geospatial data on PySpark (PySpark is the Python API for Spark). Spark is open source processing engine originally developed at UC Berkeley in 2009.  GeoPySpark makes Geotrellis (https://geotrellis.io/) accessible to the python community.  Scala is a difficult language so they have created this Python library. 

3. RasterFoundry (https://www.rasterfoundry.com/)

They say: "We help you find, combine and analyze earth imagery at any scale, and share it on the web." And "Whether you’re working with data already accessible through our platform or uploading your own, we do the heavy lifting to make processing your imagery go quickly no matter the scale."

Key RasterFoundry workflow: 

  1. Browse public data
  2. Stitch together imagery
  3. Ingest your own data
  4. Build an analysis pipeline
  5. Edit and iterate quickly
  6. Integrate with their API

4. GeoNotebooks

From the Kitware blog: Kitware has partnered with The NASA Earth Exchange (NEX) to design GeoNotebook, a Jupyter Notebook extension created to solve these problems (i.e. big raster data stacks from imagery). Their shared vision: a flexible, reproducible analysis process that makes data easy to explore with statistical and analytics services, allowing users to focus more on the science by improving their ability to interactively assess data quality at scale at any stage of the processing.

Extending Jupyter Notebooks and Jupyter Hub, this python analysis environment provides the means to easily perform reproducible geospatial analysis tasks that can be saved at any state and easily shared. As the geospatial datasets come in, they are ingested into the system and converted into tiles for visualization, creating a dynamic map that can be managed from the web UI and can communicate back to a server to perform operations like data subsetting and visualization. 

Blog post: https://blog.kitware.com/geonotebook-data-driven-quality-assurance-for-geospatial-data/ 

ESRI @ GIF Open GeoDev Hacker Lab

We had a great day today exploring ESRI open tools in the GIF. ESRI is interested in incorporating more open tools into the GIS workflow. According to www.esri.com/software/open, this means working with:

  1. Open Standards: OGC, etc. 
  2. Open Data formats: supporting open data standards, geojson, etc. 
  3. Open Systems: open APIs, etc. 

We had a full class of 30 participants, and two great ESRI instructors (leaders? evangelists?) John Garvois and Allan Laframboise, and we worked through a range of great online mapping (data, design, analysis, and 3D) examples in the morning, and focused on using ESRI Leaflet API in the afternoon. Here are some of the key resources out there.

Great Stuff! Thanks Allan and John

Croudsourced view of global agriculture: mapping farm size around the world

From Live Science. Two new maps released Jan. 16 considerably improve estimates of the amount of land farmed in the world — one map reveals the world's agricultural lands to a resolution of 1 kilometer, and the other provides the first look at the sizes of the fields being used for agriculture.

The researchers built the cropland database by combining information from several sources, such as satellite images, regional maps, video and geotagged photos, which were shared with them by groups around the world. Combining all that information would be an almost-impossible task for a handful of scientists to take on, so the team turned the project into a crowdsourced, online game. Volunteers logged into "Cropland Capture" on a computer or a phone and determined whether an image contained cropland or not. Participants were entered into weekly prize drawings.

Using Social Media to Discover Public Values, Interests, and Perceptions about Cattle Grazing on Park Lands

“Moment of Truth—and she was face to faces with this small herd…” Photo and comment by Flickr™ user, Doug GreenbergIn a recent open access journal article published in Envrionmental Management, colleague Sheila Barry explored the use of personal photography in social media to gain insight into public perceptions of livestock grazing in public spaces. In this innovative paper, Sheila examined views, interests, and concerns about cows and grazing on the photo-sharing website, FlickrTM. The data were developed from photos and associated comments posted on Flickr™ from February 2002 to October 2009 from San Francisco Bay Area parks, derived from searching photo titles, tags, and comments for location terms, such as park names, and subject terms, such as cow(s) and grazing. She found perceptions about cattle grazing that seldom show up at a public meeting or in surveys. Results suggest that social media analysis can help develop a more nuanced understanding of public viewpoints useful in making decisions and creating outreach and education programs for public grazing lands. This study demonstrates that using such media can be useful in gaining an understanding of public concerns about natural resource management. Very cool stuff!

Open Access Link: http://link.springer.com/article/10.1007/s00267-013-0216-4/fulltext.html?wt_mc=alerts:TOCjournals

Help to Validate Global Land Cover with GeoWiki and Cropland Capture

Courtesy of the International Institute for Applied Systems Analysis

This creative project from GeoWiki seeks to get croudsourced feedback on crop types from participants around the world. They say: 

By 2050 we will need to feed more than 2 billion additional people on the Earth. By playing Cropland Capture, you will help us to improve basic information about where cropland is located on the Earth's surface. Using this information, we will be better equipped at tackling problems of future food security and the effects of climate change on future food supply. Get involved and contribute to a good cause! Help us to identify cropland area!

Oh yeah, and there are prizes!

Each week (starting Nov. 15th) the top three players with the highest score at the end of each week will be added to our weekly winners list. After 25 weeks, three people will be drawn randomly from this list to become our overall winners. Prizes will include an Amazon Kindle, a brand new smartphone and a tablet.

NASA shares satellite and climate data on Amazon’s cloud

 

NASA has announced a partnership with Amazon Web Services that the agency hopes will spark wider collaboration on climate research. In an effort that is in some ways parallel to Google's Earth Engine, NASA has uploaded terabytes of data to Amazon's public cloud and made it available to the anyone. 

Three data sets are already up at Amazon. The first is climate change forecast data for the continental United States from NASA Earth Exchange (NEX) climate simulations, scaled down to make them usable outside of a supercomputing environment. The other two are satellite data sets—one from from the US Geological Survey's Landsat, and the other a collection of Moderate Resolution Imaging Spectroradiometer (MODIS) data from NASA's Terra and Aqua Earth remote sensing satellites.

More Here

troubling report of OSM vandalism

From Sarah. This is a troubling story from ReadWriteWeb reporting that someone at a range of Google IP addresses in India has been editing the collaboratively made map of the world in some very unhelpful ways, like moving and deleting information and reversing the direction of one-way streets on the map.

Update: Google sent the following statement to ReadWriteWeb on Tuesday morning. "The two people who made these changes were contractors acting on their own behalf while on the Google network. They are no longer working on Google projects."

A Google spokesperson told BoingBoing on Friday that the company was "mortified" by the discovery - but now it appears the same Google contractor may be behind mayhem rippling throughout one of the world's biggest maps. Google says it's investigating these latest allegations.

ESRI's ChangeMatters and New Landsat Image Services

Yesterday at the annual ASPRS conference in Milwaukee, WI (yes there were sausages shaped like the state), Jack Dangermond announced the release of ChangeMatters, and new Landsat Image Services from ESRI.

ChangeMatters. Working with partners, ESRI developed this web application - ChangeMatters - which allows users throughout the globe to quickly view the GLS Landsat imagery both multi-spectrally (in different Landsat band combinations) and multi-temporally (across epochs), and to conduct simple change detection analysis.

Image Services, with examples of vegetation, false color, land-water band combinations in seamless, color matched Landsat mosaics. Downloads will be available soon. Pretty nice. Website.

Example from ChangeMatters: Las Vegas from 1975 - 2000. Green is increase and red decrease in veg