Conference wrap up: DataEdge 2013

The 2nd DataEdge Conference, organized by UC Berkeley’s I School, has wrapped, and it was a doozy. The GIF was a sponsor, and Kevin Koy from the Geospatial Innovation Facility gave a workshop Understanding the Natural World Through Spatial Data. Here are some of my highlights from what was a solid and fascinating 1.5 days. (All presentations are now available online.)

Michael Manoochehri, from Google, gave the workshop Data Just Right: A Practical Introduction to Data Science Skills. This was a terrific and useful interactive talk discussing/asking: who/what is a data scientist? One early definition he offered was a person with 3 groups of skills: statistics, coding or an engineering approach to solving a problem, and communication. He further refined this definition with a list of practical skills for the modern data scientist:

  • Short-term skills: Have a working knowledge of R; be proficient in python and JavaScript, for analysis and web interaction; understand SQL; know your way around a unix shell; be familiar with distributed data platforms like Hadoop; understand the Data Pipeline: collection, processing, analysis, visualization, communication.
  • Long-term skills: Statistics: understand what k-means clustering is, multiple regression, Baysien inference; and Visualization: both the technical and communication aspects of good viz.
  • Finally: Dive into a real data set; and focus on real use cases.

Many other great points were brought up in the discussion: the data storage conundrum in science was one. We are required to make our public data available: where will we store datasets, how will we share them and pay for access of public scientific data in the future?

Kate Crawford, Principal Researcher, Microsoft Research New England gave the keynote address entitled The Raw and the Cooked: The Mythologies of Big Data. She wove together an extremely thoughtful and informative talk about some of our misconceptions about Big Data: the “myths” of her title. She framed the talk by introducing Claude Levi-Strauss’ influential anthropological work “The Raw and the Cooked” - a study of Amerindian mythology that presents myths as a type of speech through which a language and culture could be discovered and learned. You know you are in for a provocative talk in a Big Data conference when the keynote leads with CLS. She then presented a series of 6 myths about Big Data, illustrated simply with a few slides each. Here is a quick summary of the myths:

  1. Big Data is new: the term was first used in 1997, but the “pre-history” of Big Data originates much earlier, in 1950s climate science for example, or even earlier. What we have is new tools driving new foci.
  2. Big Data is objective: she used the example of post-Sandy tweets, and makes the point that while widespread, these data are a subset of a subset. Muki Haklay makes the same point with his cautionary: “you are mining the outliers” comment (see previous post). She also pointed out that 2013 marks the point in the history of the internet when 51% of web traffic is non-human. Who are you listening to?
  3. Big Data won't discriminate: does BD avoid group level prejudice? We all know this, people not only have different access to the internet, but given that your user experience has been framed by your previous use and interaction with the web, the rich and the poor see different internets.
  4. Big Data makes cities smart: there are numerous terrific examples of smart cities (even many in the recent news) but resource allocation is not even. When smart phones are used for example to map potholes needing repair, repairs are concentrated in areas where cell phone use is higher: the device becomes a proxy for the need.
  5. Big Data is anonymous: Big Data has a Big Privacy problem. We all know this, especially in the health fields. I learned the new term “Health Surrogate Data” which is information about your health that results from your interaction with the Internet. Great stuff for Google Flu Tracker for example, but still worrying. The standard law for protection in the public health field, HIPAA, is similar to “bringing a knife to a gunfight” as she quoted Nicholas Terry.
  6. You can opt out: there are currently no clear ways to opt out. She asks: how much would you pay for privacy? And if the technological means to do so were created and made widespread, we would likely see the development of privacy as a luxury good, further differentiating internet experience based on income.

The panel discussion Digital Afterlife: What Happens to Your Data When You Die? moderated by Jess Hemerly from Google, and including Jed Brubaker from UC Irvine and Stephen Wu, a technology and intellectual property attorney was eye-opening and engaging. Each speaker gave a presentation from their expertise: Stephen Wu gave us a primer on digital identity estate planning and Jed Brubaker shared his research on the spaces left in social media when someone dies. Both talks were utterly fascinating, thought provoking and unique.

And finally, Jeffrey Heer from Stanford University gave a stunning and fun talk entitled Visualization and Interactive Data Analysis showcased his Viz work, and introduced to many of us Data Wrangler, which is awesome.

Great conference!

Interested in the Intergovernmental Panel on Climate Change? New Workshop at Berkeley

Interested in the Intergovernmental Panel on Climate Change?  Come hear Berkeley-based authors of the IPCC's upcoming Fifth Assessment Report discuss their contributions and take your questions!

Date/Time: Wed, Nov 28th 3:30-4:30pm (followed by coffee and cookies until 5pm)
Location: LeConte Bldg, Lecture Room 3.

Panelists Include:

  • Max Auffhammer, Department of Agricultural and Resource Economics
  • Daithi Stone, Lawrence Berkeley National Laboratory
  • Michael Wehner, Lawrence Berkeley National Laboratory
  • Bill Collins, Lawrence Berkeley National Laboratory
  • Jim McMahon, Lawrence Berkeley National Laboratory Kirk Smith, School of Public Health

Please see the attached flyer for more details!

 

GIS Day 2012! November 14th, Mulford Hall

Please join us for GIS Day 2012, November 14, 5:00 pm to 8:15 pm.
UC Berkeley, Mulford Hall
http://gif.berkeley.edu/gisday.html

A list of speakers and topics are available on the event site.

GIS Day is free, but we encourage you to register, so that we know how many people to expect.  We still have room for posters, if you’d like to display a poster (project, map, imagery) just sign up online.

This year's event is co-hosted by the Bay Area Automated Mapping Association (BAAMA) and Geospatial Innovation Facility (GIF), with support from the Northern California Region of the American Society for Photogrammetry and Remote Sensing (ASPRS).

Welcome to Columbus, OH! GIScience in the heartland

Welcome to Ohio! When I arrived, Columbus was cloudy and warm, with the city in a buzz from a visit from President Obama.

GIScience 2012 was an amazing conference: small (~300 people) and focused, with a terrific program: 2 keynotes each morning, sessions through the day and a panel session of 6 speakers in the evenings. I went to sessions on spatial uncertainty, the geoweb (where Renee Sieber gave a terrific talk on the challenges of participation in webGIS (I learned a ton!)), and big data among others, and Thomas Blaschke and I organized a workshop on obia. The keynotes were especially satisfying: big picture, often provocative talks from gifted speakers. Helen Couclelis talked about her vision of GIScience as a meta science: an "information oriented, context sensitive, spatially referenced, method of representing the real world". I loved the discussion of intentionality and context in her talk, and overall it gave me so much to think about. Noel Cressie showed his group's work modeling uncertainty in a North American regional climate change model: summer is going to be hotter in the North American south, and winter is going to be warmer in the Canadian north, no matter how you slice it. Jack Dongarra gave a riveting talk on the future of supercomputing: he walked us through the building of a supercomputer from an individual core, and made clear the power, software and hardware requirements of these machines. Doug Richardson presented his high level perspective on GIS and health; he and the AAG have been working hard to make geoinformatics more evident in public health research through workshop, grants and tireless lobbying. Also a great treat was my visit with Desheng Liu, former lab member, who is now Associate Professor of Geography and Statistics at Ohio State University. We spent some time walking around the lovely campus and catching up. I also got to visit, very briefly, the Thurber House, home of one of my favs James Thurber, who went to OSU and lived in Columbus. Great stuff! As for our workshop, here are the key items the participants were interested in (in order of popularity): terminology, the future of geobia, integration with GIS, semantics, accuracy, change, standards, learning from the past.

ASPRS 2012 Wrap-up

ASPRS 2012, held in Sacramento California, had about 1,100 participants. I am back to being bullish about our organization, as I now recognize that ASPRS is the only place in geospatial sciences where members of government, industry, and academia can meet, discuss, and network in a meaningful way. I saw a number of great talks, met with some energetic and informative industry reps, and got to catch up with old friends. Some highlights: Wednesday's Keynote speaker was David Thau from Google Earth Engine whose talk "Terapixels for Everyone" was designed to showcase the ways in which the public's awareness of imagery, and their ability to interact with geospatial data, are increasing. He calls this phenomena (and GEE plays a big role here): "geo-literacy for all", and discussed new technologies for data/imagery acquisition, processing, and dissemination to a broad public(s) that can include policy makers, land managers, and scientists. USGS's Ken Hudnut was Thursday's Keynote, and he had a sobering message about California earthquakes, and the need (and use) of geospatial intelligence in disaster preparedness.

Berkeley was well represented: Kevin and Brian from the GIF gave a great workshop on open source web, Kevin presented new developments in cal-adapt, Lisa and Iryna presented chapters from their respective dissertations, both relating to wetlands, and our SNAMP lidar session with Sam, Marek, and Feng (with Wenkai and Jacob from UCMerced) was just great!

So, what is in the future for remote sensing/geospatial analysis as told at ASPRS 2012? Here are some highlights:

  • Cloud computing, massive datasets, data/imagery fusion are everywhere, but principles in basic photogrammetry should still comes into play;
  • We saw neat examples of scientific visualization, including smooth rendering across scales, fast transformations, and immersive web;
  • Evolving, scaleable algorithms for regional or global classification and/or change detection; for real-time results rendering with interactive (on-the-fly) algorithm parameter adjustment; and often involving open source, machine learning;
  • Geospatial data and analysis are heavily, but inconsistently, deployed throughout the US for disaster response;
  • Landsat 8 goes up in January (party anyone?) and USGS/NASA are looking for other novel parterships to extend the Landsat lifespan beyond that;
  • Lidar is still big: with new deployable and cheaper sensors like FLASH lidar on the one hand, and increasing point density on the other;
  • Obia, obia, obia! We organized a nice series of obia talks, and saw some great presentations on accuracy, lidar+optical fusion, object movements; but thorny issues about segmentation accuracy and object ontology remain; 
  • Public interaction with imagery and data are critical. The Public can be a broader scientific community, or a an informed and engaged community who can presumably use these types of data to support public policy engagement, disaster preparedness and response.

AAG 2012 Wrap-up

NY skyline from Tim DeChant's blogAAG was a moderately large conference (just under 9,000) this year, held in mid-town NY. It was a brief trip for me, but I did go to some great talks across RS, GIScience, cartography, and VGI. I also went to a very productive OpenGeoSuite workshop hosted by OpenGeo. Some brief highights from the conference: Muki Hacklay discussed participation inequities in VGI: when you mine geoweb data, you are mining outliers, not society; there are biases in gender, education, age and enthusiasm. Agent-based modeling is still hot, and still improving. I saw some great talks in ABM for understanding land use change. Peter Deadman showed how new markets in a hot crop (like Acai) can transform a region quite quickly. Landsat 8 will likely be launched in early 2013, but further missions are less certain. My talk was in a historical ecology session, and Qinghua Guo and I highlighted some of the new modeled results of historic oak diversity in California using VTM data and Maxent.

Saturday evening I had the great pleasure of being locked in after hours at the NY Public Library for a session on historic maps. David Rumsey, with Humphrey Southall (University of Portsmouth) and Petr Pridal (Moravian Library) led a presentation introducing a new website: oldmapsonline.org. The website's goal is to provide a clearer way to find old maps, and provide them with a stable digital reference.