Data science for the 21st century: building a new team of researchers

Berkeley is one out of eight new awards from the National Science Foundation's recently launched NSF Research Traineeship (NRT) program. These programs develop innovative approaches to graduate training used across these projects include industry internships, international experiences, citizen science engagement, interdisciplinary team projects, and training in communication with the media, policy makers, and general public.

Our program at UC Berkeley is called Data Science for the 21st centur: DS421.  Three Grand Challenges motivate our program:

  1. Data: data acquisition, assimilation, and analysis, and the resulting challenges and opportunities for the research community and society at large. The data revolution is a potentially disruptive advance that challenges the norms and traditions of scientific research. Data science is an opportunity, entailing a revolution in training and a reorientation of research priorities. Open science— open access to datasets, literature, scripted workflows and the like—is a fundamental transformation that integrates scientific publication with the underlying data, analysis, and reasoning, using metadata and machine-readable research products to facilitate a semantic web of knowledge. These practices will make our research reproducible and transparent, documenting the evidentiary basis for scientific conclusions and their implications for policy.
  2. System dynamics: coupled human-natural systems and their responses to rapid environmental change. Social-ecological systems display a complex array of ecological and social processes interconnected across broad spatial, temporal, and socio-political scales. Our current approach to understanding ecological and economic systems is dominated by partial equilibrium models that are poorly suited to the dynamics of rapidly changing systems. Important research avenues include: characterizing the dynamics and feedbacks among and within systems to better plan for cross-scale and nonlinear uncertainties; identifying the proximity of tipping points or other critical transitions; understanding how the spatial structure of interactions affects system dynamics; and detecting and attributing responses to environmental and climatic drivers. Real-time data analytics combined with long-term monitoring and forecasting are critical tools to address to these challenges.
  3. Action: evidence-based proposals in public policy, natural resource management, and environmental design to mitigate the impacts of rapid environmental change, and enhance societal resilience and sustainability. Effective decision-making depends on networks of diverse stakeholders, with rapid feedback between individuals and groups to evaluate the impact, efficiency, equity, and efficacy of policy and management actions. This third component is at the core of a practical data science ethic critical for translating science to societal benefit, and makes use of our partnerships with academic, private, governmental, and non-governmental organizations.

Cutting across these challenges, all students, and especially those engaged in interdisciplinary research,
need excellent communication skills and the ability to adjust content and style to reach their audiences. Welcome to the new cohort!