Unlocking America's Scientific Potential through the Power of Data
Hello, I’m Ylli Bajraktari, CEO of the Special Competitive Studies Project. In this week’s edition of our newsletter, SCSP’s Karina Barao explores how the U.S. can unlock the full potential of its scientific research capabilities through better data sharing practices. For more details, read SCSP’s newly released white paper, Data's Role in Unlocking Scientific Potential.
🎤 Exciting Announcements
We are excited to share Dr. Liz Reynolds, Peter Haas, Dr. Steve Chien, and Dr. Rob Atkinson will speak at our upcoming #AIRoboticsSummit24 on October 23rd in Washington, DC. Sign up for our waitlist here!
Director of National Intelligence Avril Haines will join Ylli Bajraktari for a fireside chat at The Ash Carter Exchange in Boston, MA on November 1st. Join the waitlist here!
SCSP x AGI House Hackathon: SCSP is partnering with the Bay Area AI hacker house, AGI House, to host an AI Agents for Gov Hackathon @SCSP’s office in DC! Come create the future of AI agents to solve important real-world challenges. Request attendance here.
Unlocking America's Scientific Potential through the Power of Data
In December 2022, SCSP released a National Data Action Plan. That strategy lays out how the United States can leverage its vast data assets as a competitive advantage to drive economic growth and societal benefits. The plan underscores that the United States possesses a wealth of data resources and yet the challenge lies in effectively maximizing these assets. It upholds that the United States, as a regulator, holder of valuable data, and convener of public-private partnerships (PPPs) must take a coordinated and strategic approach towards its data advantage.
Since that report, the data ecosystem has evolved rapidly. Data centers are being built out at lightning speed, and new multimodal AI models are ingesting more data than ever before. There's even discussion about the potential for internet data to run out. At the same time, states continue to enact new data privacy laws, while cloud adoption accelerates, reshaping how data is stored and accessed. These shifts highlight the ongoing momentum needed to stay ahead in the rapidly changing data landscape.
Our nation's capacity for groundbreaking research is immense, built on a foundation of unparalleled data assets from public and private sectors. Despite having access to vast data assets, a critical barrier remains – our scientific research processes are siloed. Researchers work within their domains, testing hypotheses and sharing only a fraction of their experimental data in publications. This approach limits the breadth of insights that could be gained from a more open, collaborative environment for sharing data.
To accelerate scientific discoveries, and the latest tech advancements, the United States must foster “community scientific discovery” that brings together often fragmented scientific research processes. One necessary component of this community approach is sharing and jointly analyzing diverse datasets that are often siloed across entities. Increased access to relevant data will improve scientific hypothesis generation, accelerate the hypothesis-to-experimentation loop, and enable faster deployment of scientific achievements. Robust data sharing mechanisms enable researchers to leverage a global network of insights, paving the way for unprecedented advancements and unthinkable scientific progress.
What can the U.S. Government do?
In our newly released paper, Data's Role in Unlocking Scientific Potential, we outline two actionable steps the U.S. government can take immediately to address the data sharing challenges hindering scientific research.
1. Create Comprehensive Data Inventories Across Scientific Domains
We recommend the Secretary of Commerce, acting through the Department of Commerce's Chief Data Officer and the Director of the National Institute of Standards and Technology (NIST), and with the Federal Chief Data Officer Council (CDO Council) create a government-led inventory where organizations – universities, industries, and research institutes – can catalog their datasets with key details like purpose, description, and accreditation. Similar to platforms like data.gov, this centralized repository would make high-quality data more visible and accessible, promoting scientific collaboration. To boost participation, the government could offer incentives, such as grants or citation credits for researchers whose data is used. Contributing organizations would also be responsible for regularly updating their entries, ensuring the data stays relevant and searchable.
2. Create Scientific Data Sharing Public-Private Partnerships
A critical recommendation of the National Data Action Plan was for the United States to facilitate the creation of data sharing public-private partnerships for specific sectors. The U.S. Government should coordinate data sharing partnerships with its departments and agencies, industry, academia, and civil society. Data collected by one entity can be tremendously valuable to others. But incentivizing data sharing is challenging as privacy, security, legal (e.g., liability), and intellectual property (IP) concerns can limit willingness to share. However, narrowly-scoped PPPs can help overcome these barriers, allowing for greater data sharing and mutually beneficial data use.
Here's what it would look like:
To break the silos of scientific research, we recommend the above government entities establish dedicated repositories for scientific and experimental data across various disciplines. This initiative would be executed through public-private partnerships that would focus on priority scientific domains like materials science, biology, chemistry, and computer science, with each PPP assigned specific scientific challenges to tackle. Proper agreements and controls surrounding privacy, data security, and democratizing access for small and medium-sized enterprises (SMEs) and researchers will be integral.
The United States acting as a broker of the PPP can ensure that all data shared adheres to standards guaranteeing quality and interoperability. By aligning with data-sharing standards derived from relevant industry, academic, and international frameworks, these PPPs can enhance functional data sharing, integration, and collaboration among researchers. Existing PPP models demonstrate the potential for such partnerships to aggregate private and public sector data in a controlled and trusted manner. Successful data-sharing PPPs typically focus on addressing a discrete, urgent problem or opportunity, providing a clear rationale for participant engagement. They operate as independent entities, with guardrails on data access and use, ensuring privacy and security to maintain trust.
Addressing data-sharing challenges within PPPs requires infrastructure that facilitates collaboration while ensuring standardization, privacy, security, and IP protection. Strengthening this infrastructure is crucial to building trust among stakeholders and enabling a secure, efficient data-sharing ecosystem. We've determined six components necessary to build a successful data sharing PPP. They are:
Develop Clear Data Standards for Interoperability: PPP Stakeholders should establish standards that ensure the quality and accessibility of shared data. These standards should promote rapid and open data exchange, supporting the creation of a more collaborative research environment.
Increase Access to U.S. Government and National Laboratories’ Data: Leveraging the valuable datasets held by the U.S. government and national laboratories will enrich the research infrastructure, providing scientists with the resources needed for groundbreaking discoveries.
Incentivize the Use of Privacy-Enhancing Technologies: Integrating privacy-enhancing technologies (PETs) into data sharing partnerships will protect sensitive information while promoting trust among stakeholders.
Explore IP Rights for Data: Current IP laws do not adequately protect valuable datasets, which can discourage sharing within the scientific community. The United States should explore the utility of tailored IP or IP-type protections for data through a comprehensive study to safeguard these assets while fostering a culture of collaboration.
Utilize Synthetic Data as a Vital Research Tool: Synthetic data can serve as a proxy when real-world data is incomplete or difficult to obtain. This approach can mitigate privacy concerns and provide additional insights, through generating simulated experimental data particularly in fields where data scarcity is a challenge. Leveraging synthetic data, alongside real world datasets, can be a cutting edge asset in scientific research.
PPPs around data can accelerate and unlock new discoveries in science, further expanding our innovation potential – a critical factor in maintaining U.S. competitiveness on the global stage. Integrating U.S. Government led data PPPs into the scientific community will not only advance research in fields like biology, chemistry, and materials science, but also fuel the innovation required to tackle the nation's most pressing scientific challenges in the years ahead. SCSP is committed to addressing these future challenges and is exploring ways to potentially implement these recommendations.