On September 11, 2019, I attended a one-day workshop in Washington, DC titled “Implementing FAIR Data for People and Machines: Impacts and Implications.” The attendees were data and information professionals, IT staff, and disciplinary scientists working towards data sharing that is “FAIR” – findable, accessible, interoperable, and reusable. The amount of information shared at this workshop was vast and varied, but a few themes reverberated throughout all the talks and panels.
Machines are the Future
The amount of data that exists in the world today is more than any human can possibly process and collect. The goal of making data FAIR is that we can use machines – computers with algorithms and the capability to “learn” – to do this work to help further scientific research. This means that the data researchers are collecting today must be able to be read by machines. How do we achieve this? By standardizing metadata across disciplines, using centralized repositories for data, and making data available for longer periods of time.
Infrastructure is Worth the Investment
In order to make data FAIR, it needs to be stored in some type of infrastructure, whether that is a local server or a cloud storage environment. Institutions need to anticipate that storage costs significant budget dollars, and those writing grant proposals need to consider the cost of data storage when requesting funds. Data stewardship plans should specify where and how long data should be stored, and that should drive budget conversations. Creating relationships with vendors can significantly reduce costs, and cloud providers will offer reduced rates to institutions and researchers that are using cloud environments for research purposes.
Is FAIR Achievable for All Types of Data?
Can all research data be digitized? Many researchers work with physical specimens that aren’t digital in nature, which makes it difficult to make the data FAIR. The sheer volume of objects that exist that are not in digital form is staggering, and while there are groups working to handle this task, it is expensive and daunting. This will continue to be a reach goal for certain scientific disciplines.
Data Management Practices
I had several side discussions with attendees about how to manage data and data sets. Common questions included how to secure data sets, methods of access for research groups, support for researchers, and sharing data outside of institutions. While Northwestern University tackles many of these questions with basic services for the research community, many individuals at smaller institutions felt that they didn’t have enough support for their data management plans. In addition, data management plans often cannot be edited during the grant period, so often researchers felt “stuck” with their original plans, even if it wasn’t the best option for their data. I encouraged researchers to think more about security, to reach out to local IT staff at their institution, and to consider consulting with the library or collaborating with other researchers to share scarce resources.
I walked away from this workshop feeling excited about the challenges ahead in making data FAIR, and looking forward to more opportunities to work with the scientific research community in managing and sharing their data with the world.
For more information on the FAIR initiative, visit the following website: Go FAIR