07 Jun Data or it didn’t happen: Launching an institutional data repository at the U
By Rebekah Cummings
Reproducibility. Transparency. Sharing. These words represent the gold standard in academic research, yet none of them are as ubiquitous as one might imagine or hope. At the University of Utah, a team of librarians and technologists from the Marriott, Eccles Health Sciences, and Faust Law libraries – as well as staff from the Center for High Performance Computing – are hoping to bring research at the U one step closer to this standard by creating an institutional data repository, The Hive, that is free and accessible to all researchers on campus.
The Hive has been years in the making, dating back to the 2011 National Science Foundation mandate that all researchers who receive NSF funding must have a plan for managing and sharing their research data. Since NSF’s policies on data management and sharing were implemented, over 30 federal funding agencies and several private foundations have adopted similar standards. Several journals, such as PLOS and Nature, also require data sharing as a condition of publication. While many researchers support this move towards transparency and openness, the unfunded mandates to share data have also caused a non-trivial amount of consternation among the research community. How and where should data be shared? Which data? And, perhaps most importantly, who pays?
For the past six years, librarians at the U have worked closely with researchers to help them create competitive data management plans for funding agencies and locate appropriate data repositories through the use of data repository registries like re3data.org. Most librarians and researchers believe that the best home for research data is in an appropriate subject-based repository like ICPSR for social science data and GenBank for genetic sequence data, NIH’s genetic sequence database. For many researchers, however, there is no obvious home for their data or funding to support long-term data curation costs. In fact, in a 2016 campus-wide survey of researchers at the U, only 20% were familiar with a subject based repository in their field.
In response to this ongoing challenge, the University of Utah Libraries have created The Hive, an institutional data repository where researchers can deposit data at the end of their project, work with librarians to create documentation for others to understand their data, and receive a Digital Object Identifier (DOI) to link their data to their publications. In addition to fulfilling grant requirements, publishing data enhances the usefulness of research findings, allows others to verify or build on existing research, and even provides training datasets for students. Each dataset in The Hive will have a suggested data citation so researchers get credit for their data and can treat their data as a legitimate output of research. A growing body of research even shows that articles that are published alongside open data have higher citation counts than those that don’t (Dorch, 2012; Piwowar, Day, & Fridsma, 2007; Piwowar & Vision, 2013).
This summer the University Libraries are launching a pilot of The Hive by soliciting a select number of datasets to test the system and standardize the ingestion process. Pilot datasets must be less than 500GB, complete, publicly available, and authored by a member of the University of Utah community. Confidential and sensitive data will not be accepted into The Hive. If you are interested in learning more, please contact Rebekah Cummings or Daureen Nesdill!