Background: Researchers require specific tools, software, and infrastructure to meet the technical, legal, and ethical requirements for the analysis, storage, and sharing of their data, particularly data from human subjects research. While using tools designed to meet specific needs allows for specialized support, it also increases the possibility of information silos; researchers are unlikely to know about data stored across the institution that employs solutions for those particular use cases. To address this issue, the library at an academic medical center employs a data catalog to render visible data across the various repositories, services, and platforms available at the institution. This supports an information environment with resources that can be geared towards specific use cases, while preventing information silos.
Description: To leverage the library's data catalog to address data discoverability and access issues at the institution, the library (1) established collaborations with academic and operational departments across the institution, (2) developed in-depth knowledge of tools used at those departments, and (3) developed workflows to connect data analysis and storage tools with the data catalog. Due to the specialized nature of many of these data analysis and storage tools, bespoke workflows are developed to integrate each of these tools with the library’s data catalog.
Collaborations with Information Technology (IT), the Shared Scientific Cores, the Office of Science and Research, and a clinical data management support service were established through outreach by data librarians and library leadership. For each collaboration, librarians attended relevant meetings to determine which data analysis and storage tools were suitable candidates for integration with the library’s data catalog.
Once a tool was determined to be a suitable tool for integration, librarians create workflows that supported the tools technical specifications and interfaced with the pre-existing workflows for describing data in the library’s data catalog. Each workflow focused on easing researcher tasks like sharing and finding data while bringing together disparate pieces of infrastructure, technology, expertise, and staff.
Conclusion: To determine the success of each integration, data on the number of datasets described, the number of datasets shared, and the number of successful research outputs related to a shared dataset will be tracked. Additionally, the library will compare the rate of growth of the data catalog when this systems-level approach is employed to the rate of growth when datasets were added in a piecemeal fashion. The library will also track secondary benefits to the approach, including the creation of established collaborations across the institution.