Reproducible Analysis of Scientific Assets

Project Description

The DERIVA platform is an asset management system that is specifically designed to support the scientific lifecycle from gathering data with experiments through to publication. DERIVA is composed of several components for storing objects, managing structured, relational data, and exploring data through a web interface. DERIVA is used extensively to support biomedical research and underlies the data management systems employed by FaceBase, GUDMAP, and Rebuilding A Kidney consortiums. 

In this project the student will explore integration between DERIVA and Whole Tale to enable reproducible analysis of scientific assets (objects) and metadata describing these assets. The student will first develop a Tale for analyzing data stored in DERIVA using deriva-py (DERIVA’s Python SDK). They are free to choose which of the many DERIVA catalogs are used for the analysis. They will then develop code for the Whole Tale platform for importing data from DERIVA. DERIVA relies on Big Data Bags (BDBags) for wrapping up and exporting multi-asset datasets (assets and metadata). The student will develop a Javascript Bookmarklet to enable users to choose datasets in DERIVA for analysis. Depending on the results, the student may carry out revisions to the Whole Tale tale specification. 

Necessary Prerequisites:

Desirable Skills / Qualifications:

Expected Outcomes:

Primary Mentors: Victoria Stodden, Kyle Chard, Carl Kesselman