The Whole Tale is inviting Internship applications for 2017!
Scholarly publications are often disconnected from the underlying data and code that was used to produce the findings. There is no shortage of tools and cyberinfrastructure (CI) addressing specific aspects of this challenge, yet scientists find it difficult to utilize these different pieces and building blocks in a seamless way that spans the “whole story”, i.e., from conducting the computational science to the publication of a “living” or executable paper. These new types of publications include not only the science narrative, but also (references to) all the relevant data, code, and provenance information needed to reproduce and experience the computational and research processes described by the paper.
Whole Tale: Merging Science and Cyberinfrastructure Pathways is an NSF-funded project that will enable researchers to examine, transform and then seamlessly re-publish research data that was used in an article. As a result, these "living articles" enable new discovery by allowing researchers to construct representations and syntheses of data. Whole Tale is a collaboration led by the University of Illinois, Urbana-Champaign, in collaboration with partners at the University of Chicago, the University of Texas at Austin, the University of California, Santa Barbara, and the University of Notre Dame. We are pleased to announce the availability of summer research internships for undergraduates, graduate students and recent postgraduates!
Some allowance will be made for students who are unavailable during these dates due to their school calendar
Interns undertake a 9-week program of work centered around one of the projects listed below. Each intern will be paired with one or more mentors. Interns need not necessarily be at the same location or institution as any of their mentors. Interns and mentors are expected to have a virtual or face-to-face meeting at the beginning of the summer, maintain frequent communication throughout the program and interns are required to keep an online notebook.
The program is open to undergraduate students, graduate students, and postgraduates who have received their degree within the past five years. There are no restrictions on academic backgrounds or field of study (but see details below). Interns must be at least 18 years of age by the program start date, must be currently enrolled or employed at a U.S. university or other research institution and must currently reside in, and be eligible to work in, the United States. Interns are expected to be available approximately 40 hours/week during the internship period with significant availability during the normal business hours.
Interns will receive a stipend of $5,000 for participation, paid in two installments (one at the midterm and one at the conclusion of the program). In addition, if travel to a project meeting is necessary, required travel expenses will be borne by the Whole Tale project. Participation in the program after the mid-term is contingent on satisfactory performance. The University of Illinois, Urbana-Champaign will administer funds. Interns will need to supply their own computing equipment and internet connection. For students who are not US citizens or permanent residents, complete visa information will be required, and it may be necessary for the funds to be paid through the student’s university or research institution. In such cases, the student will need to provide the necessary contact information for their organization.
Required application materials include: 1) a resume that includes educational history, current position, any publications or honors, and full contact information (including phone number, e-mail address, and mailing address); 2) a cover letter identifying the project you are interested in, the contributions you expect to make to the project, relevant background, value of the internship program to your career objectives and your approach to meeting the project deliverables; and optionally: 3) a letter of reference.
Applications must be completed and submitted no later than April 14th (CLOSED). Links to the application forms are provided below. Applicants are encouraged to provide a letter of reference.
https://goo.gl/forms/vps6VK6EndKVek8w2.
Applications will be judged by the following criteria:
Whole Tale is predicated on openness and universal access. Software is developed under one of several open source licenses, and copyrightable content produced during the course of the project will made available under a Creative Commons (CC-BY 3.0) license. Where appropriate, projects may result in published articles and conference presentations, on which the intern is expected to make a substantive contribution, and receive credit for that contribution.
Summer Internships are supported by National Science Foundation Award 1541450.
If you have questions or problems about the application process or internship program in general, please e-mail wholetale-interns-2017@googlegroups.com.
Primary Mentor: Kyle Chard (University of Chicago)
Additional Mentor: Ben Blaiszik/Logan Ward (University of Chicago / Argonne National Laboratory)
Necessary Prerequisites:
Desirable Skills / Qualifications:
Expected Outcomes:
Project Description:
There is a vast amount of materials science data (computational or experimental) available in repositories such as the Materials Data Facility (MDF), Materials Project, AFLOWLib, Citrination, and NOMAD, among others. While much of these data has been analyzed in isolation, little have been used collectively to develop new models and drive new discovery. In this project, the student will develop methods for accessing data contained within the MDF and other repositories directly from within Jupyter notebooks. Such a tool will enable better reproducibility and easier re-use for the increasingly common machine learning models created from materials data.
After developing a model to access data, the student will then work to recreate several machine learning methods from the literature to explore these datasets, and then branch to applying these tools to new data. For example, the student will implement models that determine whether metallic alloys can be formed as glasses (see figure below). Having implemented these models and reproduced published results, the student will apply the underlying model to a wider range of data contained in the MDF. At this point, there are several possible routes for continuing the project, including benchmarking several machine learning methods using the same datasets or exploring how to best mix different data sources to train a single model. As a result of this work, the student will publish the resulting notebooks, models, and derived data in the MDF such that others can discover, reproduce, and build upon their work.
Ability of different alloy compositions to form a metallic glass as (a) measured experimentally and (b) predicted with a machine learning model. Without being provided any data from the experimental data shown in (a), the model correctly identifies the locations of the two glass-forming regions in the Al-Ni-Zr alloy system. Ref: Ward et al. npj Compt. Mater. (2016), 16028.
Primary Mentor: Timothy McPhillips (UIUC: University of Illinois, Urbana-Champaign)
Additional Mentor: Kyle Bocinsky (Washington State University), Bertram Ludaescher (UIUC)
Necessary Prerequisites:
Desirable Skills / Qualifications:
Expected Outcomes:
Project Description:
The goal of this project is to create an integrated representation of all dimensions of the provenance of the results of particular scientific study. The study will employ PaleoCAR [BK14] to reconstruct environmental conditions of a particular paleoenvironment.
The elements of the resulting comprehensive provenance model of this PaleoCAR study will include: (1) graphical, queryable representations of each of the computational workflows enacted as part of the research and corresponding to the prospective provenance all data products generated during the study; (2) the retrospective provenance of each such intermediate and final data product complete with records of the specific program executions involved, the values of program arguments applied, and--where possible--the values of key variables within the programs themselves as exposed by YesWorkflow [YW]; (3) the provenance of all data used by the study but obtained from sources outside the study including public data repositories; (4) the intellectual and scholarly lineage of the scientific, computational, and statistical methods employed in the study; (5) the provenance and dependencies of all software programs, libraries, and components used in the study; and (6) the network of connections between the preceding 5 categories of provenance, e.g. the chains of citations to scientific literature reporting the invention, evaluation, and application of the methods and software components used in the study.
We expect that this project will yield insights into what new tools researchers need, and what new capabilities must to be added to the tools and environments they already use, if we are to make it easy for them to present their studies in the context of the rich, integrated provenance this project will illustrate.
References:
Primary Mentor: Peter Darch (University of Illinois at Urbana-Champaign)
Additional Mentor: Victoria Stodden (University of Illinois at Urbana-Champaign)
Necessary Prerequisites:
This project is suitable for students in, or graduates of, library and information science degree programs.
Desirable Skills / Qualifications:
Expected Outcomes:
Project Description:
The task of building infrastructure that supports and promotes reproducible science involves addressing multiple challenges. One major challenge involves understanding the existing practices and requirements of the domain researchers for whom the infrastructure will be built. Developing this understanding is critical for: identifying where infrastructure development is best targeted; specifying what features should be incorporated into this infrastructure; and ensuring that researchers are able to integrate this infrastructure easily into their work practices.
This project will involve studying a team of researchers from one of the Whole Tale Science Pathways domains (astronomy, archaeology, material science, biology/genomics, social science, disaster resilience): the particular domain will be selected to closely match with your own interests/background.
You will work with the research team to learn about their work and information practices. These practices include: how they produce, manage, and use data and software; whether and how they make their data and software accessible to others; and their associated record-keeping practices. Your work will involve interviewing team members about their practices, and observing them at work.
After identifying the team’s existing information practices, you will explore what infrastructure and practices could make this team’s work more reproducible. In addition, you may also identify particular barriers and challenges to introducing new infrastructure and practices into the work practices of the team.
Based on your findings, you will formulate a series of use cases. These use cases will represent your findings to software engineers who are building infrastructure to promote reproducible research.