Supporting Researchers
Last updated on 2024-01-16 | Edit this page
Overview
Questions
- How does a data interview compare to a reference interview?
- What are common researcher questions about the DMP process?
Objectives
- Identify the difference between a data interview and a reference interview
- Construct questions for a data interview
- Answer common researcher questions about the DMP process
Introduction
So far, we have covered what a data management plan looks like, what each subsection contains, and where to locate resources that connect to the researcher’s specific need. In this section, we pivot from learning about DMPs into how to apply this knowledge when serving patrons. We will provide insights into common questions and concerns researchers have about the DMP process, and describe strategies on how to effectively conduct a data interview.
Data Interview
According to the NNLM’s data glossary, “a data interview in the library context refers to an interaction between a librarian and a researcher with a structured or semi-structured set of questions designed to elicit information about the researcher’s data practices and/or needs.” This process is essentially a specialized subcategory of the reference interview, and is a good first step in helping a researcher prepare a DMP.
Just like the reference interview begins with establishing a background purpose (“what is this information being used for”), you might want to begin broadly by asking researchers about their project and its purpose. This can help you to begin formulating follow-up questions that will extrapolate the researcher’s needs.
Challenge
Let’s say that the researcher responds with “I am running a project about the impact of pets on the emotional well-being of children” to the question “what is the purpose of your research”.
What has this short response told you about the researcher’s project and needs?
Children are the subject of the research, so it is a human subject’s study. This means that researcher will probably need additional accommodations if they want to share their dataset. Children are also a protected population, which needs to be taken account when trying to answer the question should data be shared. Based on the information provided it is unclear what format the data will be in (Observational notes? Quantitative questionnaire responses? Qualitative interviews? Other?), and will therefore need to be explored more during the conversation.
Even short responses can give you an idea of who/what is the subject of the research, how sensitive this data may be, and the potential formats of the data. Even with this short response, you have already found out this is a human subjects study, and that this researcher will need additional accommodations if they want to share their dataset. Like in a reference interview, it is useful to paraphrase the project back to the researcher and ask clarifying questions to make sure you have a good grasp of the research purpose.
After establishing the purpose of the project, it is helpful to ask about where the researcher is applying for grant funding and their timeline for submitting materials. Researchers who are not applying for grant funding can still benefit from writing a data management plan, and these questions can help them consider their project needs and what workflows need to be put into place.
Next, we move on to follow-up questions that relate to the DMP sections. Like in a reference interview, these questions move from open-ended to closed, specific questions to clarify needs not raised by the researcher. Use the purpose of the project to inform your questions, and help the researcher think through their workflow and needs. Sometimes the researcher will answer “I am not sure”. This is an opportunity to explore what they think they will do, and to provide some options as to how they may proceed. Remind the researcher that a DMP can (and should) be updated as necessary to better align with their procedures as the project evolves.
Follow-up questions on data description and size
Let’s start with the first DMP section “Data Description and Format”.
Challenge
What information should this section contain?
This section of a DMP provides a brief description of what data will be collected as part of the research project and their formats. Information about general files size (MB / GB per file) and estimated total number of files can be helpful.
To summarize, we are looking for WHAT data will be generated and HOW MUCH data will be generated, both in terms of size and quantity.
What sort of questions can we ask to get at this information?
- What is your target sample size?
- How are you collecting data?
- Are you using structured questionnaires or interviews?
- Are you interacting with subjects directly or indirectly?
- Talking with the subject
- Talking to their guardians or a third party
- Recording observation
- Are you taking video, audio recordings, or images?
- What devices are you using?
- For videos/audio recordings, how long are the recordings? In what format?
- Are you collecting data any other way?
- Scans
- Measurements
- Is the collected data in a physical format (such as on paper) or in a digital format (through a computer or other electronic device)?
- How often are you collecting information for each subject throughout the study?
Follow-up questions on metadata and data standards
Challenge
What information should this section contain?
This section provides information about what standards will be used, giving context to the data generated for easier interpretation and reuse.
To summarize, we are looking for HOW data is documented, and how that documentation is standardized to make it easier to understand and reuse.
What sort of questions can we ask to get at this information?
- How are you documenting your variables?
- Are you using abbreviations that need defining?
- Does your data have units that need clarification?
- Are you using derived variables (variables obtained by combining or coding other variables)?
- Does your discipline have any requirements for how you should be
describing your dataset?
- Are you using a set of words standard to your field (controlled vocabulary)?
- What minimum information would colleagues need to know to
- Recreate your research study?
- Recreate your analyses?
Follow-up questions on preservation and access
Challenge
What information should this section contain?
This section provides information about when data will be backed-up, preserved, and published, as well as data security.
To summarize, we are looking for HOW data is secured as well as preserved for future access.
What sort of questions can we ask to get at this information?
- Where are you storing the paper copies of the questionnaires?
- Where are you storing the audio recordings of your interviews?
- Are you planning on transcribing your interviews?
- What software are you using to
- code your data (Excel, Google sheets, SPSS etc)?
- analyze your data?
- Have you considered file naming conventions or file structures to help you find your files more easily?
- If using a proprietary software, are you planning on saving your files in an open format for sharing and long-term preservation?
Callout
Proprietary software is owned by an organization that requires a license or a fee to access. Typically, this software will generate files formats specific to it (such as Excel .xlsx), and it might be difficult to open or manipulate it using other software. Converting these data files into an open format, a version that is easily accessible by many pieces of software, makes data more FAIR (such as from .xlsx to .csv or .tsv). For a list of open access file formats, please see the resources on the references page.
Follow-up questions on access and reuse
Challenge
What information should this section contain?
This section provides information about where the data will be made publicly available, and includes a justification why the repository chosen will help with dissemination, preservation, and reuse.
To summarize, we are looking for WHERE data is stored long-term, and why it is the best choice for discovery, reuse and preservation.
What sort of questions can we ask to get at this information?
- Are you planning on sharing your data in the future?
- Do you have any obligations from your funder to share your data?
- Where are you planning on publishing your articles? Does the publisher have any data sharing requirements?
- If you are planning on sharing your data in the future, is data sharing explicitly addressed in the consent form?
- If you are planning on sharing data in the future, do you have a
sense of where you want to deposit your research data when the time
comes?
- Discuss repository options
- Do you need to de-identify or aggregate your data before you can
share your data?
- Discuss embargoes, controlled vs open access
Follow-up questions on oversight
Challenge
What information should this section contain?
This section provides information about who is responsible for data oversight, which includes deciding how often or when actions such as backup, converting files to open access versions, depositing the data into a repository, long term preservation, and data destruction will occur.
To summarize, we are looking for WHO takes responsibility for the data during the project, in the short and long term, and ON WHAT timeline.
What sort of questions can we ask to get at this information?
- Who is coding your data? How are you maintaining accuracy?
- Data checks? Double entry? Controlled entry?
- Who is responsible for backing up your data? How often?
- Who is responsible for preserving your data long term?
- Who is responsible for depositing your data?
Callout
Researchers may ask if they can list you as the librarian for helping them plan the data management activities specified in the DMP. Unless they are compensating you for your time and writing your name into the grant to manage the data on the project, remind them that this section is for listing who is carrying out these activities. Typically, the PI (primary investigator) is responsible for this activity, however lab managers or other staff may also be listed.
Follow-up questions on budget
Consider costs associated with data management.
- Where are you planning on storing your data during the active research phase?
- Do you need additional or specific types of platforms that the university does not provide? Do these have costs?
- Do you need to pay someone to manage your research data?
- Do you need to pay for data de-identification or curation?
- Do you need to pay for your dataset deposit?
Tips for talking with researchers
- Researchers may have not been formally trained in data management and may not think about their project through this lens
- Researchers often speak a different language - they may assign a different meaning to metadata or data standards
- Researchers may not be accustomed to submitting data to a repository
- There are many reasons a researcher may be hesitant to share their data. This can include a lack of sharing culture within their disciple, fear of their research getting “scooped” (having your research idea or results published by someone else), or the additional labor associated with preparing their dataset after the active research phase.
Mock Data Interview
Conduct a data interview with a classmate. The “researcher” will read the scenario below, but the “librarian” will not. The researcher can feel free to fill in any details needed to answer the questions from the librarian – these scenarios have been left intentionally domain agnostic. Then switch.
Scenario 1: You are a researcher writing a grant
proposal to be submitted to the NIH. You have heard that a data
management and sharing plan is required for NIH grant applications, but
you don’t know any details.
Scenario 2: You are a researcher working to publish an
article in a journal. You have just found out you need to make your data
open by depositing it in a repository to satisfy journal requirements.
You aren’t sure which repository to choose.
Key Points
- A data interview is related to, but not the same as the reference interview.
- Use the follow up questions in this section in your data interview to elicit the information needed to improve researcher DMPs.
- Researchers may be new to sharing data and data management