knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(DT.options = list(paginate = FALSE, info = FALSE, filter = FALSE))
The purpose of this vignette is to demonstrate how to use the functions and data in the
R2i
package to create the basicStudyDesign.txt that must be submitted along
with assay and other metaData to the public ImmPort database at NIH. When creating your own
submission documents, you can create a similar vignette so that the whole process is
reproducible.
First, we will load the R2i
library, which has the functions and templates we need.
library(R2i)
Next, we need to create the 11 dataframes that will be put together to form the larger txt:
The first dataframe, called study is unique in that it has two columns, one with names and the other with values. In this way it functions more like a named list and is treated differently from most other dataframes that have a typical format with column headers and unlimited rows allowed.
In order to get a pre-made dataframe for study we use the getTemplateDF()
function.
The only argument is the name of the template.
study <- getTemplateDF("study") DT::datatable(study, options = list(scrollX = TRUE))
Now that we have the study dataframe, the easiest way to edit the dataframe by hand is to
use the edit()
function and save the output by clicking "quit" in the editor. This function
is found in the utils
package, but depends on the X11
library. If you are on a newer mac
you may need to install XQuartz from the project's website as it is no longer bundled. If for
some reason edit()
does not save the output correctly on your machine, then you will have to
input information using an R-based approach.
edit(study)
An R-based approach example:
study[1, ] <- list( NA, "myBriefTitle", "myOfficialTitle", "This is a study", "This is a longer description of the study", "We hypothesize ... ", "The objectives were ... ", "Endpoints are ... ", "NIH", 50, 10, 30, "years", "01/01/2018", "01/01/2018" )
We can check the study template to see if it passes the basic checks for class, dimension,
required columns, data types, and controlled terms by using checkTemplate()
. This function
takes in the data frame we have created as well as the template name. Any problems will throw
an error, so for demonstration purposes we wrap the function in a tryCatch()
method here.
# NOTE: messages are printed out results <- tryCatch(checkTemplate(df = study), error = function(e) { return(e) } ) # to see error statement results
It looks like we have an NA
value for study$'User Defined ID'
. Let's correct this and
re-run our checks.
study$`User Defined ID` <- "sdyID" # NOTE: messages are printed out results <- tryCatch(checkTemplate(df = study), error = function(e) { return(e) } ) # to see error statement results
It seems we are using a non-controlled term in Age Unit
column, which requires a controlled value.
Many templates use controlled or preferred terms to help maintain standardized terms across studies, so
it is important for us to correct this before writing out the template.
R2i::checkTemplate()
will throw an error for columns with non-matching controlled values. However,
to see preferred value columns with issues, you must change the default quiet
argument to FALSE
to receive messages.
In the case of study, we can see which columns have such terms by using the getLookups()
function. This function's only argument is the ImmPort Template Name.
getIPLookups(ImmPortTemplateName = "study")
It appears that study has one column with controlled terms: Age Unit. To
see what values are allowed for Age Unit we can use the getLookupValues()
function. This
function takes in the ImmPort Template Name and the column name as arguments, then returns
a vector of allowed values.
getIPLookupValues(ImmPortTemplateName = "study", templateColname = "Age Unit")
Since it looks like the original entry for Age Unit ('years') is not in the vector,
we must correct it with a capitalized version so it passes the checkTemplate()
function
later on. We can then see if our data frame passes our checks again.
study$`Age Unit` <- "Years" # NOTE: messages are printed out results <- tryCatch(checkTemplate(df = study), error = function(e) { return(e) } ) # to see error statement results
Similar to study, the other 9 dataframes that make up the basicStudyDesign.txt can be accessed
with the getTemplateDF()
function. Depending on the amount of information that needs to be
entered, it may be easier to use edit()
or base R. For the purpose of the vignette, an R-based
approach is used for reproducibiity and demonstration.
The next one is a simple one called study_categorization that defines the type of study performed
study_categorization <- getTemplateDF("study_categorization") study_categorization[1, ] <- c("Immune Response") DT::datatable(study_categorization, options = list(scrollX = TRUE))
The next one is a simple one called study_2_condition_or_disease that defines the disease or condition studied.
study_2_condition_or_disease <- getTemplateDF("study_2_condition_or_disease") study_2_condition_or_disease[1, ] <- c("Typhoid") DT::datatable(study_2_condition_or_disease, options = list(scrollX = TRUE))
The next one we focus on is called arm_or_cohort. In this
demonstration case, we have a csv already of the information needed and just want to bind this
new information to correct headers. Therefore we use the arm_or_cohort dataframe only for the
colnames() call. An important note: the checkTemplate()
function needs the "templateName"
attribute in the data frame in order to run the necessary checks. This is easily done by
using attr(arm_or_cohort, "templateName") <- "arm_or_cohort"
.
arm_or_cohort <- getTemplateDF("arm_or_cohort") file_path <- system.file("extdata/arm_or_cohort_demo.tsv", package = "R2i") aocImport <- read.table(file_path, sep = "\t", stringsAsFactors = FALSE) colnames(aocImport) <- colnames(arm_or_cohort) aocImport <- aocImport[aocImport$`User Defined ID` != "", ] DT::datatable(aocImport, options = list(scrollX = TRUE)) # to be consistent we rename aocImport for use in the 'write' functions later arm_or_cohort <- aocImport # Set "templateName" attribute to pass `checkTemplate()` fn attr(arm_or_cohort, "templateName") <- "arm_or_cohort"
Making study_personnel:
study_personnel <- getTemplateDF("study_personnel") study_personnel[1, ] <- c( "Personnel1", "Dr.", "Khanna", "Elizabeth", "", "Major University", 123, "ekhanna@major.edu", "PI", "Principal Investigator", "Major University" ) DT::datatable(study_personnel, options = list(scrollX = TRUE))
Making planned_visit:
planned_visit <- getTemplateDF("planned_visit") planned_visit[1, ] <- list( 1, "Screening", 1, -10, -2, "", "" ) planned_visit[2, ] <- list( 2, "Immunazation", 2, 0, 0, "", "" ) planned_visit[3, ] <- list( 3, "Chellenge", 3, 100, 110, "", "" ) DT::datatable(planned_visit)
Making inclusion_exclusion:
inclusion_exclusion <- getTemplateDF("inclusion_exclusion") inclusion_exclusion[1, ] <- c( "InclExcl1", "older than 35 years old", "Exclusion" ) DT::datatable(inclusion_exclusion)
study_2_protocol is different than other templates. It is a small dataframe with only 1 row and two columns with the first column being a name and the second being he value.
study_2_protocol <- getTemplateDF("study_2_protocol") study_2_protocol[1, ] <- "protocol 3445" DT::datatable(study_2_protocol)
Making study_file:
study_file <- getTemplateDF("study_file") study_file[1, ] <- list( "Appendix.txt", "Study Appendix", "Study Data" ) DT::datatable(study_file)
Making study_link:
study_link <- getTemplateDF("study_link") study_link[1, ] <- c( "main website", "https://drkhannalab.major.edu/NewStudy1" ) DT::datatable(study_link, options = list(autoWidth = TRUE))
study_pubmed is going to be left blank as a demonstration since some studies may need to be imported prior to being published. Publication information can be entered later using an update template.
study_pubmed <- getTemplateDF("study_pubmed")
Before transforming our data frames and writing the tsv output, we can also do some quality assurance
checks of the text in our data frames using the text cleaning functions in the R2i
package.
We will use planned_visit$Name
as an example to first check for spelling errors using
the checkSpelling()
function that imports the hunspell
package.
checkSpelling(input = planned_visit$Name)
If working interactively, you can use interactiveReplace()
to go through each error found
with checkSpelling() and input a replacement at the prompt. In this case, we will simply use
the findReplace()
function to fix Immunazation and Chellenge.
tmp <- findReplace(input = planned_visit$Name, find = "Immunazation", replace = "Immunization") tmp <- findReplace(input = tmp, find = "Chellenge", replace = "Challenge") planned_visit$Name <- tmp DT::datatable(planned_visit)
To create the tsv that will be included in the ImmPort submission, we use the
transform_basicStudyDesign()
function that takes a named list of the 9 dataframes as the first
argument, as well as outputDir
and validate
. The outputDir
argument is the filepath for the
output directory where the tsv should be saved. validate
is a boolean with TRUE
as the default
that uses the validator scripts from ImmPort's web application to ensure that the tsv meets the
criteria necessary for import. To see more information about the transform_basicStudyDesign()
function you can always enter ?transform_basicStudyDesign
in the console.
blocks <- list( "study" = study, "study_categorization" = study_categorization, "study_2_condition_or_disease" = study_2_condition_or_disease, "arm_or_cohort" = arm_or_cohort, "study_personnel" = study_personnel, "planned_visit" = planned_visit, "inclusion_exclusion" = inclusion_exclusion, "study_2_protocol" = study_2_protocol, "study_file" = study_file, "study_link" = study_link, "study_pubmed" = study_pubmed ) temp <- tempdir() transform_basicStudyDesign( blocks = blocks, outputDir = temp, validate = TRUE )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.