View source: R/separateSubunits.R
separateSubunits | R Documentation |
Separate names of antibodies against multi-subunit proteins e.g. CD235ab, CD66ace into one subunit per row.
Two subunit patterns are considered. For the first, subunits are lower case letters and the gene name has no separator, e.g. CD66ace is composed of subunits CD66a, CD66b and CD66c. For the second pattern, subunits are written with uppercase letters and are separated with a "-", e.g. HLA-A/C/E is composed of subunits HLA-A, HLA-C and HLA-E. Both patterns require at least at least 2 capital letters or numbers followed by at least 2 possible subunits. There may be a separator between the groups and/or between the lower case letters. At present, the between group separators are -, . and space, and the between subunit separators are / and .
Subunits should be converted from Greek symbols before applying this function.
At present user-supplied regex patterns are not supported
separateSubunits(df, ab = "Antigen", new_col = "subunit")
df |
A data.frame or tibble |
ab |
(character(1), default "Antigen) Name of the column containing antibody names |
new_col |
(default: subunit) Name of new column containing guesses for single subunit names |
df, with a new column "subunit" containing potential individual subunits. Original rows of df are replicated for each subunit, i.e. the returned data.frame is in long format.
Helen Lindsay
df <- data.frame(ID = LETTERS[1:5],
Antigen = c("CD235a/b", "CD235ab",
"HLA-ABC", "HLA-DR", "TCR alpha/beta"))
#Note that in this example, the TCR is not split as "alpha/beta" is too long
#to match the splitting pattern. Also note that HLA-DR is split - this
#function doesn't check whether the results are real protein subunits.
separateSubunits(df)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.