subset_proteins: Subset proteins

View source: R/TwoPart_MultiMS.R

subset_proteinsR Documentation

Subset proteins

Description

Subset proteins into ones common to all datasets passed into the function and unique to each dataset. Note: for 3+ datasets no intermediate combinations of proteins are returned, only proteins common to all datasets, the rest are returned as unique to each dataset.

Usage

subset_proteins(mm_list, prot.info, prot_col_name)

Arguments

mm_list

list of matrices for each experiment, length = number of datasets to compare internal dataset dimentions: numpeptides x numsamples for each dataset

prot.info

list of protein and peptide mapping for each matrix in mm_list, in same order as mm_list

prot_col_name

column name in prot.info that contains protein identifiers that link all datasets together. Not that Protein IDs will differ across different organizms and cannot be used as the linking identifier. Function match_linker_ids() produces numeric identifyers that link all datasets together

Value

data frame with the following columns

sub_mm_list

list of dataframes of intensities for each of the datasets passed in with proteins present in all datasets

sub_prot.info

list of dataframes of metadata for each of the datasets passed in with proteins present in all datasets. Same order as sub_mm_list

sub_unique_mm_list

list of dataframes of intensities not found in all datasets

sub_unique_prot.info

ist of dataframes of metadata not found in all datasets

common_list

list of protein IDs commnon to all datasets

Examples

# Load mouse dataset
data(mm_peptides)
head(mm_peptides)
# different from parameter names as R uses
# outer name spaces if variable is undefined
intsCols = 8:13
metaCols = 1:7 # reusing this variable
m_logInts = make_intencities(mm_peptides, intsCols)  # will reuse the name
m_prot.info = make_meta(mm_peptides, metaCols)
m_logInts = convert_log2(m_logInts)
grps = as.factor(c('CG','CG','CG', 'mCG','mCG','mCG'))
set.seed(173)
mm_m_ints_eig1 = eig_norm1(m=m_logInts,treatment=grps,prot.info=m_prot.info)
mm_m_ints_eig1$h.c # check the number of bias trends detected
mm_m_ints_norm = eig_norm2(rv=mm_m_ints_eig1)
mm_prot.info = mm_m_ints_norm$normalized[,1:7]
mm_norm_m =  mm_m_ints_norm$normalized[,8:13]
set.seed(131)
imp_mm = MBimpute(mm_norm_m, grps,
                  prot.info=mm_prot.info, pr_ppos=2, my.pi=0.05,
                  compute_pi=FALSE)

# Load human dataset
data(hs_peptides)
head(hs_peptides)
intsCols = 8:13
metaCols = 1:7 # reusing this variable
m_logInts = make_intencities(hs_peptides, intsCols)  # will reuse the name
m_prot.info = make_meta(hs_peptides, metaCols)
m_logInts = convert_log2(m_logInts)
grps = as.factor(c('CG','CG','CG', 'mCG','mCG','mCG'))
hs_m_ints_eig1 = eig_norm1(m=m_logInts,treatment=grps,prot.info=m_prot.info)
hs_m_ints_eig1$h.c # check the number of bias trends detected
hs_m_ints_norm = eig_norm2(rv=hs_m_ints_eig1)
hs_prot.info = hs_m_ints_norm$normalized[,1:7]
hs_norm_m =  hs_m_ints_norm$normalized[,8:13]
set.seed(131)
imp_hs = MBimpute(hs_norm_m, grps,
                  prot.info=hs_prot.info, pr_ppos=2,
                  my.pi=0.05,
                  compute_pi=FALSE)

# Multi-Matrix Model-based differential expression analysis
# Set up needed variables
mms = list()
treats = list()
protinfos = list()
mms[[1]] = imp_mm$y_imputed
mms[[2]] = imp_hs$y_imputed
treats[[1]] = grps
treats[[2]] = grps
protinfos[[1]] = imp_mm$imp_prot.info
protinfos[[2]] = imp_hs$imp_prot.info

subset_data = subset_proteins(mm_list=mms, prot.info=protinfos, 'MatchedID')
mms_mm_dd = subset_data$sub_unique_mm_list[[1]]
protinfos_mm_dd = subset_data$sub_unique_prot.info[[1]]
# DIfferential expression analysis for mouse specific protiens
DE_mCG_CG_mm_dd = peptideLevel_DE(mms_mm_dd, grps,
                                  prot.info=protinfos_mm_dd, pr_ppos=2)


YuliyaLab/ProteoMM documentation built on April 19, 2022, 8:12 a.m.