prot_level_multi_part: Multi-Matrix Differentia Expression Analysis

View source: R/TwoPart_MultiMS.R

prot_level_multi_partR Documentation

Multi-Matrix Differentia Expression Analysis

Description

Multi-Matrix Differential Expression Analysis computes Model-Based statistics for each dataset, the sum of individual statistics is the final statistic. The significance is determined via a permutation test which computed the same statistics and sums them after permuting the values across treatment groups. As is outlined in Karpievitch et al. 2018. Important to set the random number generator seed for reprodusibility with set.seed() function.

Usage

prot_level_multi_part(
  mm_list,
  treat,
  prot.info,
  prot_col_name,
  nperm = 500,
  dataset_suffix
)

Arguments

mm_list

list of matrices for each experiment, length = number of datasets to compare internal dataset dimentions: numpeptides x numsamples for each dataset

treat

list of data frames with treatment information to compute the statistic in same order as mm_list

prot.info

list of protein and peptide mapping for each matrix in mm_list, in same order as mm_list

prot_col_name

column name in prot.info that contains protein identifiers that link all datasets together. Not that Protein IDs will differ across different organizms and cannot be used as the linking identifier. Function match_linker_ids() produces numeric identifyers that link all datasets together

nperm

number of permutations, default = 500, this will take a while, test code with fewer permutations

dataset_suffix

vector of character strings that corresponds to the dataset being analysed. Same length as mm_list. Names will be appended to the columns names that will be generated for each analysed dataset. For example, if analysing mouse and human data this vector may be: c('Mouse', 'Human')

Value

data frame with the following columns

protIDused

Column containing the protien IDs used to link proteins across datasets

FC

Average fold change across all datasets

P_val

Permutation-based p-valu for the differences between the groups

BH_P_val

Multiple testing adjusted p-values

statistic

Statistic computed as a a sum of statistics produced for each dataset

Protein Information

all columns passed into the function for the 1st dataset in the list

FCs

Fold changes for individual datasets, these values should average to the FC above. As many columns as there are datasets being analyzed.

PV

p-values for individual datasets. As many columns as there are datasets being analyzed.

BHPV

Multiple testing adjusted p-values for individual datasets. As many columns as there are datasets being analyzed.

NUMPEP

Number of peptides presents in each protien for each dataset. As many columns as there are datasets being analyzed.

Examples

# Load mouse dataset
data(mm_peptides)
head(mm_peptides)
intsCols = 8:13 # different from parameter names as R uses
                # outer name spaces if variable is undefined
metaCols = 1:7 # reusing this variable
m_logInts = make_intencities(mm_peptides, intsCols)  # will reuse the name
m_prot.info = make_meta(mm_peptides, metaCols)
m_logInts = convert_log2(m_logInts)
grps = as.factor(c('CG','CG','CG', 'mCG','mCG','mCG'))
set.seed(135)
mm_m_ints_eig1 = eig_norm1(m=m_logInts,treatment=grps,
                           prot.info=m_prot.info)
mm_m_ints_eig1$h.c # check the number of bias trends detected
mm_m_ints_norm = eig_norm2(rv=mm_m_ints_eig1)
mm_prot.info = mm_m_ints_norm$normalized[,1:7]
mm_norm_m =  mm_m_ints_norm$normalized[,8:13]
set.seed(125) # Needed for reprodicibility of results
imp_mm = MBimpute(mm_norm_m, grps, prot.info=mm_prot.info,
                  pr_ppos=2, my.pi=0.05, compute_pi=FALSE)

# Load human dataset
data(hs_peptides)
head(hs_peptides)
intsCols = 8:13 # different from parameter names as R uses
                # outer name spaces if variable is undefined
metaCols = 1:7 # reusing this variable
m_logInts = make_intencities(hs_peptides, intsCols)  # will reuse the name
m_prot.info = make_meta(hs_peptides, metaCols)
m_logInts = convert_log2(m_logInts)
grps = as.factor(c('CG','CG','CG', 'mCG','mCG','mCG'))
set.seed(1237) # needed for reproducibility
hs_m_ints_eig1 = eig_norm1(m=m_logInts,treatment=grps,prot.info=m_prot.info)
hs_m_ints_eig1$h.c # check the number of bias trends detected
hs_m_ints_norm = eig_norm2(rv=hs_m_ints_eig1)
hs_prot.info = hs_m_ints_norm$normalized[,1:7]
hs_norm_m =  hs_m_ints_norm$normalized[,8:13]

set.seed(125) # or any value, ex: 12345
imp_hs = MBimpute(hs_norm_m, grps, prot.info=hs_prot.info,
                  pr_ppos=2, my.pi=0.05,
                  compute_pi=FALSE)

# Multi-Matrix Model-based differential expression analysis
# Set up needed variables
mms = list()
treats = list()
protinfos = list()
mms[[1]] = imp_mm$y_imputed
mms[[2]] = imp_hs$y_imputed
treats[[1]] = grps
treats[[2]] = grps
protinfos[[1]] = imp_mm$imp_prot.info
protinfos[[2]] = imp_hs$imp_prot.info
nperm = 50

# ATTENTION: SET RANDOM NUMBER GENERATOR SEED FOR REPRODUCIBILITY !!
set.seed(131) # needed for reproducibility

comb_MBDE = prot_level_multi_part(mm_list=mms, treat=treats,
                                  prot.info=protinfos,
                                  prot_col_name='ProtID', nperm=nperm,
                                  dataset_suffix=c('MM', 'HS'))

# Analysis for proteins only present in mouse,
# there are no proteins suitable for
# Model-Based analysis in human dataset
subset_data = subset_proteins(mm_list=mms, prot.info=protinfos, 'MatchedID')
mm_dd_only = subset_data$sub_unique_mm_list[[1]]
hs_dd_only = subset_data$sub_unique_mm_list[[2]]
protinfos_mm_dd = subset_data$sub_unique_prot.info[[1]]
DE_mCG_CG_mm_dd = peptideLevel_DE(mm_dd_only, grps,
                                  prot.info=protinfos_mm_dd, pr_ppos=2)


YuliyaLab/ProteoMM documentation built on April 19, 2022, 8:12 a.m.