allPfamAnalysis: Global analysis of a repository of mutations

View source: R/allPfamAnalysis.R

allPfamAnalysisR Documentation

Global analysis of a repository of mutations

Description

Given a repository of mutations, the method allPfamAnalysis launches the analysis of all the Pfams and single sequences which are involved with at least one mutation.

Usage

allPfamAnalysis(repos
, allLowMACAObjects=NULL
, mutation_type=c("missense", "all", "truncating" , "silent") 
, NoSilent=TRUE
, mail=NULL
, perlCommand="perl"
, verbose=FALSE
, conservation=0.1
, use_hmm=FALSE
, datum=FALSE
, clustal_cmd="clustalo"
, BPPARAM=bpparam("SerialParam"))

Arguments

repos

either a data.frame or a filename containing the data to analyze

allLowMACAObjects

filename of a RData file to save all the LowMACA object allPfamsLM produced by the function. It can be usefull for plotting a specific Pfam after the analysis, but it can be a pretty large object. Default NULL

mutation_type

type of mutation to be considered for the analysis. Default to missense.

NoSilent

logical indicating if Silent mutations should be deleted or not. Default TRUE

mail

if not NULL, it must be a valid email address to use EBI clustalo web service. Default is to use a local clustalo installation

perlCommand

a character string containing the path to Perl executable. if missing, "perl" will be used as default. Only used if mail is set

verbose

logical. verbose output or not

conservation

a number between 0 and 1. Represents the minimum level of conservation to test a mutation

use_hmm

When analysing Pfam sequences, it is possible to use the Hidden Markov Model (HMM) of the specific Pfam to align the sequences. Default is FALSE.

datum

When analysing Pfam sequences, use all the genes that belong to the Pfam to generate the alignment. This creates a unique mapping between individual residues and consensus sequence, disregarding the set of sequences that are selected for the analysis. Default is FALSE.

clustal_cmd

path to clustalomega executable. default is to check "clustalo" in the PATH

BPPARAM

An object of class BiocParallelParam-class specifiying parameters related to the parallel execution of some of the tasks and calculations within this function. See function register from the BiocParallel package.

Details

This function takes a data.frame or a tab delimited text file in LowMACA format (see LowMACA_AML) and perform a full analysis of the dataset. It basically divide the mutations into their Pfam and launch many LowMACA analysis as many Pfam are hit by mutations up to the lfm function. Every significant position after lfm is tested at gene level. A binomial test is performed to see if the ratio between the number of mutations in the significant position over the total number of mutations is higher than expected by chance at gene level. The significant mutations of all the lfm functions are aggregated in one single data.frame.

Value

A list of two dataframes named 'AlignedSequence' and 'SingleSequence'

The first dataframe is the result of the alignment based analysis. Every gene is aggregated by its corresponding Pfam domain.

Gene_Symbol

gene symbols of the analyzed genes

Multiple_Aln_pos

positions in the consensus relatively to the sequence analyzed.

Pfam_ID

Pfam name analyzed

binomialPvalue

pvalue of the single gene test, See details

Amino_Acid_Position

amino acidic positions relative to original protein

Amino_Acid_Change

amino acid changes in hgvs format

Sample

Sample barcode where the mutation was found

Tumor_Type

Tumor type of the Sample

Envelope_Start

start of the pfam domain in the protein

Envelope_End

end of the pfam domain in the protein

metric

qvalue of the position in the multiple alignment of Pfam domains

Entrez

entrez ids of the mutations

Entry

Uniprot entry of the protein

UNIPROT

other protein names for Uniprot

Chromosome

cytobands of the genes

Protein.name

extended protein names

The second dataframe represent the result of LowMACA on every couple gene-domain when it is not aligned with any other member of the same Pfam ID.

Gene_Symbol

gene symbols of the analyzed genes

Amino_Acid_Position

amino acidic positions relative to original protein

Amino_Acid_Change

amino acid changes in hgvs format

Sample

Sample barcode where the mutation was found

Tumor_Type

Tumor type of the Sample

Envelope_Start

start of the pfam domain in the protein

Envelope_End

end of the pfam domain in the protein

Multiple_Aln_pos

positions in the consensus relatively to the sequence analyzed. See warnings section

Entrez

entrez ids of the mutations

Entry

Uniprot entry of the protein

UNIPROT

other protein names for Uniprot

Chromosome

cytobands of the genes

Protein.name

extended protein names

Author(s)

Stefano de Pretis , Giorgio Melloni

See Also

lfm, LowMACA_AML

Examples

#Load Homeobox example
data(lmObj)
#Extract the data inside the object as a toy example
myData <- lmMutations(lmObj)$data
#Run allPfamAnalysis on every mutations
significant_muts <- allPfamAnalysis(repos=myData)
#Show the result of alignment based analysis
head(significant_muts$AlignedSequence)
#Show all the genes that harbor significant mutations
unique(significant_muts$AlignedSequence$Gene_Symbol)
#Show the result of the Single Gene based analysis
head(significant_muts$SingleSequence)
#Show all the genes that harbor significant mutations
unique(significant_muts$SingleSequence$Gene_Symbol)

ste-depo/LowMACA documentation built on Oct. 15, 2022, 11:53 p.m.