View source: R/allPfamAnalysis.R
allPfamAnalysis | R Documentation |
Given a repository of mutations, the method allPfamAnalysis
launches the analysis of all the Pfams and single sequences which
are involved with at least one mutation.
allPfamAnalysis(repos , allLowMACAObjects=NULL , mutation_type=c("missense", "all", "truncating" , "silent") , NoSilent=TRUE , mail=NULL , perlCommand="perl" , verbose=FALSE , conservation=0.1 , use_hmm=FALSE , datum=FALSE , clustal_cmd="clustalo" , BPPARAM=bpparam("SerialParam"))
repos |
either a data.frame or a filename containing the data to analyze |
allLowMACAObjects |
filename of a RData file to save all
the LowMACA object |
mutation_type |
type of mutation to be considered for the analysis. Default to missense. |
NoSilent |
logical indicating if Silent mutations should be deleted or not. Default TRUE |
mail |
if not NULL, it must be a valid email address to use EBI clustalo web service. Default is to use a local clustalo installation |
perlCommand |
a character string containing the path to Perl executable. if missing, "perl" will be used as default. Only used if mail is set |
verbose |
logical. verbose output or not |
conservation |
a number between 0 and 1. Represents the minimum level of conservation to test a mutation |
use_hmm |
When analysing Pfam sequences, it is possible to use the Hidden Markov Model (HMM) of the specific Pfam to align the sequences. Default is FALSE. |
datum |
When analysing Pfam sequences, use all the genes that belong to the Pfam to generate the alignment. This creates a unique mapping between individual residues and consensus sequence, disregarding the set of sequences that are selected for the analysis. Default is FALSE. |
clustal_cmd |
path to clustalomega executable. default is to check "clustalo" in the PATH |
BPPARAM |
An object of class |
This function takes a data.frame or a tab delimited text file in LowMACA format (see LowMACA_AML
)
and perform a full analysis of the dataset. It basically divide the mutations into their Pfam and launch many LowMACA
analysis as many Pfam are hit by mutations up to the lfm
function. Every significant position after lfm
is tested at gene level. A binomial test is performed to see if the ratio between the number of mutations
in the significant position over the total number of mutations is higher than expected by chance at gene level.
The significant mutations of all the lfm
functions are aggregated in one single data.frame.
A list of two dataframes named 'AlignedSequence' and 'SingleSequence'
The first dataframe is the result of the alignment based analysis. Every gene is aggregated by its corresponding Pfam domain.
Gene_Symbol |
gene symbols of the analyzed genes |
Multiple_Aln_pos |
positions in the consensus relatively to the sequence analyzed. |
Pfam_ID |
Pfam name analyzed |
binomialPvalue |
pvalue of the single gene test, See details |
Amino_Acid_Position |
amino acidic positions relative to original protein |
Amino_Acid_Change |
amino acid changes in hgvs format |
Sample |
Sample barcode where the mutation was found |
Tumor_Type |
Tumor type of the Sample |
Envelope_Start |
start of the pfam domain in the protein |
Envelope_End |
end of the pfam domain in the protein |
metric |
qvalue of the position in the multiple alignment of Pfam domains |
Entrez |
entrez ids of the mutations |
Entry |
Uniprot entry of the protein |
UNIPROT |
other protein names for Uniprot |
Chromosome |
cytobands of the genes |
Protein.name |
extended protein names |
The second dataframe represent the result of LowMACA on every couple gene-domain when it is not aligned with any other member of the same Pfam ID.
Gene_Symbol |
gene symbols of the analyzed genes |
Amino_Acid_Position |
amino acidic positions relative to original protein |
Amino_Acid_Change |
amino acid changes in hgvs format |
Sample |
Sample barcode where the mutation was found |
Tumor_Type |
Tumor type of the Sample |
Envelope_Start |
start of the pfam domain in the protein |
Envelope_End |
end of the pfam domain in the protein |
Multiple_Aln_pos |
positions in the consensus relatively to the sequence analyzed. See warnings section |
Entrez |
entrez ids of the mutations |
Entry |
Uniprot entry of the protein |
UNIPROT |
other protein names for Uniprot |
Chromosome |
cytobands of the genes |
Protein.name |
extended protein names |
Stefano de Pretis , Giorgio Melloni
lfm
, LowMACA_AML
#Load Homeobox example data(lmObj) #Extract the data inside the object as a toy example myData <- lmMutations(lmObj)$data #Run allPfamAnalysis on every mutations significant_muts <- allPfamAnalysis(repos=myData) #Show the result of alignment based analysis head(significant_muts$AlignedSequence) #Show all the genes that harbor significant mutations unique(significant_muts$AlignedSequence$Gene_Symbol) #Show the result of the Single Gene based analysis head(significant_muts$SingleSequence) #Show all the genes that harbor significant mutations unique(significant_muts$SingleSequence$Gene_Symbol)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.