fetchSequence: Fetch protein/peptide sequences and create a...

View source: R/fetchSequence.R

fetchSequenceR Documentation

Fetch protein/peptide sequences and create a dagPeptides-class object.

Description

This function fetches protein/peptide sequences from a Biomart database or from a Proteome-class object based on protein/peptide IDs and create a dagPeptides-class object following restriction as specified by parameters: anchorAA or anchorPos, upstreamOffset and downstreamOffset.

Usage

fetchSequence(
  IDs,
  type = "entrezgene",
  anchorAA = NULL,
  anchorPos,
  mart,
  proteome,
  upstreamOffset,
  downstreamOffset
)

Arguments

IDs

A character vector containing protein/peptide IDs used to fetch sequences from a Biomart database or a Proteome-class object.

type

A character vector of length 1. The available options are "entrezgene" and "uniprotswissprot" if parameter mart is missing; otherwise it can be any type of IDs available in Biomart databases.

anchorAA

A character vector of length 1 or the same length as that of anchorPos, each element of which is a single letter symbol of amino acids, for example, "K" for lysine.

anchorPos

A character or numeric vector. Each element of which is (1) a single-letter symbol of amino acid followed by the position of the anchoring amino acid in the target peptide/protein sequence, for example, "K123" for lysine at position 123 or the position of the anchoring amino acid in the target peptide/protein sequence, for example, "123" for an amino acid at position 123; or (2) a vector of subsequences containing the anchoring AAs.

mart

A Biomart database name you want to connect to. Either of parameters mart or proteome should be provided.

proteome

An object of Proteome-class. Either of parameters mart or Proteome-class should be provided.

upstreamOffset

An integer, the upstream offset relative to the anchoring position.

downstreamOffset

An integer, the downstream offset relative to the anchoring position.

Value

An object of class dagPeptides-class

Examples

## Case 1: You have both positions of the anchoring AAs and the identifiers 
## of their enclosing peptide/protein sequences for fetching sequences using 
## the fetchSequence function via the Biomart.

if (interactive())
{
    try({
    mart <- useMart("ensembl")
    fly_mart <-
       useDataset(mart = mart, dataset = "dmelanogaster_gene_ensembl")
    dat <- read.csv(system.file("extdata", "dagLogoTestData.csv",
                           package = "dagLogo"))
    seq <- fetchSequence(
       IDs = as.character(dat$entrez_geneid),
       anchorPos = as.character(dat$NCBI_site),
       mart = fly_mart,
       upstreamOffset = 7,
       downstreamOffset = 7)
   head(seq@peptides)
   })
}


## Case 2: You don't have the exactly postion information, but You have the 
## interesting peptide subsequences and the identifiers of their enclosing 
## peptide/protein sequences for fetching sequences using the fetchSequence
## function via the Biomart. In the following examples, the anchoring AAs 
## are marked by asterisks. 
if (interactive())
{
    try({
        mart <- useMart("ensembl")
        fly_mart <-
            useDataset(mart = mart, dataset = "dmelanogaster_gene_ensembl")
        dat <- read.csv(system.file("extdata", "dagLogoTestData.csv",
                                    package = "dagLogo"))
        seq <- fetchSequence(
            IDs = as.character(dat$entrez_geneid),
            anchorAA = "*",
            anchorPos = as.character(dat$peptide),
            mart = fly_mart,
            upstreamOffset = 7,
            downstreamOffset = 7
        )
        head(seq@peptides)
    })
}

## In following example, the anchoring AAs are lower-case "s" for amino acid 
## serine.
if(interactive())
{
   try({
       dat <- read.csv(system.file("extdata", "peptides4dagLogo.csv",
                                   package = "dagLogo"))
        mart <- useMart("ensembl")
        human_mart <-
            useDataset(mart = mart, dataset = "hsapiens_gene_ensembl")
        seq <- fetchSequence(IDs = toupper(as.character(dat$symbol)),
                             type = "hgnc_symbol",
                             anchorAA = "s",
                             anchorPos = as.character(dat$peptides),
                             mart = human_mart,
                             upstreamOffset = 7,
                             downstreamOffset = 7)
        head(seq@peptides)
    })
}



jianhong/dagLogo documentation built on Jan. 3, 2025, 10:31 p.m.