createExonByTranscriptCdf.AffymetrixCdfFile: Creates an exon-by-transcript CDF

createExonByTranscriptCdf.AffymetrixCdfFileR Documentation

Creates an exon-by-transcript CDF

Description

Creates an exon-by-transcript CDF based on the probesets defined in an "exon-only" CDF and transcript-exon mapping of a NetAffx probeset annotation data file.

Usage

## S3 method for class 'AffymetrixCdfFile'
createExonByTranscriptCdf(cdf, csv, tags=c("*"), path=getPath(cdf),
  type=c("all", "core", "extended", "full", "main", "control", "cds"), subsetBy=NULL,
  within=NULL, ..., overwrite=FALSE, verbose=FALSE)

Arguments

cdf

An AffymetrixCdfFile specifying an "exon-only" CDF, which defines the exon-specific probesets that will go into the new CDF. For more details, see below.

csv

An AffymetrixNetAffxCsvFile specifying the Affymetrix NetAffx CSV probeset annotation file that contains the transcript-exon mapping.

tags

Additional tags added to the filename of created CDF, i.e. <chiptype>,<tags>.cdf.

path

The output path where the custom CDF will be written.

type

A character string specifying the type of CDF to be written.

subsetBy

An optional character specifying the name of a column in the annotation file to subset against. The column will be parsed as the data type of argument within.

within

A vector of values accepted for the subsetBy column.

...

Additional arguments passed to readDataFrame() of AffymetrixNetAffxCsvFile, e.g. nrow.

overwrite

If TRUE, an existing CDF is overwritten.

verbose

...

Value

Returns an AffymetrixCdfFile.

Requirements for the "exon-only" CDF

The template CDF - argument cdf - should be an "exon-only" CDF: each unit has one group/probeset, which is the exon. An example of this is the "unsupported" HuEx-1_0-st-v2.cdf from Affymetrix, which has 1,432,154 units. Such "exon-only" CDFs do not contain information about clustering exons/probesets into gene transcripts. The CDF may also contain a number of non-exon probesets corresponding to control probes, which can contain very large numbers of probes per probeset. Such units are dropped/ignored by this method.

Ordering of transcripts and exons within transcripts

The transcripts (=units) will be ordered as they appear in the NetAffx annotation file. Within each transcript (=unit), the exons (=groups) are ordered lexicographically by their names.

Naming of transcripts and exons

In the created CDF, each unit corresponds to one transcript cluster, and each group within a unit corresponds to the exons within that transcript cluster. Thus, the unit names correspond to the transcript cluster names and the group names correspond to the exon names.

The exon names are defined by unit names of the exon-only CDF, whereas the transcript names are defined by the transcriptClusterId column in the NetAffx annotation data file. These transcript and exon names are often "non-sense" integers. In order to map these to more genome-friendly names, use the various annotations available in the NetAffx annotation data file.

Author(s)

Henrik Bengtsson adopted from createTranscriptCDF() written by Ken Simpson, Elizabeth Purdom and Mark Robinson.

Examples

## Not run: 
# The exon-only CDF
cdf <- AffymetrixCdfFile$byChipType("HuEx-1_0-st-v2")

# The NetAffx probeset annotation data file
csv <- AffymetrixNetAffxCsvFile("HuEx-1_0-st-v2.na24.hg18.probeset.csv", path=getPath(cdf))

# Create a CDF containing all core probesets:
cdfT <- createExonByTranscriptCdf(cdf, csv=csv, tags=c("*,HB20110911"))
print(cdfT)

# Create CDF containing the core probesets with 3 or 4 probes:
cdfT2 <- createExonByTranscriptCdf(cdf, csv=csv,
            tags=c("*,bySize=3-4,HB20110911"),
            subsetBy="probeCount", within=c("3", "4"))
print(cdfT2)

## End(Not run)

aroma.affymetrix documentation built on May 29, 2024, 4:32 a.m.