Description Usage Arguments Details Value Author(s) See Also Examples
Create read alignments against reference genome and optional auxiliary targets if not yet existing. If necessary, also build target indices for the aligner.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
sampleFile |
the name of a text file listing input sequence files and sample names (see ‘Details’). |
genome |
the reference genome for primary alignments, one of:
|
auxiliaryFile |
the name of a text file listing sequences to be used as additional targets for alignment of reads not mapping to the reference genome (see ‘Details’). |
aligner |
selects the aligner program to be used for aligning the
reads. Currently, only “Rbowtie” and “Rhisat2” are supported,
which are R wrapper packages for ‘bowtie’ / ‘SpliceMap’ and
‘hisat2’, respectively (see |
maxHits |
sets the maximal number of allowed mapping positions
per read (default: 1). If a read produces more than |
paired |
defines the type of paired-end library and can be set to
one of |
splicedAlignment |
if
|
snpFile |
the name of a text file listing single nucleotide polymorphisms to be used for allele-specific alignment and quantification (see ‘Details’). |
bisulfite |
for bisulfite-converted samples (Bis-seq), the type of bisulfite library (“dir” for directional libraries, “undir” for undirectional libraries). |
alignmentParameter |
a optional string containing command line
parameters to be used for the aligner, to overrule the default
alignment parameters used by |
projectName |
an optional name for the alignment project. |
alignmentsDir |
the directory to be used for storing alignments
(bam files). If set to |
lib.loc |
can be used to change the default library path of
R. The library path is used by |
cacheDir |
specifies the location to store (potentially huge)
temporary files. If set to |
clObj |
a cluster object, created by the package parallel, to enable parallel processing and speed up the alignment process. |
checkOnly |
if |
geneAnnotation |
Only used if |
Before generating new alignments, qAlign
looks for previously
generated alignments as well as for an aligner index. If no aligner
index exists, it will be automatically created and stored in the same
directory as the provided fasta file, or as an R package in the case
of a BSgenome reference. The name of this R package will be the same
as the BSgenome package name, with an additional suffix from the
aligner (e.g. BSgenome.Hsapiens.UCSC.hg19.Rbowtie
). The
generated bam files contain both aligned und unaligned reads. For
paired-end samples, by default no alignments will be reported for
read pairs where only one of the reads could be aligned.
sampleFile
is a tab-delimited text file listing all the input
sequences to be included in a given analysis. The file has either two
(single-end) or three columns (paired-end). The first row contains the
column names, and additional rows contain relative or absolute path
and name of input sequence file(s), as well as the according sample
name. Three input file formats are supported (fastq, fasta and
bam). All input files in one sampleFile
need to be in the same
format, and are recognized by their extension (.fq, .fastq, .fa,
.fasta, .fna, .bam), in raw or compressed form (e.g. .fastq.gz). If
bam files are provided, then no alignments are generated by
qAlign
, and the alignments contained in the bam files will be
used instead.
The column names in sampleFile
have to match to the ones in the
examples below, for a single-read experiment:
FileName | SampleName |
chip_1_1.fq.bz2 | Sample1 |
chip_2_1.fq.bz2 | Sample2 |
and for a paired-end experiment:
FileName1 | FileName2 | SampleName |
rna_1_1.fq.bz2 | rna_1_2.fq.bz2 | Sample1 |
rna_2_1.fq.bz2 | rna_2_2.fq.bz2 | Sample2 |
The “SampleName” column is the human-readable name for each
sample that will be used as sample labels. Multiple sequence files may
be associated to the same sample name, which instructs QuasR
to
combine those files.
auxiliaryFile
is a tab-delimited text file listing one or
several additional target sequence files in fasta format. Reads that
do not map against the reference genome will be aligned against each
of these target sequence files. The first row contains the column
names which have to match to the ones in the example below:
FileName | AuxName |
NC_001422.1.fa | phiX174 |
snpFile
is a tab-delimited text file without a header and
contains four columns with chromosome name, position, reference allele
and alternative allele, as in the example below:
chr1 | 8596 | G | A |
chr1 | 18443 | G | A |
chr1 | 18981 | C | T |
chr1 | 19341 | G | A |
The reference and alternative alleles will be injected into the
reference genome, resulting in two separate genomes. All reads will be
aligned separately to both of these genomes, and the alignments will
be combined, only retaining the best alignment for each read. In the
final alignment, each read will be marked with a tag that classifies
it into reference (R
), alternative (A
) or unknown
(U
), if the reads maps equally well to both genomes.
If bisulfite
is set to “dir” or “undir”, reads
will be C-to-T converted and aligned to a similarly converted genome.
If alignmentParameter
is NULL
(recommended),
qAlign
will select default parameters that are suitable for the
experiment type. Please note that for bisulfite or allele-specific
experiments, each read is aligned multiple times, and resulting
alignments need to be combined. This requires special settings for the
alignment parameters that are not recommended to be changed. For
‘simple’ experiments (neither bisulfite, allele-specific, nor
spliced), alignments are generated using the parameters -m
maxHits --best --strata
. This will align reads with up to
“maxHits” best hits in the genome and selects one of them randomly.
A qProject
object.
Anita Lerch, Dimos Gaidatzis, Charlotte Soneson and Michael Stadler
qProject
,
makeCluster
from package parallel,
Rbowtie-package
package,
Rhisat2-package
package
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.