map_reanno: Aligning sequences against references using bowtie
In Danis102/seqpac: Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts

map_reanno

R Documentation

Aligning sequences against references using bowtie

Description

Aligns sequences in a PAC list object against specified references generating summarized report files in the destination folder.

Usage

map_reanno(
  PAC,
  type = "internal",
  output_path,
  ref_paths,
  mismatches = 3,
  threads = 1,
  parse_external = "-a -f",
  parse_internal = "a=TRUE, f=TRUE",
  import = "genome",
  keep_temp = FALSE,
  override = FALSE
)

Arguments

`PAC`	PAC-list object containing an Anno data.frame with sequences as row names.
`type`	Character indicating if mapping should be performed by calling the internal bowtie function of the Rbowtie package (type="internal"), or if bowtie should be called externally through a system call for a locally installed bowtie (type="external").
`output_path`	Character indicating the path to the destination folder for the files generated by the function.
`ref_paths`	List of file paths (character) indicating the full paths to each fasta reference file. It is important to carefully name each reference path. The names will appear as they are named in the final annotation table. Thus, if ref_paths=list(tRNA="<path_tRNA_ref>", miRNA="<path_piRNA_ref>") mapping to these fasta references will appear as "tRNA" and "miRNA", respectively. Note: All reference fasta files must have bowtie indexes using `Rbowtie::bowtie_build`.
`mismatches`	Integer indicating the maximum number of mismatches allowed in the alignments. The function currently supports no more than 3 mismatches (botwie max).
`threads`	Integer indicating the number of parallel processes to be used.
`parse_external`	One character string specifying the additional options parsed to externally installed bowtie when type="external". The string follows similar rules as '...' parsing. More information on the formatting use: `system("bowtie --manual", intern=TRUE, ignore.stderr=FALSE)`. If this command fails you probably do not have bowtie correctly installed. As default parse_external= "-a -f".
`parse_internal`	One character string specifying the additional options parsed to the bowtie function in the Rbowtie package when type="internal". The string follows similar rules as '...' parsing. See `Rbowtie::bowtie` for details on the format. As default: parse_internal= "a=TRUE, f=TRUE".
`import`	Character or a list. If `import="genome"` mapping is done against a reference genome and genomic coordinates are acquired. If import="biotype", mapping is done against a specialized fasta reference (e.g. Ensembl_ncrna, pirBase etc), where genomic coordinates is not required because classification will be performed on a match-or-no-match basis. A list of exactly 3 objects, named "coord", "report" and "reduce" can also be provided. This list will be parsed to the `import_reanno` function. When import="genome", the list `import=list(coord=TRUE, report="full", reduce=NULL)` is automatically parsed, while when import="biotype" the list parsed is `import=list(coord=FALSE, report="full", reduce=NULL)`. Performance increases by setting coord=FALSE. See `import_reanno` for more information on how to set `report` and `reduce` for increased performance when extremely large and repetitive references are used, such as pirBase and repeatMasker.
`keep_temp`	Logical whether or not bowtie output files temporarily stored in the output path should be deleted. Note, this option is only used for troubleshooting. The bowtie output files are named as the reference files and are overwritten in each mismatch cycle. Thus, for safe saving of mismatch 0 bowtie output make sure that `mismatches=0`. If not, the mismatch 1 cycle will overwrite the bowtie files.
`override`	Logical whether or not the function should prompt you for a question if there are files in output_path. As default, override=FALSE will prevent deleting large files by accident, but requires an interactive R session. Setting override=TRUE may solve non-interactive problems.

Details

Given a PAC object this function will extract the read sequences and align them against single or multiple fasta reference file(s). Summarized output will be saved as .Rdata files in the destination folder. In its default setting the function will only report hit or no hit in up to 3 mismatches, but this can easily be changed between 0-3 mismatches, to include alignment coordinates and reference name extractions. Information from the .Rdata files (Full_reanno_mis0/1/2/3.Rdata) can be extracted and added to a PAC object using make_reanno and add_reanno functions. For increased compatibility across platforms, the function provides both R internal bowtie parsing (thorugh the Rbowtie package), as well as external parsing to a locally installed version of bowtie.

Value

Will primarily generate .Rdata files in the destination folder (output_path) containing summarized information about the reference alignments. One file is generated for every mismatch specified in mismatches. The make_reanno function can then be used to extract and generate annotation tables for a PAC list object. Large temporary bowtie input and output files will also be generated in the destination folder, but are removed unless temp_remove=FALSE.

Examples


######################################################### 
##### Simple example for reference mapping 
##### Please see manual for ?simply_reanno an ?add_reanno for more advanced 
##### mapping
closeAllConnections()
######################################################### 
##### Create an reanno object

### First, if you haven't already generated Bowtie indexes for the included
# fasta references you need to do so. If you are having problem see the small
# RNA guide (vignette) for more info.
                                                             
 ## tRNA:
 trna_file <- system.file("extdata/trna", "tRNA.fa", 
                          package = "seqpac", mustWork = TRUE)
 trna_dir <- dirname(trna_file)
 
 if(!sum(stringr::str_count(list.files(trna_dir), ".ebwt")) ==6){     
     Rbowtie::bowtie_build(trna_file, 
                           outdir=trna_dir, 
                           prefix="tRNA", force=TRUE)
                           }
 ## rRNA:
 rrna_file <- system.file("extdata/rrna", "rRNA.fa", 
                          package = "seqpac", mustWork = TRUE)
 rrna_dir <- dirname(rrna_file)
 
 if(!sum(stringr::str_count(list.files(rrna_dir), ".ebwt")) ==6){
     Rbowtie::bowtie_build(rrna_file, 
                           outdir=rrna_dir, 
                           prefix="rRNA", force=TRUE)
                           }
                           
                           
##  Then load a PAC-object and remove previous mapping from anno:
 load(system.file("extdata", "drosophila_sRNA_pac_filt_anno.Rdata", 
                   package = "seqpac", mustWork = TRUE))
 anno(pac) <- anno(pac)[,1, drop = FALSE]
 
 ref_paths <- list(trna= trna_file, rrna= rrna_file)

                                    
##  You may add an output path of your choice, but here we use a temp folder:
 output <- paste0(tempdir(),"/seqpac/test")


##  Then map the PAC-object against the fasta references. 
# Warning: if you use your own data, you may want to use override=FALSE, to avoid
# deleting previous mapping by mistake.

map_reanno(pac, ref_paths=ref_paths, output_path=output,
               type="internal", mismatches=0,  import="biotype", 
               threads=2, keep_temp=FALSE, override=TRUE)
 
##  Then import and generate a reanno-object of the temporary bowtie-files
reanno_biotype <- make_reanno(output, PAC=pac, mis_fasta_check = TRUE)
 
                                    
## Now make some search terms against reference names to create shorter names
# Theses can be used to create factors in downstream analysis
# One search hit (regular expressions) gives one new short name 

bio_search <- list(
              rrna=c("5S", "5.8S", "12S", "16S", "18S", "28S", "pre_S45"),
              trna =c("_tRNA", "mt:tRNA"))
 
 
## You can merge directly with your PAC-object by adding your original 
# PAC-object, that you used with map_reanno, to merge_pac option.

pac <- add_reanno(reanno_biotype, bio_search=bio_search, 
                       type="biotype", bio_perfect=FALSE, 
                       mismatches = 0, merge_pac=pac)

table(anno(pac)$mis0_bio) 

############################################################################ 
## Similarly, you can use Bowtie indexed genome fasta references
## But don't forget to use import="genome" for coordinate import
#    
# ref_paths <- list(genome="<path_to_bowtie_indexed_fasta>")
# output_genome <- "<your_path_to_output_folder>"

## We can actually map against several genomes simultaneously:
# (to illustrate the principle we run the mycoplasma genome twice)

 mycoplasma_file <- system.file("extdata/mycoplasma_genome",
                                "mycoplasma.fa",
                                package = "seqpac", mustWork = TRUE)
 mycoplasma_dir<- gsub("mycoplasma.fa", "", mycoplasma_file)
 
## Don't forget to create bowtie indexes
 if(!sum(stringr::str_count(list.files(mycoplasma_dir), ".ebwt")) ==6){
     Rbowtie::bowtie_build(mycoplasma_file, 
                           outdir=mycoplasma_dir, 
                           prefix="mycoplasma", force=TRUE)
                           }
                           
 ref_paths <- list(genome1=mycoplasma_file, genome2=mycoplasma_file)
 map_reanno(PAC=pac, ref_paths=ref_paths, output_path=output, 
            type="internal", mismatches=0, import="genome", 
            threads=2, keep_temp=TRUE, override=TRUE)
 reanno_genome <- make_reanno(output, PAC=pac, mis_fasta_check = TRUE)
 
 table(overview(reanno_genome)$Any) # Low Mycoplasma contamination
 
############################################################################

Danis102/seqpac documentation built on Aug. 26, 2023, 10:15 a.m.

Danis102/seqpac index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Danis102/seqpac
Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts

map_reanno: Aligning sequences against references using bowtie
In Danis102/seqpac: Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts

Aligning sequences against references using bowtie

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to map_reanno in Danis102/seqpac...

R Package Documentation

Browse R Packages

We want your feedback!

Danis102/seqpac Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts

map_reanno: Aligning sequences against references using bowtie In Danis102/seqpac: Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts

Aligning sequences against references using bowtie

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to map_reanno in Danis102/seqpac...

R Package Documentation

Browse R Packages

We want your feedback!

Danis102/seqpac
Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts

map_reanno: Aligning sequences against references using bowtie
In Danis102/seqpac: Seqpac: A Framework for smallRNA analysis in R using Sequence-Based Counts