merge_lanes: Merge fastq lane files

View source: R/merge_lanes.R

merge_lanesR Documentation

Merge fastq lane files

Description

merge_lanes Identifies, reads and merges fastq lane files.

Usage

merge_lanes(in_path, out_path, threads = 1, nlanes = NULL)

Arguments

in_path

Character string with the path to lane files that should be merged. The names of the lane files should follow the convention: 'sample1_lane1.fastq.gz, sample1_lane2.fastq.gz, sample2_lane1.fastq.gz, etc', where the first part of the name indicate the sample while the second part indicate lanes. The function will automatically try to identify sample names based on that the second part is "_lane" or "_L00". The function only takes fastq.gz compressed files.

out_path

Character string with the path to destination folder for merged fastq.gz files.

threads

Integer indicating the number of parallel processes that should be used.

nlanes

Integer indicating the number of lanes that should be merged. This works as a safety measure to ensure correct lane merging if names are long or complicated. Default=NULL.

Details

Given an input path were flow cell lane files in fastq format (typically generated by for example Illumina sequencers) this function will try to identify unique samples among the lane files and merge all lane files for each sample. The function uses md5 checks to control that each lane file is unique (no accidental copies) and will compare expected outcomes given that all samples should have the same number of lanes (comming from the same experiment). If this function does not work, try defining number of lanes to be merged by nlanes.

Value

Merged fastq files in destination folder.

See Also

https://github.com/Danis102 for updates on the current package.

Other PAC generation: PAC_check(), create_PAC(), make_PAC(), make_counts(), make_cutadapt(), make_pheno(), make_trim()

Examples


## The simple principle: 
# in_path <- "/some/path/to/lane/files/fastq.gz"
# out_path <- "/some/path/to/merged/files/"
# merge_lanes(in_path, out_path, threads=12)


## Real example
# First generate some correct file names (see above).
sys_path = system.file("extdata", package = "seqpac", mustWork = TRUE)
fq <- list.files(path = sys_path, pattern = "fastq", all.files = FALSE,
                full.names = TRUE)
 
# Create an output folder
input <- paste0(tempdir(), "/lanes/")
output <- paste0(tempdir(), "/merged/")
dir.create(input, showWarnings=FALSE)
dir.create(output, showWarnings=FALSE)
# Fix compatible file names
file.copy(from = fq, to = input)
old_fls <- list.files(input, full.names=TRUE)
new_sample <- c(rep("sample1_", times=3), rep("sample2_", times=3))
new_lane <- rep(c("lane1","lane2","lane3"), times=2)
new_fls <- paste0(input,new_sample, new_lane, ".fastq.gz")
file.rename(from = old_fls, to = new_fls)

# Then merge the fastq files
merge_lanes(input, output, threads=2)

# You will find the files in:
input
output


##-----------------------------------------##
## Warning: Clean up temp folder           ##
# (Sometimes needed for automated examples) 

closeAllConnections()
fls_temp  <- tempdir()
fls_temp  <- list.files(fls_temp, recursive=TRUE, full.names = TRUE)
suppressWarnings(file.remove(fls_temp))


Danis102/seqpac documentation built on Aug. 26, 2023, 10:15 a.m.