AIRR-seq)

Description Usage Arguments Details Value Examples

CDHIT is a greedy algorithm to cluster amino acid or DNA sequences based on a minimum identity. By default, in this package it is configured perform ungapped, global alignments with no clipping at start or end. The identity is the number of identical characters in alignment divided by the full length of the shorter sequence. Set s < 1 to change the minimum coverage of the shorter sequence, which will allow clipping at start or end. Changing G = 0 changes the meaning of the identity to be the number of identical characters in the alignment divided by the length of the alignment. In this case, you must also set the alignment coverage controls aL, AL, aS, AS.

cdhit(
  seqs,
  identity = NULL,
  kmerSize = NULL,
  min_length = 6,
  s = 1,
  only_index = FALSE,
  showProgress = interactive(),
  ...
)

`seqs`	`AAseq` or `DNAseq`
`identity`	minimum proportion identity
`kmerSize`	word size. If NULL, it will be chosen automatically based on the identity. You may need to lower it below 5 for AAseq with identity less than .7.
`min_length`	Minimum length for sequences to be clustered. An error if something smaller is passed.
`s`	fraction of shorter sequence covered by alignment.
`only_index`	if TRUE only return the integer cluster indices, otherwise return a tibble.
`showProgress`	show a status bar
`...`	other arguments that can be passed to cdhit, see https://github.com/weizhongli/cdhit/wiki/3.-User's-Guide#CDHIT for details. These will override any default values.

CDHit is by Fu, Niu, Zhu, Wu and Li (2012). The R interface is originally by Thomas Lin Pedersen and was transcribed here because it is not exported from the package FindMyFriends, which is orphaned.

vector of integer of length seqs providing the cluster ID for each sequence, or a tibble. See details.

fasta_path = system.file('extdata', 'demo.fasta', package='CellaRepertorium')
aaseq = Biostrings::readAAStringSet(fasta_path)
# 100% identity, global alignment
cdhit(aaseq, identity = 1, only_index = TRUE)[1:10]
# 100% identity, local alignment with no padding of endpoints
cdhit(aaseq,identity = 1, G = 0, aL = 1, aS = 1,  only_index = TRUE)[1:10]
# 100% identity, local alignment with .9 padding of endpoints
cdhit(aaseq,identity = 1, G = 0, aL = .9, aS = .9,  only_index = TRUE)[1:10]
# a tibble
tbl = cdhit(aaseq, identity = 1, G = 0, aL = .9, aS = .9, only_index = FALSE)

 [1] 100 101 162 102   6 245 103  49 163 164
 [1] 100 101 162 102   6 245 103  49 163 164
 [1] 100 101 162 102   6 245 103  49 163 164

CellaRepertorium documentation built on Nov. 8, 2020, 7:48 p.m.

CellaRepertorium index

README.md An Introduction to CellaRepertorium Clustering and differential usage of repertoire CDR3 sequences Combining Repertoire with Expression with SingleCellExperiment Quality control and Exploration of UMI-based repertoire data

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

CellaRepertorium
Data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq)

cdhit: R interface to CDHIT/CDHITest
In CellaRepertorium: Data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq)

Description

Usage

Arguments

Details

Value

Examples

Example output

Related to cdhit in CellaRepertorium...

R Package Documentation

Browse R Packages

We want your feedback!

CellaRepertorium Data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq)

cdhit: R interface to CDHIT/CDHITest In CellaRepertorium: Data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq)

Description

Usage

Arguments

Details

Value

Examples

Example output

Related to cdhit in CellaRepertorium...

R Package Documentation

Browse R Packages

We want your feedback!

CellaRepertorium
Data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq)

cdhit: R interface to CDHIT/CDHITest
In CellaRepertorium: Data structures, clustering and testing for single cell immune receptor repertoires (scRNAseq RepSeq/AIRR-seq)