match_and_calculate_positions: Match peptide sequence with provided sequence and calculate...

View source: R/sequence_matching_functions.R

match_and_calculate_positionsR Documentation

Match peptide sequence with provided sequence and calculate positions

Description

This function matches peptide sequences from the 'peptide_data' data frame to corresponding provided sequences in the 'whole_seq' data frame. It calculates the start and end positions of the matched sequences and returns a data frame with information about the matching positions.

Usage

match_and_calculate_positions(
  peptide_data,
  column,
  whole_seq,
  match_columns,
  sequence_length = NULL,
  column_keep = NULL
)

Arguments

peptide_data

A data frame containing peptide sequence information to match.

column

The name of the column in peptide_data containing the peptide sequences to be matched.

whole_seq

A data frame containing details about antibody sequence information including the domain and region information. 'Region_Sequence' column is required for the sequence information. Change the column name if it is different than 'Region_Sequence'.

match_columns

A character vector of column names to match on while matching peptide sequence.

sequence_length

(Optional) The sequence length range of peptide that we want to keep in the result. (e.g. c(1, 5) will include peptide sequence length from 1 to 5.)

column_keep

(Optional) The name of the columns in peptide_data to keep in result data frame.

Value

A data frame with columns from 'peptide_data' and 'whole_seq' indicating the matched positions and related information.

Examples

peptide_data <- data.frame(
  Sequence = c("AILNK", "BXLMR", "JJNXX", "DDEEF"),
  Condition_1 = c("Drug1", "Drug1", "Drug2", "Drug2"),
  Condition_2 = c("Donor1", "Donor2", "Donor1", "Donor2"),
  Region_1 = c("VH", "VL", "VH", "VL"),
  Region_2 = c("Arm_1", "Arm_2", "Arm_1", "Arm_2"),
  Area = c(100, 2, 4, NA)
)
whole_seq <- data.frame(
  Region_Sequence = c(
    "XYZAILNKPQR",
    "ABCBXLMRDEF",
    "GHIJJNXXKLM",
    "NOPDDEEFQRS",
    "AILXKPQR",
    "BNJLMRDEF",
    "ILNXXKLM",
    "DDEEXQRS",
    "XYZAAA",
    "XYZCCC",
    "XYZBBB",
    "XYZDDD",
    "XYZAAB",
    "XYZCCD",
    "XYZBBB",
    "XYZDDD"
  ),
  Condition_1 = c(
    "Drug1",
    "Drug1",
    "Drug2",
    "Drug2",
    "Drug1",
    "Drug1",
    "Drug2",
    "Drug2",
    "Drug1",
    "Drug1",
    "Drug2",
    "Drug2",
    "Drug1",
    "Drug1",
    "Drug2",
    "Drug2"
  ),
  Condition_2 = c(
    "Donor1",
    "Donor1",
    "Donor1",
    "Donor1",
    "Donor1",
    "Donor1",
    "Donor1",
    "Donor1",
    "Donor2",
    "Donor2",
    "Donor2",
    "Donor2",
    "Donor2",
    "Donor2",
    "Donor2",
    "Donor2"
  ),
  Region_1 = c(
    "VH",
    "VL",
    "VH",
    "VL",
    "VH",
    "VL",
    "VH",
    "VL",
    "VH",
    "VL",
    "VH",
    "VL",
    "VH",
    "VL",
    "VH",
    "VL"
  ),
  Region_2 = c(
    "Arm_1",
    "Arm_1",
    "Arm_1",
    "Arm_1",
    "Arm_2",
    "Arm_2",
    "Arm_2",
    "Arm_2",
    "Arm_1",
    "Arm_1",
    "Arm_1",
    "Arm_1",
    "Arm_2",
    "Arm_2",
    "Arm_2",
    "Arm_2"
  )
)
match_columns <- c("Condition_1", "Condition_2", "Region_1")
column_keep <- c("Region_2")
sequence_length <- c(1, 5)
column <- "Sequence"
matching_result <- match_and_calculate_positions(peptide_data,
                                                 column,
                                                 whole_seq,
                                                 match_columns,
                                                 sequence_length,
                                                 column_keep)


PepMapViz documentation built on April 3, 2025, 6:29 p.m.