---
title: "Synteny analysis testing"
author: "Lakhansing Pardeshi"
date: "`r Sys.Date()`"
format:
html:
embed-resources: true
df-print: paged
knitr:
opts_chunk:
fig.height: 7
---
This notebook demonstrate some examples of comparing prophage homology group
signatures to calculate syntenic Jaccard Index.
## Initial setup
```{r}
#| label: setup
#| warning: false
suppressPackageStartupMessages(library(tidyverse))
rm(list = ls())
source("https://raw.githubusercontent.com/lakhanp1/omics_utils/main/RScripts/utils.R")
source("scripts/utils/compare_hg_sets.R")
################################################################################
set.seed(124)
useCase <- 1
```
## Parameters while detecting syntenic overlap
- Score for a match of homology group during DP: $match\_score = 5$
- Score for a mismatch of homology group during DP: $gap\_penalty = -2$
- Maximum number of consecutive mismatches allowed in a valid LCS: $maximum\_gap\_length = 2$
- Minimum LCS required for a valid LCS: $minimum\_chain\_length = 5$
## Testing various use cases for synteny analysis
### Use case `r useCase`
Detecting a longest common subsequence (LCS) between two sequence while allowing
for gaps.
```{r}
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "r", "s", "t")
seq2 <- c("p", "q", "a", "b", "c", "d", "t", "f", "g", "h", "i", "l", "m", "n", "x", "y", "z", "i", "j", "k")
lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```
### Use case `r useCase`
```{r}
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "x", "y", "z", "i", "j", "k")
lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```
### Use case `r useCase`
LCS from the 5' end of sequence. A longer LCS can be found if 3 gaps are allowed.
```{r}
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "x", "y", "z", "i", "j", "k", "l", "m")
lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```
### Use case `r useCase`
LCS from the 3' end of the sequence. A longer LCS can be found if 3 gaps are allowed.
```{r}
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("a", "b", "c", "d", "e", "x", "y", "z", "f", "g", "h", "i", "j", "k", "l", "m")
lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```
### Use case `r useCase`
Change `maxGapLen` parameter to increse the syntenic chain length by allowing
longer gaps than the default of 2.
```{r}
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- rev(c("a", "b", "c", "d", "x", "y", "z", "e", "f", "g", "h", "i", "j", "k", "l", "m"))
lcs <- syntenic_hg_overlap(ref = seq1, qur = seq2, maxGapLen = 3)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```
### Use case `r useCase`
Change `minChainLen` parameter to allow smaller syntenic matches.
```{r}
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("m", "n")
lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2, minChainLen = 2)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```
### Use case `r useCase`
```{r}
seq1 <- c("m", "n")
seq2 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2, minChainLen = 2)
```
```{r echo=FALSE}
cat(
"seq1: ", paste(seq1, collapse = " "), "\n",
"seq2: ", paste(seq2, collapse = " "), "\n", sep = ""
)
print_lcs(lcs)
useCase <- useCase + 1
```