Synteny analysis testing

Author

Lakhansing Pardeshi

Published

July 20, 2025

This notebook demonstrate some examples of comparing prophage homology group signatures to calculate syntenic Jaccard Index.

Initial setup

Code
suppressPackageStartupMessages(library(tidyverse))

rm(list = ls())

source("https://raw.githubusercontent.com/lakhanp1/omics_utils/main/RScripts/utils.R")
source("scripts/utils/compare_hg_sets.R")
################################################################################
set.seed(124)

useCase <- 1

Parameters while detecting syntenic overlap

  • Score for a match of homology group during DP: match\_score = 5
  • Score for a mismatch of homology group during DP: gap\_penalty = -2
  • Maximum number of consecutive mismatches allowed in a valid LCS: maximum\_gap\_length = 2
  • Minimum LCS required for a valid LCS: minimum\_chain\_length = 5

Testing various use cases for synteny analysis

Use case 1

Detecting a longest common subsequence (LCS) between two sequence while allowing for gaps.

Code
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "r", "s", "t")
seq2 <- c("p", "q", "a", "b", "c", "d", "t", "f", "g", "h", "i", "l", "m", "n", "x", "y", "z", "i", "j", "k")

lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
seq1: a b c d e f g h i j k l m n o r s t
seq2: p q a b c d t f g h i l m n x y z i j k
LCS: a b c d f g h i l m n
Length: 11
Score: 47
Alignment:
a b c d - e f g h i j k l m n
a b c d t - f g h i - - l m n

Use case 2

Code
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "x", "y", "z", "i", "j", "k")

lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
seq1: a b c d e f g h i j k l m n o p q r s t
seq2: a b c d e f g h i x y z i j k
LCS: a b c d e f g h
Length: 8
Score: 40
Alignment:
a b c d e f g h
a b c d e f g h

Use case 3

LCS from the 5’ end of sequence. A longer LCS can be found if 3 gaps are allowed.

Code
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "x", "y", "z", "i", "j", "k", "l", "m")

lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
seq1: a b c d e f g h i j k l m n o p q r s t
seq2: a b c d e f g h i x y z i j k l m
LCS: a b c d e f g h
Length: 8
Score: 40
Alignment:
a b c d e f g h
a b c d e f g h

Use case 4

LCS from the 3’ end of the sequence. A longer LCS can be found if 3 gaps are allowed.

Code
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("a", "b", "c", "d", "e",  "x", "y", "z", "f", "g", "h", "i", "j", "k", "l", "m")

lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2)
seq1: a b c d e f g h i j k l m n o p q r s t
seq2: a b c d e x y z f g h i j k l m
LCS: f g h i j k l m
Length: 8
Score: 40
Alignment:
f g h i j k l m
f g h i j k l m

Use case 5

Change maxGapLen parameter to increse the syntenic chain length by allowing longer gaps than the default of 2.

Code
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- rev(c("a", "b", "c", "d", "x", "y", "z", "e", "f", "g", "h", "i", "j", "k", "l", "m"))

lcs <- syntenic_hg_overlap(ref = seq1, qur = seq2, maxGapLen = 3)
seq1: a b c d e f g h i j k l m n o p q r s t
seq2: m l k j i h g f e z y x d c b a
LCS: a b c d e f g h i j k l m
Length: 13
Score: 59
Alignment:
a b c d - - - e f g h i j k l m
a b c d x y z e f g h i j k l m

Use case 6

Change minChainLen parameter to allow smaller syntenic matches.

Code
seq1 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
seq2 <- c("m", "n")

lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2, minChainLen = 2)
seq1: a b c d e f g h i j k l m n o p q r s t
seq2: m n
LCS: m n
Length: 2
Score: 10
Alignment:
m n
m n

Use case 7

Code
seq1 <- c("m", "n")
seq2 <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")

lcs <- longest_local_subsequence(seq1 = seq1, seq2 = seq2, minChainLen = 2)
seq1: m n
seq2: a b c d e f g h i j k l m n o p q r s t
LCS: m n
Length: 2
Score: 10
Alignment:
m n
m n
Back to top