Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

mSINGS is a software tool used to detect MSI (Microsatellite Instability). Its advantage seems to be that it can be used for tumor-only samples.

mSINGS project is hosted on Bitbucket instead of GitHub, and the project has been continuously updated… However, the installation guide written by the author seems a bit too technical… And it also does not have an all-in-one installer, so following the documentation for installation can be a bit confusing…

The core of this project is written in Python, and its core functionality involves reading mpileup files generated by samtools for analysis. Therefore, the essential dependencies should be Python3.6 and samtools. (The git is probably used to clone the project.) Since the author specifies the Python version, he recommends using a Python virtual environment (to prevent future explosions). However, since I am not familiar with virtualenv, I will use miniconda instead, which is easier to get started. In summary: creating an environment/installing necessary dependencies -> installing the module -> testing.

  1. Creating an Environment/Installing Necessary Dependencies
1
conda create -p /path/to/soft/mSINGS/conda python=3.6 git samtools
  1. Installing the Module
1
2
3
4
conda activate /path/to/soft/mSINGS/conda           # Enter the environment to install the module itself
git clone https://bitbucket.org/uwlabmed/msings.git # Clone the project as recommended by the author
cd msings # Enter the directory
python setup.py install # Install the software itself
  1. Testing

The author mentioned how to create a baseline (which should determine which MSI loci need scanning based on existing data). However, since I am just testing, I will use the files prepared by the project (doc/ directory). Additionally, it’s important to note that the author clearly stated that the input BAM file needs to be aligned to a reference genome without chr strings (provided by GATK), so I found an FQ data for alignment. If the BAM file already meets the requirements, you can skip this step.

Another thing to note is that the author specifies in the script run_msings.sh that it activates the virtual environment. Since I did not follow his instructions, the line # source msings-env/bin/activate in the script needs to be commented out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
set -e

# Do not do sort, because mSINGS' analysis process includes sorting...
/path/to/apps/bwa-0.7.12/bwa mem \
-R '@RG\tID:group1\tSM:TUMOR\tPL:illumina\tLB:lib1\tPU:unit1' \
-M -t 16 \
/path/to/Human_Genome_GRCh37_FASTA/human_g1k_v37.fasta \
/path/to/soft/mSINGS/run_test/test.fq.R1.gz \
/path/to/soft/mSINGS/run_test/test.fq.R2.gz \
| /usr/local/bin/samtools view \
-Sb - \
> test.bam

echo "test.bam" > bam_list # Because the script specified that a file with written BAM paths is required, I can only specify it like this
sh /path/to/mSINGS/msings/scripts/run_msings.sh \
bam_list \
/path/to/mSINGS/msings/doc/mSINGS_TCGA.bed \
/path/to/mSINGS/msings/doc/mSINGS_TCGA.baseline \
/path/to/Human_Genome_GRCh37_FASTA/human_g1k_v37.fasta

The final result will be in the Combined_MSI.txt file, which should contain the results of each BAM file combined together. Of course, I only tested with one BAM file, so I’m not sure what the structure of the merged results will look like…

I tested this sample with 1,166 target MSI loci, and 27 were detected as unstable, representing 2% instability, resulting in a MSS (Microsatellite Stable) status.

1
2
3
4
5
Position        test
unstable_loci 27
passing_loci 1166
msing_score 0.0232
msi status NEG

Finally, let’s take a look at the author’s script run_msings.sh:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#!/bin/bash
set -e
# source msings-env/bin/activate

# BAM_LIST is a file of absolute paths to each bam file
BAM_LIST=$1;
BEDFILE=$2;
MSI_BASELINE=$3;
REF_GENOME=$4;

# Check for required variables:
if [ -z "$BAM_LIST" ]; then echo "BAM_LIST is unset" && exit ; else echo "BAM_LIST is set to '$BAM_LIST'"; fi
if [ -z "$BEDFILE" ]; then echo "BEDFILE is unset" && exit ; else echo "BEDFILE is set to '$BEDFILE'"; fi
if [ -z "$MSI_BASELINE" ]; then echo "MSI_BASELINE is unset" && exit ; else echo "MSI_BASELINE is set to '$MSI_BASELINE'"; fi
if [ -z "$REF_GENOME" ]; then echo "REF_GENOME is unset" && exit ; else echo "REF_GENOME is set to '$REF_GENOME'"; fi

# "multiplier" is the number of standard deviations from the baseline required to call instability
multiplier=2.0
# "msi_min_threshold" is the maximum fraction of unstable sites allowed to call a specimen MSI negative
msi_min_threshold=0.2
# "msi_max_threshold" is the minimum fraction of unstable sites allowed to call a specimen MSI positive
msi_max_threshold=0.2

for BAM in `sed '/^$/d' $BAM_LIST`; do
SAVEPATH=$(dirname $BAM)
BAMNAME=$(basename $BAM)
PFX=${BAMNAME%.*}

mkdir -p $SAVEPATH/$PFX

echo “Starting Analysis of $PFX” >> $SAVEPATH/$PFX/msi_run_log.txt;
date +"%D %H:%M" >> $SAVEPATH/$PFX/msi_run_log.txt;

echo "sorting bam of $PFX" >> $SAVEPATH/$PFX/msi_run_log.txt;
date +"%D %H:%M" >> $SAVEPATH/$PFX/msi_run_log.txt;
samtools sort -o $SAVEPATH/$PFX/$PFX.sorted.bam $BAM && samtools index $SAVEPATH/$PFX/$PFX.sorted.bam

echo "Making mpileups" >> $SAVEPATH/$PFX/msi_run_log.txt;
date +"%D %H:%M" >> $SAVEPATH/$PFX/msi_run_log.txt;
samtools mpileup -f $REF_GENOME -d 100000 -A -E -l $BEDFILE $SAVEPATH/$PFX/$PFX.sorted.bam | awk '{if($4 >= 6) print $0}' > $SAVEPATH/$PFX/$PFX.mpileup

echo "MSI Analyzer start" >> $SAVEPATH/$PFX/msi_run_log.txt;
date +"%D %H:%M" >> $SAVEPATH/$PFX/msi_run_log.txt;

msi analyzer $SAVEPATH/$PFX/$PFX.mpileup $BEDFILE -o $SAVEPATH/$PFX/$PFX.msi.txt

echo "MSI calls start" >> $SAVEPATH/$PFX/msi_run_log.txt;
date +"%D %H:%M" >> $SAVEPATH/$PFX/msi_run_log.txt;

msi count_msi_samples $MSI_BASELINE $SAVEPATH/$PFX -m $multiplier -t $msi_min_threshold $msi_max_threshold -o $SAVEPATH/$PFX/$PFX.MSI_Analysis.txt

echo “Completed Analysis of $PFX” >> $SAVEPATH/$PFX/msi_run_log.txt;
date +"%D %H:%M" >> $SAVEPATH/$PFX/msi_run_log.txt;

done

echo "Creating summary analysis file for all samples" >> $SAVEPATH/msi_run_log.txt;
msi count_msi_samples $MSI_BASELINE $SAVEPATH -m $multiplier -t $msi_min_threshold $msi_max_threshold -o $SAVEPATH/Combined_MSI.txt

It looks a bit long, but the content is actually quite simple. The author also provided some basic comments. Essentially, it involves sorting each input BAM file and generating mpileup files, then using the msi program (which was installed by python setup.py install) for analysis and data aggregation.
```

Comments

Please leave your comments here