Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

In bioinformatics, the more cutting-edge your research direction, the more problems you’ll face from the informatics side. Even when papers are published with excellent results, and the original authors share their code or even provide ready-to-use software tools, it doesn’t mean we can easily use these existing resources for reproduction or further research. Chaotic environment setup is just one aspect - more often than not, since software authors aren’t professional software engineers, we should be grateful if the tool just works. We can’t expect these software to be bug-free, nor can we expect them to have decent performance (unless performance was a development goal). Even tools from well-established labs aren’t free to these issues, such as… Azimuth.

When using pixi to manage bioinformatics analysis environments, we often encounter issues where some Bioconductor R packages show missing dependencies after installation. The exact cause of this problem is currently unclear. After using pixi for a year, this issue still hasn’t been fixed (as of October 2025). This article introduces how to use pixi’s tasks feature to resolve such problems.

In our current single-cell analysis pipeline based on scanpy, there’s a step that requires saving AnnData objects in loom format. However, unlike saving to h5ad format, when we write an AnnData object to a loom file without any special handling and then read it back, we find that the index information of obs and var (typically cell barcodes and gene names) is lost, and these indices become ordinary numeric identifiers.

Milo is a differential abundance analysis method for single-cell RNA sequencing data that can detect compositional changes in cell neighborhoods across different conditions.

In cancer research, comparing genomic features between tumor samples and organoid models is crucial for validating model reliability. Circos plots provide an intuitive way to visualize detected mutations across the genome, making them commonly used for representing overall detection results of representative samples. While reading the circlize documentation, I came across an example demonstrating paired samples, which I found suitable for showcasing paired primary samples and organoids. I’ve adapted it to create a plot for displaying paired samples. The code is primarily based on the official documentation’s 9.5 Concatenating two genomes

Looking at the creation time of this draft, it was actually June 2024… Now it’s June 2025, and I suddenly understand why so many content creators become “pigeons” (procrastinators). Starting projects is fun, but finishing them is painful… This is one of my few pure bioinformatics posts…

The problem originated last year when I needed to run monocle3 for pseudotime analysis, but encountered an annoying issue at the final stage. In monocle3, the starting point for pseudotime trajectory needs to be manually specified by the analyst. During R code execution, it automatically opens a browser where users need to specify the starting point on a temporary webpage, then close the page for the analysis to continue.

However, Jupyter’s irkernel doesn’t support this feature. This means I couldn’t complete the analysis directly in Jupyter notebook. This issue was first reported in 2019, but even by 2024 when I needed to do the analysis - five years later - there was still no solution…

Bioinformatics is an interdisciplinary field, and the toolkit or technology stack used in bioinformatics is also quite “interdisciplinary”. The level of fragmentation is, in my opinion, absolutely not less than that of Linux distributions… This also brings us a common challenge: the deployment of bioinformatics analysis environments.

mSINGS is a software tool used to detect MSI (Microsatellite Instability). Its advantage seems to be that it can be used for tumor-only samples.

Damn, I finally successfully used this feature… I’m really stupid…

MultiQC is a tool for NGS data quality control. Unlike many other tools, it does not directly obtain and calculate metrics; instead, it reads results files from various common quality control tools and provides comprehensive reporting.