In bioinformatics, the more cutting-edge your research direction, the more problems you’ll face from the informatics side. Even when papers are published with excellent results, and the original authors share their code or even provide ready-to-use software tools, it doesn’t mean we can easily use these existing resources for reproduction or further research. Chaotic environment setup is just one aspect - more often than not, since software authors aren’t professional software engineers, we should be grateful if the tool just works. We can’t expect these software to be bug-free, nor can we expect them to have decent performance (unless performance was a development goal). Even tools from well-established labs aren’t free to these issues, such as… Azimuth.
Introduction to Azimuth
Azimuth is a single-cell data annotation tool developed by the Satija Lab, designed to simplify the Label Transfer process in Seurat and quickly perform Label Transfer on cells to be classified.
Problem Description
In the current latest 0.5.0 version, running AzimuthReference in Jupyter Notebook trigger an error: Error in ValidateAzimuthReference(object = object): Reference must contain an AzimuthData object in the tools slot.
A quick Google search reveals that this issue was first reported in April 2024 (issue #219). The user who reported the problem already provided a solution, but the issue remains open and unresolved (last week another user reported encountering the same problem).
Solution Approach
As mentioned earlier, issue reporter zacharyrs has already identified the problem. In commit b1b6895, the code author uses sys.calls() to determine the name of the currently called function and makes subsequent processing decisions based on this name.
1 | tool.name <- as.character(x = sys.calls()) # <-- sys.calls() here returns a list; if not calling AzimuthReference directly, it deletes some information from the object |
However, in practice, we often need to wrap AzimuthReference within functions. In such cases, the first element in the list returned by sys.calls() becomes the outermost function name, which causes the data required for Azimuth to run to be deleted, leading to errors in subsequent checks.
In my case, I learned for the first time that all code in notebooks is wrapped within IRKernel functions (thanks to AI for helping me troubleshoot). So even when directly running AzimuthReference in my notebook, it triggers the same error that would occur when running AzimuthReference within a function.
The solution is simple: modify the judgment logic - after sys.calls() returns the list, take the last element and then extract the function name using regular expressions:
1 | call_list <- sys.calls() |
In our actual code, we can load Azimuth and then override the original AzimuthReference function with our modified version, allowing subsequent code to run properly.
1 | library(Azimuth) |
Azimuth Seems Abandoned
Although it comes from the Satija Lab, it feels like this project has been abandoned. There are nearly a hundred open issues on GitHub, and PRs submitted since last year still have 5 that are neither merged nor rejected. Recently, they’ve started a new Python-based deep learning-based universal cell type Label Transfer project, suggesting that as research moves forward, older projects are no longer being maintained.
Moreover, for Label Transfer, one can always follow tutorials step by step - it’s not absolutely necessary to use Azimuth. Perhaps soon, this project will be archived…