I’ve never used many useful functions when writing scripts before, so I decided to learn a bit about itertools
and make some notes.
This time, I was dealing with large files, but most of the time, we handle them line by line. So, I thought about using itertools
to create generators. This way, I might be able to split the generator into smaller parts and apply it to parallel operations.
The three functions from itertools
that I used this time are:
- groupby
- tee
- chain
groupby
is used to cluster items in an iterable object. For example, when dealing with UMI grouping, you need to compare reads that are close in position and group them together for further UMI analysis.
1 | # Here's a function that takes a single object from iter_obj as input and returns the value to be used for comparison |
tee
is simple. If the iterable object is a generator, since generators can only be used once, you can use thetee
function to create a copy of it for multiple uses.
1 | iterable_fork_1, iterable_fork_2 = tee(iterable, 2) # The number 2 is optional and defaults to 2. |
chain
concatenates multiple iterable objects into one. I used it to flatten nested results:
1 | flat_iter = chain.from_iterable(ori_iter) |