Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

A long time ago, a Python expert from my previous company taught us about decorators. However, due to my愚钝ness, I never fully understood this concept and didn’t use it in actual coding. Therefore, I’m recording this example that comes up at work.

Firstly, the role of a decorator in Python is to attach new functionality to an existing function without modifying its code. This requires defining a function that takes another function as an argument and returns a function.

This week, I encountered a problem where I wrote a function to calculate some metrics for UMI sequences in a Bam file. However, the Bam files have two types of UMIs:

  • Double-end with N bp UMIs
  • Single-end with N bp UMIs

The way to read UMIs from these files is different: single-end requires reading the full length of the UMI, while double-end only reads half the length. The original single-end code looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def single_count(file, tag, mode):
umi_count = {
'R1': [],
}
infos = load_single(file)
for info in infos:
if info['strand'] == 'R1':
umi_count['R1'].append(info['umi'])
umi_count['R1'] = Counter(umi_count['R1'])
total_umi = 0
read_with_umi = 0
max_grp_size = 0
max_umi = ''
size_stat = Counter([1, 5, 10, 100, 1000])
for umi in umi_count['R1']:
if umi_count['R1'][umi] > max_grp_size:
max_grp_size = umi_count['R1'][umi]
max_umi = umi
for size in size_stat: # 注意初始化的时候, 全都是1, 不是0, 所以最后要处理(-1)
if umi_count['R1'][umi] <= size:
size_stat[size] += 1
total_umi += 1
read_with_umi += umi_count['R1'][umi]
size_stat = OrderedDict(list([("%%Group_Size_below_%s(%s)" % (size, tag), str((size_stat[size] - 1) / total_umi )) \
for size in size_stat])) # python所以要先list化
return total_umi, read_with_umi, max_grp_size, max_umi, size_stat

The only difference between the double-end and single-end code is that infos = load_single(file) needs to be replaced with infos = load_duplex(file). To achieve this, there is a straightforward way: add a parameter to the function and decide whether to call load_duplex or load_single based on the parameter. However, this would require modifying the function code as well as the code that calls this function. If there are many such places, it’s easy to introduce new bugs. Therefore, I tried using decorators to solve this problem:

1
2
3
4
5
6
7
def load_mod(func):
load_single = load_duplex
return func

@load_mod
def duplex_count(file, tag, mode):
return single_count(file, tag, mode)

Here, I defined two functions: duplex_count aims to repeat all the steps in single_count except for infos = load_single(file), and only change the function called to load_duplex. Therefore, you can see that this function’s content is essentially returning the result of single_count.

However, this alone doesn’t complete the replacement of the two load functions. That’s where load_mod comes into play: it replaces the content in the local namespace with load_duplex. By decorating duplex_count with load_mod, it’s equivalent to replacing the function call before executing it, so that the function executes differently.

Of course, I use it more like the ‘closure’ usage in common decorator tutorials rather than ‘adding new functionality to a function’.

Other applications will be discussed later.
```

Comments

Please leave your comments here