过一遍可能有用的代码和函数,这样在需要类似功能的时候就能够想起来并查询使用。如果每次只是看到了就查一下,自己不会太在意,也不会留下深刻的印象,那么只是认识了这些代码和函数,而不会真正地为自己所用。

Python 的 collections 模块提供 specialized container datatypes 来增强通用目的的内置容器:dict, list, set 和 tuple.

Datatype Descriptions
namedtuple() factory function for creating tuple subclasses with named fields
deque list-like container with fast appends and pops on either end
ChainMap dict-like class for creating a single view of multiple mappings
Counter dict subclass for counting hashable objects
OrderedDict dict subclass that remembers the order entries were added
defaultdict dict subclass that calls a factory function to supply missing values
UserDict wrapper around dictionary objects for easier dict subclassing
UserList wrapper around list objects for easier list subclassing
UserString wrapper around string objects for easier string subclassing

reference: collections — Container datatypes — Python 3.11.4 documentation

Improving Code Readability: namedtuple()

nametuple() 是 tuple subclasses with named fields, 可以通过 dot notation 的方式直接访问元组中的值,例如:obj.arr, 从而增强代码的可读写性。在需要使用的 tuple 具有多个 items 并且构造位置和使用位置距离比较远时,将 tuple 换作 nametuple 最佳。

collections.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None):

  • typename 是创建的 nametuple 的名字,要求是有效命名的字符串

  • field_name 是 the list of field names,用来访问 tuple 中元素

    可是以下几种 string 相关对象:

    • 可迭代的字符串对象,例如:["field1", "field2", "field3]
    • a string with whitespace-separated field names, such as:"field1 field2 field3"
    • a string with comma-separated field names, such as field1, field2, field3
    • a generator expression as field names, such as (field for field in "xy")
1
2
3
4
5
6
7
8
9
10
In [3]: from collections import *

In [4]: def custom_divmod(x, y):
...: DivMod = namedtuple('DivMod', 'quotient remainder')
...: return DivMod(*divmod(x, y))
...:

In [5]: result = custom_divmod(18, 4)
...: result, result.quotient, result.remainder
Out[5]: (DivMod(quotient=4, remainder=2), 4, 2)

2D point

1
2
3
4
5
6
7
8
9
10
11
12
13
In [7]: Point = namedtuple("Point", ['x', 'y']) # Use a list of strings as field names

In [8]: point = Point(2,4)

In [9]: point.x
Out[9]: 2

In [11]: Point = namedtuple('point', 'x y') # Use a string with whitespace-separated field names

In [12]: point = Point(2,4)

In [14]: point.y
Out[14]: 4

tuple 是不可变对象, namedtuple 可以通过 _replace 来改变值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
In [15]: Person = namedtuple("Person", "name job", defaults = ['python developer'])

In [16]: person = Person("Jane"); person
Out[17]: Person(name='Jane', job='python developer')

In [18]: # create dictionary from a named tuple
In [20]: person = Person("Jane")

In [21]: person._asdict()
Out[21]: {'name': 'Jane', 'job': 'python developer'}

In [22]: # replace the value of job
In [23]: person = person._replace(job="web developer")

In [24]: person
Out[24]: Person(name='Jane', job='web developer')

Counter

1
collections.Counter([iterable-or-mapping]) A Counter is a dict subclass for counting hashable objects.

传入的参数需要是可迭代对象或者映射,例如:字典,列表,元组等

1
2
3
4
5
In [47]: Counter([1,3,3,5,1])
Out[47]: Counter({1: 2, 3: 2, 5: 1})

In [50]: Counter("hello")
Out[50]: Counter({'h': 1, 'e': 1, 'l': 2, 'o': 1})

Counter 实际上是使用 for 循环和字典进行计数的封装版,返回字典。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
In [36]: a = Counter(word)
In [37]: word = "mississippi"
In [38]: a = Counter(word)
In [39]: a.keys(), a.values(), a.items()
Out[39]:
(dict_keys(['m', 'i', 's', 'p']),
dict_values([1, 4, 4, 2]),
dict_items([('m', 1), ('i', 4), ('s', 4), ('p', 2)]))

In [40]: counter = {}
In [41]: for letter in word:
...: if letter not in counter:
...: counter[letter] = 0
...: counter[letter] += 1
...:
In [45]: counter
Out[45]: {'m': 1, 'i': 4, 's': 4, 'p': 2}

Handling Missing Keys: defaultdict

python 字典直接访问不存在的 key 会保存,因此可以使用 dict 自带的函数 get() 来访问,不存在的 key 会返回 None

1
2
3
4
5
6
7
8
9
10
11
In [62]: a = {"name": "Jone", "job": "developer"}
In [63]: a.get("height")
In [64]: a["height"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[64], line 1
----> 1 a["height"]
KeyError: 'height'

In [65]: a.get("height") == None
Out[65]: True

class collections.defaultdict(default_factory=None[,{key1:value1,..}])

defaultdict 相对于 dict 增加的两个功能: 1. 访问不存在的 key 不报错 2. 不存在的 key 的,返回预先定义的默认值

1
2
3
4
5
6
7
In [82]: b = [("red", 2), ("blue", 4), ('yellow', 8), ('red',5)]

In [83]: for name, num in b:
...: color[name].append(num)

In [84]: color
Out[84]: defaultdict(list, {'red': [2, 5], 'blue': [4], 'yellow': [8]})

Keeping Your Dictionaries Ordered: OrderedDict

OrderedDict

  1. 具备每个 key-value pair 的插入顺序,可以在注重顺序的情况下发挥作用
  2. 提供了 .move_to_end(), popitem() 方法来操作 pair 的顺序。次卧
  3. 在比较两个 orderdict 时,比较元素与顺序,二者都相同才可。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
In [87]: letters = OrderedDict(b=3, c = 4, e=2, a=5)

In [88]: letters
Out[88]: OrderedDict([('b', 3), ('c', 4), ('e', 2), ('a', 5)])

In [89]: letters.move_to_end('b')

In [90]: letters
Out[90]: OrderedDict([('c', 4), ('e', 2), ('a', 5), ('b', 3)])

In [91]: letters.move_to_end('b', last=False)

In [92]: letters
Out[92]: OrderedDict([('b', 3), ('c', 4), ('e', 2), ('a', 5)])

Chaining Dictionaries Together: ChainMap

ChainMap 可以容纳多个 mapping/dictionaries, 并比创建一个字典,然后再多次 update() 更新的速度快. 它是一个可变的视图,updateable views. 对其进行的修改都会影响到对应的 dictionary。

class collections.ChainMap(*maps)

应用场景 1:多个上下文,变量等具有 access priority 的情况:

1
2
3
4
5
6
In [93]: cmd_proxy = {}
In [94]: local_proxy = {"proxy": "proxy.local.com"}
In [95]: global_proxy = {"proxy": "proxy.global.com"}
In [100]: config = ChainMap(cmd_proxy, local_proxy, global_proxy)
In [101]: config['proxy']
Out[101]: 'proxy.local.com'

还有一系列的函数,如 .map 返回内部所有的字典的列表。此外还可以对其进行 dict 的操作,只是这些针对 dict 的操作,都只针对第一个 dictionary,例如 .pop(), clear()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
In [105]: config.maps
Out[105]: [{}, {'proxy': 'proxy.local.com'}, {'proxy': 'proxy.global.com'}]

In [106]: config
Out[106]: ChainMap({}, {'proxy': 'proxy.local.com'}, {'proxy': 'proxy.global.com'})

In [107]: config.maps
Out[107]: [{}, {'proxy': 'proxy.local.com'}, {'proxy': 'proxy.global.com'}]

In [108]: config['proxy'] = "proxy.cmd.com"

In [109]: config.maps
Out[109]:
[{'proxy': 'proxy.cmd.com'},
{'proxy': 'proxy.local.com'},
{'proxy': 'proxy.global.com'}]

In [111]: config.pop('proxy')
Out[111]: 'proxy.cmd.com'

In [112]: config
Out[112]: ChainMap({}, {'proxy': 'proxy.local.com'}, {'proxy': 'proxy.global.com'})

deque

deque: double ended queue,提供双端队列

主要有 append, pop, popleft, extend, extendleft 等方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
In [113]: d = deque()

In [114]: d.append(1)

In [115]: d.append(2)

In [116]: d.append(3)

In [117]: d
Out[117]: deque([1, 2, 3])

In [118]: d[0]
Out[118]: 1

In [119]:

In [119]: d.pop()
Out[119]: 3

In [120]: d.popleft()
Out[120]: 1

In [121]: d.extend([4,5,6])

In [122]: d
Out[122]: deque([2, 4, 5, 6])

In [123]: d.extendleft([7,8,9])

In [124]: d
Out[124]: deque([9, 8, 7, 2, 4, 5, 6])

参考

  1. Python's collections: A Buffet of Specialized Data Types – Real Python
  2. collections — Container datatypes — Python 3.11.4 documentation