流畅的Python笔记

流畅的Python

[第一部分序幕](#第一部分序幕)
- [第 1 章 Python 数据模型](#第 1 章 Python 数据模型)
[第二部分数据结构](#第二部分数据结构)
- [第 2 章序列构成的数组](#第 2 章序列构成的数组)
- [第 3 章字典和集合](#第 3 章字典和集合)
- [第 4 章文本和字节序列](#第 4 章文本和字节序列)
[第三部分把函数视作对象](#第三部分把函数视作对象)
- [第 5 章一等函数](#第 5 章一等函数)
- [第 7 章函数装饰器和闭包](#第 7 章函数装饰器和闭包)
[第四部分面向对象惯用法](#第四部分面向对象惯用法)
- [第 8 章对象引用、可变性和垃圾回收](#第 8 章对象引用、可变性和垃圾回收)
- - [== 和 is](#== 和 is)
  - 元组的相对不可变性
  - 默认浅拷贝
  - 函数的参数作为引用时
- [第 9 章符合Python风格的对象](#第 9 章符合Python风格的对象)
- - classmethod与staticmethod
参考资料

第一部分序幕

第 1 章 Python 数据模型

python 复制代码

import collections
from random import choice

# 构建简单的类，类名：Card，属性：rank和suit
Card = collections.namedtuple('Card', ['rank', 'suit'])

# 背面有色情图片的扑克牌（来自百度翻译）
class FrenchDeck:
   ranks = [str(n) for n in range(2, 11)] + list('JQKA')
   suits = 'spades diamonds clubs hearts'.split()
   
   def __init__(self):
      self._cards = [Card(rank, suit) for suit in self.suits
                     for rank in self.ranks]
   
   def __len__(self):
      return len(self._cards)
   
   def __getitem__(self, position):
      return self._cards[position]
   
   def sort_strategy(self, card):
      suit_rank_map = dict(spades=3, hearts=2, diamonds=1, clubs=0)
      rank_idx = FrenchDeck.ranks.index(card.rank)
      return rank_idx * len(suit_rank_map) + suit_rank_map[card.suit]


if __name__ == '__main__':
   deck = FrenchDeck()
   print(len(deck))  # len调用的是__len__
   print(deck[-1])   # deck[-1]调用的是__getitem__
   print(choice(deck))  # 随机
   print(deck[:3])   # 取前三个deck[0],deck[1],deck[2]
   print(deck[12::13])   # 从第12开始，再每隔13取一个
   
   # 实现了__getitem__，FrenchDeck就可以迭代了，迭代背后调用了__iter__
   for card in deck:
      print(card)

   print("反向迭代")
   for card in reversed(deck):
      print(card)

   print("按照指定排序策略排序")
   for card in sorted(deck, key=deck.sort_strategy):
      print(card)

python 复制代码

from math import hypot


class Vector:
   def __init__(self, x=0, y=0):
      self.x = x
      self.y = y

   def __repr__(self):
      return 'Vector(%r, %r)' % (self.x, self.y)

   def __abs__(self):
      return hypot(self.x, self.y)

   def __bool__(self):
      return bool(self.x or self.y)

   def __add__(self, other):
      x = self.x + other.x
      y = self.y + other.y
      return Vector(x, y)

   def __mul__(self, scalar):
      return Vector(self.x * scalar, self.y * scalar)

if __name__ == '__main__':
   v1 = Vector(1, 2)
   v2 = Vector(3, 4)
   print(v1)   # 调用__repr__或__str__；若无，打印类似<__main__.Vector object at 0x1025e4070>
   print(str(v1)) # 调用__repr__或__str__；若无，打印类似<__main__.Vector object at 0x1025e4070>
   print(v1 + v2) # 调用__add__
   print(v1 * 2)  # 调用__mul__
   print(abs(v2)) # 调用__abs__
   print(bool(v1))   # 默认自己定义类的实例是True，若有__bool__、__len__，就逐次调用即判即返回

特殊方法
https://docs.python.org/3/reference/datamodel.html
为什么len不是普通方法？

如果x是内置类型实例，len(x) 的速度非常快（CPython直接从C结构体里读取对象的长度）
推荐书籍

《Python 技术手册》

《Python 参考手册》

《Python Cookbook》

第二部分数据结构

第 2 章序列构成的数组

Python 标准库用 C 实现了丰富的序列类型，列举如下。

容器序列

list、tuple 和 collections.deque，这些序列能存放不同类型的数据。
扁平序列

str、bytes、bytearray、memoryview 和 array.array，这类序列只能容纳一种类型。
可变序列

list、bytearray、array.array、collections.deque 和 memoryview。
不可变序列

tuple、str 和 bytes。

列表推导

python 复制代码

# 把一个字符串变成 Unicode 码位的列表

symbols = '$¢£¥€¤'

# 方式一
codes = []
for symbol in symbols:
	codes.append(ord(symbol))

# 方式二（列表推导）
codes = [ord(symbol) for symbol in symbols]

python 复制代码

# Python 2.x 和 Python 3.x 列表推导式里的变量作用域对比

Python 2.7.18 (default, Oct  2 2021, 04:20:38)
[GCC Apple LLVM 13.0.0 (clang-1300.0.29.1) [+internal-os, ptrauth-isa=deploymen on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 1
>>> abc = [x for x in 'abc']
>>> x	# python 2.x，会被列表推导里的同名变量的值取代
'c'
>>> abc
['a', 'b', 'c']


Python 3.8.9 (default, Oct 26 2021, 07:25:53)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 1
>>> abc = [x for x in 'abc']	# x 在列表推导的局部作用域
>>> x	# python 3.x，不会被列表推导里的同名变量的值取代
1
>>> abc
['a', 'b', 'c']

python 复制代码

# 列表推导式与map/filter对比

>>> symbols = '$¢£¥€¤'
>>> beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
>>> beyond_ascii
[162, 163, 165, 8364, 164]
>>>
>>> beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
>>> beyond_ascii
[162, 163, 165, 8364, 164]

python 复制代码

# 列表推导计算笛卡儿积

>>> colors = ['black', 'white']
>>> sizes = ['S', 'M', 'L']
>>> tshirts = [(color, size) for color in colors for size in sizes]
>>> tshirts
[('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'), ('white', 'M'), ('white', 'L')]

生成器表达式

python 复制代码

# 用生成器表达式初始化元组和数组

>>> symbols = '$¢£¥€¤'
>>> tuple(ord(symbol) for symbol in symbols)
(36, 162, 163, 165, 8364, 164)

>>> import array
>>> array.array('I', (ord(symbol) for symbol in symbols))
array('I', [36, 162, 163, 165, 8364, 164])

python 复制代码

# 使用生成器表达式计算笛卡儿积

>>> colors = ['black', 'white']
>>> sizes = ['S', 'M', 'L']
>>> for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
...     print(tshirt)
...
black S
black M
black L
white S
white M
white L

元组

python 复制代码

>>> coordinates = (1, 2)
>>> city, year = ('Shanghai', 2001)
>>> city_year_list = [('Shanghai', 2001), ('Beijing', 1999)]
>>> for city_year in city_year_list:
...     print('%s-%d' % city_year)	# 元组拆包
...
Shanghai-2001
Beijing-1999
>>>
>>> for city, _ in city_year_list:
...     print(city)
...
Shanghai
Beijing

python 复制代码

>>> import os
>>> ret = os.path.split('/home/luciano/.ssh/idrsa.pub')
>>> ret
('/home/luciano/.ssh', 'idrsa.pub')

python 复制代码

# 元组拆包

>>> coordinates = (1, 2)
>>> coordinates[0]
1
>>> coordinates[1]
2
>>> x, y = coordinates	# 元祖拆包
>>> x
1
>>> y
2
>>> x, y = y, x	# 不使用中间变量交换两个变量的值
>>> x
2
>>> y
1

python 复制代码

# 元祖拆包

>>> divmod(20, 8)
(2, 4)
>>> t = (20, 8)
>>> divmod(*t)
(2, 4)
>>> quotient, remainder = divmod(*t)
>>> quotient, remainder
(2, 4)

python 复制代码

# 用*来处理剩下的元素

>>> a, b, *rest = range(5)
>>> a, b, rest
(0, 1, [2, 3, 4])
>>> a, b, *rest = range(3)
>>> a, b, rest
(0, 1, [2])
>>> a, b, *rest = range(2)
>>> a, b, rest
(0, 1, [])
>>> a, b, *rest = range(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected at least 2, got 1)


>>> a, *body, c, d = range(5)
>>> a, body, c, d
(0, [1, 2], 3, 4)
>>> *head, b, c, d = range(5)
>>> head, b, c, d
([0, 1], 2, 3, 4)

python 复制代码

# 嵌套元组拆包

>>> arr = [('Tom', 10, (90, 80)), ('Lucy', 11, (89, 100))]
>>> for name, age, (math_grade, chinese_grade) in arr:
...     print(f"{name}, {age}, {math_grade}, {chinese_grade}")
...
Tom, 10, 90, 80
Lucy, 11, 89, 100

python 复制代码

# 具名元组

>>> from collections import namedtuple
>>> City = namedtuple('City', 'name country population coordinates')
>>> tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
>>> tokyo
City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
>>> tokyo.population
36.933
>>> tokyo.coordinates
(35.689722, 139.691667)
>>> tokyo[1]
'JP'
>>>
>>> City._fields
('name', 'country', 'population', 'coordinates')
>>> LatLong = namedtuple('LatLong', 'lat long')
>>> delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
>>> delhi = City._make(delhi_data)
>>> delhi._asdict()
{'name': 'Delhi NCR', 'country': 'IN', 'population': 21.935, 'coordinates': LatLong(lat=28.613889, long=77.208889)}
>>> for key, value in delhi._asdict().items():
...     print(key + ':', value)
...
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)

元组官方文档

切片

python 复制代码

>>> s = 'bicycle'
>>> s[::3]
'bye'
>>> s[::-1]
'elcycib'
>>> s[::-2]
'eccb'
>>>
>>>
>>> l = list(range(6))
>>> l[:2]
[0, 1]
>>> l[2:]
[2, 3, 4, 5]
>>> l[slice(2, 5)]
[2, 3, 4]
>>>
>>> l
[0, 1, 2, 3, 4, 5]
>>> del l[2:4]
>>> l
[0, 1, 4, 5]
>>> l[1:3] = [2, 3]
>>> l
[0, 2, 3, 5]
>>> l[1:3] = 100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only assign an iterable
>>> l[1:3] = [100]
>>> l
[0, 100, 5]

对序列使用+和*

python 复制代码

>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> 'abc' * 3
'abcabcabc'
>>>
>>> board = [['_'] * 3 for i in range(3)]
>>> board
[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
>>> board[1][2] = 'X'
>>> board
[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]
>>>
>>> weird_board = [['_'] * 3] * 3	# weird_board[0]、weird_board[1] 和 weird_board[2] 是同一个引用
>>> weird_board
[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
>>> weird_board[1][2] = 'X'
>>> weird_board
[['_', '_', 'X'], ['_', '_', 'X'], ['_', '_', 'X']]

序列的增量赋值

python 复制代码

>>> l = [1, 2, 3]
>>> id(l)
4307416704
>>> l *= 2
>>> l
[1, 2, 3, 1, 2, 3]
>>> id(l)
4307416704

>>> t = (1, 2, 3)
>>> id(t)
4307435648
>>> t *= 2	# 对于不可变对象，此操作生成了新对象，地址已经发生变化
>>> t
(1, 2, 3, 1, 2, 3)
>>> id(t)
4307394816

>>> t = (1, 2, [30, 40])
>>> t[2] += [50, 60]	# 此操作导致tuple里的可变对象发生变化，并抛异常
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> t
(1, 2, [30, 40, 50, 60])

list.sort方法和内置函数sorted

python 复制代码

# list.sort原地修改，而sorted不影响原始的。两者都有reverse和key参数

>>> fruits = ['grape', 'raspberry', 'apple', 'banana']
>>> sorted(fruits)
['apple', 'banana', 'grape', 'raspberry']
>>> fruits
['grape', 'raspberry', 'apple', 'banana']
>>> sorted(fruits, reverse=True)
['raspberry', 'grape', 'banana', 'apple']
>>> sorted(fruits, reverse=True, key=len)
['raspberry', 'banana', 'grape', 'apple']
>>> fruits
['grape', 'raspberry', 'apple', 'banana']
>>> fruits.sort()
>>> fruits
['apple', 'banana', 'grape', 'raspberry']

bisect

python 复制代码

# 二分查找等于某值的位置，插入到该值的左侧或右侧

>>> import bisect
>>> lis = [1, 2, 2, 2, 3, 4, 5]
>>> bisect.bisect(lis, 2)
4
>>> bisect.bisect_right(lis, 2)	# bisect和bisect_right一样
4
>>> bisect.bisect_left(lis, 2)
1
>>> bisect.bisect(lis, 3)
5
>>> bisect.bisect_left(lis, 3)
4
>>> bisect.bisect_right(lis, 3)
5

>>> bisect.bisect(lis, 0)
0
>>> bisect.bisect(lis, 9)
7

>>> lis
[1, 2, 2, 2, 3, 4, 5]
>>> bisect.insort(lis, 1)	# insort和insort_right一样，insort_left则是插入左侧
>>> lis
[1, 1, 2, 2, 2, 3, 4, 5]

数组

虽然列表既灵活又简单，但面对各类需求时，我们可能会有更好的选择。比如，要存放 1000 万个浮点数的话，数组（array）的效率要高得多，因为数组在背后存的并不是 float 对象，而是数字的机器翻译，也就是字节表述。

python 复制代码

>>> from array import array
>>> from random import random
>>> floats = array('d', (random() for i in range(10**7)))
>>> floats[-1]
0.5888943390247224
>>> fp = open('floats.bin', 'wb')
>>> floats.tofile(fp)
>>> fp.close()
>>> floats2 = array('d')
>>> fp = open('floats.bin', 'rb')
>>> floats2.fromfile(fp, 10**7)
>>> fp.close()
>>> floats2[-1]
0.5888943390247224
>>> floats2 == floats
True

数组官方文档

memoryview

memoryview 是一个内置类，它能让用户在不复制内容的情况下操作同一个数组的不同切片。

python 复制代码

>>> numbers = array.array('h', [-2, -1, 0, 1, 2])	# signed short，2 bytes
>>> memv = memoryview(numbers)
>>> len(memv)
5
>>> memv[0]
-2
>>> memv_oct = memv.cast('B')	# unsigned char，1 byte
>>> memv_oct.tolist()
[254, 255, 255, 255, 0, 0, 1, 0, 2, 0] # 255 244=-2, 255 255=-1, 0 0=0, 0 1=1, 0 2 = 2 
>>> memv_oct[5] = 4	# 4 0=1024
>>> numbers
array('h', [-2, -1, 1024, 1, 2])

deque

python 复制代码

# collections.deque双向队列是一个线程安全类

>>> from collections import deque
>>> dq = deque(range(10), maxlen=10)
>>> dq
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
>>> dq.rotate(3)
>>> dq
deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
>>> dq.rotate(-4)
>>> dq
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)
>>> dq.appendleft(-1)
>>> dq
deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
>>> dq.extend([11, 22, 33])
>>> dq
deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)
>>> dq.extendleft([10, 20, 30, 40])
>>> dq
deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)

deque官方文档

第 3 章字典和集合

python 复制代码

# 字典创建
>>> a = dict(one=1, two=2, three=3)
>>> b = {'one': 1, 'two': 2, 'three': 3}
>>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
>>> d = dict([('two', 2), ('one', 1), ('three', 3)])
>>> e = dict({'three': 3, 'one': 1, 'two': 2})
>>> a == b == c == d == e
True

# 字典推导
>>> my_list = [("a", 1), ("b", 2), ("c", 3)]
>>> my_dict = {x: y for x, y in my_list}
>>> my_dict
{'a': 1, 'b': 2, 'c': 3}
>>> my_dict.get('d', -1)
-1
>>> my_dict.setdefault('e', 6)
6
>>> my_dict
{'a': 1, 'b': 2, 'c': 3, 'e': 6}

# defaultdict
>>> import collections
>>> my_dict = collections.defaultdict(list)
>>> my_dict
defaultdict(<class 'list'>, {})
>>> my_dict[0].append(1)	# 键不存在时，值默认是一个list
>>> my_dict
defaultdict(<class 'list'>, {0: [1]})

# __missing__
class StrKeyDict(dict):
   def __missing__(self, key):
      if isinstance(key, str):
         raise KeyError(key)
      return self[str(key)]
   
   def get(self, key, default=None):
      try:
         return self[key]  # __getitem__查不到，会调用__missing__
      except KeyError:
         return default
      
   def __contains__(self, key):
      return key in self.keys() or str(key) in self.keys()


if __name__ == '__main__':
   my_dict = StrKeyDict()
   # get(1)
   # self[1]
   # __getitem__查不到key=1
   # __missing__查self['1']
   # __getitem__查不到key='1'
   # __missing__raise KeyError
   # get捕获KeyError，返回None
   print(my_dict.get(1))   # None

# OrderedDict，在添加键的时候会保持顺序
>>> order_dict = collections.OrderedDict()
>>> order_dict[2] = '2'
>>> order_dict[3] = '3'
>>> order_dict[1] = '1'
>>> order_dict
OrderedDict([(2, '2'), (3, '3'), (1, '1')])

# Counter
>>> ct = collections.Counter('abracadabra')
>>> ct
Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
>>> ct.update('aaaaazzz')
>>> ct
Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
>>> ct.most_common(2)
[('a', 10), ('z', 3)]

# UserDict，因其继承了Mapping.get就不用再实现了，此方法跟StrKeyDict.get实现一样
import collections


class StrKeyDict(collections.UserDict):
   def __missing__(self, key):
      if isinstance(key, str):
         raise KeyError(key)
      return self[str(key)]
      
   def __contains__(self, key):
      return str(key) in self.data
   
   def __setitem__(self, key, item):
      self.data[str(key)] = item

if __name__ == '__main__':
   my_dict = StrKeyDict()
   # get(1)
   # self[1]
   # __getitem__查不到key=1
   # __missing__查self['1']
   # __getitem__查不到key='1'
   # __missing__raise KeyError
   # get捕获KeyError，返回None
   print(my_dict.get(1))   # None

# 不可变映射类型MappingProxyType，下面d_proxy是只读的，但可随d动态变化
>>> from types import MappingProxyType
>>> d = {1:'A'}
>>> d_proxy = MappingProxyType(d)
>>> d_proxy
mappingproxy({1: 'A'})
>>> d_proxy[1]
'A'
>>> d_proxy[2] = 'B'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'mappingproxy' object does not support item assignment
>>> d[2] = 'B'
>>> d_proxy
mappingproxy({1: 'A', 2: 'B'})
>>> d_proxy[2]
'B'

python 复制代码

# 集合
>>> l = ['a', 'a', 'a', 'b']
>>> set(l)
{'a', 'b'}

# 集合交并差
>>> a = set([1, 2, 3])
>>> b = set([2, 3, 4])
>>> a & b
{2, 3}
>>> a | b
{1, 2, 3, 4}
>>> a - b
{1}
>>> a ^ b
{1, 4}

# 集合推导
>>> {i for i in range(6)}
{0, 1, 2, 3, 4, 5}

第 4 章文本和字节序列

python 复制代码

# 字符序列与字节序列
>>> s = 'café'	# 'café' 字符串有4个Unicode字符，在2015年，"字符"的最佳定义是Unicode字符。
>>> type(s)
<class 'str'>
>>> len(s)
4
>>> b = s.encode('utf8')	# 字节序列
>>> b
b'caf\xc3\xa9'
>>> type(b)
<class 'bytes'>
>>> len(b)
5
>>> b.decode('utf8')
'café'

# bytes、bytearray
>>> cafe = bytes('café', encoding='utf_8')
>>> cafe
b'caf\xc3\xa9'	# 前3个字节b'caf'在可打印的ASCII范围内（0~255），后两个字节则不然。
>>> cafe[0]
99
>>> cafe[:1]
b'c'
>>> cafe_arr = bytearray(cafe)
>>> cafe_arr
bytearray(b'caf\xc3\xa9')
>>> cafe_arr[0]
99
>>> cafe_arr[:1]
bytearray(b'c')


# UnicodeEncodeError
# 多数非 UTF 编解码器只能处理 Unicode 字符的一小部分子集。把文本转换成字节序列时，如果目标编码中没有定义某个字符，那就会抛出UnicodeEncodeError 异常，除非把 errors 参数传给编码方法或函数，对错误进行特殊处理。
>>> city = 'São Paulo'
>>> city.encode('utf_8')
b'S\xc3\xa3o Paulo'
>>> city.encode('utf_16')
b'\xff\xfeS\x00\xe3\x00o\x00 \x00P\x00a\x00u\x00l\x00o\x00'
>>> city.encode('iso8859_1')
b'S\xe3o Paulo'
>>> city.encode('cp437')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/encodings/cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xe3' in position 1: character maps to <undefined>
>>> city.encode('cp437', errors='ignore') # 跳过无法编码的字符
b'So Paulo'
>>> city.encode('cp437', errors='replace') # 把无法编码的字符替换成 '?'
b'S?o Paulo'
>>> city.encode('cp437', errors='xmlcharrefreplace') # 把无法编码的字符替换成 XML实体
b'S&#227;o Paulo'

# UnicodeDecodeError
# 不是每一个字节都包含有效的 ASCII 字符，也不是每一个字符序列都是有效的 UTF-8 或 UTF-16。因此，把二进制序列转换成文本时，如果假设是这两个编码中的一个，遇到无法转换的字节序列时会抛此异常。
# 另一方面，很多陈旧的8位编码------如 'cp1252'、'iso8859_1'和'koi8_r'------能解码任何字节序列流而不抛出错误，例如随机噪声。因此，如果程序使用错误的8位编码，解码过程悄无声息，而得到的是无用输出。
>>> octets = b'Montr\xe9al'
>>> octets.decode('cp1252')
'Montréal'
>>> octets.decode('koi8_r')
'MontrИal'
>>> octets.decode('utf_8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 5: invalid continuation byte
>>> octets.decode('utf_8', errors='replace')
'Montr�al'

# Python3默认使用UTF-8编码源码，新版GNU/Linux或Mac OS X默认编码是UTF-8，Windows系统不一样
>>> open('cafe.txt', 'w', encoding='utf_8').write('café')
4
>>> open('cafe.txt').read()
'café'

# 文本
.encoding
'UTF-8'
>>> fp2.read()
'café'
>>> fp3 = open('cafe.txt', 'rb')
>>> fp3
<_io.BufferedReader name='cafe.txt'>
>>> fp3.read()
b'caf\xc3\xa9'

python 复制代码

# Unicode
import unicodedata
import re


if __name__ == '__main__':
   re_digit = re.compile(r'\d')
   sample = '1\xbc\xb2\u0969\u136b\u216b\u2466\u2480\u3285'
   for char in sample:
      print('U+%04x' % ord(char),
         char.center(6),
         're_dig' if re_digit.match(char) else '-',
         'isdig' if char.isdigit() else '-',
         'isnum' if char.isnumeric() else '-',
         format(unicodedata.numeric(char), '5.2f'),
         unicodedata.name(char),
         sep='\t')

python 复制代码

# 正则表达式中的字符串和字节序列
import re

re_numbers_str = re.compile(r'\d+')     # <1>
re_words_str = re.compile(r'\w+')
re_numbers_bytes = re.compile(rb'\d+')  # <2>
re_words_bytes = re.compile(rb'\w+')

text_str = ("Ramanujan saw \u0be7\u0bed\u0be8\u0bef"  # <3>
            " as 1729 = 1³ + 12³ = 9³ + 10³.")        # <4>

text_bytes = text_str.encode('utf_8')  # <5>

print('Text', repr(text_str), sep='\n  ')
print('Numbers')
print('  str  :', re_numbers_str.findall(text_str))      # <6>
print('  bytes:', re_numbers_bytes.findall(text_bytes))  # <7>
print('Words')
print('  str  :', re_words_str.findall(text_str))        # <8>
print('  bytes:', re_words_bytes.findall(text_bytes))    # <9>

第三部分把函数视作对象

第 5 章一等函数

在 Python 中，函数是一等对象（整数、字符串和字典都是一等对象------没什么特别的）。编程语言理论家把"一等对象"定义为满足下述条件的程序实体：

在运行时创建
能赋值给变量或数据结构中的元素
能作为参数传给函数
能作为函数的返回结果

python 复制代码

# 把函数视作对象
>>> def factorial(n):
...     '''returns n!'''
...     return 1 if n < 2 else n * factorial(n-1)
...
>>> factorial(42)
1405006117752879898543142606244511569936384000000000
>>> factorial.__doc__
'returns n!'
>>> type(factorial)
<class 'function'>
>>>
>>> fact = factorial
>>> fact
<function factorial at 0x10412d3a0>
>>> fact(5)
120
>>> map(factorial, range(11))
<map object at 0x1047b4c10>
>>> list(map(fact, range(11)))
[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

# 高阶函数，接受函数为参数，或者把函数作为结果返回的函数是高阶函数（higherorder function），如map、sorted
>>> fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']
>>> sorted(fruits, key=len)
['fig', 'apple', 'cherry', 'banana', 'raspberry', 'strawberry']
>>>
>>> def reverse(word):
...     return word[::-1]
...
>>> sorted(fruits, key=reverse)
['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']
>>> sorted(fruits, key=lambda word: word[::-1])	# lambda表达式/匿名函数
['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']

>>> list(map(fact, range(6)))
[1, 1, 2, 6, 24, 120]
>>> [fact(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> list(map(factorial, filter(lambda n: n % 2, range(6))))
[1, 6, 120]
>>> [factorial(n) for n in range(6) if n % 2]
[1, 6, 120]

>>> from functools import reduce
>>> from operator import add
>>> reduce(add, range(100))
4950
>>> sum(range(100))
4950

python 复制代码

# 常规对象没有而函数有的属性
>>> class C: pass
...
>>> obj = C()
>>> def func(): pass
...
>>> sorted(set(dir(func)) - set(dir(obj)))
['__annotations__', '__call__', '__closure__', '__code__', '__defaults__', '__get__', '__globals__', '__kwdefaults__', '__name__', '__qualname__']

python 复制代码

# * 和 **

def tag(name, *content, cls=None, **attrs):
   """生成一个或多个HTML标签"""
   if cls is not None:
      attrs['class'] = cls
   if attrs:
      attr_str = ''.join(' %s="%s"' % (attr, value) for attr, value in sorted(attrs.items()))
   else:
      attr_str = ''
   if content:
      return '\n'.join('<%s%s>%s</%s>' % (name, attr_str, c, name) for c in content)
   else:
      return '<%s%s />' % (name, attr_str)

if __name__ == '__main__':
   print(tag('br'))
   print(tag('p', 'hello'))
   print(tag('p', 'hello', 'world'))
   print(tag('p', 'hello', id=33))
   print(tag('p', 'hello', 'world', cls='sidebar'))
   # content='testing' 被 **attrs 接收
   print(tag(content='testing', name="img"))
   
   my_tag = {'name': 'img', 'title': 'Sunset Boulevard', 'src': 'sunset.jpg', 'cls': 'framed'}
   # 在 my_tag 前面加上 **，字典中的所有元素作为单个参数传入，同名键会绑定到对应的具名参数上，余下的则被 **attrs 捕获
   print(tag(**my_tag))

python 复制代码

# 验证一下传入的 **my_tag 都绑定到了哪些参数上
>>> def tag(name, *content, cls=None, **attrs):
...    """生成一个或多个HTML标签"""
...    if cls is not None:
...       attrs['class'] = cls
...    if attrs:
...       attr_str = ''.join(' %s="%s"' % (attr, value) for attr, value in sorted(attrs.items()))
...    else:
...       attr_str = ''
...    if content:
...       return '\n'.join('<%s%s>%s</%s>' % (name, attr_str, c, name) for c in content)
...    else:
...       return '<%s%s />' % (name, attr_str)
...
>>> import inspect
>>> sig = inspect.signature(tag)
>>> my_tag = {'name': 'img', 'title': 'Sunset Boulevard', 'src': 'sunset.jpg', 'cls': 'framed'}
>>> bound_args = sig.bind(**my_tag)
>>> bound_args
<BoundArguments (name='img', cls='framed', attrs={'title': 'Sunset Boulevard', 'src': 'sunset.jpg'})>
>>> for name, value in bound_args.arguments.items():
...     print(name, '=', value)
...
name = img
cls = framed
attrs = {'title': 'Sunset Boulevard', 'src': 'sunset.jpg'}

python 复制代码

# 函数注解，Python 对注解所做的唯一的事情是，把它们存储在函数的__annotations__ 属性里。仅此而已，Python 不做检查、不做强制、不做验证，什么操作都不做。换句话说，注解对 Python 解释器没有任何意义。注解只是元数据，可以供 IDE、框架和装饰器等工具使用。写作本书时，标准库中还没有什么会用到这些元数据，唯有inspect.signature() 函数知道怎么提取注解
>>> def f(x:int, y:int) -> int:
...     return max(x, y)
...
>>> f.__annotations__
{'x': <class 'int'>, 'y': <class 'int'>, 'return': <class 'int'>}

python 复制代码

# 函数式编程
>>> from functools import reduce
>>> from operator import mul
>>> def fact(n):
...     return reduce(mul, range(1, n+1))
...
>>> fact(5)
120

python 复制代码

# 冻结参数
>>> from operator import mul
>>> from functools import partial
>>> triple = partial(mul, 3)
>>> triple(7)
21
>>> list(map(triple, range(1, 10)))
[3, 6, 9, 12, 15, 18, 21, 24, 27]

第 7 章函数装饰器和闭包

初探装饰器

python 复制代码

# 装饰器，装饰器的一大特性是，能把被装饰的函数替换成其他函数。第二个特性是，装饰器在加载模块时立即执行。
>>> def deco(func):
...     def inner():
...             print("running inner()")
...     return inner
...
>>> @deco
... def target():
...     print("running target()")
...
>>> target()
running inner()
>>> target
<function deco.<locals>.inner at 0x10bd2bd80>


# hello.py
registry = []  # <1>

def register(func):  # <2>
    print('running register(%s)' % func)  # <3>
    registry.append(func)  # <4>
    return func  # <5>

@register  # <6>
def f1():
    print('running f1()')

@register
def f2():
    print('running f2()')

def f3():  # <7>
    print('running f3()')

def main():  # <8>
    print('running main()')
    print('registry ->', registry)
    f1()
    f2()
    f3()

if __name__=='__main__':
    main()  # <9>

>>> import hello	# 导入模块时装饰器立即执行
running register(<function f1 at 0x10acf3ce0>)
running register(<function f2 at 0x10acf3d80>)

# python hello.py 
running register(<function f1 at 0x10806df80>)
running register(<function f2 at 0x1080bce00>)
running main()
registry -> [<function f1 at 0x10806df80>, <function f2 at 0x1080bce00>]
running f1()
running f2()
running f3()

变量作用域规则

python 复制代码

# 变量作用域规则，Python 编译函数的定义体时，它判断 b 是局部变量，因为在函数中给它赋值了。但调用 f(3) 时，尝试获取局部变量 b 的值时，发现 b 没有绑定值。可查看翻译成的汇编。。。
>>> b = 6
>>> def f(a):
...     print(a)
...     print(b)
...     b = 9
...
>>> f(3)
3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in f
UnboundLocalError: cannot access local variable 'b' where it is not associated with a value

# 可以使用 global 让解释器把 b 当成全局变量
>>> b = 6
>>> def f(a):
...     print(a)
...     global b
...     print(b)
...     b = 9
...
>>> f(3)
3
6
>>> b
9

闭包

其实，闭包指延伸了作用域的函数，其中包含函数定义体中引用、但是不在定义体中定义的非全局变量。只有涉及嵌套函数时才有闭包问题。

python 复制代码

# 计算平均值------类实现方式
class Averager():

    def __init__(self):
        self.series = []

    def __call__(self, new_value):
        self.series.append(new_value)
        total = sum(self.series)
        return total/len(self.series)
    
if __name__ == '__main__':
    avg = Averager()
    print(avg(10))
    print(avg(11))
    print(avg(12))

python 复制代码

# 计算平均值------闭包实现方式
def make_averager():
    series = []

    def averager(new_value):
        series.append(new_value)	# 没有给series赋值，series不会被视为局部变量，而是自由变量，利用列表是可变的特点
        total = sum(series)
        return total/len(series)

    return averager
    
if __name__ == '__main__':
    avg = make_averager()
    print(avg(10))
    print(avg(11))
    print(avg(12))

综上，闭包是一种函数，它会保留定义函数时存在的自由变量的绑定，这样调用函数时，虽然定义作用域不可用了，但是仍能使用那些绑定。注意，只有嵌套在其他函数中的函数才可能需要处理不在全局作用域中的外部变量。

nonlocal

python 复制代码

# 求平均数------效率改进版
def make_averager():
    count = 0
    total = 0

    def averager(new_value):
        # nonlocal count, total
        count += 1  # 等价为count = count + 1，对变量赋值，count被视为局部变量。数字是不可变数据类型，count = count + 1，其实会隐式创建局部变量 count。这样，count 就不是自由变量了，因此不会保存在闭包中。为了解决这个问题，Python 3 引入了 nonlocal 声明。它的作用是把变量标记为自由变量，即使在函数中为变量赋予新值了，也会变成自由变量。如果为 nonlocal 声明的变量赋予新值，闭包中保存的绑定会更新。
        total += new_value  # 同count，total也被视为局部变量
        return total/count

    return averager
    
if __name__ == '__main__':
    avg = make_averager()
    print(avg(10))  # 报错，UnboundLocalError: cannot access local variable 'count' where it is not associated with a value
    print(avg(11))
    print(avg(12))

一个简单的装饰器，输出函数的运行时间

python 复制代码

import time

import functools 

def clock(func):
    def clocked(*args):
        t0 = time.time()
        result = func(*args)
        elapsed = time.time() - t0
        name = func.__name__
        arg_str = ', '.join(repr(arg) for arg in args)
        print('[%0.8fs] %s(%s) -> %r' % (elapsed, name, arg_str, result))
        return result
    return clocked

@clock
def snooze(seconds):
    time.sleep(seconds)

@clock
@functools.lru_cache()  # 标准库中的装饰器，它把耗时函数的结果保存起来，避免传入相同的参数时重复计算
def factorial(n):
    return 1 if n < 2 else n*factorial(n-1)

if __name__ == '__main__':
    print('*' * 40, 'Calling snooze(.123)')
    snooze(.123)
    print('*' * 40, 'Calling factorial(6)')
    print('6! =', factorial(6))

第四部分面向对象惯用法

第 8 章对象引用、可变性和垃圾回收

== 和 is

python 复制代码

# == 和 is
== 运算符比较两个对象的值（对象中保存的数据，底层是调用__eq__），而 is 比较对象的标识（id）。

元组的相对不可变性

python 复制代码

# 元组不可变，但其引用元素是可变的
>>> t1 = (1, 2, [30, 40])
>>> t2 = (1, 2, [30, 40])
>>> t1 == t2
True
>>> t1 is t2
False
>>> id(t1), id(t2)
(4370239168, 4370215296)
>>> id(t1[-1])
4370235520
>>> t1[-1].append(99)
>>> id(t1[-1])
4370235520
>>> t1
(1, 2, [30, 40, 99])

默认浅拷贝

构造方法或 [:] 做的是浅复制（即复制了最外层容器，副本中的元素是源容器中元素的引用）。如果所有元素都是不可变的，那么这样没有问题，还能节省内存。但是，如果有可变的元素，可能就会导致意想不到的问题。

python 复制代码

>>> l1 = [3, [66, 55, 44], (7, 8, 9)]
>>> l2 = list(l1)       # 同l2 = l1[:]
>>> l1.append(100)
>>> l1[1].remove(55)
>>> print('l1:', l1)
l1: [3, [66, 44], (7, 8, 9), 100]
>>> print('l2:', l2)
l2: [3, [66, 44], (7, 8, 9)]
>>> l2[1] += [33, 22]
>>> l2[2] += (10, 11)
>>> print('l1:', l1)
l1: [3, [66, 44, 33, 22], (7, 8, 9), 100]
>>> print('l2:', l2)
l2: [3, [66, 44, 33, 22], (7, 8, 9, 10, 11)]

copy 模块提供的 deepcopy 和 copy 函数能为任意对象做深复制和浅复制。

注意，一般来说，深复制不是件简单的事。如果对象有循环引用，那么这个朴素的算法会进入无限循环。deepcopy 函数会记住已经复制的对象，因此能优雅地处理循环引用。

python 复制代码

>>> a = [10, 20]
>>> b = [a, 30]
>>> a.append(b)
>>> a
[10, 20, [[...], 30]]
>>> from copy import deepcopy
>>> c = deepcopy(a)
>>> c
[10, 20, [[...], 30]]

函数的参数作为引用时

python 复制代码

>>> def f(a, b):
...     a += b
...     return a
...
>>> x, y = 1, 2
>>> f(x, y)
3
>>> x, y
(1, 2)
>>> a, b = [1, 2], [3, 4]
>>> f(a, b)
[1, 2, 3, 4]
>>> a, b
([1, 2, 3, 4], [3, 4])
>>> t = (10, 20)
>>> u = (30, 40)
>>> f(t, u)
(10, 20, 30, 40)
>>> t, u
((10, 20), (30, 40))

python 复制代码

# 不要使用可变类型作为参数的默认值，下面的bus2和bus3的passengers是同一个默认值的引用
class HauntedBus:
    """A bus model haunted by ghost passengers"""

    def __init__(self, passengers=[]):  # <1>
        self.passengers = passengers  # <2>

    def pick(self, name):
        self.passengers.append(name)  # <3>

    def drop(self, name):
        self.passengers.remove(name)

>>> bus1 = HauntedBus(['Alice', 'Bill'])
>>> bus1.passengers
['Alice', 'Bill']
>>> bus1.pick('Charlie')
>>> bus1.drop('Alice')
>>> bus1.passengers
['Bill', 'Charlie']
>>> bus2 = HauntedBus()
>>> bus2.pick('Carrie')
>>> bus2.passengers
['Carrie']
>>> bus3 = HauntedBus()
>>> bus3.passengers
['Carrie']
>>> bus3.pick('Dave')
>>> bus2.passengers
['Carrie', 'Dave']
>>> bus2.passengers is bus3.passengers
True
>>> bus1.passengers
['Bill', 'Charlie']


>>> basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']
>>> bus = TwilightBus(basketball_team)
>>> bus.drop('Tina')
>>> bus.drop('Pat')
>>> basketball_team
['Sue', 'Maya', 'Diana']

python 复制代码

# 防御可变参数
class TwilightBus:
    """A bus model that makes passengers vanish"""

    def __init__(self, passengers=None):
        if passengers is None:
            self.passengers = []  # <1>
        else:
            self.passengers = passengers  #<2>

    def pick(self, name):
        self.passengers.append(name)

    def drop(self, name):
        self.passengers.remove(name)  # <3>

第 9 章符合Python风格的对象

python 复制代码

"""
A 2-dimensional vector class

# BEGIN VECTOR2D_V0_DEMO

    >>> v1 = Vector2d(3, 4)
    >>> print(v1.x, v1.y)  # <1>
    3.0 4.0
    >>> x, y = v1  # <2>
    >>> x, y
    (3.0, 4.0)
    >>> v1  # <3>
    Vector2d(3.0, 4.0)
    >>> v1_clone = eval(repr(v1))v1_clone = eval(repr(v1))  # <4>
    >>> v1 == v1_clone  # <5>
    True
    >>> print(v1)  # <6>
    (3.0, 4.0)
    >>> octets = bytes(v1)  # <7>
    >>> octets
    b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
    >>> abs(v1)  # <8>
    5.0
    >>> bool(v1), bool(Vector2d(0, 0))  # <9>
    (True, False)
    
    >>> v1_clone = Vector2d.frombytes(bytes(v1))
    >>> v1_clone
    Vector2d(3.0, 4.0)
    >>> v1 == v1_clone
    True

# END VECTOR2D_V0_DEMO
"""

# BEGIN VECTOR2D_V0
from array import array
import math


class Vector2d:
    typecode = 'd'  # <1>

    def __init__(self, x, y):
        self.x = float(x)    # <2>
        self.y = float(y)

    def __iter__(self):
        return (i for i in (self.x, self.y))  # <3>

    def __repr__(self):
        class_name = type(self).__name__
        return '{}({!r}, {!r})'.format(class_name, *self)  # <4>

    def __str__(self):
        return str(tuple(self))  # <5>

    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +  # <6>
                bytes(array(self.typecode, self)))  # <7>

    def __eq__(self, other):
        return tuple(self) == tuple(other)  # <8>

    def __abs__(self):
        return math.hypot(self.x, self.y)  # <9>

    def __bool__(self):
        return bool(abs(self))  # <10>
    
    @classmethod  # 类方法使用 classmethod 装饰器修饰，类方法的第一个参数名为 cls（但是 Python 不介意具体怎么命名）
    def frombytes(cls, octets):  # cls 传入类本身
        typecode = chr(octets[0])  # 从第一个字节中读取 typecode
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(*memv)
# END VECTOR2D_V0

classmethod与staticmethod

classmethod 装饰器，定义操作类，而不是操作实例的方法。因此类方法的第一个参数是类本身，而不是实例。classmethod 最常见的用途是定义备选构造方法。按照约定，类方法的第一个参数名为 cls（但是 Python 不介意具体怎么命名）。

staticmethod 装饰器，定义类的普通方法。但是第一个参数不是特殊的值。

python 复制代码

>>> class Demo:
...     @classmethod
...     def classmeth(*args):
...             return args
...     @staticmethod
...     def staticmeth(*args):
...             return args
...
>>> Demo.classmeth()
(<class '__main__.Demo'>,)
>>> Demo.classmeth('param')
(<class '__main__.Demo'>, 'param')
>>> Demo.staticmeth()
()
>>> Demo.staticmeth('param')
('param',)
>>> demo = Demo()
>>> demo.staticmeth()
()
>>> demo.staticmeth('x')
('x',)
>>> demo.classmeth('x')
(<class '__main__.Demo'>, 'x')

python 复制代码

# 格式化显示
>>> brl = 1/2.43 # BRL到USD的货币兑换比价
>>> brl
0.4115226337448559
>>> format(brl, '0.4f')
'0.4115'
>>> '1 BRL = {rate:0.2f} USD'.format(rate=brl)
'1 BRL = 0.41 USD'
>>>
>>>
>>> format(42, 'b')
'101010'
>>> format(2/3, '.1%')
'66.7%'
>>>
>>> from datetime import datetime
>>> now = datetime.now()
>>> format(now, '%H:%M:%S')
'22:11:34'
>>> "It's now {:%I:%M %p}".format(now)
"It's now 10:11 PM"

参考资料

1\] 《流畅的Python》作者：\[巴西\] Luciano Ramalho 译者：安道 吴珂 \[2\] [《流畅的Python》代码库](https://github.com/fluentpython/example-code/tree/master) \[3\] [Python官方文档](https://docs.python.org/3/reference/index.html) \[4\] [菜鸟教程Python](https://www.runoob.com/python/python-tutorial.html)

流畅的Python笔记

流畅的Python

第一部分 序幕

第 1 章 Python 数据模型

第二部分 数据结构

第 2 章 序列构成的数组

列表推导

生成器表达式

元组

切片

对序列使用+和*

序列的增量赋值

list.sort方法和内置函数sorted

bisect

数组

memoryview

deque

第 3 章 字典和集合

第 4 章 文本和字节序列

第三部分 把函数视作对象

第 5 章 一等函数

第 7 章 函数装饰器和闭包

初探装饰器

变量作用域规则

闭包

nonlocal

一个简单的装饰器，输出函数的运行时间

第四部分 面向对象惯用法

第 8 章 对象引用、可变性和垃圾回收

== 和 is

元组的相对不可变性

默认浅拷贝

函数的参数作为引用时

第 9 章 符合Python风格的对象

classmethod与staticmethod

参考资料

第一部分序幕

第二部分数据结构

第 2 章序列构成的数组

第 3 章字典和集合

第 4 章文本和字节序列

第三部分把函数视作对象

第 5 章一等函数

第 7 章函数装饰器和闭包

第四部分面向对象惯用法

第 8 章对象引用、可变性和垃圾回收

第 9 章符合Python风格的对象