Comprehensive Python Cheatsheet 综合 Python 备忘单P12
[#](#Match Statement #匹配声明 "#matchstatement")Match Statement #匹配声明
Executes the first block with matching pattern. Added in Python 3.10. 执行具有匹配模式的第一个块。 Python 3.10 中添加。
sql
match <object/expression>:
case <pattern> [if <condition>]:
<code>
...
Patterns 模式
python
<value_pattern> = 1/'abc'/True/None/math.pi # Matches the literal or a dotted name.
<class_pattern> = <type>() # Matches any object of that type.
<wildcard_patt> = _ # Matches any object.
<capture_patt> = <name> # Matches any object and binds it to name.
<or_pattern> = <pattern> | <pattern> [| ...] # Matches any of the patterns.
<as_pattern> = <pattern> as <name> # Binds the match to the name.
<sequence_patt> = [<pattern>, ...] # Matches sequence with matching items.
<mapping_patt> = {<value_pattern>: <pattern>, ...} # Matches dictionary with matching items.
<class_pattern> = <type>(<attr_name>=<patt>, ...) # Matches object with matching attributes.
- Sequence pattern can also be written as a tuple. 序列模式也可以写成元组。
- Use
'*<name>'
and'**<name>'
in sequence/mapping patterns to bind remaining items. 在序列/映射模式中使用'*<name>'
和'**<name>'
来绑定剩余的项目。 - Sequence pattern must match all items, while mapping pattern does not. 序列模式必须匹配所有项目,而映射模式则不然。
- Patterns can be surrounded with brackets to override precedence (
'|'
>'as'
>','
). 模式可以用方括号括起来以覆盖优先级 ('|'
>'as'
>','
)。 - Built-in types allow a single positional pattern that is matched against the entire object. 内置类型允许与整个对象匹配的单个位置模式。
- All names that are bound in the matching case, as well as variables initialized in its block, are visible after the match statement. 匹配 case 中绑定的所有名称以及其块中初始化的变量在 match 语句之后可见。
Example 例子
python
>>> from pathlib import Path
>>> match Path('/home/gto/python-cheatsheet/README.md'):
... case Path(
... parts=['/', 'home', user, *_],
... stem=stem,
... suffix=('.md' | '.txt') as suffix
... ) if stem.lower() == 'readme':
... print(f'{stem}{suffix} is a readme file that belongs to user {user}.')
'README.md is a readme file that belongs to user gto.'
[#](#Logging 日志 "#logging")Logging 日志
python
import logging
logging.basicConfig(filename=<path>, level='DEBUG') # Configures the root logger (see Setup).
logging.debug/info/warning/error/critical(<str>) # Logs to the root logger.
<Logger> = logging.getLogger(__name__) # Logger named after the module.
<Logger>.<level>(<str>) # Logs to the logger.
<Logger>.exception(<str>) # Error() that appends caught exception.
Setup 设置
ini
logging.basicConfig(
filename=None, # Logs to console (stderr) by default.
format='%(levelname)s:%(name)s:%(message)s', # Add '%(asctime)s' for local datetime.
level=logging.WARNING, # Drops messages with lower priority.
handlers=[logging.StreamHandler(sys.stderr)] # Uses FileHandler if filename is set.
)
<Formatter> = logging.Formatter('<format>') # Creates a Formatter.
<Handler> = logging.FileHandler(<path>, mode='a') # Creates a Handler. Also `encoding=None`.
<Handler>.setFormatter(<Formatter>) # Adds Formatter to the Handler.
<Handler>.setLevel(<int/str>) # Processes all messages by default.
<Logger>.addHandler(<Handler>) # Adds Handler to the Logger.
<Logger>.setLevel(<int/str>) # What is sent to its/ancestors' handlers.
<Logger>.propagate = <bool> # Cuts off ancestors' handlers if false.
- Parent logger can be specified by naming the child logger
'<parent>.<name>'
. 可以通过命名子记录器'<parent>.<name>'
来指定父记录器。 - If logger doesn't have a set level it inherits it from the first ancestor that does. 如果记录器没有设置级别,它会从第一个具有设置级别的祖先继承它。
- Formatter also accepts: pathname, filename, funcName, lineno, thread and process. 格式化程序还接受:路径名、文件名、funcName、lineno、线程和进程。
- A
'handlers.RotatingFileHandler'
creates and deletes log files based on 'maxBytes' and 'backupCount' arguments.'handlers.RotatingFileHandler'
根据"maxBytes"和"backupCount"参数创建和删除日志文件。
Creates a logger that writes all messages to file and sends them to the root's handler that prints warnings or higher: 创建一个记录器,将所有消息写入文件并将它们发送到打印警告或更高级别的根处理程序:
python
>>> logger = logging.getLogger('my_module')
>>> handler = logging.FileHandler('test.log', encoding='utf-8')
>>> handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s:%(name)s:%(message)s'))
>>> logger.addHandler(handler)
>>> logger.setLevel('DEBUG')
>>> logging.basicConfig()
>>> logging.root.handlers[0].setLevel('WARNING')
>>> logger.critical('Running out of disk space.')
CRITICAL:my_module:Running out of disk space.
>>> print(open('test.log').read())
2023-02-07 23:21:01,430 CRITICAL:my_module:Running out of disk space.
[#](#Introspection 内省 "#introspection")Introspection 内省
python
<list> = dir() # Names of local variables, functions, classes, etc.
<dict> = vars() # Dict of local variables, etc. Also locals().
<dict> = globals() # Dict of global vars, etc. (incl. '__builtins__').
<list> = dir(<object>) # Names of object's attributes (including methods).
<dict> = vars(<object>) # Dict of writable attributes. Also <obj>.__dict__.
<bool> = hasattr(<object>, '<attr_name>') # Checks if getattr() raises an AttributeError.
value = getattr(<object>, '<attr_name>') # Default value can be passed as the third argument.
setattr(<object>, '<attr_name>', value) # Only works on objects with '__dict__' attribute.
delattr(<object>, '<attr_name>') # Same. Also `del <object>.<attr_name>`.
<Sig> = inspect.signature(<function>) # Function's Signature object.
<dict> = <Sig>.parameters # Dict of Parameter objects.
<memb> = <Param>.kind # Member of ParameterKind enum.
<obj> = <Param>.default # Default value or Parameter.empty.
<type> = <Param>.annotation # Type or Parameter.empty.
[#](#Coroutines 协程 "#coroutines")Coroutines 协程
- Coroutines have a lot in common with threads, but unlike threads, they only give up control when they call another coroutine and they don't use as much memory. 协程与线程有很多共同点,但与线程不同的是,它们仅在调用另一个协程时放弃控制,并且不使用那么多内存。
- Coroutine definition starts with
'async'
and its call with'await'
. 协程定义以'async'
开头,调用以'await'
开头。 'asyncio.run(<coroutine>)'
is the main entry point for asynchronous programs.'asyncio.run(<coroutine>)'
是异步程序的主要入口点。
python
import asyncio as aio
<coro> = <async_func>(<args>) # Creates a coroutine.
<obj> = await <coroutine> # Starts the coroutine and returns result.
<task> = aio.create_task(<coroutine>) # Schedules coroutine for execution.
<obj> = await <task> # Returns result. Also <task>.cancel().
<coro> = aio.gather(<coro/task>, ...) # Schedules coroutines. Returns results when awaited.
<coro> = aio.wait(<tasks>, ...) # `aio.ALL/FIRST_COMPLETED`. Returns (done, pending).
<iter> = aio.as_completed(<coros/tasks>) # Iter of coros. All return next result when awaited.
Runs a terminal game where you control an asterisk that must avoid numbers: 运行一个终端游戏,您可以在其中控制必须避免数字的星号:
css
import asyncio, collections, curses, curses.textpad, enum, random, timeP = collections.namedtuple('P', 'x y') # Position
D = enum.Enum('D', 'n e s w') # Direction
W, H = 15, 7 # Width, Heightdef main(screen):
curses.curs_set(0) # Makes cursor invisible.
screen.nodelay(True) # Makes getch() non-blocking.
asyncio.run(main_coroutine(screen)) # Starts running asyncio code.async def main_coroutine(screen):
moves = asyncio.Queue()
state = {'*': P(0, 0), **{id_: P(W//2, H//2) for id_ in range(10)}}
ai = [random_controller(id_, moves) for id_ in range(10)]
mvc = [human_controller(screen, moves), model(moves, state), view(state, screen)]
tasks = [asyncio.create_task(cor) for cor in ai + mvc]
await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)async def random_controller(id_, moves):
while True:
d = random.choice(list(D))
moves.put_nowait((id_, d))
await asyncio.sleep(random.triangular(0.01, 0.65))async def human_controller(screen, moves):
while True:
key_mappings = {258: D.s, 259: D.n, 260: D.w, 261: D.e}
if d := key_mappings.get(screen.getch()):
moves.put_nowait(('*', d))
await asyncio.sleep(0.005)async def model(moves, state):
while state['*'] not in (state[id_] for id_ in range(10)):
id_, d = await moves.get()
deltas = {D.n: P(0, -1), D.e: P(1, 0), D.s: P(0, 1), D.w: P(-1, 0)}
state[id_] = P((state[id_].x + deltas[d].x) % W, (state[id_].y + deltas[d].y) % H)async def view(state, screen):
offset = P(curses.COLS//2 - W//2, curses.LINES//2 - H//2)
while True:
screen.erase()
curses.textpad.rectangle(screen, offset.y-1, offset.x-1, offset.y+H, offset.x+W)
for id_, p in state.items():
screen.addstr(offset.y + (p.y - state['*'].y + H//2) % H,
offset.x + (p.x - state['*'].x + W//2) % W, str(id_))
screen.refresh()
await asyncio.sleep(0.005)if __name__ == '__main__':
curses.wrapper(main)
Libraries (常用)库
[#](#Progress Bar #进度条 "#progressbar")Progress Bar #进度条
ini
# $ pip3 install tqdm
>>> import tqdm, time
>>> for el in tqdm.tqdm([1, 2, 3], desc='Processing'):
... time.sleep(1)
Processing: 100%|████████████████████| 3/3 [00:03<00:00, 1.00s/it]
[#](#Plot 绘图 "#plot")Plot 绘图
python
# $ pip3 install matplotlib
import matplotlib.pyplot as pltplt.plot/bar/scatter(x_data, y_data [, label=<str>]) # Or: plt.plot(y_data)
plt.legend() # Adds a legend.
plt.savefig(<path>) # Saves the figure.
plt.show() # Displays the figure.
plt.clf() # Clears the figure.
[#](#Table 表格 "#table")Table 表格
Prints a CSV spreadsheet to the console: 将 CSV 电子表格打印到控制台:
ini
# $ pip3 install tabulate
import csv, tabulate
with open('test.csv', encoding='utf-8', newline='') as file:
rows = list(csv.reader(file))
print(tabulate.tabulate(rows, headers='firstrow'))
[#](#Curses 终端 "#curses")Curses 终端
Runs a basic file explorer in the console: 在控制台中运行基本文件资源管理器:
css
# $ pip3 install windows-curses
import curses, os
from curses import A_REVERSE, KEY_DOWN, KEY_UP, KEY_LEFT, KEY_RIGHT, KEY_ENTERdef main(screen):
ch, first, selected, paths = 0, 0, 0, os.listdir()
while ch != ord('q'):
height, width = screen.getmaxyx()
screen.erase()
for y, filename in enumerate(paths[first : first+height]):
color = A_REVERSE if filename == paths[selected] else 0
screen.addnstr(y, 0, filename, width-1, color)
ch = screen.getch()
selected += (ch == KEY_DOWN) - (ch == KEY_UP)
selected = max(0, min(len(paths)-1, selected))
first += (selected >= first + height) - (selected < first)
if ch in [KEY_LEFT, KEY_RIGHT, KEY_ENTER, ord('\n'), ord('\r')]:
new_dir = '..' if ch == KEY_LEFT else paths[selected]
if os.path.isdir(new_dir):
os.chdir(new_dir)
first, selected, paths = 0, 0, os.listdir()if __name__ == '__main__':
curses.wrapper(main)
[#](#PySimpleGUI 简单GUI程序 "#pysimplegui")PySimpleGUI 简单GUI程序
A weight converter GUI application: 重量转换器 GUI 应用程序:
ini
# $ pip3 install PySimpleGUI
import PySimpleGUI as sgtext_box = sg.Input(default_text='100', enable_events=True, key='-VALUE-')
dropdown = sg.InputCombo(['g', 'kg', 't'], 'kg', readonly=True, enable_events=True, k='-UNIT-')
label = sg.Text('100 kg is 220.462 lbs.', key='-OUTPUT-')
button = sg.Button('Close')
window = sg.Window('Weight Converter', [[text_box, dropdown], [label], [button]])while True:
event, values = window.read()
if event in [sg.WIN_CLOSED, 'Close']:
break
try:
value = float(values['-VALUE-'])
except ValueError:
continue
unit = values['-UNIT-']
factors = {'g': 0.001, 'kg': 1, 't': 1000}
lbs = value * factors[unit] / 0.45359237
window['-OUTPUT-'].update(value=f'{value} {unit} is {lbs:g} lbs.')
window.close()
[#](#Scraping 抓取(爬虫) "#scraping")Scraping 抓取(爬虫)
Scrapes Python's URL and logo from its Wikipedia page: 从 Python 的 Wikipedia 页面中抓取其 URL 和徽标:
ini
# $ pip3 install requests beautifulsoup4
import requests, bs4, osresponse = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)')
document = bs4.BeautifulSoup(response.text, 'html.parser')
table = document.find('table', class_='infobox vevent')
python_url = table.find('th', text='Website').next_sibling.a['href']
logo_url = table.find('img')['src']
logo = requests.get(f'https:{logo_url}').content
filename = os.path.basename(logo_url)
with open(filename, 'wb') as file:
file.write(logo)
print(f'{python_url}, file://{os.path.abspath(filename)}')
Selenium 浏览器模拟器
Library for scraping websites with dynamic content. 用于抓取具有动态内容的网站的库。
xml
# $ pip3 install selenium
from selenium import webdriver<Drv> = webdriver.Chrome/Firefox/Safari/Edge() # Opens the browser. Also <Drv>.quit().
<Drv>.get('<url>') # Also <Drv>.implicitly_wait(seconds).
<El> = <Drv/El>.find_element('css selector', '<css>') # '<tag>#<id>.<class>[<attr>="<val>"]'.
<list> = <Drv/El>.find_elements('xpath', '<xpath>') # '//<tag>[@<attr>="<val>"]'.
<str> = <El>.get_attribute/get_property(<str>) # Also <El>.text/tag_name.
<El>.click/clear() # Also <El>.send_keys(<str>).
XPath --- also available in browser's console via '$x(<xpath>)'
and by lxml library: XPath --- 也可以通过 '$x(<xpath>)'
和 lxml 库在浏览器控制台中使用:
xml
<xpath> = //<element>[/ or // <element>] # Child: /, Descendant: //, Parent: /..
<xpath> = //<element>/following::<element> # Next sibling. Also preceding/parent/...
<element> = <tag><conditions><index> # `<tag> = */a/...`, `<index> = [1/2/...]`.
<condition> = [<sub_cond> [and/or <sub_cond>]] # For negation use `not(<sub_cond>)`.
<sub_cond> = @<attr>="<val>" # `.="<val>"` matches complete text.
<sub_cond> = contains(@<attr>, "<val>") # Is <val> a substring of attr's value?
<sub_cond> = [//]<element> # Has matching child? Descendant if //.
[#](#Web 网站 "#web")Web 网站
Flask is a micro web framework/server. If you just want to open a html file in a web browser use 'webbrowser.open(<path>)'
instead. Flask 是一个微型 Web 框架/服务器。如果您只想在网络浏览器中打开 html 文件,请使用 'webbrowser.open(<path>)'
。
ini
# $ pip3 install flask
import flask
app = flask.Flask(__name__)
app.run(host=None, port=None, debug=None)
- Starts the app at
'http://localhost:5000'
. Use'host="0.0.0.0"'
to run externally. 在'http://localhost:5000'
处启动应用程序。使用'host="0.0.0.0"'
在外部运行。 - Install a WSGI server like Waitress and a HTTP server such as Nginx for better security. 安装 WSGI 服务器(如 Waitress)和 HTTP 服务器(如 Nginx)以获得更好的安全性。
- Debug mode restarts the app whenever script changes and displays errors in the browser. 每当脚本更改时,调试模式都会重新启动应用程序并在浏览器中显示错误。
Static Request 静态请求
python
@app.route('/img/<path:filename>')
def serve_file(filename):
return flask.send_from_directory('dirname/', filename)
Dynamic Request 动态请求
python
@app.route('/<sport>')
def serve_html(sport):
return flask.render_template_string('<h1>{{title}}</h1>', title=sport)
- Use
'render_template(filename, <kwargs>)'
to render file located in templates dir. 使用'render_template(filename, <kwargs>)'
渲染位于模板目录中的文件。 - To return an error code use
'abort(<int>)'
and to redirect use'redirect(<url>)'
. 要返回错误代码,请使用'abort(<int>)'
并使用'redirect(<url>)'
进行重定向。 'request.args[<str>]'
returns parameter from the query string (URL part after '?').'request.args[<str>]'
从查询字符串返回参数("?"之后的 URL 部分)。- Use
'session[key] = value'
to store session data like username, etc. 使用'session[key] = value'
存储用户名等会话数据。
REST Request REST请求
python
@app.post('/<sport>/odds')
def serve_json(sport):
team = flask.request.form['team']
return {'team': team, 'odds': [2.09, 3.74, 3.68]}
Starts the app in its own thread and queries its REST API: 在自己的线程中启动应用程序并查询其 REST API:
python
# $ pip3 install requests
>>> import threading, requests
>>> threading.Thread(target=app.run, daemon=True).start()
>>> url = 'http://localhost:5000/football/odds'
>>> request_data = {'team': 'arsenal f.c.'}
>>> response = requests.post(url, data=request_data)
>>> response.json()
{'team': 'arsenal f.c.', 'odds': [2.09, 3.74, 3.68]}
[#](#Profiling 分析 "#profiling")Profiling 分析
ini
from time import perf_counter
start_time = perf_counter()
...
duration_in_seconds = perf_counter() - start_time
Timing a Snippet 为一段代码计时
python
>>> from timeit import timeit
>>> timeit('list(range(10000))', number=1000, globals=globals(), setup='pass')
0.19373
Profiling by Line 按行分析
less
$ pip3 install line_profiler
$ echo '@profile
def main():
a = list(range(10000))
b = set(range(10000))
main()' > test.py
$ kernprof -lv test.py
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 @profile
2 def main():
3 1 253.4 253.4 32.2 a = list(range(10000))
4 1 534.1 534.1 67.8 b = set(range(10000))
Call and Flame Graphs 调用图和火焰图
shell
$ apt/brew install graphviz && pip3 install gprof2dot snakeviz # Or download installer.
$ tail --lines=4 test.py > test.py # Removes first line.
$ python3 -m cProfile -o test.prof test.py # Runs built-in profiler.
$ gprof2dot --format=pstats test.prof | dot -T png -o test.png # Generates call graph.
$ xdg-open/open test.png # Displays call graph.
$ snakeviz test.prof # Displays flame graph.
Sampling and Memory Profilers 采样和内存分析器
css
┏━━━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━┓
┃ pip3 install │ Type │ Target │ How to run │ Live ┃
┠──────────────┼──────────┼────────────┼───────────────────────────────┼──────┨
┃ pyinstrument │ Sampling │ CPU │ pyinstrument test.py │ × ┃
┃ py-spy │ Sampling │ CPU │ py-spy top -- python3 test.py │ ✓ ┃
┃ scalene │ Sampling │ CPU+Memory │ scalene test.py │ × ┃
┃ memray │ Tracing │ Memory │ memray run --live test.py │ ✓ ┃
┗━━━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━┛
#NumPy
Array manipulation mini-language. It can run up to one hundred times faster than the equivalent Python code. An even faster alternative that runs on a GPU is called CuPy. 数组操作迷你语言。它的运行速度比同等的 Python 代码快一百倍。在 GPU 上运行的更快的替代方案称为 CuPy。
xml
# $ pip3 install numpy
import numpy as np
<array> = np.array(<list/list_of_lists/...>) # Returns a 1d/2d/... NumPy array.
<array> = np.zeros/ones/empty(<shape>) # Also np.full(<shape>, <el>).
<array> = np.arange(from_inc, to_exc, ±step) # Also np.linspace(start, stop, len).
<array> = np.random.randint(from_inc, to_exc, <shape>) # Also np.random.random(<shape>).
<view> = <array>.reshape(<shape>) # Also `<array>.shape = <shape>`.
<array> = <array>.flatten() # Also `<view> = <array>.ravel()`.
<view> = <array>.transpose() # Or: <array>.T
<array> = np.copy/abs/sqrt/log/int64(<array>) # Returns new array of the same shape.
<array> = <array>.sum/max/mean/argmax/all(axis) # Passed dimension gets aggregated.
<array> = np.apply_along_axis(<func>, axis, <array>) # Func can return a scalar or array.
<array> = np.concatenate(<list_of_arrays>, axis=0) # Links arrays along first axis (rows).
<array> = np.row_stack/column_stack(<list_of_arrays>) # Treats 1d arrays as rows or columns.
<array> = np.tile/repeat(<array>, <int/list> [, axis]) # Tiles array or repeats its elements.
- Shape is a tuple of dimension sizes. A 100x50 RGB image has shape (50, 100, 3). 形状是尺寸大小的元组。 100x50 RGB 图像的形状为 (50, 100, 3)。
- Axis is an index of a dimension. Leftmost dimension has index 0. Summing the RGB image along axis 2 will return a greyscale image with shape (50, 100). 轴是维度的索引。最左边的维度的索引为 0。沿轴 2 对 RGB 图像求和将返回形状为 (50, 100) 的灰度图像。
Indexing 索引
ini
<el> = <2d_array>[row_index, column_index] # <3d_a>[table_i, row_i, column_i]
<1d_view> = <2d_array>[row_index] # <3d_a>[table_i, row_i]
<1d_view> = <2d_array>[:, column_index] # <3d_a>[table_i, :, column_i]
<2d_view> = <2d_array>[rows_slice, columns_slice] # <3d_a>[table_i, rows_s, columns_s]
<2d_array> = <2d_array>[row_indexes] # <3d_a>[table_i/is, row_is]
<2d_array> = <2d_array>[:, column_indexes] # <3d_a>[table_i/is, :, column_is]
<1d_array> = <2d_array>[row_indexes, column_indexes] # <3d_a>[table_i/is, row_is, column_is]
<1d_array> = <2d_array>[row_indexes, column_index] # <3d_a>[table_i/is, row_is, column_i]
<2d_bools> = <2d_array> > <el/1d/2d_array> # 1d_array must have size of a row.
<1d/2d_a> = <2d_array>[<2d/1d_bools>] # 1d_bools must have size of a column.
- Indexes should not be tuples because Python converts
'obj[i, j]'
to'obj[(i, j)]'
! 索引不应该是元组,因为 Python 将'obj[i, j]'
转换为'obj[(i, j)]'
! ':'
returns a slice of all dimension's indexes. Omitted dimensions default to':'
.':'
返回所有维度索引的切片。省略的尺寸默认为':'
。- Any value that is broadcastable to the indexed shape can be assigned to the selection. 任何可广播到索引形状的值都可以分配给选择。
Broadcasting 广播
Set of rules by which NumPy functions operate on arrays of different sizes and/or dimensions. NumPy 函数对不同大小和/或维度的数组进行操作的一组规则。
ini
left = [[0.1], [0.6], [0.8]] # Shape: (3, 1)
right = [ 0.1 , 0.6 , 0.8 ] # Shape: (3,)
1. If array shapes differ in length, left-pad the shorter shape with ones: 1. 如果数组形状的长度不同,则用 1 向左填充较短的形状:
lua
left = [[0.1], [0.6], [0.8]] # Shape: (3, 1)
right = [[0.1 , 0.6 , 0.8]] # Shape: (1, 3) <- !
2. If any dimensions differ in size, expand the ones that have size 1 by duplicating their elements: 2. 如果任何维度的大小不同,请通过复制其元素来扩展大小为 1 的维度:
lua
left = [[0.1, 0.1, 0.1], # Shape: (3, 3) <- !
[0.6, 0.6, 0.6],
[0.8, 0.8, 0.8]]right = [[0.1, 0.6, 0.8], # Shape: (3, 3) <- !
[0.1, 0.6, 0.8],
[0.1, 0.6, 0.8]]
Example 例子
For each point returns index of its nearest point ([0.1, 0.6, 0.8] => [1, 2, 1]
): 对于每个点返回其最近点的索引( [0.1, 0.6, 0.8] => [1, 2, 1]
):
ini
>>> points = np.array([0.1, 0.6, 0.8])
[ 0.1, 0.6, 0.8]
>>> wrapped_points = points.reshape(3, 1)
[[ 0.1],
[ 0.6],
[ 0.8]]
>>> distances = wrapped_points - points
[[ 0. , -0.5, -0.7],
[ 0.5, 0. , -0.2],
[ 0.7, 0.2, 0. ]]
>>> distances = np.abs(distances)
[[ 0. , 0.5, 0.7],
[ 0.5, 0. , 0.2],
[ 0.7, 0.2, 0. ]]
>>> distances[range(3), range(3)] = np.inf
[[ inf, 0.5, 0.7],
[ 0.5, inf, 0.2],
[ 0.7, 0.2, inf]]
>>> distances.argmin(1)
[1, 2, 1]
[#](#Image 图片/图像 "#image")Image 图片/图像
python
# $ pip3 install pillow
from PIL import Image
<Image> = Image.new('<mode>', (width, height)) # Also `color=<int/tuple/str>`.
<Image> = Image.open(<path>) # Identifies format based on file contents.
<Image> = <Image>.convert('<mode>') # Converts image to the new mode.
<Image>.save(<path>) # Selects format based on the path extension.
<Image>.show() # Opens image in the default preview app.
<int/tuple> = <Image>.getpixel((x, y)) # Returns pixel's value (its color).
<Image>.putpixel((x, y), <int/tuple>) # Updates pixel's value.
<ImagingCore> = <Image>.getdata() # Returns a flattened view of pixel values.
<Image>.putdata(<list/ImagingCore>) # Updates pixels with a copy of the sequence.
<Image>.paste(<Image>, (x, y)) # Draws passed image at specified location.
<Image> = <Image>.filter(<Filter>) # `<Filter> = ImageFilter.<name>([<args>])`
<Image> = <Enhance>.enhance(<float>) # `<Enhance> = ImageEnhance.<name>(<Image>)`
<array> = np.array(<Image>) # Creates a 2d/3d NumPy array from the image.
<Image> = Image.fromarray(np.uint8(<array>)) # Use `<array>.clip(0, 255)` to clip values.
Modes 模式
'L'
- 8-bit pixels, greyscale.'L'
- 8 位像素,灰度。'RGB'
- 3x8-bit pixels, true color.'RGB'
- 3x8 位像素,真彩色。'RGBA'
- 4x8-bit pixels, true color with transparency mask.'RGBA'
- 4x8 位像素,带透明蒙版的真彩色。'HSV'
- 3x8-bit pixels, Hue, Saturation, Value color space.'HSV'
- 3x8 位像素、色调、饱和度、明度颜色空间。
Examples 例子
Creates a PNG image of a rainbow gradient: 创建彩虹渐变的 PNG 图像:
css
WIDTH, HEIGHT = 100, 100
n_pixels = WIDTH * HEIGHT
hues = (255 * i/n_pixels for i in range(n_pixels))
img = Image.new('HSV', (WIDTH, HEIGHT))
img.putdata([(int(h), 255, 255) for h in hues])
img.convert('RGB').save('test.png')
Adds noise to the PNG image and displays it: 向 PNG 图像添加噪声并显示它:
css
from random import randint
add_noise = lambda value: max(0, min(255, value + randint(-20, 20)))
img = Image.open('test.png').convert('HSV')
img.putdata([(add_noise(h), s, v) for h, s, v in img.getdata()])
img.show()
Image Draw 图像绘制
scss
from PIL import ImageDraw
<ImageDraw> = ImageDraw.Draw(<Image>) # Object for adding 2D graphics to the image.
<ImageDraw>.point((x, y)) # Draws a point. Truncates floats into ints.
<ImageDraw>.line((x1, y1, x2, y2 [, ...])) # To get anti-aliasing use Image's resize().
<ImageDraw>.arc((x1, y1, x2, y2), deg1, deg2) # Always draws in clockwise direction.
<ImageDraw>.rectangle((x1, y1, x2, y2)) # To rotate use Image's rotate() and paste().
<ImageDraw>.polygon((x1, y1, x2, y2, ...)) # Last point gets connected to the first.
<ImageDraw>.ellipse((x1, y1, x2, y2)) # To rotate use Image's rotate() and paste().
<ImageDraw>.text((x, y), <str>, font=<Font>) # `<Font> = ImageFont.truetype(<path>, size)`
- Use
'fill=<color>'
to set the primary color. 使用'fill=<color>'
设置原色。 - Use
'width=<int>'
to set the width of lines or contours. 使用'width=<int>'
设置线条或轮廓的宽度。 - Use
'outline=<color>'
to set the color of the contours. 使用'outline=<color>'
设置轮廓的颜色。 - Color can be an int, tuple,
'#rrggbb[aa]'
string or a color name. 颜色可以是整数、元组、'#rrggbb[aa]'
字符串或颜色名称。
[#](#Animation 动画 "#animation")Animation 动画
Creates a GIF of a bouncing ball: 创建弹跳球的 GIF:
ini
# $ pip3 install imageio
from PIL import Image, ImageDraw
import imageioWIDTH, HEIGHT, R = 126, 126, 10
frames = []
for velocity in range(1, 16):
y = sum(range(velocity))
frame = Image.new('L', (WIDTH, HEIGHT))
draw = ImageDraw.Draw(frame)
draw.ellipse((WIDTH/2-R, y, WIDTH/2+R, y+R*2), fill='white')
frames.append(frame)
frames += reversed(frames[1:-1])
imageio.mimsave('test.gif', frames, duration=0.03)
[#](#Audio 音频 "#audio")Audio 音频
python
import wave
<Wave> = wave.open('<path>', 'rb') # Opens the WAV file.
<int> = <Wave>.getframerate() # Returns number of frames per second.
<int> = <Wave>.getnchannels() # Returns number of samples per frame.
<int> = <Wave>.getsampwidth() # Returns number of bytes per sample.
<tuple> = <Wave>.getparams() # Returns namedtuple of all parameters.
<bytes> = <Wave>.readframes(nframes) # Returns next n frames. All if -1.
<Wave> = wave.open('<path>', 'wb') # Creates/truncates a file for writing.
<Wave>.setframerate(<int>) # Pass 44100 for CD, 48000 for video.
<Wave>.setnchannels(<int>) # Pass 1 for mono, 2 for stereo.
<Wave>.setsampwidth(<int>) # Pass 2 for CD, 3 for hi-res sound.
<Wave>.setparams(<tuple>) # Sets all parameters.
<Wave>.writeframes(<bytes>) # Appends frames to the file.
- Bytes object contains a sequence of frames, each consisting of one or more samples. Bytes 对象包含一系列帧,每个帧由一个或多个样本组成。
- In a stereo signal, the first sample of a frame belongs to the left channel. 在立体声信号中,帧的第一个样本属于左声道。
- Each sample consists of one or more bytes that, when converted to an integer, indicate the displacement of a speaker membrane at a given moment. 每个样本由一个或多个字节组成,当转换为整数时,指示扬声器膜在给定时刻的位移。
- If sample width is one byte, then the integer should be encoded unsigned. 如果样本宽度为一字节,则整数应编码为无符号。
- For all other sizes, the integer should be encoded signed with little-endian byte order. 对于所有其他大小,整数应使用小端字节顺序进行有符号编码。
Sample Values 样本值
arduino
┏━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━┯━━━━━━━━━━━┓
┃ sampwidth │ min │ zero │ max ┃
┠───────────┼───────────┼──────┼───────────┨
┃ 1 │ 0 │ 128 │ 255 ┃
┃ 2 │ -32768 │ 0 │ 32767 ┃
┃ 3 │ -8388608 │ 0 │ 8388607 ┃
┗━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━┷━━━━━━━━━━━┛
Read Float Samples from WAV File 从 WAV 文件读取浮点样本
python
def read_wav_file(filename):
def get_int(bytes_obj):
an_int = int.from_bytes(bytes_obj, 'little', signed=(sampwidth != 1))
return an_int - 128 * (sampwidth == 1)
with wave.open(filename, 'rb') as file:
sampwidth = file.getsampwidth()
frames = file.readframes(-1)
bytes_samples = (frames[i : i+sampwidth] for i in range(0, len(frames), sampwidth))
return [get_int(b) / pow(2, sampwidth * 8 - 1) for b in bytes_samples]
Write Float Samples to WAV File 将浮点样本写入 WAV 文件
scss
def write_to_wav_file(filename, float_samples, nchannels=1, sampwidth=2, framerate=44100):
def get_bytes(a_float):
a_float = max(-1, min(1 - 2e-16, a_float))
a_float += sampwidth == 1
a_float *= pow(2, sampwidth * 8 - 1)
return int(a_float).to_bytes(sampwidth, 'little', signed=(sampwidth != 1))
with wave.open(filename, 'wb') as file:
file.setnchannels(nchannels)
file.setsampwidth(sampwidth)
file.setframerate(framerate)
file.writeframes(b''.join(get_bytes(f) for f in float_samples))
Examples 例子
Saves a 440 Hz sine wave to a mono WAV file: 将 440 Hz 正弦波保存为单声道 WAV 文件:
lua
from math import pi, sin
samples_f = (sin(i * 2 * pi * 440 / 44100) for i in range(100_000))
write_to_wav_file('test.wav', samples_f)
Adds noise to the mono WAV file: 向单声道 WAV 文件添加噪音:
css
from random import random
add_noise = lambda value: value + (random() - 0.5) * 0.03
samples_f = (add_noise(f) for f in read_wav_file('test.wav'))
write_to_wav_file('test.wav', samples_f)
Plays the WAV file: 播放 WAV 文件:
python
# $ pip3 install simpleaudio
from simpleaudio import play_buffer
with wave.open('test.wav', 'rb') as file:
p = file.getparams()
frames = file.readframes(-1)
play_buffer(frames, p.nchannels, p.sampwidth, p.framerate).wait_done()
Text to Speech 文字转语音
csharp
# $ pip3 install pyttsx3
import pyttsx3
engine = pyttsx3.init()
engine.say('Sally sells seashells by the seashore.')
engine.runAndWait()
[#](#Synthesizer 合成器 "#synthesizer")Synthesizer 合成器
Plays Popcorn by Gershon Kingsley: 格申·金斯利 (Gershon Kingsley) 玩爆米花:
ini
# $ pip3 install simpleaudio
import array, itertools as it, math, simpleaudioF = 44100
P1 = '71♩,69♪,,71♩,66♪,,62♩,66♪,,59♩,,,71♩,69♪,,71♩,66♪,,62♩,66♪,,59♩,,,'
P2 = '71♩,73♪,,74♩,73♪,,74♪,,71♪,,73♩,71♪,,73♪,,69♪,,71♩,69♪,,71♪,,67♪,,71♩,,,'
get_pause = lambda seconds: it.repeat(0, int(seconds * F))
sin_f = lambda i, hz: math.sin(i * 2 * math.pi * hz / F)
get_wave = lambda hz, seconds: (sin_f(i, hz) for i in range(int(seconds * F)))
get_hz = lambda note: 8.176 * 2 ** (int(note[:2]) / 12)
get_sec = lambda note: 1/4 if '♩' in note else 1/8
get_samples = lambda note: get_wave(get_hz(note), get_sec(note)) if note else get_pause(1/8)
samples_f = it.chain.from_iterable(get_samples(n) for n in (P1+P2).split(','))
samples_i = array.array('h', (int(f * 30000) for f in samples_f))
simpleaudio.play_buffer(samples_i, 1, 2, F).wait_done()
[#](#Pygame 小游戏 "#pygame")Pygame 小游戏
csharp
# $ pip3 install pygame
import pygame as pgpg.init()
screen = pg.display.set_mode((500, 500))
rect = pg.Rect(240, 240, 20, 20)
while not pg.event.get(pg.QUIT):
deltas = {pg.K_UP: (0, -20), pg.K_RIGHT: (20, 0), pg.K_DOWN: (0, 20), pg.K_LEFT: (-20, 0)}
for event in pg.event.get(pg.KEYDOWN):
dx, dy = deltas.get(event.key, (0, 0))
rect = rect.move((dx, dy))
screen.fill((0, 0, 0))
pg.draw.rect(screen, (255, 255, 255), rect)
pg.display.flip()
Rectangle 长方形
Object for storing rectangular coordinates. 用于存储直角坐标的对象。
xml
<Rect> = pg.Rect(x, y, width, height) # Floats get truncated into ints.
<int> = <Rect>.x/y/centerx/centery/... # Top, right, bottom, left. Allows assignments.
<tup.> = <Rect>.topleft/center/... # Topright, bottomright, bottomleft. Same.
<Rect> = <Rect>.move((delta_x, delta_y)) # Use move_ip() to move in-place.
<bool> = <Rect>.collidepoint((x, y)) # Checks if rectangle contains the point.
<bool> = <Rect>.colliderect(<Rect>) # Checks if the two rectangles overlap.
<int> = <Rect>.collidelist(<list_of_Rect>) # Returns index of first colliding Rect or -1.
<list> = <Rect>.collidelistall(<list_of_Rect>) # Returns indexes of all colliding rectangles.
Surface 表面
Object for representing images. 用于表示图像的对象。
scss
<Surf> = pg.display.set_mode((width, height)) # Opens new window and returns its surface.
<Surf> = pg.Surface((width, height)) # New RGB surface. RGBA if `flags=pg.SRCALPHA`.
<Surf> = pg.image.load(<path/file>) # Loads the image. Format depends on source.
<Surf> = pg.surfarray.make_surface(<np_array>) # Also `<np_arr> = surfarray.pixels3d(<Surf>)`.
<Surf> = <Surf>.subsurface(<Rect>) # Creates a new surface from the cutout.
<Surf>.fill(color) # Tuple, Color('#rrggbb[aa]') or Color(<name>).
<Surf>.set_at((x, y), color) # Updates pixel. Also <Surf>.get_at((x, y)).
<Surf>.blit(<Surf>, (x, y)) # Draws passed surface at specified location.
from pygame.transform import scale, ...
<Surf> = scale(<Surf>, (width, height)) # Returns scaled surface.
<Surf> = rotate(<Surf>, anticlock_degrees) # Returns rotated and scaled surface.
<Surf> = flip(<Surf>, x_bool, y_bool) # Returns flipped surface.
from pygame.draw import line, ...
line(<Surf>, color, (x1, y1), (x2, y2), width) # Draws a line to the surface.
arc(<Surf>, color, <Rect>, from_rad, to_rad) # Also ellipse(<Surf>, color, <Rect>, width=0).
rect(<Surf>, color, <Rect>, width=0) # Also polygon(<Surf>, color, points, width=0).
Font 字体
scss
<Font> = pg.font.Font(<path/file>, size) # Loads TTF file. Pass None for default font.
<Surf> = <Font>.render(text, antialias, color) # Background color can be specified at the end.
Sound 声音
scss
<Sound> = pg.mixer.Sound(<path/file/bytes>) # WAV file or bytes/array of signed shorts.
<Sound>.play/stop() # Also set_volume(<float>), fadeout(msec).
Basic Mario Brothers Example 基本马里奥兄弟示例
scss
import collections, dataclasses, enum, io, itertools as it, pygame as pg, urllib.request
from random import randint
P = collections.namedtuple('P', 'x y') # Position
D = enum.Enum('D', 'n e s w') # Direction
W, H, MAX_S = 50, 50, P(5, 10) # Width, Height, Max speed
def main():
def get_screen():
pg.init()
return pg.display.set_mode((W*16, H*16))
def get_images():
url = 'https://gto76.github.io/python-cheatsheet/web/mario_bros.png'
img = pg.image.load(io.BytesIO(urllib.request.urlopen(url).read()))
return [img.subsurface(get_rect(x, 0)) for x in range(img.get_width() // 16)]
def get_mario():
Mario = dataclasses.make_dataclass('Mario', 'rect spd facing_left frame_cycle'.split())
return Mario(get_rect(1, 1), P(0, 0), False, it.cycle(range(3)))
def get_tiles():
border = [(x, y) for x in range(W) for y in range(H) if x in [0, W-1] or y in [0, H-1]]
platforms = [(randint(1, W-2), randint(2, H-2)) for _ in range(W*H // 10)]
return [get_rect(x, y) for x, y in border + platforms]
def get_rect(x, y):
return pg.Rect(x*16, y*16, 16, 16)
run(get_screen(), get_images(), get_mario(), get_tiles())
def run(screen, images, mario, tiles):
clock = pg.time.Clock()
pressed = set()
while not pg.event.get(pg.QUIT) and clock.tick(28):
keys = {pg.K_UP: D.n, pg.K_RIGHT: D.e, pg.K_DOWN: D.s, pg.K_LEFT: D.w}
pressed |= {keys.get(e.key) for e in pg.event.get(pg.KEYDOWN)}
pressed -= {keys.get(e.key) for e in pg.event.get(pg.KEYUP)}
update_speed(mario, tiles, pressed)
update_position(mario, tiles)
draw(screen, images, mario, tiles, pressed)
def update_speed(mario, tiles, pressed):
x, y = mario.spd
x += 2 * ((D.e in pressed) - (D.w in pressed))
x += (x < 0) - (x > 0)
y += 1 if D.s not in get_boundaries(mario.rect, tiles) else (D.n in pressed) * -10
mario.spd = P(x=max(-MAX_S.x, min(MAX_S.x, x)), y=max(-MAX_S.y, min(MAX_S.y, y)))
def update_position(mario, tiles):
x, y = mario.rect.topleft
n_steps = max(abs(s) for s in mario.spd)
for _ in range(n_steps):
mario.spd = stop_on_collision(mario.spd, get_boundaries(mario.rect, tiles))
mario.rect.topleft = x, y = x + (mario.spd.x / n_steps), y + (mario.spd.y / n_steps)
def get_boundaries(rect, tiles):
deltas = {D.n: P(0, -1), D.e: P(1, 0), D.s: P(0, 1), D.w: P(-1, 0)}
return {d for d, delta in deltas.items() if rect.move(delta).collidelist(tiles) != -1}
def stop_on_collision(spd, bounds):
return P(x=0 if (D.w in bounds and spd.x < 0) or (D.e in bounds and spd.x > 0) else spd.x,
y=0 if (D.n in bounds and spd.y < 0) or (D.s in bounds and spd.y > 0) else spd.y)
def draw(screen, images, mario, tiles, pressed):
def get_marios_image_index():
if D.s not in get_boundaries(mario.rect, tiles):
return 4
return next(mario.frame_cycle) if {D.w, D.e} & pressed else 6
screen.fill((85, 168, 255))
mario.facing_left = (D.w in pressed) if {D.w, D.e} & pressed else mario.facing_left
screen.blit(images[get_marios_image_index() + mario.facing_left * 9], mario.rect)
for t in tiles:
screen.blit(images[18 if t.x in [0, (W-1)*16] or t.y in [0, (H-1)*16] else 19], t)
pg.display.flip()
if __name__ == '__main__':
main()
[#](#Pandas 数据处理 "#pandas")Pandas 数据处理
python
# $ pip3 install pandas matplotlib
import pandas as pd, matplotlib.pyplot as plt
Series 系列
Ordered dictionary with a name. 带名字的有序字典。
ini
>>> pd.Series([1, 2], index=['x', 'y'], name='a')
x 1
y 2
Name: a, dtype: int64
<Sr> = pd.Series(<list>) # Assigns RangeIndex starting at 0.
<Sr> = pd.Series(<dict>) # Takes dictionary's keys for index.
<Sr> = pd.Series(<dict/Series>, index=<list>) # Only keeps items with keys specified in index.
<el> = <Sr>.loc[key] # Or: <Sr>.iloc[index]
<Sr> = <Sr>.loc[keys] # Or: <Sr>.iloc[indexes]
<Sr> = <Sr>.loc[from_key : to_key_inclusive] # Or: <Sr>.iloc[from_i : to_i_exclusive]
<el> = <Sr>[key/index] # Or: <Sr>.key
<Sr> = <Sr>[keys/indexes] # Or: <Sr>[<keys_slice/slice>]
<Sr> = <Sr>[bools] # Or: <Sr>.loc/iloc[bools]
<Sr> = <Sr> > <el/Sr> # Returns a Series of bools.
<Sr> = <Sr> + <el/Sr> # Items with non-matching keys get value NaN.
<Sr> = pd.concat(<coll_of_Sr>) # Concats multiple series into one long Series.
<Sr> = <Sr>.combine_first(<Sr>) # Adds items that are not yet present.
<Sr>.update(<Sr>) # Updates items that are already present.
<Sr>.plot.line/area/bar/pie/hist() # Generates a Matplotlib plot.
plt.show() # Displays the plot. Also plt.savefig(<path>).
Series --- Aggregate, Transform, Map: 系列 --- 聚合、转换、映射:
scss
<el> = <Sr>.sum/max/mean/idxmax/all() # Or: <Sr>.agg(lambda <Sr>: <el>)
<Sr> = <Sr>.rank/diff/cumsum/ffill/interplt() # Or: <Sr>.agg/transform(lambda <Sr>: <Sr>)
<Sr> = <Sr>.fillna(<el>) # Or: <Sr>.agg/transform/map(lambda <el>: <el>)
>>> sr = pd.Series([2, 3], index=['x', 'y'])
x 2
y 3
┏━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┓
┃ │ 'sum' │ ['sum'] │ {'s': 'sum'} ┃
┠───────────────┼─────────────┼─────────────┼───────────────┨
┃ sr.apply(...) │ 5 │ sum 5 │ s 5 ┃
┃ sr.agg(...) │ │ │ ┃
┗━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┛┏━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┓
┃ │ 'rank' │ ['rank'] │ {'r': 'rank'} ┃
┠───────────────┼─────────────┼─────────────┼───────────────┨
┃ sr.apply(...) │ │ rank │ ┃
┃ sr.agg(...) │ x 1 │ x 1 │ r x 1 ┃
┃ │ y 2 │ y 2 │ y 2 ┃
┗━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┛
- Keys/indexes/bools can't be tuples because
'obj[x, y]'
is converted to'obj[(x, y)]'
! 键/索引/布尔值不能是元组,因为'obj[x, y]'
被转换为'obj[(x, y)]'
! - Methods ffill(), interpolate(), fillna() and dropna() accept
'inplace=True'
. 方法 ffill()、interpolate()、fillna() 和 dropna() 接受'inplace=True'
。 - Last result has a hierarchical index. Use
'<Sr>[key_1, key_2]'
to get its values. 最后的结果有一个分层索引。使用'<Sr>[key_1, key_2]'
获取其值。
DataFrame 数据框
Table with labeled rows and columns. 带有标记的行和列的表格。
ini
>>> pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
x y
a 1 2
b 3 4
<DF> = pd.DataFrame(<list_of_rows>) # Rows can be either lists, dicts or series.
<DF> = pd.DataFrame(<dict_of_columns>) # Columns can be either lists, dicts or series.
<el> = <DF>.loc[row_key, column_key] # Or: <DF>.iloc[row_index, column_index]
<Sr/DF> = <DF>.loc[row_key/s] # Or: <DF>.iloc[row_index/es]
<Sr/DF> = <DF>.loc[:, column_key/s] # Or: <DF>.iloc[:, column_index/es]
<DF> = <DF>.loc[row_bools, column_bools] # Or: <DF>.iloc[row_bools, column_bools]
<Sr/DF> = <DF>[column_key/s] # Or: <DF>.column_key
<DF> = <DF>[row_bools] # Keeps rows as specified by bools.
<DF> = <DF>[<DF_of_bools>] # Assigns NaN to items that are False in bools.
<DF> = <DF> > <el/Sr/DF> # Returns DF of bools. Sr is treated as a row.
<DF> = <DF> + <el/Sr/DF> # Items with non-matching keys get value NaN.
<DF> = <DF>.set_index(column_key) # Replaces row keys with values from the column.
<DF> = <DF>.reset_index(drop=False) # Drops or moves row keys to column named index.
<DF> = <DF>.sort_index(ascending=True) # Sorts rows by row keys. Use `axis=1` for cols.
<DF> = <DF>.sort_values(column_key/s) # Sorts rows by passed column/s. Also `axis=1`.
DataFrame --- Merge, Join, Concat: DataFrame --- 数据框 合并、连接、连接:
css
>>> l = pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
x y
a 1 2
b 3 4
>>> r = pd.DataFrame([[4, 5], [6, 7]], index=['b', 'c'], columns=['y', 'z'])
y z
b 4 5
c 6 7
┏━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ │ 'outer' │ 'inner' │ 'left' │ Description ┃
┠────────────────────────┼───────────────┼────────────┼────────────┼──────────────────────────┨
┃ l.merge(r, on='y', │ x y z │ x y z │ x y z │ Merges on column if 'on' ┃
┃ how=...) │ 0 1 2 . │ 3 4 5 │ 1 2 . │ or 'left/right_on' are ┃
┃ │ 1 3 4 5 │ │ 3 4 5 │ set, else on shared cols.┃
┃ │ 2 . 6 7 │ │ │ Uses 'inner' by default. ┃
┠────────────────────────┼───────────────┼────────────┼────────────┼──────────────────────────┨
┃ l.join(r, lsuffix='l', │ x yl yr z │ │ x yl yr z │ Merges on row keys. ┃
┃ rsuffix='r', │ a 1 2 . . │ x yl yr z │ 1 2 . . │ Uses 'left' by default. ┃
┃ how=...) │ b 3 4 4 5 │ 3 4 4 5 │ 3 4 4 5 │ If r is a Series, it is ┃
┃ │ c . . 6 7 │ │ │ treated as a column. ┃
┠────────────────────────┼───────────────┼────────────┼────────────┼──────────────────────────┨
┃ pd.concat([l, r], │ x y z │ y │ │ Adds rows at the bottom. ┃
┃ axis=0, │ a 1 2 . │ 2 │ │ Uses 'outer' by default. ┃
┃ join=...) │ b 3 4 . │ 4 │ │ A Series is treated as a ┃
┃ │ b . 4 5 │ 4 │ │ column. To add a row use ┃
┃ │ c . 6 7 │ 6 │ │ pd.concat([l, DF([sr])]).┃
┠────────────────────────┼───────────────┼────────────┼────────────┼──────────────────────────┨
┃ pd.concat([l, r], │ x y y z │ │ │ Adds columns at the ┃
┃ axis=1, │ a 1 2 . . │ x y y z │ │ right end. Uses 'outer' ┃
┃ join=...) │ b 3 4 4 5 │ 3 4 4 5 │ │ by default. A Series is ┃
┃ │ c . . 6 7 │ │ │ treated as a column. ┃
┠────────────────────────┼───────────────┼────────────┼────────────┼──────────────────────────┨
┃ l.combine_first(r) │ x y z │ │ │ Adds missing rows and ┃
┃ │ a 1 2 . │ │ │ columns. Also updates ┃
┃ │ b 3 4 5 │ │ │ items that contain NaN. ┃
┃ │ c . 6 7 │ │ │ Argument r must be a DF. ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━┛
DataFrame --- Aggregate, Transform, Map: DataFrame --- 数据框 聚合、转换、映射:
xml
<Sr> = <DF>.sum/max/mean/idxmax/all() # Or: <DF>.apply/agg(lambda <Sr>: <el>)
<DF> = <DF>.rank/diff/cumsum/ffill/interplt() # Or: <DF>.apply/agg/transfrm(lambda <Sr>: <Sr>)
<DF> = <DF>.fillna(<el>) # Or: <DF>.applymap(lambda <el>: <el>)
- All operations operate on columns by default. Pass
'axis=1'
to process the rows instead. 默认情况下,所有操作都对列进行。而是传递'axis=1'
来处理行。
css
>>> df = pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
x y
a 1 2
b 3 4
┏━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┓
┃ │ 'sum' │ ['sum'] │ {'x': 'sum'} ┃
┠─────────────────┼─────────────┼─────────────┼───────────────┨
┃ df.apply(...) │ x 4 │ x y │ x 4 ┃
┃ df.agg(...) │ y 6 │ sum 4 6 │ ┃
┗━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┛┏━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┓
┃ │ 'rank' │ ['rank'] │ {'x': 'rank'} ┃
┠─────────────────┼─────────────┼─────────────┼───────────────┨
┃ df.apply(...) │ │ x y │ ┃
┃ df.agg(...) │ x y │ rank rank │ x ┃
┃ df.transform(...) │ a 1 1 │ a 1 1 │ a 1 ┃
┃ │ b 2 2 │ b 2 2 │ b 2 ┃
┗━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┛
- Use
'<DF>[col_key_1, col_key_2][row_key]'
to get the fifth result's values. 使用'<DF>[col_key_1, col_key_2][row_key]'
获取第五个结果的值。
DataFrame --- Plot, Encode, Decode: DataFrame --- 数据框 绘图、编码、解码:
bash
<DF>.plot.line/area/bar/hist/scatter/box() # Also: `x=column_key, y=column_key/s`.
plt.show() # Displays the plot. Also plt.savefig(<path>).
<DF> = pd.read_json/html('<str/path/url>') # Run `$ pip3 install beautifulsoup4 lxml`.
<DF> = pd.read_csv('<path/url>') # `header/index_col/dtype/parse_dates=<obj>`.
<DF> = pd.read_pickle/excel('<path/url>') # Use `sheet_name=None` to get all Excel sheets.
<DF> = pd.read_sql('<table/query>', <conn.>) # SQLite3/SQLAlchemy connection (see #SQLite).
<dict> = <DF>.to_dict(['d/l/s/...']) # Returns columns as dicts, lists or series.
<str> = <DF>.to_json/html/csv([<path>]) # Also to_markdown/latex([<path>]).
<DF>.to_pickle/excel(<path>) # Run `$ pip3 install "pandas[excel]" odfpy`.
<DF>.to_sql('<table_name>', <connection>) # Also `if_exists='fail/replace/append'`.
GroupBy 通过...分组
Object that groups together rows of a dataframe based on the value of the passed column. 根据传递的列的值将数据帧的行分组在一起的对象。
bash
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 6]], list('abc'), list('xyz'))
>>> df.groupby('z').get_group(6)
x y z
b 4 5 6
c 7 8 6
<GB> = <DF>.groupby(column_key/s) # Splits DF into groups based on passed column.
<DF> = <GB>.apply(<func>) # Maps each group. Func can return DF, Sr or el.
<GB> = <GB>[column_key] # Single column GB. All operations return a Sr.
<Sr> = <GB>.size() # A Sr of group sizes. Same keys as get_group().
GroupBy --- Aggregate, Transform, Map: GroupBy --- 聚合、转换、映射:
css
<DF> = <GB>.sum/max/mean/idxmax/all() # Or: <GB>.agg(lambda <Sr>: <el>)
<DF> = <GB>.rank/diff/cumsum/ffill() # Or: <GB>.transform(lambda <Sr>: <Sr>)
<DF> = <GB>.fillna(<el>) # Or: <GB>.transform(lambda <Sr>: <Sr>)
>>> gb = df.groupby('z'); gb.apply(print)
x y z
a 1 2 3
x y z
b 4 5 6
c 7 8 6
┏━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┓
┃ │ 'sum' │ 'rank' │ ['rank'] │ {'x': 'rank'} ┃
┠─────────────────┼─────────────┼─────────────┼─────────────┼───────────────┨
┃ gb.agg(...) │ x y │ │ x y │ ┃
┃ │ z │ x y │ rank rank │ x ┃
┃ │ 3 1 2 │ a 1 1 │ a 1 1 │ a 1 ┃
┃ │ 6 11 13 │ b 1 1 │ b 1 1 │ b 1 ┃
┃ │ │ c 2 2 │ c 2 2 │ c 2 ┃
┠─────────────────┼─────────────┼─────────────┼─────────────┼───────────────┨
┃ gb.transform(...) │ x y │ x y │ │ ┃
┃ │ a 1 2 │ a 1 1 │ │ ┃
┃ │ b 11 13 │ b 1 1 │ │ ┃
┃ │ c 11 13 │ c 2 2 │ │ ┃
┗━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┛
Rolling 滚动
Object for rolling window calculations. 用于滚动窗口计算的对象。
bash
<RSr/RDF/RGB> = <Sr/DF/GB>.rolling(win_size) # Also: `min_periods=None, center=False`.
<RSr/RDF/RGB> = <RDF/RGB>[column_key/s] # Or: <RDF/RGB>.column_key
<Sr/DF> = <R>.mean/sum/max() # Or: <R>.apply/agg(<agg_func/str>)
[#](#Plotly 绘图 "#plotly")Plotly 绘图
python
# $ pip3 install pandas plotly kaleido
import pandas as pd, plotly.express as ex
<Figure> = ex.line(<DF>, x=<col_name>, y=<col_name>) # Or: ex.line(x=<list>, y=<list>)
<Figure>.update_layout(margin=dict(t=0, r=0, b=0, l=0), ...) # `paper_bgcolor='rgb(0, 0, 0)'`.
<Figure>.write_html/json/image('<path>') # Also <Figure>.show().
Displays a line chart of total coronavirus deaths per million grouped by continent: 显示按大陆分组的每百万人冠状病毒死亡总数的折线图:
Apr 2020Jul 2020Oct 2020Jan 2021Apr 2021Jul 2021Oct 202105001000150020002500
ContinentSouth AmericaNorth AmericaEuropeAsiaAfricaOceaniaDateTotal Deaths per Million
ini
covid = pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv',
usecols=['iso_code', 'date', 'total_deaths', 'population'])
continents = pd.read_csv('https://gist.githubusercontent.com/stevewithington/20a69c0b6d2ff'
'846ea5d35e5fc47f26c/raw/country-and-continent-codes-list-csv.csv',
usecols=['Three_Letter_Country_Code', 'Continent_Name'])
df = pd.merge(covid, continents, left_on='iso_code', right_on='Three_Letter_Country_Code')
df = df.groupby(['Continent_Name', 'date']).sum().reset_index()
df['Total Deaths per Million'] = df.total_deaths * 1e6 / df.population
df = df[df.date > '2020-03-14']
df = df.rename({'date': 'Date', 'Continent_Name': 'Continent'}, axis='columns')
ex.line(df, x='Date', y='Total Deaths per Million', color='Continent').show()
Displays a multi-axis line chart of total coronavirus cases and changes in prices of Bitcoin, Dow Jones and gold: 显示冠状病毒病例总数以及比特币、道琼斯和黄金价格变化的多轴折线图:
Apr 2020Jul 2020Oct 2020Jan 2021Apr 2021Jul 2021Oct 2021050M100M150M200M250M0200400600
Total CasesBitcoinDow JonesGoldTotal Cases%
python
import pandas as pd, plotly.graph_objects as godef main():
covid, bitcoin, gold, dow = scrape_data()
display_data(wrangle_data(covid, bitcoin, gold, dow))def scrape_data():
def get_covid_cases():
url = 'https://covid.ourworldindata.org/data/owid-covid-data.csv'
df = pd.read_csv(url, usecols=['location', 'date', 'total_cases'])
return df[df.location == 'World'].set_index('date').total_cases
def get_ticker(symbol):
url = (f'https://query1.finance.yahoo.com/v7/finance/download/{symbol}?'
'period1=1579651200&period2=9999999999&interval=1d&events=history')
df = pd.read_csv(url, usecols=['Date', 'Close'])
return df.set_index('Date').Close
out = get_covid_cases(), get_ticker('BTC-USD'), get_ticker('GC=F'), get_ticker('^DJI')
return map(pd.Series.rename, out, ['Total Cases', 'Bitcoin', 'Gold', 'Dow Jones'])def wrangle_data(covid, bitcoin, gold, dow):
df = pd.concat([bitcoin, gold, dow], axis=1) # Creates table by joining columns on dates.
df = df.sort_index().interpolate() # Sorts table by date and interpolates NaN-s.
df = df.loc['2020-02-23':] # Discards rows before '2020-02-23'.
df = (df / df.iloc[0]) * 100 # Calculates percentages relative to day 1.
df = df.join(covid) # Adds column with covid cases.
return df.sort_values(df.index[-1], axis=1) # Sorts columns by last day's value.def display_data(df):
figure = go.Figure()
for col_name in reversed(df.columns):
yaxis = 'y1' if col_name == 'Total Cases' else 'y2'
trace = go.Scatter(x=df.index, y=df[col_name], name=col_name, yaxis=yaxis)
figure.add_trace(trace)
figure.update_layout(
yaxis1=dict(title='Total Cases', rangemode='tozero'),
yaxis2=dict(title='%', rangemode='tozero', overlaying='y', side='right'),
legend=dict(x=1.08),
width=944,
height=423
)
figure.show()if __name__ == '__main__':
main()
[#](#Appendix 附件 "#appendix")Appendix 附件
Cython 赛通
Library that compiles Python code into C. 将 Python 代码编译为 C 的库。
python
# $ pip3 install cython
import pyximport; pyximport.install()
import <cython_script>
<cython_script>.main()
Definitions: 定义:
- All
'cdef'
definitions are optional, but they contribute to the speed-up. 所有'cdef'
定义都是可选的,但它们有助于加速。 - Script needs to be saved with a
'pyx'
extension. 脚本需要使用'pyx'
扩展名保存。
xml
cdef <ctype> <var_name> = <el>
cdef <ctype>[n_elements] <var_name> = [<el>, <el>, ...]
cdef <ctype/void> <func_name>(<ctype> <arg_name>): ...
cdef class <class_name>:
cdef public <ctype> <attr_name>
def __init__(self, <ctype> <arg_name>):
self.<attr_name> = <arg_name>
cdef enum <enum_name>: <member_name>, <member_name>, ...
Virtual Environments 虚拟环境
System for installing libraries directly into project's directory. 用于将库直接安装到项目目录中的系统。
shell
$ python3 -m venv <name> # Creates virtual environment in current directory.
$ source <name>/bin/activate # Activates venv. On Windows run `<name>\Scripts\activate`.
$ pip3 install <library> # Installs the library into active environment.
$ python3 <path> # Runs the script in active environment. Also `./<path>`.
$ deactivate # Deactivates the active virtual environment.
Basic Script Template 基本脚本模板
python
#!/usr/bin/env python3
#
# Usage: .py
#from sys import argv, exit
from collections import defaultdict, namedtuple
from dataclasses import make_dataclass
from enum import Enum
import functools as ft, itertools as it, operator as op, re
def main():
pass
###
## UTIL
#def read_file(filename):
with open(filename, encoding='utf-8') as file:
return file.readlines()
if __name__ == '__main__':
main()
March 17, 2024 2024 年 3 月 17 日Jure Šorn 尤雷·索恩 Chinese By Yulk yulike2017@outlook.com 2024 年 3 月 20 日