入门岛2-python实现wordcount并进行云端debug

书生大模型学习

任务：

1.实现一个wordcount函数，统计英文字符串中每个单词出现的次数。返回一个字典，key为单词，value为对应单词出现的次数。

2.Vscode连接InternStudio debug

TIPS：记得先去掉标点符号,然后把每个单词转换成小写。不需要考虑特别多的标点符号，只需要考虑实例输入中存在的就可以。

任务1

python基础

1.大小写转换函数

python 复制代码

text = text.lower() //小写
text = text.upper()	//大写

2.去掉标点符号

可以使用re或者string库处理

python 复制代码

import re
//1.使用re库：正则表达式进行删除
text = re.sub(r'[^\w\s]','',text)

这里，re.sub() 函数用于替换匹配到的模式，而 r'[^\w\s]' 是一个正则表达式，表示"非单词字符且非空白字符"的任何字符。

\w 匹配任何字母数字字符（等同于 [a-zA-Z0-9_]）。

\s 匹配任何空白字符（如空格、制表符等）。

^ 在方括号内表示"非"。

re.sub() 将这些字符替换为空字符串（即删除它们），从而达到去除标点符号的目的。

python 复制代码

//2.使用string库：去掉标点字符
import string
translator = str.maketrans('', '', string.punctuation)
    # 使用 translate() 方法去除标点符号	
text=text.translate(translator)

具体来说，我们可以利用 string 模块中的 punctuation 字符串，它包含了所有的标点符号，然后使用 str.translate() 方法来删除这些字符。

3.字符分割为列表

以空格为分割符号的分割函数

python 复制代码

    # 拆分字符串为单词列表
    words = text.split()

split() 是字符串的一个方法，用于根据分隔符将字符串分割成一个列表。如果不提供任何参数，默认情况下 split() 方法会按照任意数量的空白字符（空格、制表符、换行符等）作为分隔符来分割字符串。

4.列表归档到字典

遍历列表，当存在该字符串则count+1，否则创建并赋值为1。

python 复制代码

# 创建一个空字典用于存储单词出现次数
    word_count_dict = {}
    for word in words:
        if word in word_count_dict:
            word_count_dict[word]+=1
        else:
            word_count_dict[word] = 1

算法实现如下:

python 复制代码

import string ,re
text = """
Got this panda plush toy for my daughter's birthday,
who loves it and takes it everywhere. It's soft and
super cute, and its face has a friendly look. It's
a bit small for what I paid though. I think there
might be other options that are bigger for the
same price. It arrived a day earlier than expected,
so I got to play with it myself before I gave it
to her.
"""
def wordcount(text):
    text = text.lower()
    print(text)  # 小写
    # 去除标点符号只保留字母和空格
    # text = re.sub(r'[^\w\s]','',text)
    translator = str.maketrans('', '', string.punctuation)
    # 使用 translate() 方法去除标点符号	
    text=text.translate(translator)
    # 拆分字符串为单词列表
    words = text.split()
    # 创建一个空字典用于存储单词出现次数
    word_count_dict = {}
    for word in words:
        if word in word_count_dict:
            word_count_dict[word]+=1
        else:
            word_count_dict[word] = 1
    # print(word_count_dict)
    return word_count_dict
print(wordcount(text))

输出情况：

任务2

1.连接服务器并打开调试

2.点击左边的运行与调试（或右上角选择下图所示选项）

出现如下图所示的界面，其中左上角为运行到当前断点状态下的参数信息，包括local和global的参数，global参数

中间的选项分别为继续，逐过程，单步调试，单步跳出，重启调试，停止调试。

左侧的参数表可以右键进行针对监控。

vscode支持通过remote的方法连接我们在命令行中发起的debug server

选择debugger时选择python debuger。选择debug config时选择remote attach（远程连接 ）就行，随后会让我们选择debug server的地址，因为我们是在本地debug，所以全都保持默认直接回车就可以了，也就是我们的server地址为localhost:5678。

选择后再进行debug出现debug选项

在命令行中发起debug

很多时候我们要debug的不止是一个简单的python文件，而是很多参数，参数中不止会有简单的值还可能有错综复杂的文件关系,甚至debug一整个项目。这种情况下，直接使用命令行来发起debug会是一个更好的选择。

如果没有安装debugpy的话可以先通过pip install debugpy安装一下

bash 复制代码

python -m debugpy --listen 5678 --wait-for-client ./python_code/temp.py

./temp.py可以替换为我们想要debug的python文件，后面可以和直接在命令行中启动python一样跟上输入的参数。记得要先在想要debug的python文件打好断点并保存。

--wait-for-client参数会让我们的debug server在等客户端连入后才开始运行debug。在这就是要等到我们在run and debug界面启动debug。

先在终端中发起debug server，然后再去vscode debug页面单击一下绿色箭头开启debug。

效果如下：

使用别名简化命令

这边有个不方便的地方，python -m debugpy --listen 5678 --wait-for-client这个命令太长了，每次都打很麻烦。这里我们可以给这段常用的命令设置一个别名。

在linux系统中，可以对 ~/.bashrc 文件中添加以下命令

bash 复制代码

alias pyd='python -m debugpy --wait-for-client --listen 5678'

然后执行

bash 复制代码

source ~/.bashrc

这样之后使用 pyd 命令(你可以自己命名) 替代 python 就能在命令行中起debug了，之前的debug命令就变成了

bash 复制代码

pyd ./python_code/temp.py

运行如下：