大小写转换
python
import pandas as pd
data = {
'text': ['Hello World', 'Python is Great', 'Data Science']
}
df = pd.DataFrame(data)
df.dropna(thresh=True)
c = df["text"].str.capitalize()
# 0 Hello world
# 1 Python is great
# 2 Data science
# Name: text, dtype: object
c = df["text"].str.upper()
# 0 HELLO WORLD
# 1 PYTHON IS GREAT
# 2 DATA SCIENCE
c = df["text"].str.title()
# 0 Hello World
# 1 Python Is Great
# 2 Data Science
c = df["text"].str.lower()
# 0 hello world
# 1 python is great
# 2 data science
c = df["text"].str.swapcase()
# 0 hELLO wORLD
# 1 pYTHON IS gREAT
# 2 dATA sCIENCE
c = df["text"].str.casefold()
# 0 hello world
# 1 python is great
# 2 data science
字符串连接和分割
python
c = df["text"].str.cat(sep=";")
# Hello World;Python is Great;Data Science
按照分号连接。
python
sp = df["text"].str.split()
# 0 [Hello, World]
# 1 [Python, is, Great]
# 2 [Data, Science]
分割字符串
包含、以某字符串结尾
python
c = df["text"].str.contains('is')
# 0 False
# 1 True
# 2 False
支持正则表达式。
python
c = df["text"].str.endswith("e")
# 0 False
# 1 False
# 2 True
c = df["text"].str.startswith("D")
正则提取
python
import pandas as pd
data = {
'text': ['Hello World', 'Python is Great', 'Data Science']
}
df = pd.DataFrame(data)
c = df["text"].str.extract("(\w+) (\w+)")
print(c)
# 0 Hello World
# 1 Python is
# 2 Data Science
参考
https://pandas.pydata.org/docs/reference/api/pandas.Series.str.cat.html