Python因其简洁易用、丰富的库支持以及强大的社区,被广泛应用于机器学习与人工智能(AI)领域。本教程通过实用的代码示例和讲解,带你从零开始掌握Python在机器学习与人工智能中的基本用法。
1. 机器学习与AI的Python生态系统
Python拥有多种支持机器学习和AI的库,以下是几个核心库:
- NumPy:处理高效数组和矩阵运算。
- Pandas:提供数据操作与分析工具。
- Matplotlib/Seaborn:用于数据可视化。
- Scikit-learn:机器学习的核心库,包含分类、回归、聚类等算法。
- TensorFlow/PyTorch:深度学习框架,用于构建和训练神经网络。
安装:
bash
pip install numpy pandas matplotlib scikit-learn tensorflow
2. 数据预处理
加载数据
python
import pandas as pd
# 示例数据
data = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [5, 4, 3, 2, 1],
'Target': [1, 0, 1, 0, 1]
})
print(data)
输出:
bash
Feature1 Feature2 Target
0 1 5 1
1 2 4 0
2 3 3 1
3 4 2 0
4 5 1 1
特征缩放
归一化或标准化数据有助于提升模型性能。
python
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
data = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [5, 4, 3, 2, 1],
'Target': [1, 0, 1, 0, 1]
})
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(data[['Feature1', 'Feature2']])
print(scaled_features)
输出:
bash
[[0. 1. ]
[0.25 0.75]
[0.5 0.5 ]
[0.75 0.25]
[1. 0. ]]
3. 数据可视化
利用Matplotlib
和Seaborn
绘制数据分布图。
bash
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [5, 4, 3, 2, 1],
'Target': [1, 0, 1, 0, 1]
})
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(data[['Feature1', 'Feature2']])
print(scaled_features)
# 散点图
sns.scatterplot(x='Feature1', y='Feature2', hue='Target', data=data)
plt.title('Feature Scatter Plot')
plt.show()
4. 构建第一个机器学习模型
使用Scikit-learn
实现分类模型。
拆分数据
python
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [5, 4, 3, 2, 1],
'Target': [1, 0, 1, 0, 1]
})
X = data[['Feature1', 'Feature2']]
y = data['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('X_train:')
print(X_train)
print('X_test:')
print(X_test)
print('y_train:')
print(y_train)
print('y_test:')
print(y_test)
bash
X_train:
Feature1 Feature2
4 5 1
2 3 3
0 1 5
3 4 2
X_test:
Feature1 Feature2
1 2 4
y_train:
4 1
2 1
0 1
3 0
Name: Target, dtype: int64
y_test:
1 0
Name: Target, dtype: int64
训练模型
python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
data = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [5, 4, 3, 2, 1],
'Target': [1, 0, 1, 0, 1]
})
X = data[['Feature1', 'Feature2']]
y = data['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 随机森林分类器
model = RandomForestClassifier()
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
python
Accuracy: 0.0
5. 深度学习与神经网络
构建一个简单的神经网络进行分类任务。
安装TensorFlow
bash
conda install tensorflow
如果安装遇到Could not solve for environment spec错误,请先执行以下命令
bash
conda create -n tf_env python=3.8
conda activate tf_env
构建模型
python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# 构建神经网络
model = Sequential([
Dense(8, input_dim=2, activation='relu'),
Dense(4, activation='relu'),
Dense(1, activation='sigmoid')
])
编译与训练
python
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=1, verbose=1)
评估模型
python
loss, accuracy = model.evaluate(X_test, y_test)
print("Loss:", loss)
print("Accuracy:", accuracy)
完整代码
python
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
data = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [5, 4, 3, 2, 1],
'Target': [1, 0, 1, 0, 1]
})
X = data[['Feature1', 'Feature2']]
y = data['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 构建神经网络
model = Sequential([
Dense(8, input_dim=2, activation='relu'),
Dense(4, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=1, verbose=1)
loss, accuracy = model.evaluate(X_test, y_test)
print("Loss:", loss)
print("Accuracy:", accuracy)
输出:
bash
Epoch 1/50
4/4 [==============================] - 1s 1ms/step - loss: 0.6867 - accuracy: 0.5000
Epoch 2/50
4/4 [==============================] - 0s 997us/step - loss: 0.6493 - accuracy: 0.5000
Epoch 3/50
4/4 [==============================] - 0s 997us/step - loss: 0.6183 - accuracy: 0.5000
Epoch 4/50
4/4 [==============================] - 0s 665us/step - loss: 0.5920 - accuracy: 0.5000
Epoch 5/50
4/4 [==============================] - 0s 1ms/step - loss: 0.5702 - accuracy: 0.5000
Epoch 6/50
4/4 [==============================] - 0s 997us/step - loss: 0.5612 - accuracy: 0.7500
Epoch 7/50
4/4 [==============================] - 0s 998us/step - loss: 0.5405 - accuracy: 0.7500
Epoch 8/50
4/4 [==============================] - 0s 665us/step - loss: 0.5223 - accuracy: 0.7500
Epoch 9/50
4/4 [==============================] - 0s 1ms/step - loss: 0.5047 - accuracy: 0.7500
Epoch 10/50
4/4 [==============================] - 0s 665us/step - loss: 0.4971 - accuracy: 0.7500
Epoch 11/50
4/4 [==============================] - 0s 997us/step - loss: 0.4846 - accuracy: 0.7500
Epoch 12/50
4/4 [==============================] - 0s 997us/step - loss: 0.4762 - accuracy: 0.7500
Epoch 13/50
4/4 [==============================] - 0s 665us/step - loss: 0.4753 - accuracy: 0.7500
Epoch 14/50
4/4 [==============================] - 0s 997us/step - loss: 0.4623 - accuracy: 1.0000
Epoch 15/50
4/4 [==============================] - 0s 998us/step - loss: 0.4563 - accuracy: 1.0000
Epoch 16/50
4/4 [==============================] - 0s 998us/step - loss: 0.4530 - accuracy: 1.0000
Epoch 17/50
4/4 [==============================] - 0s 997us/step - loss: 0.4469 - accuracy: 1.0000
Epoch 18/50
4/4 [==============================] - 0s 997us/step - loss: 0.4446 - accuracy: 0.7500
Epoch 19/50
4/4 [==============================] - 0s 665us/step - loss: 0.4385 - accuracy: 0.7500
Epoch 20/50
4/4 [==============================] - 0s 998us/step - loss: 0.4355 - accuracy: 0.7500
Epoch 21/50
4/4 [==============================] - 0s 997us/step - loss: 0.4349 - accuracy: 0.7500
Epoch 22/50
4/4 [==============================] - 0s 665us/step - loss: 0.4290 - accuracy: 0.7500
Epoch 23/50
4/4 [==============================] - 0s 997us/step - loss: 0.4270 - accuracy: 0.7500
Epoch 24/50
4/4 [==============================] - 0s 997us/step - loss: 0.4250 - accuracy: 0.7500
Epoch 25/50
4/4 [==============================] - 0s 665us/step - loss: 0.4218 - accuracy: 0.7500
Epoch 26/50
4/4 [==============================] - 0s 997us/step - loss: 0.4192 - accuracy: 0.7500
Epoch 27/50
4/4 [==============================] - 0s 997us/step - loss: 0.4184 - accuracy: 0.7500
Epoch 28/50
4/4 [==============================] - 0s 665us/step - loss: 0.4152 - accuracy: 0.7500
Epoch 29/50
4/4 [==============================] - 0s 997us/step - loss: 0.4129 - accuracy: 0.7500
Epoch 30/50
4/4 [==============================] - 0s 997us/step - loss: 0.4111 - accuracy: 0.7500
Epoch 31/50
4/4 [==============================] - 0s 997us/step - loss: 0.4095 - accuracy: 0.7500
Epoch 32/50
4/4 [==============================] - 0s 997us/step - loss: 0.4070 - accuracy: 0.7500
Epoch 33/50
4/4 [==============================] - 0s 997us/step - loss: 0.4053 - accuracy: 0.7500
Epoch 34/50
4/4 [==============================] - 0s 997us/step - loss: 0.4033 - accuracy: 0.7500
Epoch 35/50
4/4 [==============================] - 0s 998us/step - loss: 0.4028 - accuracy: 0.7500
Epoch 36/50
4/4 [==============================] - 0s 997us/step - loss: 0.3998 - accuracy: 0.7500
Epoch 37/50
4/4 [==============================] - 0s 1ms/step - loss: 0.3978 - accuracy: 0.7500
Epoch 38/50
4/4 [==============================] - 0s 997us/step - loss: 0.3966 - accuracy: 0.7500
Epoch 39/50
4/4 [==============================] - 0s 665us/step - loss: 0.3946 - accuracy: 0.7500
Epoch 40/50
4/4 [==============================] - 0s 997us/step - loss: 0.3926 - accuracy: 0.7500
Epoch 41/50
4/4 [==============================] - 0s 997us/step - loss: 0.3918 - accuracy: 0.7500
Epoch 42/50
4/4 [==============================] - 0s 997us/step - loss: 0.3898 - accuracy: 0.7500
Epoch 43/50
4/4 [==============================] - 0s 997us/step - loss: 0.3877 - accuracy: 0.7500
Epoch 44/50
4/4 [==============================] - 0s 997us/step - loss: 0.3861 - accuracy: 0.7500
Epoch 45/50
4/4 [==============================] - 0s 665us/step - loss: 0.3842 - accuracy: 0.7500
Epoch 46/50
4/4 [==============================] - 0s 665us/step - loss: 0.3830 - accuracy: 0.7500
Epoch 47/50
4/4 [==============================] - 0s 997us/step - loss: 0.3815 - accuracy: 0.7500
Epoch 48/50
4/4 [==============================] - 0s 665us/step - loss: 0.3790 - accuracy: 0.7500
Epoch 49/50
4/4 [==============================] - 0s 665us/step - loss: 0.3778 - accuracy: 0.7500
Epoch 50/50
4/4 [==============================] - 0s 997us/step - loss: 0.3768 - accuracy: 0.7500
1/1 [==============================] - 0s 277ms/step - loss: 2.8638 - accuracy: 0.0000e+00
Loss: 2.863826274871826
Accuracy: 0.0
6. 数据聚类
实现一个K-Means聚类模型:
python
from sklearn.cluster import KMeans
# 数据
data_points = [[1, 2], [2, 3], [3, 4], [8, 7], [9, 8], [10, 9]]
# K-Means
kmeans = KMeans(n_clusters=2)
kmeans.fit(data_points)
# 输出聚类中心
print("Cluster Centers:", kmeans.cluster_centers_)
输出:
bash
Cluster Centers: [[9. 8.]
[2. 3.]]
7. 自然语言处理 (NLP)
使用NLTK
处理文本数据:
python
pip install nltk
文本分词
python
import nltk
nltk.download('punkt_tab')
nltk.download('punkt')
from nltk.tokenize import word_tokenize
text = "Machine learning is amazing!"
tokens = word_tokenize(text)
print(tokens)
输出:
bash
['Machine', 'learning', 'is', 'amazing', '!']
词袋模型
python
from sklearn.feature_extraction.text import CountVectorizer
texts = ["I love Python", "Python is great for AI"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
print(X.toarray())
输出:
bash
[[0 0 0 0 1 1]
[1 1 1 1 0 1]]
8. 实用案例:房价预测
python
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 加载数据集
data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target
# 数据拆分
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 模型训练
model = LinearRegression()
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
print("Model Coefficients:", model.coef_)
# 评估
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
输出:
bash
Model Coefficients: [ 4.48674910e-01 9.72425752e-03 -1.23323343e-01 7.83144907e-01
-2.02962058e-06 -3.52631849e-03 -4.19792487e-01 -4.33708065e-01]
Mean Squared Error: 0.5558915986952442
总结
本教程涵盖了Python在机器学习和人工智能领域的基础应用,从数据预处理、可视化到模型构建和评估,再到深度学习的基本实现。通过这些示例,你可以逐步掌握如何使用Python进行机器学习和AI项目开发。