如何利用plotly和geopandas根据美国邮政编码(Zip-Code)绘制美国地图

对于我自己来说,该需求源自于分析Movielens-1m数据集的用户数据:

python 复制代码
UserID::Gender::Age::Occupation::Zip-code
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117

我希望根据Zip-code计算出用户所在的州,然后在地图上显示每个州的用户数量。

那么应该这样写代码:

python 复制代码
import pandas as pd
import geopandas as gpd
import plotly.express as px
from uszipcode import SearchEngine

# 创建 SearchEngine 实例
search = SearchEngine()

# 读取用户数据集
data = pd.read_csv('./users.dat', sep='::', engine='python',
                   names=['UserID', 'Gender', 'Age', 'Occupation', 'Zip-code'])
data = data.dropna(subset=['Zip-code'])

def get_state_name(zipcode):
    result = search.by_zipcode(zipcode)
    if result is None:
        return None
    else:
        state_abbr = result.state
        return state_abbr
data['STATE_ABBR'] = data['Zip-code'].apply(get_state_name)

# 计算每个Zip-code的用户数量
zip_counts = data['STATE_ABBR'].value_counts()
zip_counts_df = zip_counts.reset_index() # 将Series转换为DataFrame
zip_counts_df.columns = ['STATE_ABBR', 'COUNT'] # 重新命名列

# 读取美国地图的shapefile
usa_map = gpd.read_file('./shapefile/USA_States.shp')

# 将Zip-code数据与地图数据进行合并
# 简称合并使用STATE_ABBR,全称合并使用STATE_NAME
zip_geo = pd.merge(usa_map, zip_counts_df, on='STATE_ABBR')

# 绘制地图
fig = px.choropleth(zip_geo,
                    locations='STATE_ABBR',
                    locationmode='USA-states',
                    color='COUNT',
                    scope='usa',
                    hover_data=['COUNT'],
                    color_continuous_scale='Reds',
                    range_color=(0, zip_geo['COUNT'].max()),
                    labels={'STATE_ABBR': 'User Count'})
fig.update_layout(title_text='Movielens User Distribution by State')
fig.show()

在上面的代码中,USA_States.shp可以在efrainmaps(https://www.efrainmaps.es/english-version/free-downloads/united-states/)下载。

效果如下,鼠标悬停到某个州,可以显示出州名称和对应的用户数量:

如果不希望显示州简称,可以创建州的简称与全称的映射,然后将Zip-code映射到州的全称,再显示地图:

python 复制代码
# 创建州的简称与全称的映射
# 该映射字典涵盖了50个州、哥伦比亚特区、5个美国领土以及3个军邮邮编简称。
state_name_dict = {
    "AL": "Alabama",
    "AK": "Alaska",
    "AZ": "Arizona",
    "AR": "Arkansas",
    "CA": "California",
    "CO": "Colorado",
    "CT": "Connecticut",
    "DE": "Delaware",
    "FL": "Florida",
    "GA": "Georgia",
    "HI": "Hawaii",
    "ID": "Idaho",
    "IL": "Illinois",
    "IN": "Indiana",
    "IA": "Iowa",
    "KS": "Kansas",
    "KY": "Kentucky",
    "LA": "Louisiana",
    "ME": "Maine",
    "MD": "Maryland",
    "MA": "Massachusetts",
    "MI": "Michigan",
    "MN": "Minnesota",
    "MS": "Mississippi",
    "MO": "Missouri",
    "MT": "Montana",
    "NE": "Nebraska",
    "NV": "Nevada",
    "NH": "New Hampshire",
    "NJ": "New Jersey",
    "NM": "New Mexico",
    "NY": "New York",
    "NC": "North Carolina",
    "ND": "North Dakota",
    "OH": "Ohio",
    "OK": "Oklahoma",
    "OR": "Oregon",
    "PA": "Pennsylvania",
    "RI": "Rhode Island",
    "SC": "South Carolina",
    "SD": "South Dakota",
    "TN": "Tennessee",
    "TX": "Texas",
    "UT": "Utah",
    "VT": "Vermont",
    "VA": "Virginia",
    "WA": "Washington",
    "WV": "West Virginia",
    "WI": "Wisconsin",
    "WY": "Wyoming",
    "DC": "District of Columbia",
    "AS": "American Samoa",
    "GU": "Guam",
    "MP": "Northern Mariana Islands",
    "PR": "Puerto Rico",
    "UM": "United States Minor Outlying Islands",
    "VI": "Virgin Islands",
    "AA": "Armed Forces Americas",
    "AE": "Armed Forces Europe",
    "AP": "Armed Forces Pacific"
}

def get_state_name(zipcode):
    result = search.by_zipcode(zipcode)
    if result is None:
        return None
    else:
        state_abbr = result.state
        state_name = state_name_dict.get(state_abbr, None)
        return state_name
data['STATE_NAME'] = data['Zip-code'].apply(get_state_name)
# 后续代码同上,注意要将STATE_ABBR替换为STATE_NAME
相关推荐
神奇夜光杯2 分钟前
Python酷库之旅-第三方库Pandas(202)
开发语言·人工智能·python·excel·pandas·标准库及第三方库·学习与成长
千天夜13 分钟前
使用UDP协议传输视频流!(分片、缓存)
python·网络协议·udp·视频流
测试界的酸菜鱼17 分钟前
Python 大数据展示屏实例
大数据·开发语言·python
羊小猪~~21 分钟前
神经网络基础--什么是正向传播??什么是方向传播??
人工智能·pytorch·python·深度学习·神经网络·算法·机器学习
放飞自我的Coder1 小时前
【python ROUGE BLEU jiaba.cut NLP常用的指标计算】
python·自然语言处理·bleu·rouge·jieba分词
正义的彬彬侠1 小时前
【scikit-learn 1.2版本后】sklearn.datasets中load_boston报错 使用 fetch_openml 函数来加载波士顿房价
python·机器学习·sklearn
张小生1802 小时前
PyCharm中 argparse 库 的使用方法
python·pycharm
秃头佛爷2 小时前
Python使用PDF相关组件案例详解
python
Dxy12393102162 小时前
python下载pdf
数据库·python·pdf
叶知安2 小时前
如何用pycharm连接sagemath?
ide·python·pycharm