R语言舆情监控与可视化统计

用R语言进行舆情监控并且做到可视化,对我来说,总体难度还算可以,主要是舆情监控通常涉及文本数据的收集(如社交媒体、新闻评论),然后进行情感分析,最后通过图表展示结果。步骤看似简单实则一点也不简单。

以下就是我使用R语言进行舆情监控和可视化统计的完整示例。该方案包括文本情感分析和时间趋势可视化:

R 复制代码
# 加载必要的包
library(tidyverse)      # 数据处理和可视化
library(tidytext)       # 文本分析
library(lubridate)      # 日期处理
library(wordcloud2)     # 词云
library(ggplot2)        # 绘图
library(plotly)         # 交互式图表

# 1. 模拟舆情数据生成(实际应用中替换为真实数据)
set.seed(123)
n <- 500  # 样本量

# 生成模拟数据
sentiment_data <- tibble(
  id = 1:n,
  content = replicate(n, paste(sample(c("产品很好服务优秀", 
                                         "体验很差客服态度恶劣",
                                         "性价比高会回购",
                                         "物流慢包装破损",
                                         "功能强大界面美观",
                                         "系统卡顿更新后更糟",
                                         "售后响应及时",
                                         "虚假宣传失望透顶"), 
                                       sample(1:3,1), replace=TRUE), collapse=",")),
  date = sample(seq.Date(as.Date("2024-01-01"), as.Date("2024-06-30"), by="day"), n, replace=TRUE),
  source = sample(c("微博", "微信公众号", "知乎", "小红书", "抖音"), n, replace=TRUE),
  author = paste0("用户", sample(1000:9999, n))
)

# 2. 情感分析处理
# 自定义情感词典
sentiment_dict <- tibble(
  word = c("很好", "优秀", "强大", "美观", "及时", "高", "优秀", "满意", "回购", "优秀",
           "差", "慢", "破损", "卡顿", "失望", "虚假", "糟糕", "恶劣"),
  sentiment = c(rep("正面", 10), rep("负面", 8))
)

# 情感分析函数
analyze_sentiment <- function(data) {
  data %>%
    unnest_tokens(word, content) %>%
    left_join(sentiment_dict, by = "word") %>%
    filter(!is.na(sentiment)) %>%
    count(id, sentiment) %>%
    pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
    mutate(sentiment_score = 正面 - 负面)
}

# 执行情感分析
sentiment_results <- sentiment_data %>%
  analyze_sentiment() %>%
  right_join(sentiment_data, by = "id") %>%
  replace_na(list(正面 = 0, 负面 = 0, sentiment_score = 0)) %>%
  mutate(sentiment_label = case_when(
    sentiment_score > 0 ~ "正面",
    sentiment_score < 0 ~ "负面",
    TRUE ~ "中性"
  ))

# 3. 数据概览统计
overall_stats <- sentiment_results %>%
  summarise(
    total_posts = n(),
    positive_rate = mean(sentiment_label == "正面") * 100,
    negative_rate = mean(sentiment_label == "负面") * 100,
    avg_sentiment = mean(sentiment_score)
)

# 4. 可视化分析
# 4.1 情感分布饼图
sentiment_pie <- sentiment_results %>%
  count(sentiment_label) %>%
  plot_ly(labels = ~sentiment_label, values = ~n, type = 'pie') %>%
  layout(title = "舆情情感分布")

# 4.2 情感趋势图(按周)
sentiment_trend <- sentiment_results %>%
  mutate(week = floor_date(date, "week")) %>%
  group_by(week) %>%
  summarise(avg_score = mean(sentiment_score)) %>%
  ggplot(aes(x = week, y = avg_score)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(color = "darkorange", size = 3) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey50") +
  labs(title = "每周情感趋势变化", x = "日期", y = "情感得分均值") +
  theme_minimal()

# 4.3 平台情感对比
platform_sentiment <- sentiment_results %>%
  group_by(source) %>%
  summarise(positive_rate = mean(sentiment_label == "正面") * 100) %>%
  ggplot(aes(x = reorder(source, positive_rate), y = positive_rate, fill = source)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(title = "各平台正面评价比例", x = "平台", y = "正面评价比例(%)") +
  theme_minimal()

# 4.4 关键词词云
word_freq <- sentiment_results %>%
  unnest_tokens(word, content) %>%
  anti_join(stop_words, by = "word") %>%  # 使用tidytext的停用词
  count(word) %>%
  filter(n > 5) %>%
  arrange(desc(n))

wordcloud <- wordcloud2(word_freq, size = 1, shape = 'circle')

# 5. 输出结果
print(paste("总舆情数量:", overall_stats$total_posts))
print(paste("正面率:", round(overall_stats$positive_rate, 1), "%"))
print(paste("负面率:", round(overall_stats$negative_rate, 1), "%"))

# 显示图表
sentiment_pie
ggplotly(sentiment_trend)
platform_sentiment
wordcloud

# 6. 保存关键结果(可选)
# ggsave("sentiment_trend.png", plot = sentiment_trend, width = 10, height = 6)
# saveWidget(wordcloud, "wordcloud.html")

实际应用说明:

1、数据获取:真实场景需替换数据采集部分:

  • 使用API(如微博、Twitter、Reddit等)
  • 网页爬虫(rvest包)
  • 数据库连接(RMySQL/RSQLite)

2、情感分析增强

  • 使用更专业的词典(如BosonNLP情感词典)
  • 采用机器学习模型(如text2vec包)
  • 考虑否定词处理(如"不"+"好"=负面)

3、扩展功能

R 复制代码
# 主题聚类示例
library(topicmodels)
dtm <- sentiment_results %>%
  unnest_tokens(word, content) %>%
  count(id, word) %>%
  cast_dtm(id, word, n)

lda_model <- LDA(dtm, k = 4, control = list(seed = 1234))
topics <- tidy(lda_model, matrix = "beta")

4、预警机制

R 复制代码
# 负面舆情预警
negative_alerts <- sentiment_results %>%
  filter(sentiment_label == "负面") %>%
  arrange(desc(date)) %>%
  select(date, source, content, sentiment_score)

可视化效果说明:

  • 交互式情感趋势图(鼠标悬停查看数值)
  • 动态词云(支持点击交互)
  • 平台对比柱状图(直观比较各渠道舆情)
  • 情感分布饼图(整体情绪概览)

运行此代码前请确保已安装所有必要的包:

R 复制代码
install.packages(c("tidyverse", "tidytext", "lubridate", "wordcloud2", "plotly"))

最后给点实际部署时建议:最好设置定时任务自动抓取数据(如cronR包),同时也要构建Shiny应用生成动态报告,最后在添加邮件预警功能(如sendmailR包)这样让你的程序完美运行。