大数据开发基于Hadoop+springboot平台的岗位推荐系统

文章目录


前言

文章底部名片,获取项目的完整演示视频,免费解答技术疑问

项目介绍

随着网络科学技术不断的发展和普及化,用户在寻找适合自己的信息管理系统时面临着越来越大的挑战。因此,本文介绍了一套平台的岗位推荐系统,在技术实现方面,本系统采用JAVA、VUE、TOMCAT、HADOOP以及MySQL数据库编程,使用Spring boot框架实现前后端的连接和交互功能。用户需要先注册账号,然后才能登录系统并使用功能。本文还对平台的岗位推荐系统的研究现状和意义进行了详细介绍。随着大数据和人工智能技术的不断发展,大数据分析系统正逐渐成为网络应用中越来越重要的部分。本文提出的平台的岗位推荐系统将为用户提供更加高效和准确的信息智能化服务,满足用户的需求。总之,本文旨在介绍一套具有实际应用意义的平台的岗位推荐系统,针对传统管理方式进行了重要改进。通过对系统的实现和应用,本文展示了高效、准确的平台的岗位推荐系统应该具备的特点和功能,为平台的岗位推荐系统的研究和应用提供了有益的参考。

技术介绍

开发语言:Java

框架:springboot

JDK版本:JDK1.8

服务器:tomcat7

数据库:mysql

数据库工具:Navicat11

开发软件:eclipse/myeclipse/idea

Maven包:Maven

功能介绍

本文介绍了一个基于 Hadoop 平台的岗位推荐系统,该系统在B/S体系结构下,并通过MySQL数据库和Spring boot框架实现数据存储和前端展示。用户通过浏览器与网站进行交互。整个系统具有很好的可扩展性和安全性,为用户提供了更好的服务。系统的总体架构设计图如图4-1所示。

复制代码
图 4-1系统架构图

核心代码

c 复制代码
# 数据爬取文件

import scrapy
import pymysql
import pymssql
from ..items import LvyoujingdianItem
import time
import re
import random
import platform
import json
import os
import urllib
from urllib.parse import urlparse
import requests
import emoji

# 旅游景点
class LvyoujingdianSpider(scrapy.Spider):
    name = 'lvyoujingdianSpider'
    spiderUrl = 'https://you.ctrip.com/sight/lanzhou231/s0-p{}.html'
    start_urls = spiderUrl.split(";")
    protocol = ''
    hostname = ''

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def start_requests(self):

        plat = platform.system().lower()
        if plat == 'linux' or plat == 'windows':
            connect = self.db_connect()
            cursor = connect.cursor()
            if self.table_exists(cursor, '5295r_lvyoujingdian') == 1:
                cursor.close()
                connect.close()
                self.temp_data()
                return

        pageNum = 1 + 1
        for url in self.start_urls:
            if '{}' in url:
                for page in range(1, pageNum):
                    next_link = url.format(page)
                    yield scrapy.Request(
                        url=next_link,
                        callback=self.parse
                    )
            else:
                yield scrapy.Request(
                    url=url,
                    callback=self.parse
                )

    # 列表解析
    def parse(self, response):
        
        _url = urlparse(self.spiderUrl)
        self.protocol = _url.scheme
        self.hostname = _url.netloc
        plat = platform.system().lower()
        if plat == 'windows_bak':
            pass
        elif plat == 'linux' or plat == 'windows':
            connect = self.db_connect()
            cursor = connect.cursor()
            if self.table_exists(cursor, '5295r_lvyoujingdian') == 1:
                cursor.close()
                connect.close()
                self.temp_data()
                return

        list = response.css('div.list_wide_mod2 div.list_mod2')
        
        for item in list:

            fields = LvyoujingdianItem()



            if '(.*?)' in '''dt a::attr(href)''':
                fields["laiyuan"] = re.findall(r'''dt a::attr(href)''', response.text, re.DOTALL)[0].strip()
            else:
                fields["laiyuan"] = self.remove_html(item.css('dt a::attr(href)').extract_first())
            if '(.*?)' in '''div.leftimg a img::attr(src)''':
                fields["fengmian"] = re.findall(r'''div.leftimg a img::attr(src)''', response.text, re.DOTALL)[0].strip()
            else:
                fields["fengmian"] = self.remove_html(item.css('div.leftimg a img::attr(src)').extract_first())
            if '(.*?)' in '''div.rdetailbox dl dt a::text''':
                fields["biaoti"] = re.findall(r'''div.rdetailbox dl dt a::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["biaoti"] = self.remove_html(item.css('div.rdetailbox dl dt a::text').extract_first())
            if '(.*?)' in '''b.hot_score_number::text''':
                fields["redu"] = re.findall(r'''b.hot_score_number::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["redu"] = self.remove_html(item.css('b.hot_score_number::text').extract_first())
            if '(.*?)' in '''dd.ellipsis::text''':
                fields["dizhi"] = re.findall(r'''dd.ellipsis::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["dizhi"] = self.remove_html(item.css('dd.ellipsis::text').extract_first())
            if '(.*?)' in '''a.score strong::text''':
                fields["pingfen"] = re.findall(r'''a.score strong::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["pingfen"] = self.remove_html(item.css('a.score strong::text').extract_first())
            if '(.*?)' in '''a.recomment::text''':
                fields["pinglun"] = re.findall(r'''a.recomment::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["pinglun"] = self.remove_html(item.css('a.recomment::text').extract_first())
            if '(.*?)' in '''p[class="bottomcomment ellipsis open_popupbox_a"]''':
                fields["dianping"] = re.findall(r'''p[class="bottomcomment ellipsis open_popupbox_a"]''', response.text, re.DOTALL)[0].strip()
            else:
                fields["dianping"] = self.remove_html(item.css('p[class="bottomcomment ellipsis open_popupbox_a"]').extract_first())

            detailUrlRule = item.css('dt a::attr(href)').extract_first()
            if self.protocol in detailUrlRule:
                pass
            elif detailUrlRule.startswith('//'):
                detailUrlRule = self.protocol + ':' + detailUrlRule
            else:
                detailUrlRule = self.protocol + '://' + self.hostname + detailUrlRule
                fields["laiyuan"] = detailUrlRule

            yield scrapy.Request(url=detailUrlRule, meta={'fields': fields},  callback=self.detail_parse, dont_filter=True)


    # 详情解析
    def detail_parse(self, response):
        fields = response.meta['fields']

        try:
            if '(.*?)' in '''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''':
                fields["gfdh"] = re.findall(r'''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''', response.text, re.S)[0].strip()
            else:
                if 'gfdh' != 'xiangqing' and 'gfdh' != 'detail' and 'gfdh' != 'pinglun' and 'gfdh' != 'zuofa':
                    fields["gfdh"] = self.remove_html(response.css('''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''').extract_first())
                else:
                    fields["gfdh"] = emoji.demojize(response.css('''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''').extract_first())
        except:
            pass


        try:
            if '(.*?)' in '''div[class="detailModule normalModule"]''':
                fields["detail"] = re.findall(r'''div[class="detailModule normalModule"]''', response.text, re.S)[0].strip()
            else:
                if 'detail' != 'xiangqing' and 'detail' != 'detail' and 'detail' != 'pinglun' and 'detail' != 'zuofa':
                    fields["detail"] = self.remove_html(response.css('''div[class="detailModule normalModule"]''').extract_first())
                else:
                    fields["detail"] = emoji.demojize(response.css('''div[class="detailModule normalModule"]''').extract_first())
        except:
            pass




        return fields

    # 去除多余html标签
    def remove_html(self, html):
        if html == None:
            return ''
        pattern = re.compile(r'<[^>]+>', re.S)
        return pattern.sub('', html).strip()

    # 数据库连接
    def db_connect(self):
        type = self.settings.get('TYPE', 'mysql')
        host = self.settings.get('HOST', 'localhost')
        port = int(self.settings.get('PORT', 3306))
        user = self.settings.get('USER', 'root')
        password = self.settings.get('PASSWORD', '123456')

        try:
            database = self.databaseName
        except:
            database = self.settings.get('DATABASE', '')

        if type == 'mysql':
            connect = pymysql.connect(host=host, port=port, db=database, user=user, passwd=password, charset='utf8')
        else:
            connect = pymssql.connect(host=host, user=user, password=password, database=database)

        return connect

    # 断表是否存在
    def table_exists(self, cursor, table_name):
        cursor.execute("show tables;")
        tables = [cursor.fetchall()]
        table_list = re.findall('(\'.*?\')',str(tables))
        table_list = [re.sub("'",'',each) for each in table_list]

        if table_name in table_list:
            return 1
        else:
            return 0

    # 数据缓存源
    def temp_data(self):

        connect = self.db_connect()
        cursor = connect.cursor()
        sql = '''
            insert into `lvyoujingdian`(
                id
                ,laiyuan
                ,fengmian
                ,biaoti
                ,redu
                ,dizhi
                ,pingfen
                ,pinglun
                ,dianping
                ,gfdh
                ,detail
            )
            select
                id
                ,laiyuan
                ,fengmian
                ,biaoti
                ,redu
                ,dizhi
                ,pingfen
                ,pinglun
                ,dianping
                ,gfdh
                ,detail
            from `5295r_lvyoujingdian`
            where(not exists (select
                id
                ,laiyuan
                ,fengmian
                ,biaoti
                ,redu
                ,dizhi
                ,pingfen
                ,pinglun
                ,dianping
                ,gfdh
                ,detail
            from `lvyoujingdian` where
                `lvyoujingdian`.id=`5295r_lvyoujingdian`.id
            ))
            limit {0}
        '''.format(random.randint(10,15))

        cursor.execute(sql)
        connect.commit()

        connect.close()

数据库参考

csharp 复制代码
--
-- Table structure for table `chat`
--

DROP TABLE IF EXISTS `chat`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `chat` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `userid` bigint(20) NOT NULL COMMENT '用户id',
  `adminid` bigint(20) DEFAULT NULL COMMENT '管理员id',
  `ask` longtext COMMENT '提问',
  `reply` longtext COMMENT '回复',
  `isreply` int(11) DEFAULT NULL COMMENT '是否回复',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=69 DEFAULT CHARSET=utf8 COMMENT='在线咨询';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `chat`
--

LOCK TABLES `chat` WRITE;
/*!40000 ALTER TABLE `chat` DISABLE KEYS */;
INSERT INTO `chat` VALUES (61,'2024-03-30 04:52:45',1,1,'提问1','回复1',1),(62,'2024-03-30 04:52:45',2,2,'提问2','回复2',2),(63,'2024-03-30 04:52:45',3,3,'提问3','回复3',3),(64,'2024-03-30 04:52:45',4,4,'提问4','回复4',4),(65,'2024-03-30 04:52:45',5,5,'提问5','回复5',5),(66,'2024-03-30 04:52:45',6,6,'提问6','回复6',6),(67,'2024-03-30 04:52:45',7,7,'提问7','回复7',7),(68,'2024-03-30 04:52:45',8,8,'提问8','回复8',8);
/*!40000 ALTER TABLE `chat` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `config`
--

DROP TABLE IF EXISTS `config`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `config` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `name` varchar(100) NOT NULL COMMENT '配置参数名称',
  `value` varchar(100) DEFAULT NULL COMMENT '配置参数值',
  `url` varchar(500) DEFAULT NULL COMMENT 'url',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COMMENT='配置文件';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `config`
--

LOCK TABLES `config` WRITE;
/*!40000 ALTER TABLE `config` DISABLE KEYS */;
INSERT INTO `config` VALUES (1,'picture1','upload/picture1.jpg',NULL),(2,'picture2','upload/picture2.jpg',NULL),(3,'picture3','upload/picture3.jpg',NULL);
/*!40000 ALTER TABLE `config` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `discussqiyezhaopin`
--

DROP TABLE IF EXISTS `discussqiyezhaopin`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `discussqiyezhaopin` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `refid` bigint(20) NOT NULL COMMENT '关联表id',
  `userid` bigint(20) NOT NULL COMMENT '用户id',
  `avatarurl` longtext COMMENT '头像',
  `nickname` varchar(200) DEFAULT NULL COMMENT '用户名',
  `content` longtext NOT NULL COMMENT '评论内容',
  `reply` longtext COMMENT '回复内容',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='企业招聘评论表';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `discussqiyezhaopin`
--

LOCK TABLES `discussqiyezhaopin` WRITE;
/*!40000 ALTER TABLE `discussqiyezhaopin` DISABLE KEYS */;
/*!40000 ALTER TABLE `discussqiyezhaopin` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `forum`
--

DROP TABLE IF EXISTS `forum`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `forum` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `title` varchar(200) DEFAULT NULL COMMENT '帖子标题',
  `content` longtext NOT NULL COMMENT '帖子内容',
  `parentid` bigint(20) DEFAULT NULL COMMENT '父节点id',
  `userid` bigint(20) NOT NULL COMMENT '用户id',
  `username` varchar(200) DEFAULT NULL COMMENT '用户名',
  `avatarurl` longtext COMMENT '头像',
  `isdone` varchar(200) DEFAULT NULL COMMENT '状态',
  `istop` int(11) DEFAULT '0' COMMENT '是否置顶',
  `toptime` datetime DEFAULT NULL COMMENT '置顶时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=79 DEFAULT CHARSET=utf8 COMMENT='论坛交流';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `forum`
--

LOCK TABLES `forum` WRITE;
/*!40000 ALTER TABLE `forum` DISABLE KEYS */;
INSERT INTO `forum` VALUES (71,'2024-03-30 04:52:45','帖子标题1','帖子内容1',0,1,'用户名1','upload/forum_avatarurl1.jpg,upload/forum_avatarurl2.jpg,upload/forum_avatarurl3.jpg','开放',0,'2024-03-30 12:52:45'),(72,'2024-03-30 04:52:45','帖子标题2','帖子内容2',0,2,'用户名2','upload/forum_avatarurl2.jpg,upload/forum_avatarurl3.jpg,upload/forum_avatarurl4.jpg','开放',0,'2024-03-30 12:52:45'),(73,'2024-03-30 04:52:45','帖子标题3','帖子内容3',0,3,'用户名3','upload/forum_avatarurl3.jpg,upload/forum_avatarurl4.jpg,upload/forum_avatarurl5.jpg','开放',0,'2024-03-30 12:52:45'),(74,'2024-03-30 04:52:45','帖子标题4','帖子内容4',0,4,'用户名4','upload/forum_avatarurl4.jpg,upload/forum_avatarurl5.jpg,upload/forum_avatarurl6.jpg','开放',0,'2024-03-30 12:52:45'),(75,'2024-03-30 04:52:45','帖子标题5','帖子内容5',0,5,'用户名5','upload/forum_avatarurl5.jpg,upload/forum_avatarurl6.jpg,upload/forum_avatarurl7.jpg','开放',0,'2024-03-30 12:52:45'),(76,'2024-03-30 04:52:45','帖子标题6','帖子内容6',0,6,'用户名6','upload/forum_avatarurl6.jpg,upload/forum_avatarurl7.jpg,upload/forum_avatarurl8.jpg','开放',0,'2024-03-30 12:52:45'),(77,'2024-03-30 04:52:45','帖子标题7','帖子内容7',0,7,'用户名7','upload/forum_avatarurl7.jpg,upload/forum_avatarurl8.jpg,upload/forum_avatarurl9.jpg','开放',0,'2024-03-30 12:52:45'),(78,'2024-03-30 04:52:45','帖子标题8','帖子内容8',0,8,'用户名8','upload/forum_avatarurl8.jpg,upload/forum_avatarurl9.jpg,upload/forum_avatarurl10.jpg','开放',0,'2024-03-30 12:52:45');
/*!40000 ALTER TABLE `forum` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `gangweifenlei`
--

DROP TABLE IF EXISTS `gangweifenlei`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `gangweifenlei` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `gangweifenlei` varchar(200) DEFAULT NULL COMMENT '岗位分类',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8 COMMENT='岗位分类';
/*!40101 SET character_set_client = @saved_cs_client */;

系统效果图









文章目录

目录

第1章 序言 1

1.1选题背景及意义 1

1.2国内外研究状况 2

1.3课题研究的主要内容 2

1.4文章的内容结构安排 3

第2章 相关理论技术介绍 4

2.1 Spring boot框架 5

2.2 Java语言简介 5

2.3爬虫技术简介 6

2.4VUE简介 7

2.5Hadoop简介 7

第3章 系统需求分析 9

3.1 可行性分析 10

3.1.1网站技术可行性分析 12

3.1.2网络经济可行性分析 14

3.1.3网络运行可行性分析 16

3.2 非功能性需求分析 18

3.3系统用例图 20

3.4 系统流程图 21

第4章 系统设计 22

4.1系统总体架构 25

4.2岗位爬虫的模型设计 25

4.3数据库设计 25

4.3.1 数据库E-R图设计 27

4.3.2 数据库表设计 29

第5章 系统实现 30

5.1注册、登录模块 31

5.2用户后台功能模块实现 32

5.3管理员后端模块实现 32

5.4岗位推荐系统看板展示 33

第6章 系统测试 35

6.1测试方法 36

6.2测试用例 37

结束语 38

参考文献 39

致谢 40

相关推荐
qq_463944866 分钟前
【Spark征服之路-2.2-安装部署Spark(二)】
大数据·分布式·spark
weixin_505154461 小时前
数字孪生在建设智慧城市中可以起到哪些作用或帮助?
大数据·人工智能·智慧城市·数字孪生·数据可视化
打码人的日常分享1 小时前
智慧城市建设方案
大数据·架构·智慧城市·制造
考虑考虑1 小时前
Springboot3.5.x结构化日志新属性
spring boot·后端·spring
TTDreamTT2 小时前
SpringBoot十二、SpringBoot系列web篇之过滤器Filte详解
spring boot
阿里云大数据AI技术3 小时前
ES Serverless 8.17王牌发布:向量检索「火力全开」,智能扩缩「秒级响应」!
大数据·运维·serverless
Mikhail_G4 小时前
Python应用变量与数据类型
大数据·运维·开发语言·python·数据分析
G皮T4 小时前
【Elasticsearch】映射:null_value 详解
大数据·elasticsearch·搜索引擎·映射·mappings·null_value
大霸王龙5 小时前
软件工程的软件生命周期通常分为以下主要阶段
大数据·人工智能·旅游
一只爱撸猫的程序猿5 小时前
构建一个简单的智能文档问答系统实例
数据库·spring boot·aigc