大数据开发基于Hadoop+springboot平台的岗位推荐系统

文章目录


前言

文章底部名片,获取项目的完整演示视频,免费解答技术疑问

项目介绍

随着网络科学技术不断的发展和普及化,用户在寻找适合自己的信息管理系统时面临着越来越大的挑战。因此,本文介绍了一套平台的岗位推荐系统,在技术实现方面,本系统采用JAVA、VUE、TOMCAT、HADOOP以及MySQL数据库编程,使用Spring boot框架实现前后端的连接和交互功能。用户需要先注册账号,然后才能登录系统并使用功能。本文还对平台的岗位推荐系统的研究现状和意义进行了详细介绍。随着大数据和人工智能技术的不断发展,大数据分析系统正逐渐成为网络应用中越来越重要的部分。本文提出的平台的岗位推荐系统将为用户提供更加高效和准确的信息智能化服务,满足用户的需求。总之,本文旨在介绍一套具有实际应用意义的平台的岗位推荐系统,针对传统管理方式进行了重要改进。通过对系统的实现和应用,本文展示了高效、准确的平台的岗位推荐系统应该具备的特点和功能,为平台的岗位推荐系统的研究和应用提供了有益的参考。

技术介绍

开发语言:Java

框架:springboot

JDK版本:JDK1.8

服务器:tomcat7

数据库:mysql

数据库工具:Navicat11

开发软件:eclipse/myeclipse/idea

Maven包:Maven

功能介绍

本文介绍了一个基于 Hadoop 平台的岗位推荐系统,该系统在B/S体系结构下,并通过MySQL数据库和Spring boot框架实现数据存储和前端展示。用户通过浏览器与网站进行交互。整个系统具有很好的可扩展性和安全性,为用户提供了更好的服务。系统的总体架构设计图如图4-1所示。

图 4-1系统架构图

核心代码

c 复制代码
# 数据爬取文件

import scrapy
import pymysql
import pymssql
from ..items import LvyoujingdianItem
import time
import re
import random
import platform
import json
import os
import urllib
from urllib.parse import urlparse
import requests
import emoji

# 旅游景点
class LvyoujingdianSpider(scrapy.Spider):
    name = 'lvyoujingdianSpider'
    spiderUrl = 'https://you.ctrip.com/sight/lanzhou231/s0-p{}.html'
    start_urls = spiderUrl.split(";")
    protocol = ''
    hostname = ''

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def start_requests(self):

        plat = platform.system().lower()
        if plat == 'linux' or plat == 'windows':
            connect = self.db_connect()
            cursor = connect.cursor()
            if self.table_exists(cursor, '5295r_lvyoujingdian') == 1:
                cursor.close()
                connect.close()
                self.temp_data()
                return

        pageNum = 1 + 1
        for url in self.start_urls:
            if '{}' in url:
                for page in range(1, pageNum):
                    next_link = url.format(page)
                    yield scrapy.Request(
                        url=next_link,
                        callback=self.parse
                    )
            else:
                yield scrapy.Request(
                    url=url,
                    callback=self.parse
                )

    # 列表解析
    def parse(self, response):
        
        _url = urlparse(self.spiderUrl)
        self.protocol = _url.scheme
        self.hostname = _url.netloc
        plat = platform.system().lower()
        if plat == 'windows_bak':
            pass
        elif plat == 'linux' or plat == 'windows':
            connect = self.db_connect()
            cursor = connect.cursor()
            if self.table_exists(cursor, '5295r_lvyoujingdian') == 1:
                cursor.close()
                connect.close()
                self.temp_data()
                return

        list = response.css('div.list_wide_mod2 div.list_mod2')
        
        for item in list:

            fields = LvyoujingdianItem()



            if '(.*?)' in '''dt a::attr(href)''':
                fields["laiyuan"] = re.findall(r'''dt a::attr(href)''', response.text, re.DOTALL)[0].strip()
            else:
                fields["laiyuan"] = self.remove_html(item.css('dt a::attr(href)').extract_first())
            if '(.*?)' in '''div.leftimg a img::attr(src)''':
                fields["fengmian"] = re.findall(r'''div.leftimg a img::attr(src)''', response.text, re.DOTALL)[0].strip()
            else:
                fields["fengmian"] = self.remove_html(item.css('div.leftimg a img::attr(src)').extract_first())
            if '(.*?)' in '''div.rdetailbox dl dt a::text''':
                fields["biaoti"] = re.findall(r'''div.rdetailbox dl dt a::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["biaoti"] = self.remove_html(item.css('div.rdetailbox dl dt a::text').extract_first())
            if '(.*?)' in '''b.hot_score_number::text''':
                fields["redu"] = re.findall(r'''b.hot_score_number::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["redu"] = self.remove_html(item.css('b.hot_score_number::text').extract_first())
            if '(.*?)' in '''dd.ellipsis::text''':
                fields["dizhi"] = re.findall(r'''dd.ellipsis::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["dizhi"] = self.remove_html(item.css('dd.ellipsis::text').extract_first())
            if '(.*?)' in '''a.score strong::text''':
                fields["pingfen"] = re.findall(r'''a.score strong::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["pingfen"] = self.remove_html(item.css('a.score strong::text').extract_first())
            if '(.*?)' in '''a.recomment::text''':
                fields["pinglun"] = re.findall(r'''a.recomment::text''', response.text, re.DOTALL)[0].strip()
            else:
                fields["pinglun"] = self.remove_html(item.css('a.recomment::text').extract_first())
            if '(.*?)' in '''p[class="bottomcomment ellipsis open_popupbox_a"]''':
                fields["dianping"] = re.findall(r'''p[class="bottomcomment ellipsis open_popupbox_a"]''', response.text, re.DOTALL)[0].strip()
            else:
                fields["dianping"] = self.remove_html(item.css('p[class="bottomcomment ellipsis open_popupbox_a"]').extract_first())

            detailUrlRule = item.css('dt a::attr(href)').extract_first()
            if self.protocol in detailUrlRule:
                pass
            elif detailUrlRule.startswith('//'):
                detailUrlRule = self.protocol + ':' + detailUrlRule
            else:
                detailUrlRule = self.protocol + '://' + self.hostname + detailUrlRule
                fields["laiyuan"] = detailUrlRule

            yield scrapy.Request(url=detailUrlRule, meta={'fields': fields},  callback=self.detail_parse, dont_filter=True)


    # 详情解析
    def detail_parse(self, response):
        fields = response.meta['fields']

        try:
            if '(.*?)' in '''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''':
                fields["gfdh"] = re.findall(r'''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''', response.text, re.S)[0].strip()
            else:
                if 'gfdh' != 'xiangqing' and 'gfdh' != 'detail' and 'gfdh' != 'pinglun' and 'gfdh' != 'zuofa':
                    fields["gfdh"] = self.remove_html(response.css('''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''').extract_first())
                else:
                    fields["gfdh"] = emoji.demojize(response.css('''<div class="baseInfoItem"><p class="baseInfoTitle">官方电话</p><p class="baseInfoText">(.*?)</p></div>''').extract_first())
        except:
            pass


        try:
            if '(.*?)' in '''div[class="detailModule normalModule"]''':
                fields["detail"] = re.findall(r'''div[class="detailModule normalModule"]''', response.text, re.S)[0].strip()
            else:
                if 'detail' != 'xiangqing' and 'detail' != 'detail' and 'detail' != 'pinglun' and 'detail' != 'zuofa':
                    fields["detail"] = self.remove_html(response.css('''div[class="detailModule normalModule"]''').extract_first())
                else:
                    fields["detail"] = emoji.demojize(response.css('''div[class="detailModule normalModule"]''').extract_first())
        except:
            pass




        return fields

    # 去除多余html标签
    def remove_html(self, html):
        if html == None:
            return ''
        pattern = re.compile(r'<[^>]+>', re.S)
        return pattern.sub('', html).strip()

    # 数据库连接
    def db_connect(self):
        type = self.settings.get('TYPE', 'mysql')
        host = self.settings.get('HOST', 'localhost')
        port = int(self.settings.get('PORT', 3306))
        user = self.settings.get('USER', 'root')
        password = self.settings.get('PASSWORD', '123456')

        try:
            database = self.databaseName
        except:
            database = self.settings.get('DATABASE', '')

        if type == 'mysql':
            connect = pymysql.connect(host=host, port=port, db=database, user=user, passwd=password, charset='utf8')
        else:
            connect = pymssql.connect(host=host, user=user, password=password, database=database)

        return connect

    # 断表是否存在
    def table_exists(self, cursor, table_name):
        cursor.execute("show tables;")
        tables = [cursor.fetchall()]
        table_list = re.findall('(\'.*?\')',str(tables))
        table_list = [re.sub("'",'',each) for each in table_list]

        if table_name in table_list:
            return 1
        else:
            return 0

    # 数据缓存源
    def temp_data(self):

        connect = self.db_connect()
        cursor = connect.cursor()
        sql = '''
            insert into `lvyoujingdian`(
                id
                ,laiyuan
                ,fengmian
                ,biaoti
                ,redu
                ,dizhi
                ,pingfen
                ,pinglun
                ,dianping
                ,gfdh
                ,detail
            )
            select
                id
                ,laiyuan
                ,fengmian
                ,biaoti
                ,redu
                ,dizhi
                ,pingfen
                ,pinglun
                ,dianping
                ,gfdh
                ,detail
            from `5295r_lvyoujingdian`
            where(not exists (select
                id
                ,laiyuan
                ,fengmian
                ,biaoti
                ,redu
                ,dizhi
                ,pingfen
                ,pinglun
                ,dianping
                ,gfdh
                ,detail
            from `lvyoujingdian` where
                `lvyoujingdian`.id=`5295r_lvyoujingdian`.id
            ))
            limit {0}
        '''.format(random.randint(10,15))

        cursor.execute(sql)
        connect.commit()

        connect.close()

数据库参考

csharp 复制代码
--
-- Table structure for table `chat`
--

DROP TABLE IF EXISTS `chat`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `chat` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `userid` bigint(20) NOT NULL COMMENT '用户id',
  `adminid` bigint(20) DEFAULT NULL COMMENT '管理员id',
  `ask` longtext COMMENT '提问',
  `reply` longtext COMMENT '回复',
  `isreply` int(11) DEFAULT NULL COMMENT '是否回复',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=69 DEFAULT CHARSET=utf8 COMMENT='在线咨询';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `chat`
--

LOCK TABLES `chat` WRITE;
/*!40000 ALTER TABLE `chat` DISABLE KEYS */;
INSERT INTO `chat` VALUES (61,'2024-03-30 04:52:45',1,1,'提问1','回复1',1),(62,'2024-03-30 04:52:45',2,2,'提问2','回复2',2),(63,'2024-03-30 04:52:45',3,3,'提问3','回复3',3),(64,'2024-03-30 04:52:45',4,4,'提问4','回复4',4),(65,'2024-03-30 04:52:45',5,5,'提问5','回复5',5),(66,'2024-03-30 04:52:45',6,6,'提问6','回复6',6),(67,'2024-03-30 04:52:45',7,7,'提问7','回复7',7),(68,'2024-03-30 04:52:45',8,8,'提问8','回复8',8);
/*!40000 ALTER TABLE `chat` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `config`
--

DROP TABLE IF EXISTS `config`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `config` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `name` varchar(100) NOT NULL COMMENT '配置参数名称',
  `value` varchar(100) DEFAULT NULL COMMENT '配置参数值',
  `url` varchar(500) DEFAULT NULL COMMENT 'url',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COMMENT='配置文件';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `config`
--

LOCK TABLES `config` WRITE;
/*!40000 ALTER TABLE `config` DISABLE KEYS */;
INSERT INTO `config` VALUES (1,'picture1','upload/picture1.jpg',NULL),(2,'picture2','upload/picture2.jpg',NULL),(3,'picture3','upload/picture3.jpg',NULL);
/*!40000 ALTER TABLE `config` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `discussqiyezhaopin`
--

DROP TABLE IF EXISTS `discussqiyezhaopin`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `discussqiyezhaopin` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `refid` bigint(20) NOT NULL COMMENT '关联表id',
  `userid` bigint(20) NOT NULL COMMENT '用户id',
  `avatarurl` longtext COMMENT '头像',
  `nickname` varchar(200) DEFAULT NULL COMMENT '用户名',
  `content` longtext NOT NULL COMMENT '评论内容',
  `reply` longtext COMMENT '回复内容',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='企业招聘评论表';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `discussqiyezhaopin`
--

LOCK TABLES `discussqiyezhaopin` WRITE;
/*!40000 ALTER TABLE `discussqiyezhaopin` DISABLE KEYS */;
/*!40000 ALTER TABLE `discussqiyezhaopin` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `forum`
--

DROP TABLE IF EXISTS `forum`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `forum` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `title` varchar(200) DEFAULT NULL COMMENT '帖子标题',
  `content` longtext NOT NULL COMMENT '帖子内容',
  `parentid` bigint(20) DEFAULT NULL COMMENT '父节点id',
  `userid` bigint(20) NOT NULL COMMENT '用户id',
  `username` varchar(200) DEFAULT NULL COMMENT '用户名',
  `avatarurl` longtext COMMENT '头像',
  `isdone` varchar(200) DEFAULT NULL COMMENT '状态',
  `istop` int(11) DEFAULT '0' COMMENT '是否置顶',
  `toptime` datetime DEFAULT NULL COMMENT '置顶时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=79 DEFAULT CHARSET=utf8 COMMENT='论坛交流';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `forum`
--

LOCK TABLES `forum` WRITE;
/*!40000 ALTER TABLE `forum` DISABLE KEYS */;
INSERT INTO `forum` VALUES (71,'2024-03-30 04:52:45','帖子标题1','帖子内容1',0,1,'用户名1','upload/forum_avatarurl1.jpg,upload/forum_avatarurl2.jpg,upload/forum_avatarurl3.jpg','开放',0,'2024-03-30 12:52:45'),(72,'2024-03-30 04:52:45','帖子标题2','帖子内容2',0,2,'用户名2','upload/forum_avatarurl2.jpg,upload/forum_avatarurl3.jpg,upload/forum_avatarurl4.jpg','开放',0,'2024-03-30 12:52:45'),(73,'2024-03-30 04:52:45','帖子标题3','帖子内容3',0,3,'用户名3','upload/forum_avatarurl3.jpg,upload/forum_avatarurl4.jpg,upload/forum_avatarurl5.jpg','开放',0,'2024-03-30 12:52:45'),(74,'2024-03-30 04:52:45','帖子标题4','帖子内容4',0,4,'用户名4','upload/forum_avatarurl4.jpg,upload/forum_avatarurl5.jpg,upload/forum_avatarurl6.jpg','开放',0,'2024-03-30 12:52:45'),(75,'2024-03-30 04:52:45','帖子标题5','帖子内容5',0,5,'用户名5','upload/forum_avatarurl5.jpg,upload/forum_avatarurl6.jpg,upload/forum_avatarurl7.jpg','开放',0,'2024-03-30 12:52:45'),(76,'2024-03-30 04:52:45','帖子标题6','帖子内容6',0,6,'用户名6','upload/forum_avatarurl6.jpg,upload/forum_avatarurl7.jpg,upload/forum_avatarurl8.jpg','开放',0,'2024-03-30 12:52:45'),(77,'2024-03-30 04:52:45','帖子标题7','帖子内容7',0,7,'用户名7','upload/forum_avatarurl7.jpg,upload/forum_avatarurl8.jpg,upload/forum_avatarurl9.jpg','开放',0,'2024-03-30 12:52:45'),(78,'2024-03-30 04:52:45','帖子标题8','帖子内容8',0,8,'用户名8','upload/forum_avatarurl8.jpg,upload/forum_avatarurl9.jpg,upload/forum_avatarurl10.jpg','开放',0,'2024-03-30 12:52:45');
/*!40000 ALTER TABLE `forum` ENABLE KEYS */;
UNLOCK TABLES;

--
-- Table structure for table `gangweifenlei`
--

DROP TABLE IF EXISTS `gangweifenlei`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `gangweifenlei` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `addtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `gangweifenlei` varchar(200) DEFAULT NULL COMMENT '岗位分类',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8 COMMENT='岗位分类';
/*!40101 SET character_set_client = @saved_cs_client */;

系统效果图









文章目录

目录

第1章 序言 1

1.1选题背景及意义 1

1.2国内外研究状况 2

1.3课题研究的主要内容 2

1.4文章的内容结构安排 3

第2章 相关理论技术介绍 4

2.1 Spring boot框架 5

2.2 Java语言简介 5

2.3爬虫技术简介 6

2.4VUE简介 7

2.5Hadoop简介 7

第3章 系统需求分析 9

3.1 可行性分析 10

3.1.1网站技术可行性分析 12

3.1.2网络经济可行性分析 14

3.1.3网络运行可行性分析 16

3.2 非功能性需求分析 18

3.3系统用例图 20

3.4 系统流程图 21

第4章 系统设计 22

4.1系统总体架构 25

4.2岗位爬虫的模型设计 25

4.3数据库设计 25

4.3.1 数据库E-R图设计 27

4.3.2 数据库表设计 29

第5章 系统实现 30

5.1注册、登录模块 31

5.2用户后台功能模块实现 32

5.3管理员后端模块实现 32

5.4岗位推荐系统看板展示 33

第6章 系统测试 35

6.1测试方法 36

6.2测试用例 37

结束语 38

参考文献 39

致谢 40

相关推荐
旗晟机器人3 分钟前
A4-C四驱高防变电站巡检机器人
大数据·人工智能·安全·机器人
小黑0316 分钟前
Spark SQL
大数据·sql·spark
菜菜-plus1 小时前
微服务技术,SpringCloudAlibaba,Redis,RocketMQ,Docker,分库分表
java·spring boot·redis·spring cloud·docker·微服务·java-rocketmq
除了菜一无所有!2 小时前
基于SpringBoot技术的教务管理
java·spring boot·后端
DEARM LINER4 小时前
mysql 巧妙的索引
数据库·spring boot·后端·mysql
雪兽软件5 小时前
人工智能和大数据如何改变企业?
大数据·人工智能
Data-Miner6 小时前
54页可编辑PPT | 大型集团企业数据治理解决方案
大数据·big data
开心工作室_kaic7 小时前
ssm010基于ssm的新能源汽车在线租赁管理系统(论文+源码)_kaic
java·前端·spring boot·后端·汽车
代码吐槽菌7 小时前
基于SSM的汽车客运站管理系统【附源码】
java·开发语言·数据库·spring boot·后端·汽车
ws2019077 小时前
聚焦汽车智能化与电动化︱AUTO TECH 2025 华南展,以展带会,已全面启动,与您相约11月广州!
大数据·人工智能·汽车