Java超市收银系统(十、爬虫)

引言

爬虫功能实现,要求爬取页面数据至少100条,这里以豆瓣音乐为示例编写代码豆瓣音乐标签: 民谣 (douban.com)

功能实现

除了爬虫功能增加,代码其他内容原理和之前博客发布是一致的,只不过这里为了区分,我们重新创建数据库,名称为music,依旧是vo包中存放数据信息,也就是java可自动生成的构造函数。dao包中存放数据库功能实现函数,主要为增删改查四大基础功能。util包中存放数据库连接函数,用于java和数据库的连接。ui包中存放主函数内容,即实现各类函数调用。service包中存放爬虫相关函数,用于实现对指定页面的数据信息爬取。、

该类定义了几个列表来保存有关正在抓取的音乐记录的不同数据:

  • musicName:存储音乐专辑的名称。
  • musicURLaddress:存储相册的 URL。
  • musicScore:存储专辑的评分(分数)。
  • musicPeople:存储对相册进行评分的人数。
  • musicSinger:存储歌手或艺术家的姓名。
  • musicTime:存储专辑的发行日期。
  • musicType:存储音乐的流派或类型。
  • musicMedium:存储专辑的介质(例如,CD、黑胶唱片)。
  • musicSect:存储有关相册的其他信息(可选)。
  • musicBarcode:存储条形码信息(可选)。

这些列表用于收集抓取的数据,然后用于将数据插入数据库。

getData() 方法

该方法是启动 Web 抓取过程的主要方法:getData()

  • User Agent:该字符串模拟浏览器请求,使其看起来像是来自真实浏览器。这有助于避免被网站阻止。

  • Loop Over Pages :该方法循环 5 个页面(即 100 个项目,假设每个页面有 20 个项目)。对于每次迭代,它都会构建当前页面的 URL,并调用getMusicInfo()以从该页面抓取数据。

  • 睡眠 1 秒Thread.sleep(1000)是添加的延迟,以防止网站被请求淹没(一种常见的反抓取措施)。

  • 将数据插入数据库 : 从所有页面抓取数据后,它会调用insertMusicInfoToDB()将收集的数据存储在数据库中。

对应html:

点击链接,进入每首歌的详细信息页面:

getMusicInfo() 方法

此方法处理从给定页面中实际抓取的数据:

  • 文档检索 :该方法用于Jsoup连接到 URL 并检索 HTML 文档。

  • 选择元素 :然后,它会选择所有带有类 .item 的元素,这些元素代表单独的音乐记录。

  • 提取数据: 对于每张音乐唱片,它提取名称、URL、分数、评分人数以及歌手、发行日期、类型、媒体等各种其他详细信息,并将它们添加到相应的列表中。

insertMusicInfoToDB() 方法

此方法将收集的数据插入到数据库中:

  • Looping Over Data :该方法遍历Information所有收集的数据(从列表中),并为每个音乐记录创建一个对象。

  • 解析数据:它尝试将分数和人数从字符串解析为适当的类型(float 和 int)。如果解析失败,它会设置默认值(0.0f 表示 score 和 0 表示 people)。

  • Inserting into Database :然后调用InformationDAO.insert(info)将数据插入数据库。插入的结果存储在 a 中,该 a 将音乐名称映射到Map指示插入是否成功的布尔值。

  • 记录结果:每次插入后,它会记录插入是否成功。

总结

  • 网页抓取getData()getMusicInfo() 方法负责从特定网页抓取数据。
  • 数据收集:数据收集到各种列表中。
  • 数据库插入 :该方法处理将insertMusicInfoToDB()收集的数据插入数据库,确保每条数据都得到正确解析和存储。

结果展示

完整代码

ui---Driver

package ui;

import service.MusicService;

import java.io.IOException;


public class Driver {
    public static void main(String[] args) throws IOException, InterruptedException {
        MusicService.getData();
    }
}

vo---Information

package vo;

public class Information {
    private int id;
    private String musicName;
    private String singer;
    private String time;
    private String type;
    private String medium;
    private String sect;
    private String barCode;
    private float score;
    private int people;
    private String urlAddress;

    public Information() {
    }

    public Information(int id, String musicName, String singer, String time, String type, String medium, String sect, String barCode, float score, int people, String urlAddress) {
        this.id = id;
        this.musicName = musicName;
        this.singer = singer;
        this.time = time;
        this.type = type;
        this.medium = medium;
        this.sect = sect;
        this.barCode = barCode;
        this.score = score;
        this.people = people;
        this.urlAddress = urlAddress;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getMusicName() {
        return musicName;
    }

    public void setMusicName(String musicName) {
        this.musicName = musicName;
    }

    public String getSinger() {
        return singer;
    }

    public void setSinger(String singer) {
        this.singer = singer;
    }

    public String getTime() {
        return time;
    }

    public void setTime(String time) {
        this.time = time;
    }

    public String getType() {
        return type;
    }

    public void setType(String type) {
        this.type = type;
    }

    public String getMedium() {
        return medium;
    }

    public void setMedium(String medium) {
        this.medium = medium;
    }

    public String getSect() {
        return sect;
    }

    public void setSect(String sect) {
        this.sect = sect;
    }

    public String getBarCode() {
        return barCode;
    }

    public void setBarCode(String barCode) {
        this.barCode = barCode;
    }

    public float getScore() {
        return score;
    }

    public void setScore(float score) {
        this.score = score;
    }

    public int getPeople() {
        return people;
    }

    public void setPeople(int people) {
        this.people = people;
    }

    public String getUrlAddress() {
        return urlAddress;
    }

    public void setUrlAddress(String urlAddress) {
        this.urlAddress = urlAddress;
    }

    @Override
    public String toString() {
        return "Information{" +
                "id=" + id +
                ", musicName='" + musicName + '\'' +
                ", singer='" + singer + '\'' +
                ", time='" + time + '\'' +
                ", type='" + type + '\'' +
                ", medium='" + medium + '\'' +
                ", sect='" + sect + '\'' +
                ", barCode='" + barCode + '\'' +
                ", score=" + score +
                ", people=" + people +
                ", urlAddress='" + urlAddress + '\'' +
                '}';
    }



    public static class Info {
        private String singer;
        private String time;
        private String type;
        private double medium;
    }



    }

dao---InformationDAO

package dao;

import util.DBConnection;
import vo.Information;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

public class InformationDAO {

    //按歌名查询
    public static Information queryByName(String musicName) {
        Connection con = null;
        PreparedStatement pst = null;
        ResultSet rs = null;
        Information information = null;
        try {
            con = DBConnection.getConnection();
            String sql = "SELECT * FROM music_information WHERE musicName = ?";
            pst = con.prepareStatement(sql);
            pst.setString(1, musicName);
            rs = pst.executeQuery();
            if (rs.next()) {
                information = new Information();
                information.setId(rs.getInt("id"));
                information.setMusicName(rs.getString("musicName"));
                information.setSinger(rs.getString("singer"));
                information.setTime(rs.getString("time"));
                information.setType(rs.getString("type"));
                information.setSect(rs.getString("medium"));
                information.setSect(rs.getString("sect"));
                information.setBarCode(rs.getString("barcode"));
                information.setScore(rs.getFloat("score"));
                information.setPeople(rs.getInt("people"));
                information.setUrlAddress(rs.getString("URLaddress"));
            }
        } catch (SQLException e) {
            throw new RuntimeException(e);
        } finally {
            DBConnection.close(con, pst);
        }
        return information;
    }

    public static List<Information> queryBySinger(String singer) {
        List<Information> infoList = new ArrayList<>();
        Connection con = null;
        PreparedStatement pst = null;
        ResultSet rs = null;

        try {
            con = DBConnection.getConnection();
            String sql = "SELECT * FROM music_information WHERE singer = ?";
            pst = con.prepareStatement(sql);
            pst.setString(1, singer);
            rs = pst.executeQuery();

            while (rs.next()) {
                Information info = new Information();
                info.setId(rs.getInt("id"));
                info.setMusicName(rs.getString("musicName"));
                info.setSinger(rs.getString("singer"));
                info.setTime(rs.getString("time"));
                info.setType(rs.getString("type"));
                info.setMedium(rs.getString("medium"));
                info.setSect(rs.getString("sect"));
                info.setBarCode(rs.getString("barcode"));
                info.setScore(rs.getFloat("score"));
                info.setPeople(rs.getInt("people"));
                info.setUrlAddress(rs.getString("URLaddress"));
                infoList.add(info);
            }
        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            DBConnection.close(con, pst);
        }
        return infoList;
    }

    public static int getTotalPeople() {
        String query = "SELECT SUM(people) AS totalPeople FROM music_information";
        try (Connection conn = DBConnection.getConnection();
             PreparedStatement pst = conn.prepareStatement(query);
             ResultSet rs = pst.executeQuery()) {
            if (rs.next()) {
                return rs.getInt("totalPeople");
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }
        return 0;
    }

    public static float getAverageScore(String singer) {
        String query = "SELECT AVG(score) AS averageScore FROM music_information WHERE singer = ? AND sect = '民谣'";
        float averageScore = -1; // 默认值,表示没有找到数据
        Connection con = null;
        PreparedStatement pst = null;
        ResultSet rs = null;

        try {
            con = DBConnection.getConnection();
            pst = con.prepareStatement(query);
            pst.setString(1, singer);
            rs = pst.executeQuery();

            if (rs.next()) {
                averageScore = rs.getFloat("averageScore");
            }
        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            // 关闭资源
            try {
                rs.close();
                pst.close();
                con.close();
            } catch (SQLException e) {
                throw new RuntimeException(e);
            }

        }
        return averageScore;
    }

    //query 任意条件查寻
    public static ArrayList<Information> query(Information information) {
        Connection con = null;
        PreparedStatement pst = null;
        ResultSet rs = null;
        ArrayList<Information> informationArrayList = new ArrayList<>();
        try {
            con = DBConnection.getConnection();
            StringBuilder sql = new StringBuilder("SELECT * FROM music_information WHERE 1 = 1");
            if (information.getId() != 0) {
                sql.append(" AND id = ?");
            }
            if (information.getMusicName() != null) {
                sql.append(" AND musicName = ?");
            }
            if (information.getSinger() != null) {
                sql.append(" AND signer = ?");
            }
            if (information.getTime() != null) {
                sql.append(" AND time = ?");
            }
            if (information.getType() != null) {
                sql.append(" AND type = ?");
            }
            if (information.getMedium() != null) {
                sql.append(" AND medium = ?");
            }
            if (information.getSect() != null) {
                sql.append(" AND sect = ?");
            }
            if (information.getBarCode() != null) {
                sql.append(" AND barCode = ?");
            }
            if (information.getScore() != 0) {
                sql.append(" AND score = ?");
            }
            if (information.getPeople() != 0) {
                sql.append(" AND people = ?");
            }
            if (information.getUrlAddress() != null) {
                sql.append(" AND URLaddress = ?");
            }
            pst = con.prepareStatement(sql.toString());
            int paramIndex = 1;
            if (information.getId() != 0) {
                pst.setInt(paramIndex++, information.getId());
            }
            if (information.getMusicName() != null) {
                pst.setString(paramIndex++, information.getMusicName());
            }
            if (information.getSinger() != null) {
                pst.setString(paramIndex++, information.getSinger());
            }
            if (information.getTime() != null) {
                pst.setString(paramIndex++, information.getTime());
            }
            if (information.getType() != null) {
                pst.setString(paramIndex++, information.getType());
            }
            if (information.getMedium() != null) {
                pst.setString(paramIndex++, information.getMedium());
            }
            if (information.getSect() != null) {
                pst.setString(paramIndex++, information.getSect());
            }
            if (information.getBarCode() != null) {
                pst.setString(paramIndex++, information.getBarCode());
            }
            if (information.getScore() != 0) {
                pst.setFloat(paramIndex++, information.getScore());
            }
            if (information.getPeople() != 0) {
                pst.setInt(paramIndex++, information.getPeople());
            }
            if (information.getUrlAddress() != null) {
                pst.setString(paramIndex++, information.getUrlAddress());
            }
            rs = pst.executeQuery();
            while (rs.next()) {
                Information i = new Information();
                i.setId(rs.getInt("id"));
                i.setMusicName(rs.getString("musicName"));
                i.setSinger(rs.getString("singer"));
                i.setTime(rs.getString("time"));
                i.setType(rs.getString("type"));
                i.setMedium(rs.getString("medium"));
                i.setSect(rs.getString("sect"));
                i.setBarCode(rs.getString("barcode"));
                i.setScore(rs.getFloat("score"));
                i.setPeople(rs.getInt("people"));
                i.setUrlAddress(rs.getString("URLaddress"));
                informationArrayList.add(i);
            }
        } catch (SQLException e) {
            throw new RuntimeException(e);
        } finally {
            DBConnection.close(con, pst);
        }
        return informationArrayList;
    }

    //insert
    public static boolean insert(Information information) {
        Connection con = null;
        PreparedStatement pst = null;
        boolean success = false;
        try {
            con = DBConnection.getConnection();
            String sql = "INSERT INTO music_information (id,musicName,singer,time,type,medium,sect,barcode,score,people,URLaddress)" +
                    "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?,?,?)";
            pst = con.prepareStatement(sql);
            pst.setInt(1,information.getId());
            pst.setString(2, information.getMusicName());
            pst.setString(3, information.getSinger());
            pst.setString(4, information.getTime());
            pst.setString(5, information.getType());
            pst.setString(6, information.getMedium());
            pst.setString(7, information.getSect());
            pst.setString(8, information.getBarCode());
            pst.setFloat(9, information.getScore());
            pst.setInt(10, information.getPeople());
            pst.setString(11, information.getUrlAddress());
            int rowsAffected = pst.executeUpdate();
            if (rowsAffected > 0) {
                success = true;
            }
        } catch (SQLException e) {
            throw new RuntimeException(e);
        } finally {
            DBConnection.close(con, pst);
        }
        return success;
    }

    //update 更新商品信息
    public static boolean update(Information information) {
        Connection con = null;
        PreparedStatement pst = null;
        boolean success = false;
        try {
            con = DBConnection.getConnection();
            String sql = "UPDATE music_information SET singer=?, time=?, type=?, medium=?, sect=?, barcode=?, score=?, people=?, URLaddress=? WHERE musicName=?";
            pst = con.prepareStatement(sql);
            pst.setString(1, information.getSinger());
            pst.setString(2, information.getTime());
            pst.setString(3, information.getType());
            pst.setString(4, information.getMedium());
            pst.setString(5, information.getSect());
            pst.setString(6, information.getBarCode());
            pst.setFloat(7, information.getScore());
            pst.setInt(8, information.getPeople());
            pst.setString(9, information.getUrlAddress());
            pst.setString(10, information.getMusicName()); // musicName 作为最后一个参数
            System.out.println("执行 SQL: " + pst.toString()); // 添加日志以调试
            int rowsAffected = pst.executeUpdate();
            if (rowsAffected > 0) {
                success = true;
            } else {
                System.out.println("更新失败,没有匹配的记录被更新"); // 添加日志
            }
        } catch (SQLException e) {
            throw new RuntimeException(e);
        } finally {
            DBConnection.close(con, pst);
        }
        return success;
    }


    //delete 删除商品信息
    public static boolean delete(Information information) {
        Connection con = null;
        PreparedStatement pst = null;
        boolean success = false;
        try {
            con = DBConnection.getConnection();
            String sql = "DELETE FROM music_information WHERE musicName = ?";
            pst = con.prepareStatement(sql);
            pst.setString(1, information.getMusicName());
            int rowsAffected = pst.executeUpdate();
            if (rowsAffected > 0) {
                success = true;
            }
        } catch (SQLException e) {
            throw new RuntimeException(e);
        } finally {
            DBConnection.close(con, pst);
        }
        return success;
    }

}

util---DBConnection

package util;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class DBConnection {
	private static String driverName;
	private static String url;
	private static String user;
	private static String password;

	//驱动加载,只需执行一次
	static{
		driverName = "com.mysql.cj.jdbc.Driver";
		try {
			Class.forName(driverName);
		} catch (ClassNotFoundException e) {
			throw new RuntimeException(e);
		}
	}

	//获取链接
	public static Connection getConnection(){
		url = "jdbc:mysql://localhost:3306/music?useUnicode=true&characterEncoding=utf-8";
		user = "root";
		password = "123456";
		Connection con = null;
		try {
			con = DriverManager.getConnection(url,user,password);
		} catch (SQLException e) {
			throw new RuntimeException(e);
		}
		return con;
	}

	//关闭资源
	public static void close(Connection con, PreparedStatement pst){
		if(con!=null) {
			try {
				con.close();
			} catch (SQLException e) {
				throw new RuntimeException(e);
			}
		}
		if(pst!=null) {
			try {
				pst.close();
			} catch (SQLException e) {
				throw new RuntimeException(e);
			}
		}
	}
}

service---MusicService

package service;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import dao.InformationDAO;
import vo.Information;

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class MusicService {

    private static List<String> musicName = new ArrayList<>();
    private static List<String> musicURLaddress = new ArrayList<>();
    private static List<String> musicScore = new ArrayList<>();
    private static List<String> musicPeople = new ArrayList<>();
    private static List<String> musicSinger = new ArrayList<>();
    private static List<String> musicTime = new ArrayList<>();
    private static List<String> musicType = new ArrayList<>();
    private static List<String> musicMedium = new ArrayList<>();
    private static List<String> musicSect = new ArrayList<>();
    private static List<String> musicBarcode = new ArrayList<>();

    public static void getData() throws IOException, InterruptedException {
        String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36";
        for (int i = 0; i < 5; i++) {  // 爬取共10页,每页20条数据
            String pageUrl = "https://music.douban.com/tag/民谣?start=" + (i * 20) + "&type=T";
            System.out.println("开始爬取第" + (i + 1) + "页,地址是:" + pageUrl);
            getMusicInfo(pageUrl, userAgent);
            Thread.sleep(1000);  // 等待1秒(防止反爬)
        }
        // 插入数据库
        insertMusicInfoToDB();
    }

    public static void getMusicInfo(String url, String userAgent) throws IOException {
        Document document = Jsoup.connect(url).userAgent(userAgent).get();
        //获取<tr
        Elements musicElements = document.select(".item");

        for (Element music : musicElements) {
            // 专辑名称
            String name = music.select(".pl2 a").text().replace("\n", "").replace("                ", " ").trim();
            musicName.add(name);
            // 专辑链接
            String URLaddress = music.select(".pl2 a").attr("href");
            musicURLaddress.add(URLaddress);
            // 音乐评分
            String score;
            try {
                score = music.select(".rating_nums").text();
            } catch (Exception e) {
                score = "";
            }
            musicScore.add(score);
            //评分人数
            String people = music.select(".pl").get(1).text().replace(" ", "").replace("人评价", "").replace("(", "").replace(")", "");  // 评分人数
            musicPeople.add(people);

            String[] musicInfos = music.select(".pl").get(0).text().trim().split(" / ");
            if (musicInfos.length >= 4) {
                musicSinger.add(musicInfos[0]);
                musicTime.add(musicInfos[1]);
                musicType.add(musicInfos[2]);
                musicMedium.add(musicInfos[3]);
                musicSect.add(musicInfos.length > 4 ? musicInfos[4] : "");
                musicBarcode.add(musicInfos.length > 5 ? musicInfos[5] : "");
            } else {
                // 处理信息不完整的情况
                musicSinger.add(musicInfos[0]);
                musicTime.add(musicInfos.length > 1 ? musicInfos[1] : "");
                musicType.add(musicInfos.length > 2 ? musicInfos[2] : "");
                musicMedium.add(musicInfos.length > 3 ? musicInfos[3] : "");
                musicSect.add("");
                musicBarcode.add("");
            }
        }
    }

    public static Map<String, Object> insertMusicInfoToDB() {
        Map<String, Object> resultMap = new HashMap<>();
        for (int i = 0; i < musicName.size(); i++) {
            Information info = new Information();
            info.setMusicName(musicName.get(i));
            info.setSinger(musicSinger.get(i));
            info.setTime(musicTime.get(i));
            info.setType(musicType.get(i));
            info.setMedium(musicMedium.get(i));
            info.setSect(musicSect.get(i));
            info.setBarCode(musicBarcode.get(i));
            try {
                info.setScore(Float.parseFloat(musicScore.get(i)));
            } catch (NumberFormatException e) {
                info.setScore(0.0f);
            }
            try {
                info.setPeople(Integer.parseInt(musicPeople.get(i)));
            } catch (NumberFormatException e) {
                info.setPeople(0);
            }
            info.setUrlAddress(musicURLaddress.get(i));
            boolean success = InformationDAO.insert(info);
            resultMap.put(musicName.get(i), success); // 将结果添加到Map中
            if (success) {
                System.out.println("成功插入: " + info.getMusicName());
            } else {
                System.out.println("插入失败: " + info.getMusicName());
            }
        }
        return resultMap;
    }
}

mysql

create database music;
use music;

CREATE TABLE `music_information` (  
    `id` INT ,  
    `musicName` VARCHAR(255) PRIMARY KEY,  
    `singer` VARCHAR(255),  
    `time` varchar(50),    # 发行日期
    `type` VARCHAR(255),  # 专辑类型
    `medium` VARCHAR(100),
    `sect` varchar(50),  # 流派
    `barcode` VARCHAR(50),  
    `score` DECIMAL(3, 1),  
    `people` INT,  
    `URLaddress` VARCHAR(500)  
);

INSERT INTO `music_information` (`id`,`musicName`,`singer`,`time`,`type`,`medium`,`sect`,`barcode`,`score`,`people`,`URLaddress`)VALUES  
('1','Song Title 1', 'Artist Name 1', '2023-01-01', 'Album Type 1','md1', '民谣', '123456789012', 4.5, 1000, 'https://example.com/song1'), 
('3','st1', 'Artist Name 1', '2023-01-01', 'Album Type 1','md1', 'Pop', '123456789012', 4.5, 1000, 'https://example.com/song1'),
('4','st2', 'Artist Name 1', '2023-01-01', 'Album Type 1','md1', '民谣', '123456789012', 4.2, 1000, 'https://example.com/song1'),
('2','Song Title 2', 'Artist Name 2', '2022-05-15', 'Album Type 2','md2', 'Rock', '234567890123', 4.2, 500, 'https://example.com/song2');

drop table music_information;
select*from music_information;
相关推荐
Swift社区2 小时前
在 Swift 中实现字符串分割问题:以字典中的单词构造句子
开发语言·ios·swift
没头脑的ht2 小时前
Swift内存访问冲突
开发语言·ios·swift
没头脑的ht2 小时前
Swift闭包的本质
开发语言·ios·swift
wjs20242 小时前
Swift 数组
开发语言
Python私教2 小时前
model中能定义字段声明不存储到数据库吗
数据库·oracle
吾日三省吾码3 小时前
JVM 性能调优
java
stm 学习ing3 小时前
FPGA 第十讲 避免latch的产生
c语言·开发语言·单片机·嵌入式硬件·fpga开发·fpga
湫ccc4 小时前
《Python基础》之字符串格式化输出
开发语言·python
弗拉唐4 小时前
springBoot,mp,ssm整合案例
java·spring boot·mybatis