JAVA爬虫系列

准备工作

导入依赖

XML 复制代码
 <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.2</version>
        </dependency>

yml

XML 复制代码
logging:
  level:
    root: info
    com.lrm: debug

1.入门程序(获取到静态页面)

java 复制代码
package com.itheima.reggie.utils;


import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

/**
 * @Author lpc
 **/
public class CrawlerFirst {
    public static void main(String[] args) throws Exception {

        //1.打开浏览器,创建Httpclient对象
        CloseableHttpClient httpClient = HttpClients.createDefault();

       //2.输入网址,发起get请求创建HttpGet对象
        HttpGet httpGet = new HttpGet("http://www.itcast.cn");

        //3.按回车,发起请求,返回响应,使用Httpclient对象发起请求
        CloseableHttpResponse response = httpClient.execute(httpGet);
        //4.解析响应,获取数据
        //判斯状态码是否是200
        if (response.getStatusLine().getStatusCode()==200){
            HttpEntity httpEntity = response.getEntity();
            //获取前端静态页面
            String content = EntityUtils.toString(httpEntity,"utf8");
            System.out.println(content);
        }


    }
}

2.HttpClient---Get

java 复制代码
package com.itheima.reggie.utils;


import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

/**
 * @Author lpc
 * @Date 2024 03 12 00 23
 **/
public class CrawlerFirst {
    public static void main(String[] args){

        //1.打开浏览器,创建Httpclient对象
        CloseableHttpClient httpClient = HttpClients.createDefault();

       //2.输入网址,发起get请求创建HttpGet对象
        HttpGet httpGet = new HttpGet("http://www.itcast.cn");

        //3.按回车,发起请求,返回响应,使用Httpclient对象发起请求
        CloseableHttpResponse response = null;

        try {
            response = httpClient.execute(httpGet);
            //4.解析响应,获取数据
            //判斯状态码是否是200
            if (response.getStatusLine().getStatusCode()==200){
                HttpEntity httpEntity = response.getEntity();
                //获取前端静态页面
                String content = EntityUtils.toString(httpEntity,"utf8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            throw new RuntimeException(e);
        }finally {
            try {
                //关闭response
                response.close();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            try {
                //关闭浏览器
                httpClient.close();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }


    }
}
相关推荐
{Hello World}15 小时前
Java抽象类与接口深度解析
java·开发语言
AI视觉网奇15 小时前
ue5 自定义 actor ac++ actor 用法实战
java·c++·ue5
光明顶上的5G15 小时前
本地缓存面试重点
java·缓存·面试
haluhalu.16 小时前
深入理解Linux线程机制:线程概念,内存管理
java·linux·运维
jiaguangqingpanda16 小时前
Day22-20260118
java·开发语言
雪碧聊技术16 小时前
1、LangChain4j 名字的寓意
java·大模型·langchain4j
Ulyanov16 小时前
战场地形生成与多源数据集成
开发语言·python·算法·tkinter·pyside·pyvista·gui开发
风生u16 小时前
bpmn 的理解和元素
java·开发语言·工作流·bpmn
C+-C资深大佬16 小时前
C++数据类型
开发语言·c++·算法
ID_1800790547316 小时前
日本乐天商品详情API接口的请求构造与参数说明
开发语言·python·pandas