【某音电商】protobuf聊天协议逆向

文章目录

  • [1. 写在前面](#1. 写在前面)
  • [2. 接口分析](#2. 接口分析)
  • [3. JS分析](#3. JS分析)
  • [4. proto文件实现](#4. proto文件实现)
  • [5. 源码实现](#5. 源码实现)

【🏠作者主页】:吴秋霖
【💼作者介绍】:擅长爬虫与JS加密逆向分析!Python领域优质创作者、CSDN博客专家、阿里云博客专家、华为云享专家。一路走来长期坚守并致力于Python与爬虫领域研究与开发工作!
【🌟作者推荐】:对爬虫领域以及JS逆向分析感兴趣的朋友可以关注《爬虫JS逆向实战》《深耕爬虫领域》
未来作者会持续更新所用到、学到、看到的技术知识!包括但不限于:各类验证码突防、爬虫APP与JS逆向分析、RPA自动化、分布式爬虫、Python领域等相关文章

作者声明:文章仅供学习交流与参考!严禁用于任何商业与非法用途!否则由此产生的一切后果均与作者无关!如有侵权,请联系作者本人进行删除!

1. 写在前面

本期分析一个某电商后台聊天会话实时消息明文数据解析!在很多IM、实时消息,请求体并不是JSON,而是Protobuf二进制流,请求头里面通常会出现application/x-protobuf,真正的业务字段不在URL里面,而是在二进制payload中,前端代码中往往可以看到encode|decodeRequestBody.create(...)一类逻辑

所以本期我们需要完成的事情主要包括:

构造Protobuf请求对象 -> 序列化二进制payload -> 再把响应内容反序列化 -> 最终将消息结构转为JSON

2. 接口分析

打开精选联盟后打开消息,有一个小插曲debugger,直接跳过即可,然后主要看get_by_conversation接口,获取聊天会话消息的,请求头中pigeon_sign参数固定测试也是可以的,然后msTokenX-Bogus两个参数一样(同之前分享的巨量文章中算法一致),如下所示:



看调用栈的话可以看到有很熟悉的文件名称(webmssdkbdms),protobuf相关的操作均在main.dbddfff3.js文件中,请求体也是经过序列化操作的,如下所示:

3. JS分析

进入到这个JS中可以看到实现了一整套完整的IM会话、消息Protobuf编解码实现,前端将JS对象编码成二进制发给后端,返回二进制再通过decode还原成明文对象渲染,如下所示:

其中uint32int64skipType这些就是为了将二进制翻译成我们能够看懂的明文对象

这里就是构造符合服务端要求的Protobuf请求包,cmd是拉取会话消息,conversation_id指定要拉取哪个会话,limit是拉取的条数以及token|inbox_type相关的鉴权字段,实现如下所示:

python 复制代码
def build_conversation_messages_request():
    request_envelope = message_pb2.RequestEnvelope(
        cmd=301,
        sequence_id=25275,
        sdk_version="0.3.1-dev.2",
        token="",
        refer=3,
        inbox_type=2,
        build_number="8cf4734:dev-0.3.1",
        device_platform="web",
    )

    conversation_request = request_envelope.body.conversation_messages
    conversation_request.conversation_id = "7564320634436666674" # 会话ID
    conversation_request.conversation_type = 2
    conversation_request.conversation_short_id = 7564320634436666674 # 会话ID
    conversation_request.direction = 1
    conversation_request.anchor_index = 0
    conversation_request.limit = 20
    return request_envelope

往下到Protobuf的解码位置,A是自定义的二进制读取器实例,s.im_proto.MessagesInConversationResponseBody是解码后的结构化数据,最后返回的t即明文数据,如下所示:


4. proto文件实现

其实上面的分析现在借助AI基本是很快就能还原出来,接下来需要对上面JS还原成.proto文件(新手可以理解为前后端约定好的数据结构模版),实现如下所示:

javascript 复制代码
syntax = "proto3";
package message;

message RequestEnvelope {
  int32 cmd = 1;
  int64 sequence_id = 2;
  string sdk_version = 3;
  string token = 4;
  int32 refer = 5;
  int32 inbox_type = 6;
  string build_number = 7;
  RequestPayload body = 8;
  string device_id = 9;
  string channel = 10;
  string device_platform = 11;
  string device_type = 12;
  string os_version = 13;
  string version_code = 14;
  map<string, string> headers = 15;
  int32 config_id = 16;
  TokenInfo token_info = 17;
  int32 auth_type = 18;
}

message TokenInfo {
  int32 mark_id = 1;
  int32 type = 2;
  int32 app_id = 3;
  int64 user_id = 4;
  int64 timestamp = 5;
}

message RequestPayload {
  SendMessageRequest send_message = 100;
  ConversationMessagesRequest conversation_messages = 301;
}

message SendMessageRequest {
  string conversation_id = 1;
  int32 conversation_type = 2;
  int64 conversation_short_id = 3;
  string content = 4;
  map<string, string> ext = 5;
  int32 message_type = 6;
  string ticket = 7;
  string client_message_id = 8;
  repeated int64 mentioned_users = 9;
  QuotedMessage quoted_message = 11;
}

message QuotedMessage {
  int64 referenced_message_id = 1;
  string hint = 2;
  int64 root_message_id = 3;
  int64 root_message_conv_index = 4;
}

message ConversationMessagesRequest {
  string conversation_id = 1;
  int32 conversation_type = 2;
  int64 conversation_short_id = 3;
  int32 direction = 4;
  int64 anchor_index = 5;
  int32 limit = 6;
}

message ResponseEnvelope {
  int32 cmd = 1;
  int64 sequence_id = 2;
  int32 status_code = 3;
  string error_desc = 4;
  int32 inbox_type = 5;
  ResponsePayload body = 6;
  string log_id = 7;
  map<string, string> headers = 8;
  int64 start_time_stamp = 9;
  int64 request_arrived_time = 10;
  int64 server_execution_end_time = 11;
}

message ResponsePayload {
  SendMessageResponse send_message = 100;
  ConversationMessagesResponse conversation_messages = 301;
}

message SendMessageResponse {
  int64 server_message_id = 1;
  string extra_info = 2;
  int32 status = 3;
  string client_message_id = 4;
  int64 check_code = 5;
  string check_message = 6;
}

message ConversationMessagesResponse {
  repeated ConversationMessage messages = 1;
  int64 next_cursor = 2;
  bool has_more = 3;
}

message ConversationMessage {
  string conversation_id = 1;
  int32 conversation_type = 2;
  int64 server_message_id = 3;
  int64 index_in_conversation = 4;
  int64 conversation_short_id = 5;
  int32 message_type = 6;
  int64 sender = 7;
  string content = 8;
  map<string, string> ext = 9;
  int64 create_time = 10;
  int64 version = 11;
  int32 status = 12;
  int64 order_in_conversation = 13;
  string sec_sender = 14;
  map<string, MessagePropertyList> property_list = 15;
  MessageReference reference = 18;
}

message MessageProperty {
  int64 uid = 1;
  string sec_uid = 2;
  int64 create_time = 3;
  string idempotent_id = 4;
  string value = 5;
}

message MessagePropertyList {
  repeated MessageProperty entries = 1;
}

message MessageReference {
  int64 referenced_message_id = 1;
  string hint = 2;
  int64 ref_message_type = 3;
  int32 referenced_message_status = 4;
  int64 root_message_id = 5;
  int64 root_message_conv_index = 6;
}

接下来使用protoc.proto文件翻译成各种语言的代码(比如Py),它会读取我们上面写的结构自动生成能在Py里直接使用的类文件,最终生成*_pb2.py,如下所示:

bash 复制代码
protoc --python_out=. ./*.proto
python 复制代码
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# NO CHECKED-IN PROTOBUF GENCODE
# source: msg.proto
# Protobuf Python Version: 5.29.3
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import runtime_version as _runtime_version
from google.protobuf import symbol_database as _symbol_database
from google.protobuf.internal import builder as _builder
_runtime_version.ValidateProtobufRuntimeVersion(
    _runtime_version.Domain.PUBLIC,
    5,
    29,
    3,
    '',
    'msg.proto'
)
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()

DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\tmsg.proto\x12\x07message\"\xe9\x03\n\x0fRequestEnvelope\x12\x0b\n\x03\x63md\x18\x01 \x01(\x05\x12\x13\n\x0bsequence_id\x18\x02 \x01(\x03\x12\x13\n\x0bsdk_version\x18\x03 \x01(\t\x12\r\n\x05token\x18\x04 \x01(\t\x12\r\n\x05refer\x18\x05 \x01(\x05\x12\x12\n\ninbox_type\x18\x06 \x01(\x05\x12\x14\n\x0c\x62uild_number\x18\x07 \x01(\t\x12%\n\x04\x62ody\x18\x08 \x01(\x0b\x32\x17.message.RequestPayload\x12\x11\n\tdevice_id\x18\t \x01(\t\x12\x0f\n\x07\x63hannel\x18\n \x01(\t\x12\x17\n\x0f\x64\x65vice_platform\x18\x0b \x01(\t\x12\x13\n\x0b\x64\x65vice_type\x18\x0c \x01(\t\x12\x12\n\nos_version\x18\r \x01(\t\x12\x14\n\x0cversion_code\x18\x0e \x01(\t\x12\x36\n\x07headers\x18\x0f \x03(\x0b\x32%.message.RequestEnvelope.HeadersEntry\x12\x11\n\tconfig_id\x18\x10 \x01(\x05\x12&\n\ntoken_info\x18\x11 \x01(\x0b\x32\x12.message.TokenInfo\x12\x11\n\tauth_type\x18\x12 \x01(\x05\x1a.\n\x0cHeadersEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"^\n\tTokenInfo\x12\x0f\n\x07mark_id\x18\x01 \x01(\x05\x12\x0c\n\x04type\x18\x02 \x01(\x05\x12\x0e\n\x06\x61pp_id\x18\x03 \x01(\x05\x12\x0f\n\x07user_id\x18\x04 \x01(\x03\x12\x11\n\ttimestamp\x18\x05 \x01(\x03\"\x89\x01\n\x0eRequestPayload\x12\x31\n\x0csend_message\x18\x64 \x01(\x0b\x32\x1b.message.SendMessageRequest\x12\x44\n\x15\x63onversation_messages\x18\xad\x02 \x01(\x0b\x32$.message.ConversationMessagesRequest\"\xe1\x02\n\x12SendMessageRequest\x12\x17\n\x0f\x63onversation_id\x18\x01 \x01(\t\x12\x19\n\x11\x63onversation_type\x18\x02 \x01(\x05\x12\x1d\n\x15\x63onversation_short_id\x18\x03 \x01(\x03\x12\x0f\n\x07\x63ontent\x18\x04 \x01(\t\x12\x31\n\x03\x65xt\x18\x05 \x03(\x0b\x32$.message.SendMessageRequest.ExtEntry\x12\x14\n\x0cmessage_type\x18\x06 \x01(\x05\x12\x0e\n\x06ticket\x18\x07 \x01(\t\x12\x19\n\x11\x63lient_message_id\x18\x08 \x01(\t\x12\x17\n\x0fmentioned_users\x18\t \x03(\x03\x12.\n\x0equoted_message\x18\x0b \x01(\x0b\x32\x16.message.QuotedMessage\x1a*\n\x08\x45xtEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"v\n\rQuotedMessage\x12\x1d\n\x15referenced_message_id\x18\x01 \x01(\x03\x12\x0c\n\x04hint\x18\x02 \x01(\t\x12\x17\n\x0froot_message_id\x18\x03 \x01(\x03\x12\x1f\n\x17root_message_conv_index\x18\x04 \x01(\x03\"\xa8\x01\n\x1b\x43onversationMessagesRequest\x12\x17\n\x0f\x63onversation_id\x18\x01 \x01(\t\x12\x19\n\x11\x63onversation_type\x18\x02 \x01(\x05\x12\x1d\n\x15\x63onversation_short_id\x18\x03 \x01(\x03\x12\x11\n\tdirection\x18\x04 \x01(\x05\x12\x14\n\x0c\x61nchor_index\x18\x05 \x01(\x03\x12\r\n\x05limit\x18\x06 \x01(\x05\"\xed\x02\n\x10ResponseEnvelope\x12\x0b\n\x03\x63md\x18\x01 \x01(\x05\x12\x13\n\x0bsequence_id\x18\x02 \x01(\x03\x12\x13\n\x0bstatus_code\x18\x03 \x01(\x05\x12\x12\n\nerror_desc\x18\x04 \x01(\t\x12\x12\n\ninbox_type\x18\x05 \x01(\x05\x12&\n\x04\x62ody\x18\x06 \x01(\x0b\x32\x18.message.ResponsePayload\x12\x0e\n\x06log_id\x18\x07 \x01(\t\x12\x37\n\x07headers\x18\x08 \x03(\x0b\x32&.message.ResponseEnvelope.HeadersEntry\x12\x18\n\x10start_time_stamp\x18\t \x01(\x03\x12\x1c\n\x14request_arrived_time\x18\n \x01(\x03\x12!\n\x19server_execution_end_time\x18\x0b \x01(\x03\x1a.\n\x0cHeadersEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"\x8c\x01\n\x0fResponsePayload\x12\x32\n\x0csend_message\x18\x64 \x01(\x0b\x32\x1c.message.SendMessageResponse\x12\x45\n\x15\x63onversation_messages\x18\xad\x02 \x01(\x0b\x32%.message.ConversationMessagesResponse\"\x9a\x01\n\x13SendMessageResponse\x12\x19\n\x11server_message_id\x18\x01 \x01(\x03\x12\x12\n\nextra_info\x18\x02 \x01(\t\x12\x0e\n\x06status\x18\x03 \x01(\x05\x12\x19\n\x11\x63lient_message_id\x18\x04 \x01(\t\x12\x12\n\ncheck_code\x18\x05 \x01(\x03\x12\x15\n\rcheck_message\x18\x06 \x01(\t\"u\n\x1c\x43onversationMessagesResponse\x12.\n\x08messages\x18\x01 \x03(\x0b\x32\x1c.message.ConversationMessage\x12\x13\n\x0bnext_cursor\x18\x02 \x01(\x03\x12\x10\n\x08has_more\x18\x03 \x01(\x08\"\xea\x04\n\x13\x43onversationMessage\x12\x17\n\x0f\x63onversation_id\x18\x01 \x01(\t\x12\x19\n\x11\x63onversation_type\x18\x02 \x01(\x05\x12\x19\n\x11server_message_id\x18\x03 \x01(\x03\x12\x1d\n\x15index_in_conversation\x18\x04 \x01(\x03\x12\x1d\n\x15\x63onversation_short_id\x18\x05 \x01(\x03\x12\x14\n\x0cmessage_type\x18\x06 \x01(\x05\x12\x0e\n\x06sender\x18\x07 \x01(\x03\x12\x0f\n\x07\x63ontent\x18\x08 \x01(\t\x12\x32\n\x03\x65xt\x18\t \x03(\x0b\x32%.message.ConversationMessage.ExtEntry\x12\x13\n\x0b\x63reate_time\x18\n \x01(\x03\x12\x0f\n\x07version\x18\x0b \x01(\x03\x12\x0e\n\x06status\x18\x0c \x01(\x05\x12\x1d\n\x15order_in_conversation\x18\r \x01(\x03\x12\x12\n\nsec_sender\x18\x0e \x01(\t\x12\x45\n\rproperty_list\x18\x0f \x03(\x0b\x32..message.ConversationMessage.PropertyListEntry\x12,\n\treference\x18\x12 \x01(\x0b\x32\x19.message.MessageReference\x1a*\n\x08\x45xtEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\x1aQ\n\x11PropertyListEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12+\n\x05value\x18\x02 \x01(\x0b\x32\x1c.message.MessagePropertyList:\x02\x38\x01\"j\n\x0fMessageProperty\x12\x0b\n\x03uid\x18\x01 \x01(\x03\x12\x0f\n\x07sec_uid\x18\x02 \x01(\t\x12\x13\n\x0b\x63reate_time\x18\x03 \x01(\x03\x12\x15\n\ridempotent_id\x18\x04 \x01(\t\x12\r\n\x05value\x18\x05 \x01(\t\"@\n\x13MessagePropertyList\x12)\n\x07\x65ntries\x18\x01 \x03(\x0b\x32\x18.message.MessageProperty\"\xb6\x01\n\x10MessageReference\x12\x1d\n\x15referenced_message_id\x18\x01 \x01(\x03\x12\x0c\n\x04hint\x18\x02 \x01(\t\x12\x18\n\x10ref_message_type\x18\x03 \x01(\x03\x12!\n\x19referenced_message_status\x18\x04 \x01(\x05\x12\x17\n\x0froot_message_id\x18\x05 \x01(\x03\x12\x1f\n\x17root_message_conv_index\x18\x06 \x01(\x03\x62\x06proto3')

_globals = globals()
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals)
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'msg_pb2', _globals)
if not _descriptor._USE_C_DESCRIPTORS:
  DESCRIPTOR._loaded_options = None
  _globals['_REQUESTENVELOPE_HEADERSENTRY']._loaded_options = None
  _globals['_REQUESTENVELOPE_HEADERSENTRY']._serialized_options = b'8\001'
  _globals['_SENDMESSAGEREQUEST_EXTENTRY']._loaded_options = None
  _globals['_SENDMESSAGEREQUEST_EXTENTRY']._serialized_options = b'8\001'
  _globals['_RESPONSEENVELOPE_HEADERSENTRY']._loaded_options = None
  _globals['_RESPONSEENVELOPE_HEADERSENTRY']._serialized_options = b'8\001'
  _globals['_CONVERSATIONMESSAGE_EXTENTRY']._loaded_options = None
  _globals['_CONVERSATIONMESSAGE_EXTENTRY']._serialized_options = b'8\001'
  _globals['_CONVERSATIONMESSAGE_PROPERTYLISTENTRY']._loaded_options = None
  _globals['_CONVERSATIONMESSAGE_PROPERTYLISTENTRY']._serialized_options = b'8\001'
  _globals['_REQUESTENVELOPE']._serialized_start=23
  _globals['_REQUESTENVELOPE']._serialized_end=512
  _globals['_REQUESTENVELOPE_HEADERSENTRY']._serialized_start=466
  _globals['_REQUESTENVELOPE_HEADERSENTRY']._serialized_end=512
  _globals['_TOKENINFO']._serialized_start=514
  _globals['_TOKENINFO']._serialized_end=608
  _globals['_REQUESTPAYLOAD']._serialized_start=611
  _globals['_REQUESTPAYLOAD']._serialized_end=748
  _globals['_SENDMESSAGEREQUEST']._serialized_start=751
  _globals['_SENDMESSAGEREQUEST']._serialized_end=1104
  _globals['_SENDMESSAGEREQUEST_EXTENTRY']._serialized_start=1062
  _globals['_SENDMESSAGEREQUEST_EXTENTRY']._serialized_end=1104
  _globals['_QUOTEDMESSAGE']._serialized_start=1106
  _globals['_QUOTEDMESSAGE']._serialized_end=1224
  _globals['_CONVERSATIONMESSAGESREQUEST']._serialized_start=1227
  _globals['_CONVERSATIONMESSAGESREQUEST']._serialized_end=1395
  _globals['_RESPONSEENVELOPE']._serialized_start=1398
  _globals['_RESPONSEENVELOPE']._serialized_end=1763
  _globals['_RESPONSEENVELOPE_HEADERSENTRY']._serialized_start=466
  _globals['_RESPONSEENVELOPE_HEADERSENTRY']._serialized_end=512
  _globals['_RESPONSEPAYLOAD']._serialized_start=1766
  _globals['_RESPONSEPAYLOAD']._serialized_end=1906
  _globals['_SENDMESSAGERESPONSE']._serialized_start=1909
  _globals['_SENDMESSAGERESPONSE']._serialized_end=2063
  _globals['_CONVERSATIONMESSAGESRESPONSE']._serialized_start=2065
  _globals['_CONVERSATIONMESSAGESRESPONSE']._serialized_end=2182
  _globals['_CONVERSATIONMESSAGE']._serialized_start=2185
  _globals['_CONVERSATIONMESSAGE']._serialized_end=2803
  _globals['_CONVERSATIONMESSAGE_EXTENTRY']._serialized_start=1062
  _globals['_CONVERSATIONMESSAGE_EXTENTRY']._serialized_end=1104
  _globals['_CONVERSATIONMESSAGE_PROPERTYLISTENTRY']._serialized_start=2722
  _globals['_CONVERSATIONMESSAGE_PROPERTYLISTENTRY']._serialized_end=2803
  _globals['_MESSAGEPROPERTY']._serialized_start=2805
  _globals['_MESSAGEPROPERTY']._serialized_end=2911
  _globals['_MESSAGEPROPERTYLIST']._serialized_start=2913
  _globals['_MESSAGEPROPERTYLIST']._serialized_end=2977
  _globals['_MESSAGEREFERENCE']._serialized_start=2980
  _globals['_MESSAGEREFERENCE']._serialized_end=3162
# @@protoc_insertion_point(module_scope)

5. 源码实现

最后再使用Py实现对二进制数据的解析以此拿到明文数据,完整的代码实现如下:

python 复制代码
import json
from datetime import datetime
from urllib.parse import unquote

import requests
from loguru import logger

import message_pb2

REQUEST_HEADERS = {} # 自行获取

REQUEST_URL = "https://imapi.jinritemai.com/v1/message/get_by_conversation"
REQUEST_QUERY = {
    "x-use-ppe": "1",
    "x-tt-env": "prod",
    "PIGEON_BIZ_TYPE": "5",
    "pigeon_source": "web",
    "pigeon_sign": "",
    "msToken": "",
    "X-Bogus": "",
}

def build_conversation_messages_request():
    request_envelope = message_pb2.RequestEnvelope(
        cmd=301,
        sequence_id=25275,
        sdk_version="0.3.1-dev.2",
        token="",
        refer=3,
        inbox_type=2,
        build_number="8cf4734:dev-0.3.1",
        device_platform="web",
    )

    conversation_request = request_envelope.body.conversation_messages
    conversation_request.conversation_id = "7564320634436666674" # 会话ID
    conversation_request.conversation_type = 2
    conversation_request.conversation_short_id = 7564320634436666674 # 会话ID
    conversation_request.direction = 1
    conversation_request.anchor_index = 0
    conversation_request.limit = 20
    return request_envelope


def parse_json_if_possible(raw_value):
    if not raw_value:
        return None

    decoded_value = unquote(raw_value)
    candidates = [decoded_value] if decoded_value != raw_value else []
    candidates.append(raw_value)

    for candidate in candidates:
        try:
            return json.loads(candidate)
        except json.JSONDecodeError:
            continue

    return None


def format_timestamp(timestamp_ms):
    if not timestamp_ms:
        return None

    try:
        return datetime.fromtimestamp(timestamp_ms / 1000).strftime("%Y-%m-%d %H:%M:%S")
    except Exception:
        return None


def to_plain_ext_map(ext_map):
    if hasattr(ext_map, "items"):
        return dict(ext_map.items())
    if isinstance(ext_map, dict):
        return dict(ext_map)
    return {item.key: item.value for item in ext_map}


def normalize_message_properties(property_map):
    if not property_map:
        return {}

    normalized_property_map = {}
    property_items = property_map.items() if hasattr(property_map, "items") else property_map
    for property_key, property_list in property_items:
        normalized_property_map[property_key] = [
            {
                "uid": int(property_item.uid),
                "sec_uid": property_item.sec_uid,
                "create_time": int(property_item.create_time),
                "idempotent_id": property_item.idempotent_id,
                "value": property_item.value,
            }
            for property_item in property_list.entries
        ]
    return normalized_property_map


def normalize_message_reference(message_reference):
    if message_reference is None:
        return None

    if not getattr(message_reference, "referenced_message_id", 0):
        return None

    return {
        "referenced_message_id": int(message_reference.referenced_message_id),
        "hint": message_reference.hint,
        "ref_message_type": int(message_reference.ref_message_type),
        "referenced_message_status": message_reference.referenced_message_status,
        "root_message_id": int(message_reference.root_message_id),
        "root_message_conv_index": int(message_reference.root_message_conv_index),
    }


def normalize_message(conversation_message):
    ext_map = to_plain_ext_map(conversation_message.ext)
    parsed_content = parse_json_if_possible(conversation_message.content)
    parsed_business_ext = parse_json_if_possible(ext_map.get("biz_ext"))
    parsed_track_info = parse_json_if_possible(ext_map.get("track_info"))
    decoded_content = unquote(conversation_message.content)

    return {
        "server_message_id": int(conversation_message.server_message_id),
        "conversation_id": conversation_message.conversation_id,
        "conversation_short_id": int(conversation_message.conversation_short_id),
        "index_in_conversation": int(conversation_message.index_in_conversation),
        "order_in_conversation": int(conversation_message.order_in_conversation),
        "message_type": conversation_message.message_type,
        "message_kind": ext_map.get("type"),
        "sender_uid": int(conversation_message.sender),
        "sender_id": ext_map.get("sender_id"),
        "receiver_id": ext_map.get("receiver_id"),
        "client_message_id": ext_map.get("s:client_message_id"),
        "source_type": ext_map.get("source_type"),
        "from_source": ext_map.get("p:from_source"),
        "is_stranger": ext_map.get("s:is_stranger"),
        "card_type": ext_map.get("card_type"),
        "create_time_ms": int(conversation_message.create_time),
        "create_time": format_timestamp(int(conversation_message.create_time)),
        "content": conversation_message.content,
        "decoded_content": decoded_content,
        "content_obj": parsed_content,
        "biz_ext": parsed_business_ext,
        "track_info": parsed_track_info,
        "property_list": normalize_message_properties(conversation_message.property_list),
        "reference_info": normalize_message_reference(conversation_message.reference),
    }


def main():
    request_envelope = build_conversation_messages_request()
    payload = request_envelope.SerializeToString()
    request_headers = {**REQUEST_HEADERS, "content-length": str(len(payload))}

    http_response = requests.post(
        REQUEST_URL,
        headers=request_headers,
        params=REQUEST_QUERY,
        data=payload,
        cookies=REQUEST_COOKIES,
        timeout=30,
    )

    print(
        json.dumps(
            {
                "http_status": http_response.status_code,
                "content_type": http_response.headers.get("content-type", ""),
                "content_length": len(http_response.content),
            },
            ensure_ascii=False,
            indent=2,
        )
    )

    response_envelope = message_pb2.ResponseEnvelope()
    try:
        response_envelope.ParseFromString(http_response.content)
    except Exception as exc:
        print(f"protobuf parse failed: {exc}")
        print(http_response.text[:1000])
        raise


    conversation_messages = response_envelope.body.conversation_messages
    normalized_messages = [normalize_message(message) for message in conversation_messages.messages]

    logger.success(
        json.dumps(
            normalized_messages,
            ensure_ascii=False,
            indent=2,
        )
    )


if __name__ == "__main__":
    main()

在页面中拿一个会话ID运行测试i可以看到解析数据JSON中的消息内容与网页聊天窗口内信息一致。如下所示:


相关推荐
深藏功yu名1 小时前
Day24:向量数据库 Chroma_FAISS 入门
数据库·人工智能·python·ai·agent·faiss·chroma
m0_587958952 小时前
C++中的命令模式变体
开发语言·c++·算法
似水এ᭄往昔2 小时前
【数据结构】--链表OJ
数据结构·算法·链表
cm6543202 小时前
用Python破解简单的替换密码
jvm·数据库·python
2501_924952692 小时前
代码生成器优化策略
开发语言·c++·算法
MORE_772 小时前
leecode100-划分区间-贪心算法
算法·贪心算法
wan9yu2 小时前
为什么你需要给 LLM 的数据"加密"而不是"脱敏"?我写了一个开源工具
python
摇滚侠2 小时前
你是一名 java 程序员,总结定义数组的方式
java·开发语言·python
Book思议-2 小时前
【数据结构实战】C语言实现栈的链式存储:从初始化到销毁,手把手教你写可运行代码
数据结构·算法·链表··408