1、同步mysql:OS errno 24 - Too many open files
2023-11-20 12:30:04.371 [job-0] ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-07], Description:[读取数据库数据失败. 请检查您的配置的 column/table/where/querySql或者向 DBA 寻求帮助.]. - 执行的SQL为: select a.archive_code,a.archive_name,FROM_UNIXTIME(a.archive_file_time/1000,'%Y-%m-%d'),c.contract_code,c.contract_name,FROM_UNIXTIME(c.start_time/1000,'%Y-%m-%d'),FROM_UNIXTIME(c.finish_time/1000,'%Y-%m-%d'),b.brand_name,s.subject_name,co.contract_id,co.customer_code,o.opposites_name,f.field_value,cast(c.contract_type as char),cc.category_name,FROM_UNIXTIME(c.create_time/1000,'%Y-%m-%d'),cc.category_code,x.field_value from company_contract_archive a left join company_contract c on a.contract_id = c.contract_id left join company_brand b on c.brand_id = b.brand_id left join sign_subject s on c.sign_subject = s.subject_id left join company_contract_opposites co on co.contract_id = c.contract_id left join opposites o on co.opposites_id = o.opposites_id left join contract_basics_field_value f on f.contract_id = c.contract_id and f.field_name = '店铺编号' left join htquan_devops.contract_category cc on c.contract_type = cc.category_id left join contract_basics_field_value x on x.contract_id = c.contract_id and x.field_name = '销售地区' 具体错误信息为:java.sql.SQLException: Can't create/write to file '/tmp/MYJLaOfQ' (OS errno 24 - Too many open files)
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.rdbms.util.RdbmsException.asQueryException(RdbmsException.java:81) ~[na:na]
at com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader$Task.startRead(CommonRdbmsReader.java:237) ~[na:na]
at com.alibaba.datax.plugin.reader.mysqlreader.MysqlReader$Task.startRead(MysqlReader.java:81) ~[na:na]
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:57) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
querySql:查询结果打开文件较多
解决方案:来源表mysql 的开启了文件大小限制,需要在数据库中增加 open_files_limit 配置
或者优化querySql
2、Code:[HiveReader-12], Description:[文件类型目前不支持].
- 文件[hdfs://vm-lvmh-cdp-cdh02:8020/data/hive/warehouse/pcd_ods.db/ods_std_dmall_oms_sdb_ome_payments_delta/ds=20230202/.hive-staging_hive_2023-02-03_01-15-57_375_6186868347167071378-29/-ext-10001/tmpstats-1]的类型与用户配置的fileType类型不一致,请确认您配置的目录下面所有文件的类型均为[parquet]
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
at com.alibaba.datax.plugin.reader.hivereader.DFSUtil.addSourceFileByType(DFSUtil.java:320)
at com.alibaba.datax.plugin.reader.hivereader.DFSUtil.getHDFSAllFilesNORegex(DFSU
解决方案:在集群上增加以下配置:
hive.insert.into.multilevel.dirs=true
hive.exec.stagingdir=/tmp/hive/staging/.hive-staging
3、datax:hive(hdfs)写时,datax任务异常终止,未删除临时目录,该任务下次启动时,从临时目录读取文件报错
解决方案:1、调整basic container 内存限制,防止datax任务被系统kill
2、任务重新运行时,先删除临时目录
4、data写ftp文件失败: 暂时不支持写入到根目录
代码已修复,未合并开源代码中
SftpHelperImpl中增加以下逻辑
String parentDir; int lastIndex = StringUtils.lastIndexOf(filePath, IOUtils.DIR_SEPARATOR); if(lastIndex<=0){ parentDir = filePath.substring(0,1); }else { parentDir = filePath.substring(0, StringUtils.lastIndexOf(filePath, IOUtils.DIR_SEPARATOR)); }
5、datax写入dorisDB报错
Caused by: java.io.IOException: Failed to flush data to StarRocks.{"Status":"Fail","Comment":"","BeginTxnTimeMs":0,"Message":"[INTERNAL_ERROR]too many filtered rows\n0. /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/common/stack_trace.cpp:302: StackTrace::tryCapture() @ 0x000000000ba70197 in /data/doris/be/lib/doris_be\n1. /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/common/stack_trace.h:0: doris::get_stack_trace[abi:cxx11]() @ 0x000000000ba6e72d in /data/doris/be/lib/doris_be\n2. /usr/local/software/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::Status doris::Status::Error(int, std::basic_string_view >) @ 0x000000000af07e2b in /data/doris/be/lib/doris_be\n3. /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/common/status.h:348: std::_Function_handler)::$_0>::_M_invoke(std::_Any_data const&, doris::RuntimeState*&&, doris::Status*&&) @ 0x000000000b961a09 in /data/doris/be/lib/doris_be\n4. /usr/local/software/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360: doris::FragmentMgr::_exec_actual(std::shared_ptr, std::function const&) @ 0x000000000b86b36c in
原因:数据中存在%时报错,官方代码已修复
// httpPut.setHeader("Content-Type", "application/x-www-form-urlencoded"); httpPut.setHeader("two_phase_commit", "false");
6、datax任务卡死,获取oracle连接时卡死
现象:同步任务重,speed为0,record为0
查看日志,执行完前置sql后,该线程就没有日志了,因此需要重点看下dump日志中该线程卡在什么地方
dump日志,发现线程卡oracle执行sql方法中
解决方案:具体卡住原因需要后续追踪下oracle数据库本身及源码,此处仅提出应对方案
1、oracle超时时间减小,默认是48h
2、任务设置超时重试
7、Datax导入Hive任务报 java.io.EOFException: Premature EOF: no length prefix available
详细错误日志:
2023-01-06 14:45:50.212 [0-0-0-writer] INFO HiveWriter$Task - write to file : [hdfs://dev-datakun-master-1:8020/user/simba/jhjdb/ods_eas_public_t_im_inventorybalance_df/ds=20230105__14e0bc9a_1424_4711_9415_f2790d582151/ods_eas_public_t_im_inventorybalance_df__b60e4145_7cf0_4407_9117_6e26f484c680]WARNING: DFSOutputStream ResponseProcessor exception for block BP-72300943-10.1.3.22-1655292685273:blk_1078063544_4322848
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
Jan 06, 2023 2:46:27 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer setupPipelineForAppendOrRecovery
WARNING: Error Recovery for block BP-72300943-10.1.3.22-1655292685273:blk_1078063544_4322848 in pipeline 10.1.3.173:50010, 10.1.3.55:50010, 10.1.3.5:50010: bad datanode 10.1.3.173:50010
2023-01-06 14:46:30.189 [job-0] INFO StandAloneJobContainerCommunicator - Total 339200 records, 157246868 bytes | Speed 4.81MB/s, 10985 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 2.264s | All Task WaitReaderTime 21.189s | Percentage 0.00%
Jan 06, 2023 2:46:36 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor run
WARNING: DFSOutputStream ResponseProcessor exception for block BP-72300943-10.1.3.22-1655292685273:blk_1078063544_4322849
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
Jan 06, 2023 2:46:36 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer setupPipelineForAppendOrRecovery
原因分析,通过java.io.EOFException: Premature EOF: no length prefix available 可以知道是HDFS服务端异常终止,导致客户端报错,而Datax中操作Hive时会写入HDFS,问题应该在该过程中。
检查HDFS服务端日志。日志位置,通过Datakun获得日志位置
到DataNode相关的几台服务器,打开日志打印,执行Datax任务,可以看到超时时间9000毫秒导致异常,应增大该时间
解决: 【解决】
8、Datax数据同步报"UnstructureStorageReaderUtil - 您尝试读取的列越界,源文件该行有 [9] 列,您尝试读取第 [11]列,数据详情[2023/6/7,690007.0F......"
运行Datax任务SFTP同步至Simba表任务报错
UnstructureStorageReaderUtil - 您尝试读取的列越界,源文件该行有 [9] 列,您尝试读取第 [11]列,数据详情[2023/6/7,690007.0F......
同时数据详情显示中文内容乱码
(1)源文件数组越界问题
复现场景包含元数据,但SFTP文件无法获取到元数据字段列表,然后手动配置字段列表,请求时前端字段索引会从10开始导致错误
(2)乱码问题
乱码由于服务器上文件字符集不属于默认的UTF-8,因此出现中文乱码
解决
(1)源文件数组越界问题
操作规避:无法获得元数据文件、文件不存在、源文件无表头时,则选择包含元数据,手动配置字段;可以获得元数据文件并包含元数据时,必须通过自动获得字段。
前端BUG修复:保证"复现场景包含元数据,但SFTP文件无法获取到元数据字段列表,然后手动配置字段列表,请求时前端字段索引会从10开始导致错误"场景下索引数值正常
临时修复:转脚本模式,手动修改索引值
(2)乱码问题
复制一个任务备份,将原任务转为脚本模式,修改reader.parameter.encoding:GBK