Credit Card Fraud Detection(信用卡欺诈检测数据集)

Credit Card Fraud Detection

Anonymized credit card transactions labeled as fraudulent or genuine

About Dataset

Context

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

Content

The dataset contains transactions made by credit cards in September 2013 by European cardholders.

This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

Update (03/05/2021)

A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book.

Acknowledgements

The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.

More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project

Please cite the following works:

Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015

Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon

Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE

Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)

Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing

Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019

Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019

Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook

Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics

信用卡欺诈检测

匿名信用卡交易被标记为欺诈或真实

关于数据集

上下文

信用卡公司能够识别欺诈性信用卡交易非常重要,这样客户就不会为他们未购买的商品付费。

内容

该数据集包含欧洲持卡人在 2013 年 9 月通过信用卡进行的交易。

该数据集显示两天内发生的交易,其中 284,807 笔交易中有 492 笔是欺诈交易。该数据集非常不平衡,正类(欺诈)占所有交易的 0.172%。它仅包含数值输入变量,这些变量是 PCA 转换的结果。遗憾的是,由于保密问题,我们无法提供原始特征和有关数据的更多背景信息。特征 V1、V2、...V28 是通过 PCA 获得的主要成分,唯一未通过 PCA 转换的特征是"时间"和"金额"。特征"时间"包含数据集中每笔交易与第一笔交易之间经过的秒数。特征"金额"是交易金额,该特征可用于依赖于示例的成本敏感学习。特征"类别"是响应变量,在发生欺诈时取值为 1,否则取值为 0。

考虑到类别不平衡率,我们建议使用精确度-召回率曲线下面积 (AUPRC) 来测量准确度。混淆矩阵准确度对于不平衡分类没有意义。

更新(2021 年 3 月 5 日)

作为信用卡欺诈检测机器学习实用手册的一部分,已发布了交易数据模拟器 - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html。我们邀请所有对欺诈检测数据集感兴趣的从业者也查看这个数据模拟器,以及书中介绍的信用卡欺诈检测方法。

致谢

数据集是在 Worldline 和布鲁塞尔自由大学 (ULB) 机器学习小组 (http://mlg.ulb.ac.be) 就大数据挖掘和欺诈检测展开的研究合作期间收集和分析的。

有关相关主题的当前和过去项目的更多详细信息,请访问 https://www.researchgate.net/project/Fraud-detection-5 和 DefeatFraud 项目页面

请引用以下作品:

Andrea Dal Pozzolo、Olivier Caelen、Reid A. Johnson 和 Gianluca Bontempi。使用欠采样校准不平衡分类的概率。在计算智能和数据挖掘研讨会 (CIDM) 中,IEEE,2015

Dal Pozzolo,Andrea;Caelen,Olivier;Le Borgne,Yann-Ael;Waterschoot,Serge;Bontempi,Gianluca。从实践者的角度学习信用卡欺诈检测的经验教训,应用专家系统,41,10,4915-4928,2014,Pergamon

Dal Pozzolo,Andrea;Boracchi,Giacomo;Caelen,Olivier;Alippi,Cesare;Bontempi,Gianluca。信用卡欺诈检测:一种现实的建模和一种新颖的学习策略,IEEE 神经网络和学习系统交易,29,8,3784-3797,2018,IEEE

Dal Pozzolo,Andrea 自适应机器学习用于信用卡欺诈检测 ULB MLG 博士论文(由 G. Bontempi 指导)

Carcillo,Fabrizio;Dal Pozzolo,Andrea;Le Borgne,Yann-Aël;Caelen,Olivier;Mazzer,Yannis;Bontempi,Gianluca。 Scarff:使用 Spark 进行流式信用卡欺诈检测的可扩展框架,信息融合,41,182-194,2018,Elsevier

Carcillo,Fabrizio;Le Borgne,Yann-Aël;Caelen,Olivier;Bontempi,Gianluca。流式主动学习策略用于现实生活中的信用卡欺诈检测:评估和可视化,国际数据科学与分析杂志,5,4,285-300,2018,Springer International Publishing

Bertrand Lebichot、Yann-Aël Le Borgne、Liyun He、Frederic Oblé、Gianluca Bontempi 信用卡欺诈检测的深度学习领域自适应技术,INNSBDDL 2019:大数据和深度学习的最新进展,第 78-88 页,2019 年

Fabrizio Carcillo、Yann-Aël Le Borgne、Olivier Caelen、Frederic Oblé、Gianluca Bontempi 将无监督和监督学习相结合用于信用卡欺诈检测信息科学,2019 年

Yann-Aël Le Borgne、Gianluca Bontempi 信用卡欺诈检测的可重复机器学习 - 实用手册

Bertrand Lebichot、Gianmarco Paldino、Wissam Siblini、Liyun He、Frederic Oblé、Gianluca Bontempi 信用卡欺诈检测的增量学习策略,《国际数据科学与分析杂志》

相关推荐
zew10409945882 天前
基于深度学习的手势识别系统设计
人工智能·深度学习·算法·数据集·pyqt·yolov5·训练模型
幽络源小助理3 天前
杂草YOLO系列数据集4000张
yolo·数据集
数据堂官方账号13 天前
数据驱动进化:AI Agent如何重构手机交互范式?
人工智能·智能手机·重构·数据集·ai大模型·ai agent
前网易架构师-高司机14 天前
停车场停车位数据集,标注停车位上是否有车,平均正确识别率99.5%,支持yolov5-11, coco json,darknet,xml格式标注
yolo·数据集·停车场·车位·停车
@HNUSTer19 天前
基于 GEE 的城市热岛效应分析——可视化地表温度 LST 与归一化植被指数 NDVI 的关联
云计算·数据集·遥感大数据·gee·云平台
@HNUSTer20 天前
基于 GEE 利用 Sentinel-2 数据反演叶绿素与冠层水分含量
云计算·数据集·遥感大数据·gee·云平台
云卷云舒___________1 个月前
【保姆级视频教程(二)】YOLOv12训练数据集构建:标签格式转换-划分-YAML 配置 避坑指南 | 小白也能轻松玩转目标检测!
人工智能·yolo·目标检测·数据集·ultralytics·小白教程·yolov12
@HNUSTer2 个月前
基于 GEE 计算研究区年均地表温度数据
云计算·数据集·遥感大数据·gee·云平台
前网易架构师-高司机2 个月前
木材表面缺陷检测数据集,支持YOLO+COCO JSON+PASICAL VOC XML+DARKNET格式标注信息,平均正确识别率95.0%
yolo·数据集·木材缺陷识别·木材表面缺陷
@HNUSTer2 个月前
基于 GEE 利用插值方法填补缺失影像
云计算·数据集·遥感大数据·gee·云平台