Credit Card Fraud Detection(信用卡欺诈检测数据集)

Credit Card Fraud Detection

Anonymized credit card transactions labeled as fraudulent or genuine

About Dataset

Context

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

Content

The dataset contains transactions made by credit cards in September 2013 by European cardholders.

This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

Update (03/05/2021)

A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book.

Acknowledgements

The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.

More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project

Please cite the following works:

Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015

Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon

Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE

Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)

Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing

Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019

Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019

Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook

Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics

信用卡欺诈检测

匿名信用卡交易被标记为欺诈或真实

关于数据集

上下文

信用卡公司能够识别欺诈性信用卡交易非常重要,这样客户就不会为他们未购买的商品付费。

内容

该数据集包含欧洲持卡人在 2013 年 9 月通过信用卡进行的交易。

该数据集显示两天内发生的交易,其中 284,807 笔交易中有 492 笔是欺诈交易。该数据集非常不平衡,正类(欺诈)占所有交易的 0.172%。它仅包含数值输入变量,这些变量是 PCA 转换的结果。遗憾的是,由于保密问题,我们无法提供原始特征和有关数据的更多背景信息。特征 V1、V2、...V28 是通过 PCA 获得的主要成分,唯一未通过 PCA 转换的特征是"时间"和"金额"。特征"时间"包含数据集中每笔交易与第一笔交易之间经过的秒数。特征"金额"是交易金额,该特征可用于依赖于示例的成本敏感学习。特征"类别"是响应变量,在发生欺诈时取值为 1,否则取值为 0。

考虑到类别不平衡率,我们建议使用精确度-召回率曲线下面积 (AUPRC) 来测量准确度。混淆矩阵准确度对于不平衡分类没有意义。

更新(2021 年 3 月 5 日)

作为信用卡欺诈检测机器学习实用手册的一部分,已发布了交易数据模拟器 - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html。我们邀请所有对欺诈检测数据集感兴趣的从业者也查看这个数据模拟器,以及书中介绍的信用卡欺诈检测方法。

致谢

数据集是在 Worldline 和布鲁塞尔自由大学 (ULB) 机器学习小组 (http://mlg.ulb.ac.be) 就大数据挖掘和欺诈检测展开的研究合作期间收集和分析的。

有关相关主题的当前和过去项目的更多详细信息,请访问 https://www.researchgate.net/project/Fraud-detection-5 和 DefeatFraud 项目页面

请引用以下作品:

Andrea Dal Pozzolo、Olivier Caelen、Reid A. Johnson 和 Gianluca Bontempi。使用欠采样校准不平衡分类的概率。在计算智能和数据挖掘研讨会 (CIDM) 中,IEEE,2015

Dal Pozzolo,Andrea;Caelen,Olivier;Le Borgne,Yann-Ael;Waterschoot,Serge;Bontempi,Gianluca。从实践者的角度学习信用卡欺诈检测的经验教训,应用专家系统,41,10,4915-4928,2014,Pergamon

Dal Pozzolo,Andrea;Boracchi,Giacomo;Caelen,Olivier;Alippi,Cesare;Bontempi,Gianluca。信用卡欺诈检测:一种现实的建模和一种新颖的学习策略,IEEE 神经网络和学习系统交易,29,8,3784-3797,2018,IEEE

Dal Pozzolo,Andrea 自适应机器学习用于信用卡欺诈检测 ULB MLG 博士论文(由 G. Bontempi 指导)

Carcillo,Fabrizio;Dal Pozzolo,Andrea;Le Borgne,Yann-Aël;Caelen,Olivier;Mazzer,Yannis;Bontempi,Gianluca。 Scarff:使用 Spark 进行流式信用卡欺诈检测的可扩展框架,信息融合,41,182-194,2018,Elsevier

Carcillo,Fabrizio;Le Borgne,Yann-Aël;Caelen,Olivier;Bontempi,Gianluca。流式主动学习策略用于现实生活中的信用卡欺诈检测:评估和可视化,国际数据科学与分析杂志,5,4,285-300,2018,Springer International Publishing

Bertrand Lebichot、Yann-Aël Le Borgne、Liyun He、Frederic Oblé、Gianluca Bontempi 信用卡欺诈检测的深度学习领域自适应技术,INNSBDDL 2019:大数据和深度学习的最新进展,第 78-88 页,2019 年

Fabrizio Carcillo、Yann-Aël Le Borgne、Olivier Caelen、Frederic Oblé、Gianluca Bontempi 将无监督和监督学习相结合用于信用卡欺诈检测信息科学,2019 年

Yann-Aël Le Borgne、Gianluca Bontempi 信用卡欺诈检测的可重复机器学习 - 实用手册

Bertrand Lebichot、Gianmarco Paldino、Wissam Siblini、Liyun He、Frederic Oblé、Gianluca Bontempi 信用卡欺诈检测的增量学习策略,《国际数据科学与分析杂志》

相关推荐
weixin_468466851 天前
医学影像数据集汇总分享
深度学习·目标检测·数据集·图像分割·机器视觉·医学影像·ct影像
数据岛12 天前
大模型应用的数字能源数据集
大数据·数据分析·数据集·能源
知来者逆15 天前
Octo—— 基于80万个机器人轨迹的预训练数据集用于训练通用机器人,可在零次拍摄中解决各种任务
人工智能·机器学习·机器人·数据集·大语言模型
数据猎手小k16 天前
EmoAva:首个大规模、高质量的文本到3D表情映射数据集。
人工智能·算法·3d·数据集·机器学习数据集·ai大模型应用
数据猎手小k19 天前
GEOBench-VLM:专为地理空间任务设计的视觉-语言模型基准测试数据集
人工智能·语言模型·自然语言处理·数据集·机器学习数据集·ai大模型应用
dundunmm19 天前
论文阅读之方法: Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris
论文阅读·数据挖掘·数据集·聚类·单细胞·细胞聚类·细胞测序
数据猎手小k19 天前
BioDeepAV:一个多模态基准数据集,包含超过1600个深度伪造视频,用于评估深度伪造检测器在面对未知生成器时的性能。
人工智能·算法·数据集·音视频·机器学习数据集·ai大模型应用
数据猎手小k20 天前
HNTS-MRG 2024 Challenge:是一个包含200个头颈癌病例的磁共振图像及其标注的公开数据集,旨在推动AI在头颈癌放射治疗自动分割领域的研究。
人工智能·数据集·机器学习数据集·ai大模型应用
数据猎手小k1 个月前
OSPTrack:一个包含多个生态系统中软件包执行时生成的静态和动态特征的标记数据集,用于识别开源软件中的恶意行为。
数据集·开源软件·机器学习数据集·ai大模型应用
HyperAI超神经1 个月前
NeurIPS 2024 有效投稿达 15,671 篇,数据集版块内容丰富
人工智能·开源·自动驾驶·数据集·多模态·化学光谱·neurips 2024