MySQL数据生成工具mysql_random_data_load

在看MySQL文章的时候偶然发现生成数据的工具,此处直接将软件作者的文档贴了过来,说明了使用方式及下载地址

Random data generator for MySQL

Many times in my job I need to generate random data for a specific table in order to reproduce an issue.

After writing many random generators for every table, I decided to write a random data generator, able to get the table structure and generate random data for it.

Plase take into consideration that this is the first version and it doesn't support all field types yet!

NOTICE

This is an early stage project.

Supported fields:

Field type Generated values
tinyint 0 ~ 0xFF
smallint 0 ~ 0XFFFF
mediumint 0 ~ 0xFFFFFF
int - integer 0 ~ 0xFFFFFFFF
bigint 0 ~ 0xFFFFFFFFFFFFFFFF
float 0 ~ 1e8
decimal(m,n) 0 ~ 10^(m-n)
double 0 ~ 1000
char(n) up to n random chars
varchar(n) up to n random chars
date NOW() - 1 year ~ NOW()
datetime NOW() - 1 year ~ NOW()
timestamp NOW() - 1 year ~ NOW()
time 00:00:00 ~ 23:59:59
year Current year - 1 ~ current year
tinyblob up to 100 chars random paragraph
tinytext up to 100 chars random paragraph
blob up to 100 chars random paragraph
text up to 100 chars random paragraph
mediumblob up to 100 chars random paragraph
mediumtext up to 100 chars random paragraph
longblob up to 100 chars random paragraph
longtext up to 100 chars random paragraph
varbinary up to 100 chars random paragraph
enum A random item from the valid items list
set A random item from the valid items list

How strings are generated

  • If field size < 10 the program generates a random "first name"
  • If the field size > 10 and < 30 the program generates a random "full name"
  • If the field size > 30 the program generates a "lorem ipsum" paragraph having up to 100 chars.

The program can detect if a field accepts NULLs and if it does, it will generate NULLs ramdomly (~ 10 % of the values).

Usage

mysql_random_data_load <database> <table> <number of rows> [options...]

Options

Option Description
--bulk-size Number of rows per INSERT statement (Default: 1000)
--debug Show some debug information
--fk-samples-factor Percentage used to get random samples for foreign keys fields. Default 0.3
--host Host name/ip
--max-fk-samples Maximum number of samples for fields having foreign keys constarints. Default: 100
--max-retries Maximum number of rows to retry in case of errors. See duplicated keys. Deafult: 100
--no-progressbar Skip showing the progress bar. Default: false
--password Password
--port Port number
--Print Print queries to the standard output instead of inserting them into the db
--user Username
--version Show version and exit

Foreign keys support

If a field has Foreign Keys constraints, random-data-load will get up to --max-fk-samples random samples from the referenced tables in order to insert valid values for the field.

The number of samples to get follows this rules:
1. Get the aproximate number of rows in the referenced table using the rows field in:

复制代码
EXPLAIN SELECT COUNT(*) FROM <referenced schema>.<referenced table>

1.1 If the number of rows is less than max-fk-samples, all rows are retrieved from the referenced table using this query:

复制代码
SELECT <referenced field> FROM <referenced schema>.<referenced table>

1.2 If the number of rows is greater than max-fk-samples, samples are retrieved from the referenced table using this query:

复制代码
SELECT <referenced field> FROM <referenced schema>.<referenced table> WHERE RAND() <= <fk-samples-factor> LIMIT <max-fk-samples>

Example

复制代码
CREATE DATABASE IF NOT EXISTS test;

CREATE TABLE `test`.`t3` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `tcol01` tinyint(4) DEFAULT NULL,
  `tcol02` smallint(6) DEFAULT NULL,
  `tcol03` mediumint(9) DEFAULT NULL,
  `tcol04` int(11) DEFAULT NULL,
  `tcol05` bigint(20) DEFAULT NULL,
  `tcol06` float DEFAULT NULL,
  `tcol07` double DEFAULT NULL,
  `tcol08` decimal(10,2) DEFAULT NULL,
  `tcol09` date DEFAULT NULL,
  `tcol10` datetime DEFAULT NULL,
  `tcol11` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `tcol12` time DEFAULT NULL,
  `tcol13` year(4) DEFAULT NULL,
  `tcol14` varchar(100) DEFAULT NULL,
  `tcol15` char(2) DEFAULT NULL,
  `tcol16` blob,
  `tcol17` text,
  `tcol18` mediumtext,
  `tcol19` mediumblob,
  `tcol20` longblob,
  `tcol21` longtext,
  `tcol22` mediumtext,
  `tcol23` varchar(3) DEFAULT NULL,
  `tcol24` varbinary(10) DEFAULT NULL,
  `tcol25` enum('a','b','c') DEFAULT NULL,
  `tcol26` set('red','green','blue') DEFAULT NULL,
  `tcol27` float(5,3) DEFAULT NULL,
  `tcol28` double(4,2) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

To generate 100K random rows, just run:

复制代码
mysql_random_data_load test t3 100000 --user=root --password=root

mysql> select * from t3 limit 1\G
*************************** 1. row ***************************
    id: 1
tcol01: 10
tcol02: 173
tcol03: 1700
tcol04: 13498
tcol05: 33239373
tcol06: 44846.4
tcol07: 5300.23
tcol08: 11360967.75
tcol09: 2017-09-04
tcol10: 2016-11-02 23:11:25
tcol11: 2017-03-03 08:11:40
tcol12: 03:19:39
tcol13: 2017
tcol14: repellat maxime nostrum provident maiores ut quo voluptas.
tcol15: Th
tcol16: Walter
tcol17: quo repellat accusamus quidem odi
tcol18: esse laboriosam nobis libero aut dolores e
tcol19: Carlos Willia
tcol20: et nostrum iusto ipsa sunt recusa
tcol21: a accusantium laboriosam voluptas facilis.
tcol22: laudantium quo unde molestiae consequatur magnam.
tcol23: Pet
tcol24: Richard
tcol25: c
tcol26: green
tcol27: 47.430
tcol28: 6.12
1 row in set (0.00 sec)

效果良好

How to download the precompiled binaries

There are binaries available for each version for Linux and Darwin. You can find compiled binaries for each version in the releases tab:

https://github.com/Percona-Lab/mysql_random_data_load/releases

相关推荐
特立独行的猫a8 小时前
PostgreSQL客户端工具介绍:从性能测试到跨平台管理
数据库·docker·postgresql·客户端·pgadmin4
微爱帮监所写信寄信8 小时前
微爱帮监狱写信寄信小程序:MySQL核心日志与备份恢复安全架构
数据库·mysql·小程序·邮局·监狱寄信·挂号信·邮政
isNotNullX9 小时前
数据迁移怎么做?有什么好用的数据库迁移工具推荐吗?
数据库·数字化·数据迁移·企业管理
云老大TG:@yunlaoda3609 小时前
华为云国际站代理商DAS的跨境合规适配的应用场景有哪些?
网络·数据库·华为云
38242782710 小时前
python3网络爬虫开发实战 第二版:绑定回调
开发语言·数据库·python
BIBI204910 小时前
Windows 上配置 Nacos Server 3.x.x 使用 MySQL 5.7
java·windows·spring boot·后端·mysql·nacos·配置
wniuniu_10 小时前
ceph的参数
java·数据库·ceph
一只专注api接口开发的技术猿10 小时前
智能决策数据源:利用 1688 商品详情 API 构建实时比价与供应链分析系统
大数据·前端·数据库
山峰哥10 小时前
SQL查询优化秘籍:从Explain分析到性能飞跃
开发语言·数据库·sql·oracle·性能优化·系统优化
刘一说10 小时前
MySQL 版本演进全景图:从 5.6 到 8.4 的技术变革与行业实践
数据库·mysql