Fill database tables with a large amount of test data - mysql

I need to load a table with a large amount of test data. This is to be used for testing performance and scaling.
How can I easily create 100,000 rows of random/junk data for my database table?

You could also use a stored procedure. Consider the following table as an example:
CREATE TABLE your_table (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, val int);
Then you could add a stored procedure like this:
DELIMITER $$
CREATE PROCEDURE prepare_data()
BEGIN
DECLARE i INT DEFAULT 100;
WHILE i < 100000 DO
INSERT INTO your_table (val) VALUES (i);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
When you call it, you'll have 100k records:
CALL prepare_data();

For multiple row cloning (data duplication) you could use
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i < 100000 DO
INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
SELECT `user_id`, `page_id`, `name`, `description`, `created`
FROM `table`
WHERE id = 1;
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;

Here it's solution with pure math and sql:
create table t1(x int primary key auto_increment);
insert into t1 () values (),(),();
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 1265 rows affected (0.01 sec)
Records: 1265 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 2530 rows affected (0.02 sec)
Records: 2530 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 5060 rows affected (0.03 sec)
Records: 5060 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 10120 rows affected (0.05 sec)
Records: 10120 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 20240 rows affected (0.12 sec)
Records: 20240 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 40480 rows affected (0.17 sec)
Records: 40480 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 80960 rows affected (0.31 sec)
Records: 80960 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 161920 rows affected (0.57 sec)
Records: 161920 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 323840 rows affected (1.13 sec)
Records: 323840 Duplicates: 0 Warnings: 0
mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 647680 rows affected (2.33 sec)
Records: 647680 Duplicates: 0 Warnings: 0

If you want more control over the data, try something like this (in PHP):
<?php
$conn = mysql_connect(...);
$num = 100000;
$sql = 'INSERT INTO `table` (`col1`, `col2`, ...) VALUES ';
for ($i = 0; $i < $num; $i++) {
mysql_query($sql . generate_test_values($i));
}
?>
where function generate_test_values would return a string formatted like "('val1', 'val2', ...)". If this takes a long time, you can batch them so you're not making so many db calls, e.g.:
for ($i = 0; $i < $num; $i += 10) {
$values = array();
for ($j = 0; $j < 10; $j++) {
$values[] = generate_test_data($i + $j);
}
mysql_query($sql . join(", ", $values));
}
would only run 10000 queries, each adding 10 rows.

try filldb
you can either post your schema or use existing schema and generate dummy data and export from this site and import in your data base.

I really like the mysql_random_data_loader utility from Percona, you can find more details about it here.
mysql_random_data_loader is a utility that connects to the mysql database and fills the specified table with random data. If foreign keys are present in the table, they will also be correctly filled.
This utility has a cool feature, the speed of data generation can be limited.
For example, to generate 30,000 records, in the sakila.film_actor table with a speed of 500 records per second, you need the following command
mysql_random_data_load sakila film_actor 30000 --host=127.0.0.1 --port=3306 --user=my_user --password=my_password --qps=500 --bulk-size=1
I have successfully used this tool to simulate a workload in a test environment by running this utility on multiple threads at different speeds for different tables.

create table mydata as select * from information_schema.columns;
insert into mydata select * from mydata;
-- repeating the insert 11 times will give you at least 6 mln rows in the table.
I am terribly sorry if this is out of place, but I wanted to offer some explanation on this code as I know just enough to explain it and how the answer above is rather useful if you only understand what it does.
The first line Creates a table called mydata , and it generates the layout of the columns from the information_schema, which stores the information about your MYSQL server, and in this case, it is pulling from information_schema.columns, which allows the table being created to have all the column information needed to create not only the table, but all the columns you will need automatically, very handy.
The second line starts off with an Insert statement that will now target that new table called mydata and insert the Information_schema data into the table. The last line is just a comment suggesting you run the script a few times if you want to generate more data.
Lastly in conclusion, in my testing, one execution of this script generated 6,956 rows of data. If you are needing a quick way to generate some records, this isn't a bad method. However, for more advanced testing, you might want to ALTER the table to include a primary key that auto increments so that you have a unique index as a database without a primary key is a sad database. It also tends to have unpredictable results since there can be duplicate entries. All that being said, I wanted to offer some insight into this code because I found it useful, I think others might as well, if only they had spent the time to explain what it is doing. Most people aren't a fan of executing code that they have no idea what it is going to do, even from a trusted source, so hopefully someone else found this useful as I did. I'm not offering this as "the answer" but rather as another source of information to help provide some logistical support to the above answer.

This is a more performant modification to #michalzuber answer. The only difference is removing the WHERE id = 1, so that the inserts can accumulate on each run.
The amount of records produced would be n^2;
So for 10 iterations 10^2 = 1024 records
For 20 iterations 20^2 = 1048576 records and so on.
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= 10 DO
INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
SELECT `user_id`, `page_id`, `name`, `description`, `created`
FROM `table`;
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;

Related

MySQL: Forbid empty string on insert when wrong enum value specified

Is there a way to forbid empty string when wrong enum value is specified? whithout setting sql in "strict" or "traditional" mode
CREATE TABLE test (
foo enum('aaa','bbb') NOT NULL
);
INSERT INTO test VALUES('asd');
You could write a CHECK constraint (requires MySQL 8.0):
mysql> set sql_mode='';
mysql> alter table test add check (foo != '');
mysql> INSERT INTO test VALUES('asd');
ERROR 3819 (HY000): Check constraint 'test_chk_1' is violated.
Or you could do something similar in a trigger.
But I recommend you just enable strict mode.
You can write your INSERT statement with a SELECT that will allow you to utilize a WHERE clause:
INSERT INTO test (foo) SELECT 'aaa' WHERE 'aaa' IN ('aaa', 'bbb');
Query OK, 1 row affected (0.05 sec)
Records: 1 Duplicates: 0 Warnings: 0
INSERT INTO test (foo) SELECT 'ccc' WHERE 'ccc' IN ('aaa', 'bbb');
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0

Can't create table throught a view with function inside mysql

I created two tables
CREATE TABLE `prova` (
`id` int NOT NULL AUTO_INCREMENT ,
`text` varchar(255) NOT NULL ,
PRIMARY KEY (`id`)
)
;
CREATE TABLE `prova2` (
`id2` int NOT NULL AUTO_INCREMENT ,
`text2` varchar(255) NOT NULL ,
PRIMARY KEY (`id2`)
)
;
insert into prova (text) values ('ffffff');
A function does a select on table one and inserts a row in table two only if the value of variable #test is set to 0:
CREATE FUNCTION `get_prova`()
RETURNS int(11)
BEGIN
declare id_prova int ;
declare test int ;
set #test = 1;
set #id_prova = (select id from prova limit 1);
if (#test = 0) THEN
insert into prova2 (text2) values ('dddd');
end if;
return #id_prova;
END;
then, I create a view that calls this function:
create view temp_prova as
select id,
text,
get_prova() as prova
from prova
I want to create table 3 that contains the result of view:
CREATE TABLE zzz_prova SELECT * FROM temp_prova;
but when I try to create table zzz_prova I get this error:
[SQL]CREATE TABLE zzz_prova SELECT * FROM temp_prova; [Err] 1746 -
Can't update table 'prova2' while 'zzz_prova' is being created.
Why does this error show up?
Thank you
What version of MySQL are you running?
Changes in MySQL 5.6.2 (2011-04-11)
Incompatible Change; Replication: It is no longer possible to issue a
CREATE TABLE ... SELECT statement which changes any tables other than
the table being created. Any such statement is not executed and
instead fails with an error.
One consequence of this change is that FOR UPDATE may no longer be
used at all with the SELECT portion of a CREATE TABLE ... SELECT.
This means that, prior to upgrading from a previous release, you
should rewrite any CREATE TABLE ... SELECT statements that cause
changes in other tables so that the statements no longer do so.
This change also has implications for statement-based replication
between a MySQL 5.6 (or later slave) and a master running a previous
version of MySQL. In such a case, if a CREATE TABLE ... SELECT
statement on the master that causes changes in other tables succeeds
on the master, the statement nonetheless fails on the slave, causing
replication to stop. To keep this from happening, you should either
use row-based replication, or rewrite the offending statement before
running it on the master. (Bug #11749792, Bug #11745361, Bug #39804,
Bug #55876)
References: See also Bug #47899.
UPDATE
MySQL 5.5:
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.5.47 |
+-----------+
1 row in set (0.00 sec)
mysql> DROP FUNCTION IF EXISTS `f`;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE IF EXISTS `t1`;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE IF EXISTS `t2`;
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER |
mysql> CREATE FUNCTION `f`()
-> RETURNS INT
-> BEGIN
-> INSERT INTO `t2` VALUES (1);
-> RETURN 1;
-> END|
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER ;
mysql> CREATE TABLE `t2`(`c1` INT);
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `t1` SELECT `f`() `c1`;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT `c1` FROM `t1`;
+------+
| c1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
mysql> SELECT `c1` FROM `t2`;
+------+
| c1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
MySQL 5.6:
mysql> SELECT VERSION();
+-----------------+
| VERSION() |
+-----------------+
| 5.6.25 |
+-----------------+
1 row in set (0.00 sec)
mysql> DROP FUNCTION IF EXISTS `f`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DROP TABLE IF EXISTS `t1`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DROP TABLE IF EXISTS `t2`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DELIMITER |
mysql> CREATE FUNCTION `f`()
-> RETURNS INT
-> BEGIN
-> INSERT INTO `t2` VALUES (1);
-> RETURN 1;
-> END|
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER ;
mysql> CREATE TABLE `t2`(`c1` INT);
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `t1` SELECT `f`() `c1`;
ERROR 1746 (HY000): Can't update table 't2' while 't1' is being created.

Randomize timestamp column in large MySQL table

I have a test database table with ~100m rows which were generated by cloning original 3k rows multiple times. Let's say this table describes some events which have timestamps. Due to cloning now we have ~10m events per day which is far from real cases. So I'd like to randomize the date column and scatter records for several days.
Here is the procedure I've come up with:
DROP PROCEDURE IF EXISTS `randomizedates`;
DELIMITER //
CREATE PROCEDURE `randomizedates`(IN `daterange` INT)
BEGIN
DECLARE id INT UNSIGNED;
DECLARE buf TIMESTAMP;
DECLARE done INT DEFAULT FALSE;
DECLARE cur1 CURSOR FOR SELECT event_id FROM events;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur1;
the_loop: LOOP
FETCH cur1 INTO id;
IF done THEN
LEAVE the_loop;
END IF;
SET buf = (SELECT NOW() - INTERVAL FLOOR(RAND() * daterange) DAY);
UPDATE events SET starttime = buf WHERE event_id = id;
END LOOP the_loop;
CLOSE cur1;
END //
DELIMITER ;
On 3k table it executes for ~6 seconds so assuming linear сomplexity it will take ~50 hours to be applied on 100m table. Is there a way to speed it up? Or maybe my procedure is incorrect at all?
Just do:
set #datarange = 7;
update `events`
set starttime = NOW() - INTERVAL FLOOR(RAND()) * #datarange DAY;
Databases are not good at fetching and processing single rows in a lopp, like we are used to do in procedural languages (iterators, for each loops, arrays etc), they are best at, and optimized for processing SQL, which is essetially a declarative language - you declare what you want to get without specyfying how to do it, in contrast to procedural languages, which are used to specify the steps the program must do.
Remember - row by row = slow by slow.
Look at simple example that simulates your table and compares your procedure to UPDATE:
drop table `events`;
create table `events` as
select * from information_schema.tables
where 1=0;
alter table `events` add column event_id int primary key auto_increment first;
alter table `events` change column create_time starttime timestamp;
insert into `events`
select null, t.*
from information_schema.tables t
cross join (
select 1 from information_schema.tables
limit 100
) xx
mysql> select count(*) from `events`;
+----------+
| count(*) |
+----------+
| 17200 |
+----------+
We created a table with 17 thousand rows. Now we call the procedure:
mysql> call `randomizedates`(7);
Query OK, 0 rows affected (34.26 sec)
and the update command:
mysql> set #datarange = 7;
Query OK, 0 rows affected (0.00 sec)
mysql> update `events`
-> set starttime = NOW() - INTERVAL FLOOR(RAND()) * #datarange DAY;
Query OK, 17200 rows affected (0.23 sec)
Rows matched: 17200 Changed: 17200 Warnings: 0
As you see - 34 seconds / 0.23 second = 14782 % faster - it's a huge difference !!!

Mysql IN stopped working on 5.1

I am trying to do something in Mysql Server 5.1 on Windows.
I am positive this type of query worked in an older version of Mysql as I supplied it to a client previously without a problem.
Basically, a field in one of my tables contains several ids; such as 1,2,3,4,5
The field is of type varchar
I am trying to see if a value exists in the field by using an IN statement, like below. But it returns nothing.
What am I doing wrong? Is there a better way? Thanks.
mysql> create database testing;
Query OK, 1 row affected (0.00 sec)
mysql> use testing;
Database changed
mysql> create table table1(field1 char(20));
Query OK, 0 rows affected (0.01 sec)
mysql> create table table2(field2 char(20));
Query OK, 0 rows affected (0.00 sec)
mysql> insert into table1 values('1');
Query OK, 1 row affected (0.00 sec)
mysql> insert into table2 values('1,2,3');
Query OK, 1 row affected (0.00 sec)
mysql> select * from table1 where field1 in (select field2 from table2);
Empty set (0.00 sec)
From your query select * from table1 where field1 in (select field2 from table2);, what I'm imagining is like this:
Dissecting the sub-query select field2 from table2, you will have:
field2
'1,2,3'
Then the main query will be (substitution):
select * from table1 where field1 in ('1,2,3');
Obviously it will return no rows since the only value that table1.field1 has is '1'. And '1' <> '1,2,3'.
Well, I bet you are looking for this: FIND_IN_SET
Sample query:
SELECT FIND_IN_SET('1', '1,2,3'); will return 1.
insert into table2 values('1,2,3');
most likely needs to be
insert into table2 values('1');
insert into table2 values('2');
insert into table2 values('3');
Then, your sub select select field2 from table2 returns ('1', '2', '3'), and the IN operator can be used to check if the result of the corresponding field from the main select is contained in this set.
According to the comments, the same schema seems to have worked before. I am not aware that the IN operator can be used like in the question, and splitting of column values into a row set seems to be non-trivial.
Using the FIND_IN_SET() function as proposed by #KaeL, the following query should work:
SELECT b.*
FROM table2 a
INNER JOIN table1 b ON FIND_IN_SET(b.field1, a.field2) > 0;
See also
Single MySQL field with comma separated values
Sample query on SQLFiddle
In any case, you should consider normalizing your schema - using string lists as values of single fields can usually be much better handled by a relational database when the separate values are stored in separate rows in a separate table.
http://sqlfiddle.com/#!2/797cc/3 shows a possible solution.

Why doesn't this IF NOT EXISTS statement work?

I'm trying to write a query which will INSERT a random value between 0 to 9999 INTO a table, whereas this random value is yet to exist there.
However, nothing I wrote works. It seems like a WHERE clause doesn't work with INSERT, and my SQL server fails to execute an IF NOT EXISTS query. Is it incorrect, I wonder?
What should I do? Is there a solution to my problem?
(I'm using MySQL)
SET #rand = ROUND(RAND() * 9999);
IF NOT EXISTS (SELECT `num` FROM `nums` WHERE `num` = #rand)
INSERT INTO `nums` (`num`) VALUES (#rand);
You can do it like here: MySQL: Insert record if not exists in table
INSERT INTO `nums` (`num`)
SELECT *
FROM
(SELECT #rand) AS q
WHERE NOT EXISTS
(SELECT `num`
FROM `nums`
WHERE `num` = #rand);
Try it using a stored procedure like this:
CREATE PROCEDURE my_sp()
BEGIN
SET #rand = ROUND(RAND() * 9999);
IF NOT EXISTS (SELECT `num` FROM `nums` WHERE `num` = #rand) THEN
INSERT INTO `nums` (`num`) VALUES (#rand);
END IF;
END
Using statements like IF belongs inside a block of code like a stored procedure. You won't be able to execute it just on the mysql prompt.
If you just want to insert the a random value that wasn't there before you can also do it by
mysql> create table nums(num int, unique key(num));
Query OK, 0 rows affected (0.05 sec)
mysql> insert ignore into nums >select round(rand()*9999);>
Query OK, 1 row affected (0.01 >sec)>
Records: 1 Duplicates: 0 Warn>ings>: 0>
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> select * from nums;
+------+
| num |
+------+
| 5268 |
| 9075 |
| 9114 |
| 9768 |
+------+
4 rows in set (0.00 sec)
mysql>
With insert ignore, it won't insert a row if it already exists.