I have a system and I want to test it which executes Alter queries. I'm looking for a way to simulate a long-running Alter query that I can test "panic", "resource usage", "concurrency", ... when it's running.
Is there any way that exists I can simulate a long-running Alter query?
I'm using gh-ost for alter execution.
Here's what I do when I want to test a long-running ALTER TABLE:
Create a table.
Fill it with a few million rows of random data, until it's large enough that ALTER TABLE takes a few minutes. How many rows are required depends on the speed of your computer.
Run ALTER TABLE on it.
I have not found a better solution, and I've been using MySQL since 2001.
Here's a trick for filling lots of rows without needing a client app or script:
mysql> create table mytable (id int auto_increment primary key, t text);
Query OK, 0 rows affected (0.05 sec)
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from dual;
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 1 row affected (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 8 rows affected (0.03 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 16 rows affected (0.03 sec)
Records: 16 Duplicates: 0 Warnings: 0
Now I have 32 rows (16+8+4+2+1+1). I can continue the same query as many times as I want, which doubles the size of the table each time. It doesn't take long before I have a table several gigabytes
in size.
Related
I have never had problems like this and I can't find the problem.
https://i.stack.imgur.com/RJBCK.png
CREATE TABLE setting_cube(i INT, subject VARCHAR(30));
Query OK, 0 rows affected (0.03 sec)
UPDATE setting_cube SET i='1', subject='#uno';
Query OK, 0 rows affected (0.01 sec)
Rows matched: 0 Changed: 0 Warnings: 0
and if I try:
SELECT * FROM setting_cube;
Empty set (0.00 sec)
even though I have updated the table...
What am I doing wrong? thank youuuu!!!
The table is empty. You should insert some rows before updating them in the second step:
INSERT INTO setting_cube VALUES (1, 'first row');
Create TABLE setting_cube(i INT, subject VARCHAR(30));
----First you need to insert some x date
insert into setting_cube values (1 ,'Test')
---Now You can Update table
update setting_cube set i = 1 , subject = '#uno'
select * from setting_cube
I have a table that is in InnoDB engine, quite a simple one with 25000 rows. When I do a simple ALTER, it runs for almost 10 minutes:
mysql> ALTER TABLE `quote_followups_istvan`
ADD `customer_ip2` VARCHAR(20) NOT NULL DEFAULT '' AFTER `comment`;
Query OK, 0 rows affected (10 min 52.82 sec)
Records: 0 Duplicates: 0 Warnings: 0
But when I change it's engine to MyISAM, I get this:
mysql> alter table quote_followups_istvan engine="MyISAM";
Query OK, 25053 rows affected (0.56 sec)
Records: 25053 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE `quote_followups_istvan`
ADD `customer_ip3` VARCHAR(20) NOT NULL DEFAULT '' AFTER `comment`;
Query OK, 25053 rows affected (0.37 sec)
Records: 25053 Duplicates: 0 Warnings: 0
So 10minutes vs 0.37s....
What amd I missing here?
Let me answer my own question. Reading on, articles like this one
optimize mySql for faster alter table add column
and many more, actually say that this is the "issue" with InnoDb tables, and suggest some alternative approaches.
So I can only conclude that this is a normal behavior.
I have a MySQL very large database (1 billion rows) like this:
database : products("name","caracteristics")
Both columns are VARCHAR(50).
actually, it have no KEY sat, but "name" will be unique, so I think I will alter it as "name" PRIMARY_KEY. (I should have done that before.. now I need to perform a remove duplicate query before adding primary_key option I guess)
My problem is, when performing a simple query on the table, it takes ages literally.
SELECT caracteristics WHERE name=blabla LIMIT 1; //takes ages.
I was thinking of partitioning the existing table.
So here are the question:
Is it a good idea to fix my performance issues?
How can I achieve that?
Is my idea of ALTER TABLE to set 'name' column as PRIMARY_KEY a good idea also?
also about the duplicate query, I found this around here, am I doing it properly? (don't want to mess up my table...)
delete a
from products a
left join(
select max(name) maxname, caracteristics
from products
group by caracteristics) b
on a.name = maxname and
a.caracteristics= b.caracteristics
where b.maxname IS NULL;
you can also direct set a PRIMARY KEY with the ignore option like this:
ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
this will delete all duplicates from name.
sample
MariaDB [l]> CREATE TABLE `products` (
-> `name` varchar(50) NOT NULL DEFAULT '',
-> `caracteristics` varchar(50) DEFAULT NULL
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]> INSERT INTO `products` (`name`, `caracteristics`)
-> VALUES
-> ('val1', 'asdfasdfasdf'),
-> ('val2', 'asdasDasd'),
-> ('val3', 'aesfawfa'),
-> ('val1', '99999999');
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
| val1 | 99999999 |
+------+----------------+
4 rows in set (0.00 sec)
MariaDB [l]> ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 1 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
+------+----------------+
3 rows in set (0.00 sec)
MariaDB [l]>
test ADD PRIMARY KEY / INSERT IGNORE
Here is a test between add Primary key and insert ignore into. and you can see that add Primary key (90 sec / 120 sec) is a little bit faster in this sample
MariaDB [l]> CREATE TABLE `bigtable10m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable10m`
-> select lpad(seq,8,'0') from seq_1_to_10000000;
Query OK, 10000000 rows affected (24.24 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> SELECT * FROM `bigtable10m` LIMIT 10;
+----------+
| id |
+----------+
| 00000001 |
| 00000002 |
| 00000003 |
| 00000004 |
| 00000005 |
| 00000006 |
| 00000007 |
| 00000008 |
| 00000009 |
| 00000010 |
+----------+
10 rows in set (0.00 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (28.49 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (29.01 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (32.98 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> ALTER IGNORE TABLE `bigtable30m` ADD PRIMARY KEY(id);
Query OK, 30000000 rows affected (1 min 32.34 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 0
MariaDB [l]>
MariaDB [l]> DROP TABLE `bigtable30m`;
Query OK, 0 rows affected (0.52 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (37.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (41.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (30.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> CREATE TABLE bigtable_unique (
-> `id` varchar(32) NOT NULL DEFAULT '',
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT IGNORE bigtable_unique SELECT * FROM `bigtable30m`;
Query OK, 10000000 rows affected, 65535 warnings (1 min 57.99 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 20000000
MariaDB [l]>
I think partitionning is not the way you should go for this particular problem. How would you partition? On what criteria?
I think your main concern is architectural and should be fixed prior to anything else: unique records are not unique.
Because of the volumetry I think any solution will take a while to execute. But my bet is that this one is the fastest:
CREATE TABLE products_unique (
name VARCHAR(50) NOT NULL,
characteristics VARCHAR(50),
PRIMARY KEY (name)
);
INSERT IGNORE INTO products_unique SELECT * FROM products;
RENAME TABLE products TO products_backup;
RENAME TABLE products_unique TO products;
Duplicates will be evinced arbitrarily, but I think it is what you are looking for anyway.
If it takes too long, you should try running it overnight... I just hope the transaction buffer does not explode on you in which case we'd have to work on some stored procedure to separate the inserts in batches.
Yes, it is a good idea to fix performance issues. That is the correct answer always when you have performance issues serious-enough to be wondering about performance fixes.
You can achieve that by altering the table and making name a primary key, as you have already realized.
Your query should not be necessary. You should create a temporary table instead where you would insert the values you deem necessary. Let's suppose the name of that table is mytemptable. Then:
insert into mytemptable(name, characteristics)
select name, characteristics
from products
where not exists (select 1
from mytemptable t
where products.name = t.name);
Then remove your records from products using
delete from products;
then alter products, make sure it has name as a primary key and then
insert into products(name, characteristics)
select name, characteristics
from mytemptable;
and finally drop your temporary table.
As about your query:
Since you remove records, max(name) will be equal to all other names in your group if you have a single possible name associated to a given characteristics value, which is pretty safe to assume.. So, if you have a possible characteristics value matching a single name, you will remove all instances of that name, so yes, your query will mess your data.
In the following MySQL code, the first two blocks drop and create a temporary table _temp (with different column labels) and select * from it without a problem. Then, I create a stored procedure that does the same thing (i.e., select * from _temp), and it works first time, but not the second, failing with
ERROR 1054 (42S22): Unknown column 'test._temp.f' in 'field list'
It seems like select * from _temp on its own correctly handles the change in table columns, but the previous columns names are remembered across stored procedure calls. Am I doing something wrong, or is there a workaround?
MySQL Code
drop temporary table if exists _temp;
create temporary table _temp select 'first' as f;
select * from _temp;
drop temporary table if exists _temp;
create temporary table _temp select 'second' as s;
select * from _temp;
drop procedure if exists selectTemp;
create procedure selectTemp()
select * from _temp;
drop temporary table if exists _temp;
create temporary table _temp select 'first' as f;
call selectTemp();
drop temporary table if exists _temp;
create temporary table _temp select 'second' as s;
call selectTemp();
Transcript
$ mysql --version
mysql Ver 14.14 Distrib 5.5.38, for debian-linux-gnu (x86_64) using readline 6.2
mysql> source temp.sql
Query OK, 0 rows affected (0.01 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
+-------+
| f |
+-------+
| first |
+-------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
+--------+
| s |
+--------+
| second |
+--------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
+-------+
| f |
+-------+
| first |
+-------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
ERROR 1054 (42S22): Unknown column 'test._temp.f' in 'field list'
After paring this down to a minimal working example, and distilling the essential elements, searching for a bug report, this became much easier. It turns out that this was reported all the way back in 2005 as:
Bug #12257 SELECT * inside PROCEDURE gives "Unknown column" on second loop if tbl changed
Some of the bugs marked as a duplicate of that are actually more along the lines of the example:
Bug #15766 select * from table inside stored procedure uses old field names
Bug #49333 Unknown column 'test.TEMPTABLE.column1' in 'field list'
Bug #62406 new cursor, on table with same name but different structure as used before fails
The bug is closed, but apparently not fixed yet, though 5.6 mentions the behavior. From the comments in the bug report:
Noted in 5.6.6 changelog.
"Unknown column" errors or bad data could result from changing the set
of columns in a table used within a stored program between executions
of the program or while the table was used within a program loop.
I'm trying to write a query which will INSERT a random value between 0 to 9999 INTO a table, whereas this random value is yet to exist there.
However, nothing I wrote works. It seems like a WHERE clause doesn't work with INSERT, and my SQL server fails to execute an IF NOT EXISTS query. Is it incorrect, I wonder?
What should I do? Is there a solution to my problem?
(I'm using MySQL)
SET #rand = ROUND(RAND() * 9999);
IF NOT EXISTS (SELECT `num` FROM `nums` WHERE `num` = #rand)
INSERT INTO `nums` (`num`) VALUES (#rand);
You can do it like here: MySQL: Insert record if not exists in table
INSERT INTO `nums` (`num`)
SELECT *
FROM
(SELECT #rand) AS q
WHERE NOT EXISTS
(SELECT `num`
FROM `nums`
WHERE `num` = #rand);
Try it using a stored procedure like this:
CREATE PROCEDURE my_sp()
BEGIN
SET #rand = ROUND(RAND() * 9999);
IF NOT EXISTS (SELECT `num` FROM `nums` WHERE `num` = #rand) THEN
INSERT INTO `nums` (`num`) VALUES (#rand);
END IF;
END
Using statements like IF belongs inside a block of code like a stored procedure. You won't be able to execute it just on the mysql prompt.
If you just want to insert the a random value that wasn't there before you can also do it by
mysql> create table nums(num int, unique key(num));
Query OK, 0 rows affected (0.05 sec)
mysql> insert ignore into nums >select round(rand()*9999);>
Query OK, 1 row affected (0.01 >sec)>
Records: 1 Duplicates: 0 Warn>ings>: 0>
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> select * from nums;
+------+
| num |
+------+
| 5268 |
| 9075 |
| 9114 |
| 9768 |
+------+
4 rows in set (0.00 sec)
mysql>
With insert ignore, it won't insert a row if it already exists.