MySQL Partitioning (innoDB) - Large table - mysql

I have a MySQL very large database (1 billion rows) like this:
database : products("name","caracteristics")
Both columns are VARCHAR(50).
actually, it have no KEY sat, but "name" will be unique, so I think I will alter it as "name" PRIMARY_KEY. (I should have done that before.. now I need to perform a remove duplicate query before adding primary_key option I guess)
My problem is, when performing a simple query on the table, it takes ages literally.
SELECT caracteristics WHERE name=blabla LIMIT 1; //takes ages.
I was thinking of partitioning the existing table.
So here are the question:
Is it a good idea to fix my performance issues?
How can I achieve that?
Is my idea of ALTER TABLE to set 'name' column as PRIMARY_KEY a good idea also?
also about the duplicate query, I found this around here, am I doing it properly? (don't want to mess up my table...)
delete a
from products a
left join(
select max(name) maxname, caracteristics
from products
group by caracteristics) b
on a.name = maxname and
a.caracteristics= b.caracteristics
where b.maxname IS NULL;

you can also direct set a PRIMARY KEY with the ignore option like this:
ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
this will delete all duplicates from name.
sample
MariaDB [l]> CREATE TABLE `products` (
-> `name` varchar(50) NOT NULL DEFAULT '',
-> `caracteristics` varchar(50) DEFAULT NULL
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]> INSERT INTO `products` (`name`, `caracteristics`)
-> VALUES
-> ('val1', 'asdfasdfasdf'),
-> ('val2', 'asdasDasd'),
-> ('val3', 'aesfawfa'),
-> ('val1', '99999999');
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
| val1 | 99999999 |
+------+----------------+
4 rows in set (0.00 sec)
MariaDB [l]> ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 1 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
+------+----------------+
3 rows in set (0.00 sec)
MariaDB [l]>
test ADD PRIMARY KEY / INSERT IGNORE
Here is a test between add Primary key and insert ignore into. and you can see that add Primary key (90 sec / 120 sec) is a little bit faster in this sample
MariaDB [l]> CREATE TABLE `bigtable10m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable10m`
-> select lpad(seq,8,'0') from seq_1_to_10000000;
Query OK, 10000000 rows affected (24.24 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> SELECT * FROM `bigtable10m` LIMIT 10;
+----------+
| id |
+----------+
| 00000001 |
| 00000002 |
| 00000003 |
| 00000004 |
| 00000005 |
| 00000006 |
| 00000007 |
| 00000008 |
| 00000009 |
| 00000010 |
+----------+
10 rows in set (0.00 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (28.49 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (29.01 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (32.98 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> ALTER IGNORE TABLE `bigtable30m` ADD PRIMARY KEY(id);
Query OK, 30000000 rows affected (1 min 32.34 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 0
MariaDB [l]>
MariaDB [l]> DROP TABLE `bigtable30m`;
Query OK, 0 rows affected (0.52 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (37.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (41.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (30.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> CREATE TABLE bigtable_unique (
-> `id` varchar(32) NOT NULL DEFAULT '',
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT IGNORE bigtable_unique SELECT * FROM `bigtable30m`;
Query OK, 10000000 rows affected, 65535 warnings (1 min 57.99 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 20000000
MariaDB [l]>

I think partitionning is not the way you should go for this particular problem. How would you partition? On what criteria?
I think your main concern is architectural and should be fixed prior to anything else: unique records are not unique.
Because of the volumetry I think any solution will take a while to execute. But my bet is that this one is the fastest:
CREATE TABLE products_unique (
name VARCHAR(50) NOT NULL,
characteristics VARCHAR(50),
PRIMARY KEY (name)
);
INSERT IGNORE INTO products_unique SELECT * FROM products;
RENAME TABLE products TO products_backup;
RENAME TABLE products_unique TO products;
Duplicates will be evinced arbitrarily, but I think it is what you are looking for anyway.
If it takes too long, you should try running it overnight... I just hope the transaction buffer does not explode on you in which case we'd have to work on some stored procedure to separate the inserts in batches.

Yes, it is a good idea to fix performance issues. That is the correct answer always when you have performance issues serious-enough to be wondering about performance fixes.
You can achieve that by altering the table and making name a primary key, as you have already realized.
Your query should not be necessary. You should create a temporary table instead where you would insert the values you deem necessary. Let's suppose the name of that table is mytemptable. Then:
insert into mytemptable(name, characteristics)
select name, characteristics
from products
where not exists (select 1
from mytemptable t
where products.name = t.name);
Then remove your records from products using
delete from products;
then alter products, make sure it has name as a primary key and then
insert into products(name, characteristics)
select name, characteristics
from mytemptable;
and finally drop your temporary table.
As about your query:
Since you remove records, max(name) will be equal to all other names in your group if you have a single possible name associated to a given characteristics value, which is pretty safe to assume.. So, if you have a possible characteristics value matching a single name, you will remove all instances of that name, so yes, your query will mess your data.

Related

How to simulate long running alter query on MySQL

I have a system and I want to test it which executes Alter queries. I'm looking for a way to simulate a long-running Alter query that I can test "panic", "resource usage", "concurrency", ... when it's running.
Is there any way that exists I can simulate a long-running Alter query?
I'm using gh-ost for alter execution.
Here's what I do when I want to test a long-running ALTER TABLE:
Create a table.
Fill it with a few million rows of random data, until it's large enough that ALTER TABLE takes a few minutes. How many rows are required depends on the speed of your computer.
Run ALTER TABLE on it.
I have not found a better solution, and I've been using MySQL since 2001.
Here's a trick for filling lots of rows without needing a client app or script:
mysql> create table mytable (id int auto_increment primary key, t text);
Query OK, 0 rows affected (0.05 sec)
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from dual;
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 1 row affected (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 8 rows affected (0.03 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 16 rows affected (0.03 sec)
Records: 16 Duplicates: 0 Warnings: 0
Now I have 32 rows (16+8+4+2+1+1). I can continue the same query as many times as I want, which doubles the size of the table each time. It doesn't take long before I have a table several gigabytes
in size.

Cannot refer to tables with computed virtual columns in triggers of MySql

This seems to be a bug in MySql. Posting it here to confirm my conclusion and share my experience. We are currently migrating from MS SQL Server to MySql Community Edition 5.7.12. There is a Dealers table which has a virtual computed column. It was being referred in the join of a query used inside a trigger. As a result of this, the MySql Server got re-started.
To make sure that there was no other cause to the event, we had created a dummy table without computed columns and referred to that table in the trigger. The trigger executed successfully. Then, we had created another dummy table with the computed column. We had just referred the table in the join without the reference to the computed column. When the trigger was fired, the server crashed inspite of the fact that only a actual column of the table was referred and there was no reference to the computed column. Thus, you cannot even refer a table with computed columns in the triggers.
What we have done temporarily is to convert the virtual columns into actual columns and modified the queries of select, insert and update on the table.
Is there a better alternative to solve this issue?
Can you post your test example?. I can't reproduce the problem on my test example.
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.7.12 |
+-----------+
1 row in set (0.00 sec)
mysql> DROP TABLE IF EXISTS `t2`;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE IF EXISTS `t1`;
Query OK, 0 rows affected (0.00 sec)
-- Table with Generated Column
mysql> CREATE TABLE IF NOT EXISTS `t1` (
-> `c0` INTEGER UNSIGNED NOT NULL PRIMARY KEY,
-> `value` VARCHAR(20),
-> `c1` INTEGER UNSIGNED GENERATED ALWAYS AS (`c0`) VIRTUAL
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE IF NOT EXISTS `t2` (
-> `c1` INTEGER UNSIGNED NOT NULL PRIMARY KEY,
-> `value` VARCHAR(20)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO `t2` (`c1`, `value`) VALUES (1, 'value 1');
Query OK, 1 row affected (0.00 sec)
mysql> DELIMITER ||
mysql> DROP TRIGGER IF EXISTS `t1_ins_bef`||
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> CREATE TRIGGER `t1_ins_bef` BEFORE INSERT ON `t1`
-> FOR EACH ROW
-> BEGIN
-> SET NEW.`value` := (SELECT `t2`.`value`
-> FROM `t1`
-> INNER JOIN `t2` ON `t1`.`c1` = `t2`.`c1`);
-> END||
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER ;
mysql> INSERT INTO `t1` (`c0`) VALUES (1), (2);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> SELECT
-> `c0`,
-> `value`,
-> `c1`
-> FROM
-> `t1`;
+----+---------+------+
| c0 | value | c1 |
+----+---------+------+
| 1 | NULL | 1 |
| 2 | value 1 | 2 |
+----+---------+------+
2 rows in set (0.00 sec)
mysql> SELECT
-> `c1`,
-> `value`
-> FROM
-> `t2`;
+----+---------+
| c1 | value |
+----+---------+
| 1 | value 1 |
+----+---------+
1 row in set (0.00 sec)

Can't create table throught a view with function inside mysql

I created two tables
CREATE TABLE `prova` (
`id` int NOT NULL AUTO_INCREMENT ,
`text` varchar(255) NOT NULL ,
PRIMARY KEY (`id`)
)
;
CREATE TABLE `prova2` (
`id2` int NOT NULL AUTO_INCREMENT ,
`text2` varchar(255) NOT NULL ,
PRIMARY KEY (`id2`)
)
;
insert into prova (text) values ('ffffff');
A function does a select on table one and inserts a row in table two only if the value of variable #test is set to 0:
CREATE FUNCTION `get_prova`()
RETURNS int(11)
BEGIN
declare id_prova int ;
declare test int ;
set #test = 1;
set #id_prova = (select id from prova limit 1);
if (#test = 0) THEN
insert into prova2 (text2) values ('dddd');
end if;
return #id_prova;
END;
then, I create a view that calls this function:
create view temp_prova as
select id,
text,
get_prova() as prova
from prova
I want to create table 3 that contains the result of view:
CREATE TABLE zzz_prova SELECT * FROM temp_prova;
but when I try to create table zzz_prova I get this error:
[SQL]CREATE TABLE zzz_prova SELECT * FROM temp_prova; [Err] 1746 -
Can't update table 'prova2' while 'zzz_prova' is being created.
Why does this error show up?
Thank you
What version of MySQL are you running?
Changes in MySQL 5.6.2 (2011-04-11)
Incompatible Change; Replication: It is no longer possible to issue a
CREATE TABLE ... SELECT statement which changes any tables other than
the table being created. Any such statement is not executed and
instead fails with an error.
One consequence of this change is that FOR UPDATE may no longer be
used at all with the SELECT portion of a CREATE TABLE ... SELECT.
This means that, prior to upgrading from a previous release, you
should rewrite any CREATE TABLE ... SELECT statements that cause
changes in other tables so that the statements no longer do so.
This change also has implications for statement-based replication
between a MySQL 5.6 (or later slave) and a master running a previous
version of MySQL. In such a case, if a CREATE TABLE ... SELECT
statement on the master that causes changes in other tables succeeds
on the master, the statement nonetheless fails on the slave, causing
replication to stop. To keep this from happening, you should either
use row-based replication, or rewrite the offending statement before
running it on the master. (Bug #11749792, Bug #11745361, Bug #39804,
Bug #55876)
References: See also Bug #47899.
UPDATE
MySQL 5.5:
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.5.47 |
+-----------+
1 row in set (0.00 sec)
mysql> DROP FUNCTION IF EXISTS `f`;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE IF EXISTS `t1`;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE IF EXISTS `t2`;
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER |
mysql> CREATE FUNCTION `f`()
-> RETURNS INT
-> BEGIN
-> INSERT INTO `t2` VALUES (1);
-> RETURN 1;
-> END|
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER ;
mysql> CREATE TABLE `t2`(`c1` INT);
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `t1` SELECT `f`() `c1`;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT `c1` FROM `t1`;
+------+
| c1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
mysql> SELECT `c1` FROM `t2`;
+------+
| c1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
MySQL 5.6:
mysql> SELECT VERSION();
+-----------------+
| VERSION() |
+-----------------+
| 5.6.25 |
+-----------------+
1 row in set (0.00 sec)
mysql> DROP FUNCTION IF EXISTS `f`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DROP TABLE IF EXISTS `t1`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DROP TABLE IF EXISTS `t2`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DELIMITER |
mysql> CREATE FUNCTION `f`()
-> RETURNS INT
-> BEGIN
-> INSERT INTO `t2` VALUES (1);
-> RETURN 1;
-> END|
Query OK, 0 rows affected (0.00 sec)
mysql> DELIMITER ;
mysql> CREATE TABLE `t2`(`c1` INT);
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `t1` SELECT `f`() `c1`;
ERROR 1746 (HY000): Can't update table 't2' while 't1' is being created.

Why doesn't this IF NOT EXISTS statement work?

I'm trying to write a query which will INSERT a random value between 0 to 9999 INTO a table, whereas this random value is yet to exist there.
However, nothing I wrote works. It seems like a WHERE clause doesn't work with INSERT, and my SQL server fails to execute an IF NOT EXISTS query. Is it incorrect, I wonder?
What should I do? Is there a solution to my problem?
(I'm using MySQL)
SET #rand = ROUND(RAND() * 9999);
IF NOT EXISTS (SELECT `num` FROM `nums` WHERE `num` = #rand)
INSERT INTO `nums` (`num`) VALUES (#rand);
You can do it like here: MySQL: Insert record if not exists in table
INSERT INTO `nums` (`num`)
SELECT *
FROM
(SELECT #rand) AS q
WHERE NOT EXISTS
(SELECT `num`
FROM `nums`
WHERE `num` = #rand);
Try it using a stored procedure like this:
CREATE PROCEDURE my_sp()
BEGIN
SET #rand = ROUND(RAND() * 9999);
IF NOT EXISTS (SELECT `num` FROM `nums` WHERE `num` = #rand) THEN
INSERT INTO `nums` (`num`) VALUES (#rand);
END IF;
END
Using statements like IF belongs inside a block of code like a stored procedure. You won't be able to execute it just on the mysql prompt.
If you just want to insert the a random value that wasn't there before you can also do it by
mysql> create table nums(num int, unique key(num));
Query OK, 0 rows affected (0.05 sec)
mysql> insert ignore into nums >select round(rand()*9999);>
Query OK, 1 row affected (0.01 >sec)>
Records: 1 Duplicates: 0 Warn>ings>: 0>
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert ignore into nums select round(rand()*9999);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> select * from nums;
+------+
| num |
+------+
| 5268 |
| 9075 |
| 9114 |
| 9768 |
+------+
4 rows in set (0.00 sec)
mysql>
With insert ignore, it won't insert a row if it already exists.

2 servers, 2 memory tables, different sizes

I have got two servers both running a MySQL instance. The first one, server1, is running MySQL 5.0.22. The other one, server2, is running MySQL 5.1.58.
When I create a memory table on server1 and I add a row its size is instantly 8,190.0 KiB.
When I create a memory table on server2 and I add a row its size is still only some bytes, though.
Is this caused by the difference in MySQL version or (hopefully) is this due to some setting I can change?
EDIT:
I haven't found the reason for this behaviour yet, but I did found a workaround. So, for future references, this is what fixed it for me:
All my memory tables are made once and are read-only from thereon. When you specify to MySQL the maximum number of rows your table will have, its size will shrink. The following query will do that for you.
ALTER TABLE table_name MAX_ROWS = N
Factor of 2?
OK, the problem likely is caused by the UTF-8 vs latin1
:- http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
You can check the database connection, database default character set for both servers.
here is the testing I have just done :-
mysql> create table test ( name varchar(10) ) engine
-> =memory;
Query OK, 0 rows affected (0.03 sec)
mysql> show create table test;
+-------+------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` varchar(10) DEFAULT NULL
) ENGINE=MEMORY DEFAULT CHARSET=latin1 |
+-------+------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> insert into test values ( 1 );
mysql> set names utf8;
Query OK, 0 rows affected (0.01 sec)
mysql> create table test2 ( name varchar(10) ) engine =memory default charset = utf8;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> insert into test2 values ( convert(1 using utf8) );
Query OK, 1 row affected (0.01 sec)
mysql> select table_name, avg_row_length from information_schema.tables where TABLE_NAME in( 'test2', 'test');
+------------+----------------+
| table_name | avg_row_length |
+------------+----------------+
| test | 12 |
| test2 | 32 |
+------------+----------------+
2 rows in set (0.01 sec)