I have a table that is in InnoDB engine, quite a simple one with 25000 rows. When I do a simple ALTER, it runs for almost 10 minutes:
mysql> ALTER TABLE `quote_followups_istvan`
ADD `customer_ip2` VARCHAR(20) NOT NULL DEFAULT '' AFTER `comment`;
Query OK, 0 rows affected (10 min 52.82 sec)
Records: 0 Duplicates: 0 Warnings: 0
But when I change it's engine to MyISAM, I get this:
mysql> alter table quote_followups_istvan engine="MyISAM";
Query OK, 25053 rows affected (0.56 sec)
Records: 25053 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE `quote_followups_istvan`
ADD `customer_ip3` VARCHAR(20) NOT NULL DEFAULT '' AFTER `comment`;
Query OK, 25053 rows affected (0.37 sec)
Records: 25053 Duplicates: 0 Warnings: 0
So 10minutes vs 0.37s....
What amd I missing here?
Let me answer my own question. Reading on, articles like this one
optimize mySql for faster alter table add column
and many more, actually say that this is the "issue" with InnoDb tables, and suggest some alternative approaches.
So I can only conclude that this is a normal behavior.
Related
I have a system and I want to test it which executes Alter queries. I'm looking for a way to simulate a long-running Alter query that I can test "panic", "resource usage", "concurrency", ... when it's running.
Is there any way that exists I can simulate a long-running Alter query?
I'm using gh-ost for alter execution.
Here's what I do when I want to test a long-running ALTER TABLE:
Create a table.
Fill it with a few million rows of random data, until it's large enough that ALTER TABLE takes a few minutes. How many rows are required depends on the speed of your computer.
Run ALTER TABLE on it.
I have not found a better solution, and I've been using MySQL since 2001.
Here's a trick for filling lots of rows without needing a client app or script:
mysql> create table mytable (id int auto_increment primary key, t text);
Query OK, 0 rows affected (0.05 sec)
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from dual;
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 1 row affected (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 8 rows affected (0.03 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> insert into mytable (t) select repeat(sha1(rand()), 255) from mytable;
Query OK, 16 rows affected (0.03 sec)
Records: 16 Duplicates: 0 Warnings: 0
Now I have 32 rows (16+8+4+2+1+1). I can continue the same query as many times as I want, which doubles the size of the table each time. It doesn't take long before I have a table several gigabytes
in size.
I have a MySQL very large database (1 billion rows) like this:
database : products("name","caracteristics")
Both columns are VARCHAR(50).
actually, it have no KEY sat, but "name" will be unique, so I think I will alter it as "name" PRIMARY_KEY. (I should have done that before.. now I need to perform a remove duplicate query before adding primary_key option I guess)
My problem is, when performing a simple query on the table, it takes ages literally.
SELECT caracteristics WHERE name=blabla LIMIT 1; //takes ages.
I was thinking of partitioning the existing table.
So here are the question:
Is it a good idea to fix my performance issues?
How can I achieve that?
Is my idea of ALTER TABLE to set 'name' column as PRIMARY_KEY a good idea also?
also about the duplicate query, I found this around here, am I doing it properly? (don't want to mess up my table...)
delete a
from products a
left join(
select max(name) maxname, caracteristics
from products
group by caracteristics) b
on a.name = maxname and
a.caracteristics= b.caracteristics
where b.maxname IS NULL;
you can also direct set a PRIMARY KEY with the ignore option like this:
ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
this will delete all duplicates from name.
sample
MariaDB [l]> CREATE TABLE `products` (
-> `name` varchar(50) NOT NULL DEFAULT '',
-> `caracteristics` varchar(50) DEFAULT NULL
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]> INSERT INTO `products` (`name`, `caracteristics`)
-> VALUES
-> ('val1', 'asdfasdfasdf'),
-> ('val2', 'asdasDasd'),
-> ('val3', 'aesfawfa'),
-> ('val1', '99999999');
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
| val1 | 99999999 |
+------+----------------+
4 rows in set (0.00 sec)
MariaDB [l]> ALTER IGNORE TABLE `products` ADD PRIMARY KEY(name);
Query OK, 4 rows affected (0.03 sec)
Records: 4 Duplicates: 1 Warnings: 0
MariaDB [l]> select * from products;
+------+----------------+
| name | caracteristics |
+------+----------------+
| val1 | asdfasdfasdf |
| val2 | asdasDasd |
| val3 | aesfawfa |
+------+----------------+
3 rows in set (0.00 sec)
MariaDB [l]>
test ADD PRIMARY KEY / INSERT IGNORE
Here is a test between add Primary key and insert ignore into. and you can see that add Primary key (90 sec / 120 sec) is a little bit faster in this sample
MariaDB [l]> CREATE TABLE `bigtable10m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable10m`
-> select lpad(seq,8,'0') from seq_1_to_10000000;
Query OK, 10000000 rows affected (24.24 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> SELECT * FROM `bigtable10m` LIMIT 10;
+----------+
| id |
+----------+
| 00000001 |
| 00000002 |
| 00000003 |
| 00000004 |
| 00000005 |
| 00000006 |
| 00000007 |
| 00000008 |
| 00000009 |
| 00000010 |
+----------+
10 rows in set (0.00 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (28.49 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (29.01 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (32.98 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> ALTER IGNORE TABLE `bigtable30m` ADD PRIMARY KEY(id);
Query OK, 30000000 rows affected (1 min 32.34 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 0
MariaDB [l]>
MariaDB [l]> DROP TABLE `bigtable30m`;
Query OK, 0 rows affected (0.52 sec)
MariaDB [l]>
MariaDB [l]> CREATE TABLE `bigtable30m` (
-> `id` varchar(32) NOT NULL DEFAULT ''
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)
MariaDB [l]>
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (37.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (41.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]> INSERT INTO `bigtable30m` SELECT * FROM `bigtable10m`;
Query OK, 10000000 rows affected (30.87 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
MariaDB [l]>
MariaDB [l]> CREATE TABLE bigtable_unique (
-> `id` varchar(32) NOT NULL DEFAULT '',
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.02 sec)
MariaDB [l]>
MariaDB [l]> INSERT IGNORE bigtable_unique SELECT * FROM `bigtable30m`;
Query OK, 10000000 rows affected, 65535 warnings (1 min 57.99 sec)
Records: 30000000 Duplicates: 20000000 Warnings: 20000000
MariaDB [l]>
I think partitionning is not the way you should go for this particular problem. How would you partition? On what criteria?
I think your main concern is architectural and should be fixed prior to anything else: unique records are not unique.
Because of the volumetry I think any solution will take a while to execute. But my bet is that this one is the fastest:
CREATE TABLE products_unique (
name VARCHAR(50) NOT NULL,
characteristics VARCHAR(50),
PRIMARY KEY (name)
);
INSERT IGNORE INTO products_unique SELECT * FROM products;
RENAME TABLE products TO products_backup;
RENAME TABLE products_unique TO products;
Duplicates will be evinced arbitrarily, but I think it is what you are looking for anyway.
If it takes too long, you should try running it overnight... I just hope the transaction buffer does not explode on you in which case we'd have to work on some stored procedure to separate the inserts in batches.
Yes, it is a good idea to fix performance issues. That is the correct answer always when you have performance issues serious-enough to be wondering about performance fixes.
You can achieve that by altering the table and making name a primary key, as you have already realized.
Your query should not be necessary. You should create a temporary table instead where you would insert the values you deem necessary. Let's suppose the name of that table is mytemptable. Then:
insert into mytemptable(name, characteristics)
select name, characteristics
from products
where not exists (select 1
from mytemptable t
where products.name = t.name);
Then remove your records from products using
delete from products;
then alter products, make sure it has name as a primary key and then
insert into products(name, characteristics)
select name, characteristics
from mytemptable;
and finally drop your temporary table.
As about your query:
Since you remove records, max(name) will be equal to all other names in your group if you have a single possible name associated to a given characteristics value, which is pretty safe to assume.. So, if you have a possible characteristics value matching a single name, you will remove all instances of that name, so yes, your query will mess your data.
I'm using InnoDb engine by default. And this is what looks strange:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> set session transaction isolation level serializable;
Query OK, 0 rows affected (0.00 sec)
mysql> create table test_1(id int);
Query OK, 0 rows affected (0.07 sec)
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)
mysql> show tables;
+------------------+
| Tables_in_reestr |
+------------------+
| test_1 |
+------------------+
1 rows in set (0.00 sec)
It looks strange, because I started transaction and rollbacked, but to no avail. So, what I'm doing wrong?
To expand on the comment above: in MySQL, basically all operations that alter database objects perform auto-commit. The main categories are:
any DDL on your objects, like CREATE/ALTER/DROP TABLE/VIEW/INDEX...,
anything that modifies the system database mysql, like ALTER/CREATE USER,
any administrative commands, like ANALYZE,
any data loading/replication statements.
Actually, I find it best to assume that INSERT, UPDATE and DELETE are safe, and anything else is not.
Source: https://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html
In the following MySQL code, the first two blocks drop and create a temporary table _temp (with different column labels) and select * from it without a problem. Then, I create a stored procedure that does the same thing (i.e., select * from _temp), and it works first time, but not the second, failing with
ERROR 1054 (42S22): Unknown column 'test._temp.f' in 'field list'
It seems like select * from _temp on its own correctly handles the change in table columns, but the previous columns names are remembered across stored procedure calls. Am I doing something wrong, or is there a workaround?
MySQL Code
drop temporary table if exists _temp;
create temporary table _temp select 'first' as f;
select * from _temp;
drop temporary table if exists _temp;
create temporary table _temp select 'second' as s;
select * from _temp;
drop procedure if exists selectTemp;
create procedure selectTemp()
select * from _temp;
drop temporary table if exists _temp;
create temporary table _temp select 'first' as f;
call selectTemp();
drop temporary table if exists _temp;
create temporary table _temp select 'second' as s;
call selectTemp();
Transcript
$ mysql --version
mysql Ver 14.14 Distrib 5.5.38, for debian-linux-gnu (x86_64) using readline 6.2
mysql> source temp.sql
Query OK, 0 rows affected (0.01 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
+-------+
| f |
+-------+
| first |
+-------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
+--------+
| s |
+--------+
| second |
+--------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
+-------+
| f |
+-------+
| first |
+-------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
ERROR 1054 (42S22): Unknown column 'test._temp.f' in 'field list'
After paring this down to a minimal working example, and distilling the essential elements, searching for a bug report, this became much easier. It turns out that this was reported all the way back in 2005 as:
Bug #12257 SELECT * inside PROCEDURE gives "Unknown column" on second loop if tbl changed
Some of the bugs marked as a duplicate of that are actually more along the lines of the example:
Bug #15766 select * from table inside stored procedure uses old field names
Bug #49333 Unknown column 'test.TEMPTABLE.column1' in 'field list'
Bug #62406 new cursor, on table with same name but different structure as used before fails
The bug is closed, but apparently not fixed yet, though 5.6 mentions the behavior. From the comments in the bug report:
Noted in 5.6.6 changelog.
"Unknown column" errors or bad data could result from changing the set
of columns in a table used within a stored program between executions
of the program or while the table was used within a program loop.
Is it possible to create a mysql view and table in same name
for example i have a table hs_hr_employee i want create a view as a same name
create VIEW hs_hr_employee AS SELECT * from hs_hr_employee;
I m getting following error
#1050 - Table 'hs_hr_employee' already exists
Any help Thankful
Regards
you can't , give to view different name like
hs_hr_employee_view
from manual
Within a database, base tables and views share the same namespace, so
a base table and a view cannot have the same name.
As stated, you can't do it with views, but you can with temporary tables.
If you create a temporary table with the same name as an actual table, the temporary table will shadow (hide) the actual table. This means that you can't access the actual table until you have dropped the temporary table:
mysql> create table t select 1; # actual table t
Query OK, 1 row affected (0.58 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> create temporary table t select*from t; # temp table t
Query OK, 1 row affected (0.53 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert t select 2; # t refers to the temp table
Query OK, 1 row affected (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> select*from t; # temp table
+---+
| 1 |
+---+
| 1 |
| 2 |
+---+
2 rows in set (0.00 sec)
mysql> drop table t; # temp table
Query OK, 0 rows affected (0.06 sec)
mysql> show tables like "t"; # verify that actual table still exists. show tables will not show temporary tables
+--------------------+
| Tables_in_test (t) |
+--------------------+
| t |
+--------------------+
1 row in set (0.00 sec)
mysql>
mysql> select*from t; # now t refers to the actual table
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
mysql> drop table t; # actual table
Query OK, 0 rows affected (0.14 sec)
mysql>
However, temporary tables are gone (even if you do not drop them) once your session is disconnected. You'll need to re-create them every time you connect.
Yes, if you create it under a different schema.
for example
dbo.Customers
user.customers (this could be a view)
You can call it by including the schema or if you call it from a user that is assigned the schema.
This is for MS SQL server.
Not only in mysql server but in any sqldialect, tables and views in the same schema can't be with the same name as it will lead to ambiguity when you fire a select query like from where to select the columns either from table or view.