MySQL: unique and index equivalence

MySQL: unique and index equivalence - mysql

I created a Java application that uses Hibernate ORM, with Hibernate tools I get an automated script that install or upgrades the DB schema from the Java objects used as entities.
The program works properly in MySQL, however for Oracle an error is triggered when in one column the constraint "unique" is declared and after an index is attempted to be defined. Oracle says that a "unique" constraint creates an index by default, so two indexes on the same column cannot be declared,
So, my question is if in MySQL there's an equivalence or relation between the unique constraint and one index.
Please clarify. Thanks in advanced.

A unique constraint requires a index so it can be enforced. Both DBMS create an appropriate index when you declare columns as unique. The only difference is that Oracle prevents you from creating redundant indexes but MySQL doesn't:
show index from test_table;
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| test_table | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | |
| test_table | 0 | foo_unique | 1 | foo | A | 0 | NULL | NULL | | BTREE | |
| test_table | 1 | foo_key | 1 | foo | A | 0 | NULL | NULL | | BTREE | |
+------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

MySQL just doesn't care for useless indexes. For updates it will check all unique indexes and for SELECTs it will pick an arbitrary index.
So to make Oracle happy, delete the index before creating the unique index.

Related

Why would cardinality of an index in a restored table be different from cardinality in the original table?

I'm testing a proprietary tool that dumps a table in a MySQL RDS to the parquet format, and then restores it into another MySQL RDS.
Both tables have the same amount of rows:
mysql> SELECT COUNT(*) FROM fox_owners;
+----------+
| COUNT(*) |
+----------+
| 118950 |
+----------+
The table itself is configured the same way in both cases:
mysql> SHOW CREATE TABLE fox_owners;
+------------+-------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------+
| fox_owners | CREATE TABLE `fox_owners` (
`name` mediumtext,
`owner_id` bigint NOT NULL,
PRIMARY KEY (`owner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------+-------------------------------------------------------+
So far so good, right?
However, the table sizes are different.
The original:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5582.52 |
The restored one:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5584.52 |
The restored one is 2MB bigger!
However, what's really bugging me, is the change in cardinality of the indexes between the 2 tables.
Original:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 118728 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Restored:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 117518 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Why would cardinality drop from 118728 to 117518?
If the restored table is less unique than the original - isn't this a clear sign that this table is different? How can I verify that these 2 tables in separate RDS databases have identical content?
And shouldn't the cardinality be 118950 in both of them anyway, since for a table with a single primary key column, the cardinality must be equal to the number of rows in the table?
I've ran ANALYZE TABLE on both tables, the values didn't change.

No Problem.
The "cardinality" is determined by making a small number of 'random' probes into the table. This leads to estimates. Sometimes the estimates are off by a factor of two or even more. 118728 and 117518 are unusually close to each other.
When loading/copying/altering a table, the BTrees are rebuilt. This leads to a likely variation of how the blocks of the BTree are laid out. So, it is normal to see the size (on disk) of a table change. A change of a factor of 2 is rare for this.

Extremely slow MySQL performance

I have a table with about 500K jobs, each job has a unique ID which is used as primary key and a status that is used to indicate whether the job is pending, complete or failed. The status is an integer that is not key.
My problem is that a simple query where I try to select jobs based on the status takes too much time, more than 10 minutes. There are about 46 threads connected in the DB and I also did a restart but it didn't help the performance.
I pasted the table schema and the query I try to run here:
http://pastie.org/10416054
Is there any way to find what's the bottleneck and optimize the table so it doesn't take that long?

After hours I would rifle off the following command:
CREATE INDEX idx_qry_status ON queries(status);
As your query is doing a tablescan, employing no index whatsoever.
See the Manual page on Create Index.
A visual of the after, table-wise (not performance-wise):
create table queries
( id bigint auto_increment primary key,
status int null
-- partial definition
);
insert queries (status) values (7),(2),(1),(4),(1),(5),(9),(11);
CREATE INDEX idx_qry_status ON queries(status);
show indexes from queries;
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| queries | 0 | PRIMARY | 1 | id | A | 8 | NULL | NULL | | BTREE | | |
| queries | 1 | idx_qry_status | 1 | status | A | 8 | NULL | NULL | YES | BTREE | | |
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

composite index on large table, optimizing aggregate query

We are having a large table (Having arround 160 million records) in MySql 5.5.
The machine having 4GB RAM where we installed our mysql
table schema
+---------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| domain | varchar(50) | YES | MUL | NULL | |
| uid | varchar(100) | YES | | NULL | |
| sid | varchar(100) | YES | MUL | NULL | |
| vurl | varchar(2500) | YES | | NULL | |
| ip | varchar(20) | YES | | NULL | |
| ref | varchar(2500) | YES | | NULL | |
| stats_time | datetime | YES | MUL | NULL | |
| country | varchar(50) | YES | | NULL | |
| region | varchar(50) | YES | | NULL | |
| place | varchar(50) | YES | | NULL | |
| email | varchar(100) | YES | MUL | NULL | |
+---------------+---------------+------+-----+---------+-------+
Indexes
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visit_views | 1 | sid_index | 1 | sid | A | 157531031 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | domain_index | 1 | domain | A | 17 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | email_index | 1 | email | A | 392845 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | stats_time_index | 1 | stats_time | A | 78765515 | NULL | NULL | YES | BTREE | | |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Example query
SELECT count(*)
FROM visit_views
WHERE domain ='our'
AND email!=''
AND stats_time BETWEEN '2010-06-21 00:00:00' AND '2015-08-21 00:00:00';
We are having very slow performance on queries like above, So I want to add composite index on this table
I ran following command
ALTER TABLE visit_views ADD INDEX domain_statstime_email (domain,stats_time,email);
after running this command , our table got locked, it has reached connection limit (connect limit is 1000). Now table is not responding for any INSERTS and SELECTS.
Here are my few questions
1.Why table got locked and why table is not releasing existing connections
2.How much time it will take to complete the index. I applied 3 hours back still index not created.
3.How to see index creation progress.
4.Why connection limit suddenly increasing to max while adding index to table.
5.Is it safe to add composite indexes for this kind of large table
6.If I add partitions for this table, will it any better performance.
I don't know much about indexes
some stats
+---------------------------+
| ##innodb_buffer_pool_size |
+---------------------------+
| 3221225472 |
+---------------------------+

Your query has three conditions: an inequality, an equality, and a range.
WHERE domain ='our'
AND email!=''
AND stats_time BETWEEN '2010-06-21 00:00:00' AND '2015-08-21 00:00:00';
To make this work, you should try the following indexes to see which one works better.
(email, domain, stats_time)
(domain, email, stats_time)
Why these? MySQL indexes are BTREE. That is, they're sorted in order. So to satisfy the query MySQL finds the first element in the index matching your query. That's based on domain, email, and the starting stats_time value. It then scans the index sequentially looking for the last matching value. Along the way it counts the records, and that satisfies your query. In other words it does a range scan on stats_time.
Why the choice? I don't know what MySQL will do with the inequality in your email matching predicate. That's why I suggest trying both.
If you have not simplified the query you showed us, you also might try a compound covering index on
(domain, stats_time, email)
This will random-access immediately to the first matching domain/stats_time combination, and then scan to the last one. As it scans, it will look at the email values from the index (that's why this is called a covering index) and pick out the rows matching. Along the way it counts the rows.
You should consider declaring your email column NOT NULL to help your inequality test use its index more efficiently. Read http://use-the-index-luke.com/ for good background information.
As to your questions:
Why table got locked and why table is not releasing existing connections
Why connection limit suddenly increasing to max while adding index to table.
It can take a long time to add an index to a large table. Yours, at 160 megarows, is large. While that indexing operation us going on, other users of the table must wait. So, if you're accessing this from a web app, the connections pile up waiting for it to become available.
How much time it will take to complete the index. I applied 3 hours back still index not created.
It will be much faster on a quiet system. It is also possible you have some redundant single-column indexes you could drop. You may wish to copy the table and index the copy, then, when it's ready, rename it.
How to see index creation progress.
SHOW FULL PROCESSLIST will display all the action in your MySQL server. You'll need a command line interface to give this command.
Is it safe to add composite indexes for this kind of large table
Yes, of course, but it takes time on a production system.
If I add partitions for this table, will it any better performance.
Probably not. What WILL help is DELETEing rows that are old, if you don't need them.

Order by and pagination

I have a thousand of records in my database mysql and I use pagination to retrieve just 10 results.
When i add a order by in my query it slow down but when i omit it the query run very fast.
I know that the problem come from that the query load whole results, sort them and after that it get the 10 records.
I don't use index because the column use for order is a PK and i think if i'm not wrong in mysql a index is created automatically on every primary key
Why the index on my PK which is the column I'm ordering.
not used ?
Is there any alternative solution to perform sorting without load all the data ?
How to add new inserted data at the first row of tables and not at the end of the table ?
My sql query
select distinct ...... order by appeloffre0_.ID_APPEL_OFFRE desc limit 10
and my indexes
mysql> show index from appel_offre;
+-------------+------------+--------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| appel_offre | 0 | PRIMARY | 1 | ID_APPEL_OFFRE | A | 13691 | NULL | NULL | | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_1 | 1 | ID_APPEL_OFFRE_MERE | A | 2 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_2 | 1 | ID_ACHETEUR | A | 2 | NULL | NULL | | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_3 | 1 | USER_SAISIE | A | 2 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_4 | 1 | USER_VALIDATION | A | 4 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | ao_fk_3 | 1 | TYPE_MARCHE | A | 2 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | ao_fk_5 | 1 | USER_CONTROLE | A | 2 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
7 rows in set (0.03 sec)
no index was chosen in explain cmd:
+----+-------------+---------------+--------+-------------------------------------+--------------------+---------+----------------
| id | select_type | table | type | possible_keys | key | key_len | ref
+----+-------------+---------------+--------+-------------------------------------+--------------------+---------+----------------
| 1 | SIMPLE | appeloffre0_ | ALL | NULL | NULL | NULL | NULL
UPDATE SOLUTION
the problem was from distinct when i delete it the query finnaly use the index.

Because you already use an index on "USER_VALIDATION", MySQL won't use the ID index instead.
Try rebuilding the USER_VALIDATION index to include the ID too:
CREATE UNIQUE INDEX appel_offre_ibfk_4 ON appel_offre (USER_VALIDATION, ID);
Update
Log all Hibernate queries, extract the slow query and use EXPLAIN in a db console to understand what execution plan MySQL selects for this query. It may be possible for the db to use a FULL TABLE SCAN even when you have an index, because the index is too large to fit into memory. Try giving it a HINT as explained in this post.
According to MySQL ORDER BY optimization documentation you should:
To increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
• Increase the sort_buffer_size variable value.
• Increase the read_rnd_buffer_size variable value.
• Use less RAM per row by declaring columns only as large as they need
to be to hold the values stored in them. For example, CHAR(16) is
better than CHAR(200) if values never exceed 16 characters.
• Change the tmpdir system variable to point to a dedicated file
system with large amounts of free space. The variable value can list
several paths that are used in round-robin fashion; you can use this
feature to spread the load across several directories. Paths should be
separated by colon characters (“:”) on Unix and semicolon characters
(“;”) on Windows, NetWare, and OS/2. The paths should name directories
in file systems located on different physical disks, not different
partitions on the same disk.
Also make sure DISTINCT doesn't overrule your index. Try removing it and see if it helps.

Add an index to the column by which you are ordering.
You can't add rows to the beginning of the table, just like you can't add rows to the end of the table. Database tables are multisets. Multisets are by definition unordered collections. The notion of a first element or a last element makes no sense for multisets.

Composite index not used in mysql

According to MySQL docs a composite index will still be used if the leftmost fields are part of the criteria. However, this table will not join correctly with the primary key; I had to add another index of the left two fields which is then used.
One of the tables is memory, and I know that by default memory uses a hash index which can't be used for group/order. However I'm using all rows of the memory table and not the index, so I don't think that relates to the problem.
What am I missing?
mysql> show create table pr_temp;
| pr_temp | CREATE TEMPORARY TABLE `pr_temp` (
`player_id` int(10) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`) USING BTREE,
KEY `insert_date` (`insert_date`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
mysql> show create table player_game_record;
| player_tank_record | CREATE TABLE `player_game_record` (
`player_id` int(10) unsigned NOT NULL,
`game_id` smallint(5) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`,`game_id`),
KEY `insert_date` (`insert_date`),
KEY `player_date` (`player_id`,`insert_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 DATA DIRECTORY='...' INDEX DIRECTORY='...' |
mysql> explain select pgr.* from player_game_record pgr inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY,insert_date,player_date | player_date | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 21 | |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
mysql> explain select pgr.* from player_game_record pgr force index (primary) inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY | PRIMARY | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 2873031 | |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
2 rows in set (0.00 sec)
I think the primary key should work, with the two left columns (player_id, insert_date) being used. However it will use the player_date index by default, and if I force it to use the primary index it looks like it's only using one field rather than both.
Update2: Mysql version 5.5.27-log
Update3:
(note this is after removing the player_date index while trying some other tests)
mysql> show indexes in player_game_record;
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_game_record | 0 | PRIMARY | 1 | player_id | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 2 | insert_date | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 3 | game_id | A | 576276246 | NULL | NULL | | BTREE | | |
| player_game_record | 1 | insert_date | 1 | insert_date | A | 33304 | NULL | NULL | | BTREE | | |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (1.08 sec)
mysql> select count(*) from player_game_record;
+-----------+
| count(*) |
+-----------+
| 576276246 |
+-----------+
1 row in set (0.00 sec)

I agree that your use of the MEMORY storage engine for one of the tables should not at all be an issue here, since we're talking about the other table.
I also agree that the leftmost prefix of an index can be used exactly how you are trying to use it, and I cannot think of any reason why the primary key could not be used in exactly the same way as any other index.
This has been a head-scratcher. The new index you created "should" be the same as the left side of the primary key, so why don't they behave the same way? I have two thoughts, both of which lead me to the same recommendation, even though I am not as familiar with the internals of MyISAM as I am with InnoDB. (As an aside, I'd recommend InnoDB over MyISAM.)
The index on your primary key was presumably on the table when you began inserting data, while the new index was added while most or all of the data was already there. This suggests that your new index is nice and cleanly-organized internally, while your primary key index may be highly fragmented, having been built as the data was loaded.
The row count the optimizer shows is based on index statistics, which may be inaccurate on your primary key due to the insert order.
The fragmentation theory may explain why querying with the primary key as your index is not as fast; the index statistics theory may explain why the optimizer comes up with such a different row count and it may explain why the optimizer might have been choosing a full table scan instead of using that index (which is only a guess, since we don't have the explain available).
The thing I would suggest based on these two thoughts is running OPTIMIZE TABLE on your table. If it took 12 hours to build that new index, then optimizing the table may very possibly take that long or longer.
Possibly helpful: http://www.dbasquare.com/2012/07/09/data-fragmentation-problem-in-mysql-myisam/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008