composite index on large table, optimizing aggregate query

composite index on large table, optimizing aggregate query - mysql

We are having a large table (Having arround 160 million records) in MySql 5.5.
The machine having 4GB RAM where we installed our mysql
table schema
+---------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| domain | varchar(50) | YES | MUL | NULL | |
| uid | varchar(100) | YES | | NULL | |
| sid | varchar(100) | YES | MUL | NULL | |
| vurl | varchar(2500) | YES | | NULL | |
| ip | varchar(20) | YES | | NULL | |
| ref | varchar(2500) | YES | | NULL | |
| stats_time | datetime | YES | MUL | NULL | |
| country | varchar(50) | YES | | NULL | |
| region | varchar(50) | YES | | NULL | |
| place | varchar(50) | YES | | NULL | |
| email | varchar(100) | YES | MUL | NULL | |
+---------------+---------------+------+-----+---------+-------+
Indexes
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visit_views | 1 | sid_index | 1 | sid | A | 157531031 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | domain_index | 1 | domain | A | 17 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | email_index | 1 | email | A | 392845 | NULL | NULL | YES | BTREE | | |
| visit_views | 1 | stats_time_index | 1 | stats_time | A | 78765515 | NULL | NULL | YES | BTREE | | |
+------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Example query
SELECT count(*)
FROM visit_views
WHERE domain ='our'
AND email!=''
AND stats_time BETWEEN '2010-06-21 00:00:00' AND '2015-08-21 00:00:00';
We are having very slow performance on queries like above, So I want to add composite index on this table
I ran following command
ALTER TABLE visit_views ADD INDEX domain_statstime_email (domain,stats_time,email);
after running this command , our table got locked, it has reached connection limit (connect limit is 1000). Now table is not responding for any INSERTS and SELECTS.
Here are my few questions
1.Why table got locked and why table is not releasing existing connections
2.How much time it will take to complete the index. I applied 3 hours back still index not created.
3.How to see index creation progress.
4.Why connection limit suddenly increasing to max while adding index to table.
5.Is it safe to add composite indexes for this kind of large table
6.If I add partitions for this table, will it any better performance.
I don't know much about indexes
some stats
+---------------------------+
| ##innodb_buffer_pool_size |
+---------------------------+
| 3221225472 |
+---------------------------+

Your query has three conditions: an inequality, an equality, and a range.
WHERE domain ='our'
AND email!=''
AND stats_time BETWEEN '2010-06-21 00:00:00' AND '2015-08-21 00:00:00';
To make this work, you should try the following indexes to see which one works better.
(email, domain, stats_time)
(domain, email, stats_time)
Why these? MySQL indexes are BTREE. That is, they're sorted in order. So to satisfy the query MySQL finds the first element in the index matching your query. That's based on domain, email, and the starting stats_time value. It then scans the index sequentially looking for the last matching value. Along the way it counts the records, and that satisfies your query. In other words it does a range scan on stats_time.
Why the choice? I don't know what MySQL will do with the inequality in your email matching predicate. That's why I suggest trying both.
If you have not simplified the query you showed us, you also might try a compound covering index on
(domain, stats_time, email)
This will random-access immediately to the first matching domain/stats_time combination, and then scan to the last one. As it scans, it will look at the email values from the index (that's why this is called a covering index) and pick out the rows matching. Along the way it counts the rows.
You should consider declaring your email column NOT NULL to help your inequality test use its index more efficiently. Read http://use-the-index-luke.com/ for good background information.
As to your questions:
Why table got locked and why table is not releasing existing connections
Why connection limit suddenly increasing to max while adding index to table.
It can take a long time to add an index to a large table. Yours, at 160 megarows, is large. While that indexing operation us going on, other users of the table must wait. So, if you're accessing this from a web app, the connections pile up waiting for it to become available.
How much time it will take to complete the index. I applied 3 hours back still index not created.
It will be much faster on a quiet system. It is also possible you have some redundant single-column indexes you could drop. You may wish to copy the table and index the copy, then, when it's ready, rename it.
How to see index creation progress.
SHOW FULL PROCESSLIST will display all the action in your MySQL server. You'll need a command line interface to give this command.
Is it safe to add composite indexes for this kind of large table
Yes, of course, but it takes time on a production system.
If I add partitions for this table, will it any better performance.
Probably not. What WILL help is DELETEing rows that are old, if you don't need them.

Related

Simple heavily-indexed table slow query in MySQL

I am having troubles with a particular query being slow. Although everything is heavily indexed, some similar queries working fine and the indexes are used, the query still is slow as hell. I cannot understand why, so maybe anybody can help.
Just for the prerequisites: the write speed of the underlying table does not matter. The table contains ~3.5 million entries but I guess MySQL should handle that just fine.
The query that is being slow takes about 2s
SELECT DISTINCT t.`tag_3` FROM `image_tags` t
WHERE t.`type` = 1 AND t.`category` LIKE "00%" AND tag_1 = "0"
--- DESCRIBE OUTPUT
--- The used index thirdtag is just an index defined as (type, category, tag_1, tag_3)
--- The actual result is 201 rows
+----+-------------+-------+- -----------------------+----------+---------+------+---------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+----------+---------+------+---------+-------------------------------------------+
| 1 | SIMPLE | t | range | [... A LOT ...] | thirdtag | 31 | NULL | 1652861 | Using where; Using index; Using temporary |
+----+-------------+-------+-------+-----------------+----------+---------+------+---------+-------------------------------------------+
The only thing that's standing out is the enormous amount of rows involved. If you compare with the 2 fast queries I attached to the end of this question it is literally the only thing different (at least from the first one). So most probably that's the problem. But that's how the data is given to me so I need to work with that. I thought if involved in the index mysql could handle the data just fine.
Does anybody have a suggestion how to optimize the query? Any suggestions if i could use different indexes that suit more to the query?
For comparison these 2 similar queries work blazing fast
--- just a longer category string resulting in fewer results
SELECT DISTINCT t.`tag_3` FROM `image_tags` t
WHERE t.`type` = 1 AND t.`category` LIKE "0000%" AND tag_1 = "0"
--- and additional where clause
SELECT DISTINCT t.`tag_3` FROM `image_tags` t
WHERE t.`type` = 1 AND t.`category` LIKE "00%" AND tag_1 = "0" and tag_2 = ""
The table (it has a lot of indexes probably too long to paste).
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| image | char(8) | NO | MUL | NULL | |
| category | varchar(6) | YES | MUL | NULL | |
| type | tinyint(1) | NO | MUL | NULL | |
| tag_1 | char(3) | NO | MUL | NULL | |
| tag_2 | char(3) | NO | MUL | NULL | |
| tag_3 | char(3) | NO | MUL | NULL | |
| tag_4 | char(3) | NO | MUL | NULL | |
| tag_5 | char(3) | NO | MUL | NULL | |
| tag_6 | char(3) | NO | MUL | NULL | |
+----------+------------------+------+-----+---------+----------------+

Please provide SHOW CREATE TABLE, it is more descriptive than DESCRIBE! In particular, I cannot see what indexes you have.
As My index cookbook explains, start the index with any fields that are '=', then you get one chance to add a 'range' comparison. Your category is a range, so
WHERE t.`type` = 1 AND t.`category` LIKE "00%" AND tag_1 = "0"
does not get past category in
INDEX(type, category, tag_1, tag_3)
For your 3 queries, these are the best indexes:
INDEX(type, tag_1, category)
INDEX(type, tag_1, category)
INDEX(type, tag_1, tag_2, category)
category should be last; the other columns can be in any order. Perhaps some one of your indexes handled the 3rd case?
it has a lot of indexes probably too long to paste
Probably most of them are unused. Keep in mind that INDEX(a) is unnecessary if you also have INDEX(a,b).

Extremely slow MySQL performance

I have a table with about 500K jobs, each job has a unique ID which is used as primary key and a status that is used to indicate whether the job is pending, complete or failed. The status is an integer that is not key.
My problem is that a simple query where I try to select jobs based on the status takes too much time, more than 10 minutes. There are about 46 threads connected in the DB and I also did a restart but it didn't help the performance.
I pasted the table schema and the query I try to run here:
http://pastie.org/10416054
Is there any way to find what's the bottleneck and optimize the table so it doesn't take that long?

After hours I would rifle off the following command:
CREATE INDEX idx_qry_status ON queries(status);
As your query is doing a tablescan, employing no index whatsoever.
See the Manual page on Create Index.
A visual of the after, table-wise (not performance-wise):
create table queries
( id bigint auto_increment primary key,
status int null
-- partial definition
);
insert queries (status) values (7),(2),(1),(4),(1),(5),(9),(11);
CREATE INDEX idx_qry_status ON queries(status);
show indexes from queries;
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| queries | 0 | PRIMARY | 1 | id | A | 8 | NULL | NULL | | BTREE | | |
| queries | 1 | idx_qry_status | 1 | status | A | 8 | NULL | NULL | YES | BTREE | | |
+---------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Order by and pagination

I have a thousand of records in my database mysql and I use pagination to retrieve just 10 results.
When i add a order by in my query it slow down but when i omit it the query run very fast.
I know that the problem come from that the query load whole results, sort them and after that it get the 10 records.
I don't use index because the column use for order is a PK and i think if i'm not wrong in mysql a index is created automatically on every primary key
Why the index on my PK which is the column I'm ordering.
not used ?
Is there any alternative solution to perform sorting without load all the data ?
How to add new inserted data at the first row of tables and not at the end of the table ?
My sql query
select distinct ...... order by appeloffre0_.ID_APPEL_OFFRE desc limit 10
and my indexes
mysql> show index from appel_offre;
+-------------+------------+--------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| appel_offre | 0 | PRIMARY | 1 | ID_APPEL_OFFRE | A | 13691 | NULL | NULL | | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_1 | 1 | ID_APPEL_OFFRE_MERE | A | 2 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_2 | 1 | ID_ACHETEUR | A | 2 | NULL | NULL | | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_3 | 1 | USER_SAISIE | A | 2 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | appel_offre_ibfk_4 | 1 | USER_VALIDATION | A | 4 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | ao_fk_3 | 1 | TYPE_MARCHE | A | 2 | NULL | NULL | YES | BTREE | | |
| appel_offre | 1 | ao_fk_5 | 1 | USER_CONTROLE | A | 2 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
7 rows in set (0.03 sec)
no index was chosen in explain cmd:
+----+-------------+---------------+--------+-------------------------------------+--------------------+---------+----------------
| id | select_type | table | type | possible_keys | key | key_len | ref
+----+-------------+---------------+--------+-------------------------------------+--------------------+---------+----------------
| 1 | SIMPLE | appeloffre0_ | ALL | NULL | NULL | NULL | NULL
UPDATE SOLUTION
the problem was from distinct when i delete it the query finnaly use the index.

Because you already use an index on "USER_VALIDATION", MySQL won't use the ID index instead.
Try rebuilding the USER_VALIDATION index to include the ID too:
CREATE UNIQUE INDEX appel_offre_ibfk_4 ON appel_offre (USER_VALIDATION, ID);
Update
Log all Hibernate queries, extract the slow query and use EXPLAIN in a db console to understand what execution plan MySQL selects for this query. It may be possible for the db to use a FULL TABLE SCAN even when you have an index, because the index is too large to fit into memory. Try giving it a HINT as explained in this post.
According to MySQL ORDER BY optimization documentation you should:
To increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
• Increase the sort_buffer_size variable value.
• Increase the read_rnd_buffer_size variable value.
• Use less RAM per row by declaring columns only as large as they need
to be to hold the values stored in them. For example, CHAR(16) is
better than CHAR(200) if values never exceed 16 characters.
• Change the tmpdir system variable to point to a dedicated file
system with large amounts of free space. The variable value can list
several paths that are used in round-robin fashion; you can use this
feature to spread the load across several directories. Paths should be
separated by colon characters (“:”) on Unix and semicolon characters
(“;”) on Windows, NetWare, and OS/2. The paths should name directories
in file systems located on different physical disks, not different
partitions on the same disk.
Also make sure DISTINCT doesn't overrule your index. Try removing it and see if it helps.

Add an index to the column by which you are ordering.
You can't add rows to the beginning of the table, just like you can't add rows to the end of the table. Database tables are multisets. Multisets are by definition unordered collections. The notion of a first element or a last element makes no sense for multisets.

Simple Query Slow In Mysql

Is there anyway to get better performance out of this.
select * from p_all where sec='0P00009S33' order by date desc
Query took 0.1578 sec.
Table structure is shown below. There are more than 100 Millions records in this table.
+------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------------+------+-----+---------+-------+
| sec | varchar(10) | NO | PRI | NULL | |
| date | date | NO | PRI | NULL | |
| open | decimal(13,3) | NO | | NULL | |
| high | decimal(13,3) | NO | | NULL | |
| low | decimal(13,3) | NO | | NULL | |
| close | decimal(13,3) | NO | | NULL | |
| volume | decimal(13,3) | NO | | NULL | |
| unadjusted_close | decimal(13,3) | NO | | NULL | |
+------------------+---------------+------+-----+---------+-------+
EXPLAIN result
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | price_all | ref | PRIMARY | PRIMARY | 12 | const | 1731 | Using where |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
How can i speed up this query?

In your example, you do a SELECT *, but you only have an INDEX that contains the columns sec and date.
In result, MySQLs execution plan roughly looks like the following:
Find all rows that have sec = 0P00009S33 in the INDEX. This is fast.
Sort all returned rows by date. This is also possibly fast, depending on the size of your MySQL buffer. Here is possibly room for improvement by optimizing the sort_buffer_size.
Fetch all columns (= full row) for each returned row from the previous INDEX query. This is slow! see (1)
You can optimize this drastically by reducing the SELECTed fields to the minimum. Example: If you only need the open price, do only a SELECT sec, date, open instead of SELECT *.
When you identified the minimum columns you need to query, add a combined INDEX that contains exactly these colums (all columns involved - in the WHERE, SELECT or ORDER BY clause)
This way you can completely skip the slow part of this query, (3) in my example above. When the INDEX already contains all necessary columns, MySQLs optimizer can avoid looking up the full columns and serve your query directly from the INDEX.
Disclaimer: I'm unsure in which order MySQL executes the steps, possibly i ordered (2) and (3) the wrong way round. But this is not important to answer this question, though.

MySQL query very slow. Count(*) on indexed column

The table is in InnoDB table. Here is some information that might be helpful.
EXPLAIN SELECT COUNT(*) AS y0_ FROM db.table this_ WHERE this_.id IS NOT NULL;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | SIMPLE | this_ | index | PRIMARY | PRIMARY | 8 | NULL | 4711235 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> DESCRIBE db.table;
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| id2 | varchar(28) | YES | | NULL | |
| photo | longblob | YES | | NULL | |
| source | varchar(10) | YES | | NULL | |
| file_name | varchar(120) | YES | | NULL | |
| file_type | char(1) | YES | | NULL | |
| created_date | datetime | YES | | NULL | |
| updated_date | datetime | YES | | NULL | |
| createdby | varchar(50) | YES | | NULL | |
| updatedby | varchar(50) | YES | | NULL | |
+--------------+--------------+------+-----+---------+-------+
10 rows in set (0.05 sec)
The explain query gives me the result right there. But the actual query has been running for quite a while. How can I fix this? What am I doing wrong?
I basically need to figure out how many photos there are in this table. Initially the original coder had a query which checked WHERE photo IS NOT NULL (which took 3hours+) but I changed this query to check the id column as it is a primary key. I expected a huge performance gain there and was expecting an answer in under a second but that seems to not be the case.
What sort of optimizations on the database do I need to do? I think the query is fine but feel free to correct me if I am wrong.
Edit: mysql Ver 14.14 Distrib 5.1.52, for redhat-linux-gnu (x86_64) using readline 5.1
P.S: I renamed the tables for some crazy reason. I don't actually have the database named db and the table in question named table.

How long is 'long'? How many rows are there in this table?
A MyISAM table keeps track of how many rows it has, so a simple COUNT(*) will always return almost instantly.
InnoDB, on the other hand works differently: an InnoDB table doesn't keep track of how many rows it has, and so when you COUNT(*), it literally has to go and count each row. If you have a large table, this can take a number of seconds.
EDIT: Try COUNT(ID) instead of COUNT(*), where ID is an indexed column that has no NULLs in it. That may run faster.
EDIT2: If you're storing the binary data of the files in the longblob, your table will be massive, which will slow things down.
Possible solutions:
Use MyISAM instead of InnoDB.
Maintain your own count, perhaps using triggers on inserts and deletes.
Strip out the binary data into another table, or preferably regular files.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008