mysql use EXPLAIN and SELECT rows column alone - mysql

I am willing to get information about the progress of a BIG insert of 10M rows into out_table during the INSERT execution.
I found that:
query1: EXPLAIN SELECT COUNT(*) FROM out_table; gives:
+----+-------------+-----------+-------+---------------+-----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------------------+---------+------+---------+-------------+
| 1 | SIMPLE | out_table| index | NULL | periodo_cuit_cuil_idx | 22 | NULL | 2518148 | Using index |
+----+-------------+-----------+-------+---------------+-----------------------+---------+------+---------+-------------+
As I want to show the advance of the INSERT script I will run an script that keeps showing the progress. The problem I have is that I cannot get the 'rows'
column alone from the former query.
I've tried to make a SELECT rows FROM (query1) and to asign the output to a variable with no luck. Any idea?

Related

MySQL- Improvement on count(*) aggregation with composite index keys

I have a table with the following structure with almost 120000 rows,
desc user_group_report
+------------------+----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+-------------------+-------+
| user_id | int | YES | MUL | NULL | |
| group_id | int(11) | YES | MUL | NULL | |
| type_id | int(11) | YES | | NULL | |
| group_desc | varchar(128)| NO| | NULL |
| status | enum('open','close')|NO| | NULL | |
| last_updated | datetime | NO | | CURRENT_TIMESTAMP | |
+------------------+----------+------+-----+-------------------+-------+
I have indexes on the following keys :
user_group_type(user_id,group_id,group_type)
group_type(group_id,type_id)
user_type(user_id,type_id)
user_group(user_id,group_id)
My issue is I am running a count(*) aggregation on above table group by group_id and with a clause on type_id
Here is the query :
select count(*) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
and here is the explain plan (query taking 0.3 secs on average):
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | user_group_report | index | user_group_type,group_type,user_group | group_type | 10 | NULL | 119811 | Using where; Using index |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Here as I understand the query almost does a full table scan because of complex indices and When I am trying to add an index on group_id, the rows in explain plan shows a less number (almost half the rows) but the time taking for query execution is increased to 0.4-0.5 secs.
I have tried different ways to add/remove indices but none of them is reducing the time taken.
Assuming the table structure cannot be changed and querying is independent of other tables, Can someone suggest me a better way to optimize the above query or If i am missing anything here.
PS:
I have already tried to modify the query to the following but couldn't find any improvement.
select count(user_id) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
Any little help is appreciated.
Edit:
As per the suggestions, I added a new index
type_group on (type_id,group_id)
This is the new explain plan. The number of rows in explain,reduced but the query execution time is still the same
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | user_group_report | ref | user_group_type,type_group,user_group | type_group | 5 | const | 59846 | Using where; Using index |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
EDIT 2:
Adding details as suggested in answers/comments
select count(*)
from user_group_report
where type_id = 1
This query itself is taking 0.25 secs to execute.
and here is the explain plan:
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | user_group_report | ref | type_group | type_group | 5 | const | 59866 | Using index |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
I believe that your group_type is wrong. Try to switch the attributes.
create index ix_type_group on user_group_report(type_id,group_id)
This index is better for your query because you specify the type_id = 1 in the where clause. Therefore, the query processor finds the first record with type_id = 1 in your index and then it scans the records in the index with this type_id and performs the aggregation. With such index, only relevant records in the index are accessed which is not possible with the group_type index.
If type_id is selective (i.e. it reduces the search space significantly), creating an index on type_id, group_id should help significantly.
This is because it reduces the number of records that need to be grouped first (remove everything where type_id != 1), and only then does the grouping/summing.
EDIT:
Following on from the comments, it seems we need to figure out more about where the bottleneck is - finding the records, or grouping/summing.
The first step would be to measure the performance of:
select count(*)
from user_group_report
where type_id = 1
If that is significantly faster, the challenge is likely in the grouping than in finding the records. If that's just as slow, it's in finding the records in the first place.
Do most of the columns really need to be NULLable? Change to NOT NULL where applicable.
What percentage of the table has type_id = 1? If it is most of the table, then that would explain why you don't see much improvement. Meanwhile, the EXPLAIN seems to be thinking there are only two distinct values for type_id, hence it says only half the table will be scanned -- this number cannot be trusted.
To get more insight into what is going on, please do these:
EXPLAIN FORMAT=JSON SELECT...;
And
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
We can help interpret the data you get there. (Here is a brief discussion of such.)

mysql get average of column join from million records

SELECT AVG(table1.column1) as a,
table2.column2
FROM table1
LEFT OUTER JOIN table2
ON table2.column2 = table1.column2
GROUP BY table2.column2 ORDER BY a DESC LIMIT 10
This is MySQL code. I have 1.5 Million rows in table1, 200.000 rows in table2.
I am still waiting for the query to finish.
Does anybody know a way to work in a shorter time?
Lot of comments in the same vein but I thought I'd give a thorough answer. I'm gonna use one of my own tables/databases here for explanation. Let's take this query:
SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%"
This query returns about 851 and takes 0.5 seconds. If we add the word EXPLAIN to the query, MySQL tells us what this query is doing.
mysql> EXPLAIN SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%";
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1183 | Using where |
| 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 6594 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)
Important column to look at here is the rows as this is the number of records MySQL is having to look at and in this case for tables A and B it is having to look up all the rows even though there's only 851 that fit the condition. This is how tables can get out of hand quickly, this only has 6594 record to search through but left alone this could easily reach your 1.5 million rows.
So we can cut this down by adding an index to the table, allowing MySQL to store a reference for each record.
ALTER TABLE AmazonWishlistItemPrices ADD INDEX idx_asin (asin)
This simply says create an index called idx_asin and use the column asin to do the indexing. If we re run our EXPLAIN...
mysql> EXPLAIN SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%";
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1183 | Using where |
| 1 | SIMPLE | B | ref | idx_asin | idx_asin | 12 | mah_database.A.asin | 6 | Using index |
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
2 rows in set (0.00 sec)
We're down to six rows and you can see in the possible_keys it's found our index. You may find that with certain joins and where clauses your indexes are being ignored that's simply MySQL saying "I'm going to have to get all the data anyway" because of the conditions you've provided in the WHERE condition.
It's best to use numeric keys for indexing, you can get away with some varchars but they do take up disk space. You should have a PRIMARY KEY on each table where possible. So look at your database structure and consider adding some indexes.
Final thing to check if your table has indexes you can use SHOW CREATE TABLE followed by the table name.

SELECT on indexed column is slow

In a MySQL db I have a table that only has 2 columns, for all intents and purposes: a key hash and a value. Both are INTEGER type. The hash column will have a large number of duplicates (worst case expect ~80k dup for each hash, not possible to make unique due to small hash preimage), and the table contains on the order of 100 billion rows.
Right now I have the hash column indexed (CREATE INDEX idx_hash ON table(hash)); however lookups are very slow. Something like SELECT value FROM table WHERE hash=123 LIMIT 50 will take minutes if not hours, while a similar select on a similar sized table on a primary key column will finish in a jiffy on the same machine.
So my question is how do I optimize for lookup in this case? Is sub-linear time SELECT possible on index columns? This table will be mostly read-only, rebuilding it is possible but will take a long time, so I'd like to gather some information and do it correctly.
EXPLAIN says:
+----+-------------+----------------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
| 1 | SIMPLE | partial_lookup | NULL | ALL | NULL | NULL | NULL | NULL | 100401571 | 10.00 | Using where |
+----+-------------+----------------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
ANALYZE:
+--------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------------------+---------+----------+----------+
| partial_lookup | analyze | status | OK |
+--------------------+---------+----------+----------+
1 row in set (1.47 sec)

How to improve performance of this SELECT query JOIN

I have a query which seems to be taking a long time to run on occasion. The slowness may be unrelated but I wanted to check what could be done to make this more efficient.
The user table has about 40k rows. The code table has about 30k rows. user_id and code are unique values.
SELECT *
FROM `user`, code
WHERE `user`.user_id = code.user_id
AND code.code = '50816ef96210415d1cad824bdb43';
I have an index setup on the code.user_id field. Anything else I can do? Should I have other indexes in place here?
Output from EXPLAIN on that query:
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> | 1 | SIMPLE | code | ALL | user_id | NULL | NULL | NULL 35696 | Using where |
>> | 1 | SIMPLE | user | eq_ref | PRIMARY | PRIMARY | 4 | mydb.code.user_id | 1 | |
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> 2 rows in set (10.11 sec)
You also need to add indexes on code.code and user.user_id fields, and it should start flying
Beside adding an index to code.code another thing you can do is to select only the columns you need ( i dont like using SELECT * )

Why my query is slower if a date column is in the SELECT part of my query?

I have a strange behavior with my mysql query below:
SELECT domain_id, domain_name, domain_lastupdate
FROM domains
WHERE domain_id > 300000 LIMIT 2000
takes ~ 15seconds...
while
SELECT domain_id, domain_name
FROM domains
WHERE domain_id > 300000 LIMIT 2000
takes ~ 0.05seconds...
I've tried different ids with different limits doing one before the other and the other way around not to get cached results, but I end up with dramatic time differences.
I have 1 index on the domain_id, 1 on the domain_name, but none with both columns...
I just don't get it...
#
The domain_lastupdate is a simple Date column.
Here's the EXPLAIN output of both queries:
explain SELECT domain_id, domain_name, domain_lastupdate FROM domains WHERE domain_id > 255000 LIMIT 500;
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
| 1 | SIMPLE | domains | range | UN_domainid | UN_domainid | 4 | NULL | 12575357 | Using where |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
1 row in set (0.00 sec)
second one:
explain SELECT domain_id, domain_name FROM domains WHERE domain_id > 255000 LIMIT 500;
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | domains | range | UN_domainid | UN_domainid | 4 | NULL | 12575369 | Using where; Using index |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
1 row in set (0.01 sec)
Any idea why the first one doesn't use the index ?
When you are pulling out the non date columns that you have indexed the SQL Server is able to pull your data directly out of the index and needn't go to the table at all. To get the date it is having to hit the table. Add an index on the date column.
Also I suppose you could create a multi column index. Make sure you have domain_id as the first column in the index. Creating Indexes
What you want to use is what is called A Covering Index