How to speed up my mysql query with group by? - mysql

Hi all I have this table schema
create table user_activities
(
id int unsigned auto_increment
primary key,
user_id int unsigned not null,
other_user_id int unsigned not null,
activity_type_id tinyint unsigned not null,
reason varchar(255) null,
created_at timestamp default CURRENT_TIMESTAMP not null,
updated_at timestamp default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP,
constraint user_activities_user_id_other_user_id_unique
unique (user_id, other_user_id),
constraint user_activities_other_user_id_foreign
foreign key (other_user_id) references users (id),
constraint user_activities_activity_type_id_foreign
foreign key (activity_type_id) references user_activity_types (id),
constraint user_activities_user_id_foreign
foreign key (user_id) references users (id)
)
engine = InnoDB
collate = utf8_unicode_ci;
create index user_activities_other_user_id_index
on user_activities (other_user_id);
create index user_activities_activity_type_id_index
on user_activities (activity_type_id);
create index user_activities_user_id_index
on user_activities (user_id);
The table has now 6515846 rows
Goal
I want to write a query to get the users that had the most recent activity in the last 7 days.
I need rows of user_id, mostrecentuseractivitydate
Then in the code I will do some action on them.
My query at the moment is
select updated_at, user_id from user_activities
where created_at > '2022-08-08 15:16:55'
group by user_id
order by max(updated_at) desc
limit 10;
The explain statement result is
1,SIMPLE,user_activities,,index,"user_activities_user_id_other_user_id_unique,user_activities_user_id_index",user_activities_user_id_index,4,,6416255,33.33,Using where; Using temporary; Using filesort
Problem
The query above with the given schema and number of rows takes forever like 5 minutes... and sometimes I receive no response and query hangs forever
That is not acceptable for my requirement.
Any ideas how to speed that up ?
I have already foreign_key as you can see from table schema on the user_id field and the innodb I think also generates index automatically on the foreign key.
I am also adding the where created_at > clause to reduce only the the items in the 7 days.
I even tried without adding the where created_at and did not change much to be honest.
Anyway I am interested only in the data from last 7 days so that where clause can stay

You need an index on created_at. Others might be useful as well, but start there.

You want to have a composite key with user_id and created_at in it.
This key should make it possible to do the group by as well as the where clause at the same time.
Try this:
create index user_activities_user_id_created_at on user_activities (user_id, created_at);

Related

Why is not primary index here?

Recently,I have reviwed the basic of SQL and found A question about index.
My Working environment is as follows:
os: Centos 7
mysql: 5.7.39
database: sakila
table: customer
AND My question is why Innodb uses idx_fk_store_id instead of the primary index when I use select count(customer_id) from customer
mysql> explain select count(customer_id) from customer;
AND Result:
type
key
Extra
index
idx_fk_store_id
Using index
The code to create this table is as follow:
CREATE TABLE `customer`(
`customer_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`store_id` tinyint(3) unsigned NOT NULL,
PRIMARY KEY(`customer_id`),
KEY `idx_fk_store_id`(`store_id`)
) ENGINE=InnoDE AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4
I've considered that it caused by MYSQL's Optimizer , even if it's hard to understand.
The type: index in the EXPLAIN report indicates it is doing an index-scan. That is, reading every entry in the index idk_fk_store_id.
It can get the values of the primary key from a secondary index, so it can count them.
The alternative of using the primary key would be a table-scan, which reads every row of the table.
The primary key index in InnoDB is the clustered index. It stores all the columns of the table.
The secondary index stores only the values of store_id plus the primary key values of rows where a given store_id value occurs.
So it will be able to get the same answer by doing an index-scan, by reading fewer pages than doing the table-scan.
Saying COUNT(x) is common mistake.
COUNT(*) counts the number of rows
COUNT(x) counts the number of rows where x is NOT NULL
So, if you try SELECT COUNT(*) FROM customer, the Explain will again say that it is using that index. But for a different reason than Bill gave. This time the Optimizer's logic goes this way:
find the "smallest" index. That will happen to be the same store_id index.
count the "rows" in that index. But no need to test whether customer_id IS NOT NULL.
Another note: In MySQL, the PRIMARY KEY is required to be NOT NULL, so customer_id cannot be NULL. Also you declared it to be NOT NULL. Hence COUNT(customer_id) is necessarily the same as COUNT(*).
TMI.

Why Primary Key was not used in counting Primary Key number? [duplicate]

Recently,I have reviwed the basic of SQL and found A question about index.
My Working environment is as follows:
os: Centos 7
mysql: 5.7.39
database: sakila
table: customer
AND My question is why Innodb uses idx_fk_store_id instead of the primary index when I use select count(customer_id) from customer
mysql> explain select count(customer_id) from customer;
AND Result:
type
key
Extra
index
idx_fk_store_id
Using index
The code to create this table is as follow:
CREATE TABLE `customer`(
`customer_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`store_id` tinyint(3) unsigned NOT NULL,
PRIMARY KEY(`customer_id`),
KEY `idx_fk_store_id`(`store_id`)
) ENGINE=InnoDE AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4
I've considered that it caused by MYSQL's Optimizer , even if it's hard to understand.
The type: index in the EXPLAIN report indicates it is doing an index-scan. That is, reading every entry in the index idk_fk_store_id.
It can get the values of the primary key from a secondary index, so it can count them.
The alternative of using the primary key would be a table-scan, which reads every row of the table.
The primary key index in InnoDB is the clustered index. It stores all the columns of the table.
The secondary index stores only the values of store_id plus the primary key values of rows where a given store_id value occurs.
So it will be able to get the same answer by doing an index-scan, by reading fewer pages than doing the table-scan.
Saying COUNT(x) is common mistake.
COUNT(*) counts the number of rows
COUNT(x) counts the number of rows where x is NOT NULL
So, if you try SELECT COUNT(*) FROM customer, the Explain will again say that it is using that index. But for a different reason than Bill gave. This time the Optimizer's logic goes this way:
find the "smallest" index. That will happen to be the same store_id index.
count the "rows" in that index. But no need to test whether customer_id IS NOT NULL.
Another note: In MySQL, the PRIMARY KEY is required to be NOT NULL, so customer_id cannot be NULL. Also you declared it to be NOT NULL. Hence COUNT(customer_id) is necessarily the same as COUNT(*).
TMI.

Why EXPLAIN SQL result for KEY is NULL

I have created INDEX for my table but when use explain QUERY the result for key is NULL.
my table as below:
TABLE list_country
id
id_tx
id_ref_country FK TO id in ref_country
cost
cceiling
INDEX FOR list_country:
id PRIMARY
id_tx,id_ref_country UNIQUE
id_tx KEY
id_ref_country KEY
TABLE ref_country
id
country_name
INDEX for ref_country:
id PRIMARY
i run explain query as below:
EXPLAIN
SELECT ctr.id_tx
, GROUP_CONCAT(rctr.country_name,':',cost) AS cost_country
, GROUP_CONCAT(rctr.country_name,':',cceiling) AS ceiling_country
, GROUP_CONCAT(rctr.country_name) AS country
FROM list_country ctr
LEFT JOIN ref_country rctr ON rctr.id = ctr.id_ref_country
GROUP BY id_tx
RESULT EXPLAIN FOR TABLE list_country TYPE = ALL, KEY = NULL
Why the key is null for list_country even i specify the index?
The DDL for this table:
CREATE TABLE `list_country` (
`id` INT NOT NULL AUTO_INCREMENT,
`id_tx` INT NOT NULL,
`id_ref_country` INT NOT NULL,
`cost` DECIMAL(15,2) DEFAULT NULL,
`cceiling` DECIMAL(15,2) DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `country_unik` (`id_tx`,`id_ref_country`) USING BTREE,
KEY `id_tx` (`id_tx`) USING BTREE,
KEY `id_ref_country` (`id_ref_country`) USING BTREE,
CONSTRAINT `list_country_ibfk_1` FOREIGN KEY (`id_tx`) REFERENCES `ep_tx` (`id_tx`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `list_country_ibfk_2` FOREIGN KEY (`id_ref_country`) REFERENCES `ref_country` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=INNODB AUTO_INCREMENT=55609 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
To get the results for you query, MySSQL needs to get the info of the following fields:
ctr.id_tx
ctr.id_ref_country
Because of this, only the index country_unik can be used, it contains both fields, or MySQL can just read the complete table.
EXPLAIN Output Format says, about Type=ALL:
A full table scan is done for each combination of rows from the
previous tables. This is normally not good if the table is the first
table not marked const, and usually very bad in all other cases.
Normally, you can avoid ALL by adding indexes that enable row
retrieval from the table based on constant values or column values
from earlier tables.
MySQL is avoiding the use of the index, because it needs all records for that table.

Tuning SQL Query for a table with size more than 2GB

I have a table with millions of records and the size of table currently is 2GB and expected to grow further
Table Structure
CREATE TABLE `test` (
`column_1` int(11) NOT NULL AUTO_INCREMENT,
`column_2` int(11) NOT NULL,
`column_3` int(11) NOT NULL,
`column_4` int(11) NOT NULL,
`column_5` datetime NOT NULL,
`column_6` time NOT NULL,
PRIMARY KEY (`column_1`),
UNIQUE KEY `index_1` (`column_2`,`column_3`),
UNIQUE KEY `index_2` (`column_2`,`column_4`),
KEY `index_3` (`column_3`),
KEY `index_4` (`column_4`),
KEY `index_5` (`column_2`),
KEY `index_6` (`column_5`,`column_2`),
CONSTRAINT `fk_1` FOREIGN KEY (`column_3`) REFERENCES `test2`(`id`),
CONSTRAINT `fk_2` FOREIGN KEY (`column_4`) REFERENCES `test2` (`id`),
CONSTRAINT `fl_3` FOREIGN KEY (`column_2`) REFERENCES `link` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14164023 DEFAULT CHARSET=utf8;
When I run the following query it is taking around 5-8 secs for different values of column_2. Can some one help to execute this better ?
SELECT count(*) FROM test WHERE test.column_2= 26 and
test.column_5 between '2015-06-01 00:00:00' AND
'2015-06-30 00:00:00' ;
Note: The timings mentioned are captured by executing the query on mysql work bench
Your index_6 currently has column_5, then column_2, so MySQL first tries to filter based on the BETWEEN clause. However MySQL has limitation that after using index in range mode, it can't use the 2nd part of the index (more info in this blog post).
The correct way of optimizing such queries is to have the equation column as 1st part of index and the range column as second. Then MySQL will choose rows which have column_2 value of 26 and then will use 2nd part of index to further filter them based on column_5 date range.
So the solution is to have an index:
KEY `ind_c2_c5` (`column_2`,`column_5`)
BTW it is better to give indexes descriptive names, so you know on first sight what they are for...

MySQL: optimization of table (indexing, foreign key) with no primary keys

Each member has 0 or more orders. Each order contains at least 1 item.
memberid - varchar, not integer - that's OK (please do not mention that's not very good, I can't change it).
So, thera 3 tables: members, orders and order_items. Orders and order_items are below:
CREATE TABLE `orders` (
`orderid` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`memberid` VARCHAR( 20 ),
`Time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ,
`info` VARCHAR( 3200 ) NULL ,
PRIMARY KEY (orderid) ,
FOREIGN KEY (memberid) REFERENCES members(memberid)
) ENGINE = InnoDB;
CREATE TABLE `order_items` (
`orderid` INT(11) UNSIGNED NOT NULL,
`item_number_in_cart` tinyint(1) NOT NULL , --- 5 items in cart= 5 rows
`price` DECIMAL (6,2) NOT NULL,
FOREIGN KEY (orderid) REFERENCES orders(orderid)
) ENGINE = InnoDB;
So, order_items table looks like:
orderid - item_number_in_cart - price:
...
1000456 - 1 - 24.99
1000456 - 2 - 39.99
1000456 - 3 - 4.99
1000456 - 4 - 17.97
1000457 - 1 - 20.00
1000458 - 1 - 99.99
1000459 - 1 - 2.99
1000459 - 2 - 69.99
1000460 - 1 - 4.99
...
As you see, order_items table has no primary keys (and I think there is no sense to create an auto_increment id for this table, because once we want to extract data, we always extract it as WHERE orderid='1000456' order by item_number_in_card asc - the whole block, id woudn't be helpful in queries).
Once data is inserted into order_items, it's not UPDATEd, just SELECTed.
The questions are:
I think it's a good idea to put index on item_number_in_cart. Could anybody please confirm that?
Is there anything else I have to do with order_items to increase the performance, or that looks pretty good? I could miss something because I'm a newbie.
Thank you in advance.
Primary keys can span multiple columns. You can't use the PRIMARY attribute of columns to do this, but you can define a separate primary key with multiple columns:
CREATE TABLE `order_items` (
`orderid` INT(11) UNSIGNED NOT NULL,
`item_number_in_cart` tinyint(1) NOT NULL , --- 5 items in cart= 5 rows
`price` DECIMAL (6,2) NOT NULL,
PRIMARY KEY (orderid, item_number_in_cart),
FOREIGN KEY (orderid) REFERENCES orders(orderid)
) ENGINE = InnoDB;
Moreover, a primary key is simply a unique key where every column is not null with a certain name; you can create your own unique keys on non-nullable columns to get the same effects.
You'll not likely get much of a performance improvement by indexing item_number_in_cart; as the number of line items for a given order will tend to be small, sorting by item_number_in_cart won't take much time or memory. However, including the column in a primary key will help with data consistency.
Index on item_number_in_cart won't be used. It's tiny int, not selective enough, and won't even considered by the engine once you have 2 records. You can add it as a second column to the existing index on orderid (since you created FK constraint on orderid, mysql automatically adds an index on this field).
You say that data in order_items never updated, but I think it can be deleted; doing so without primary key will be problematic.
Well I'd be having an autoinc anyway, as I'm a big believer in surrogate keys, but as suggested by alex07 an index, or even primary key of orderid,item_number_in_cart should sort things out. Note the order by item_number will be using a two pass sort, (get the data and then sort it in the number order) so an index / key will chop that out straight off so you'd want that index even with a surrogate key.