Can someone tell me how can I reduce the number of time using this query?
This is the SQL query:
SELECT
`i`.`id`,
`i`.`local_file`,
`i`.`remote_file`,
`i`.`remote_file_big`,
`i`.`image_name`,
`i`.`description`,
IF(`i`.`prevent_sync`='1', '5', `i`.`status`) `status`,
GROUP_CONCAT(`il`.`user_id` SEPARATOR ',') AS `likes`,
COUNT(`il`.`user_id`) AS `likes_count`
FROM `images` `i`
LEFT JOIN `image_likes` `il` ON (`il`.`image_id`=`i`.`id`)
WHERE 1 AND `i`.`created` < DATE_SUB(CURDATE(), INTERVAL 48 HOUR)
GROUP BY `i`.`id`
ORDER BY `likes_count` DESC LIMIT 3 OFFSET 0;
On checking the query time, this is the result:
# Query_time: 9.948511 Lock_time: 0.000181 Rows_sent: 3 Rows_examined: 4730490
# Rows_affected: 0
Table image_likes:
id (Primary) int(11)
local_file varchar(100)
orig_name varchar(100)
remote_file varchar(1000)
remote_file_big varchar(1000)
remote_page varchar(1000)
image_name varchar(50)
image_name_eng varchar(50)
user_idIndex int(11)
author varchar(50)
credit varchar(250)
credit_eng varchar(250)
location varchar(50)
description varchar(500)
description_eng varchar(275)
notes varchar(550)
categoryIndex int(11)
date_range varchar(50)
createdIndex datetime
license enum('1', '2', '3')
status enum('0', '1', '2', '3', '4')
locked enum('0', '1')
watch_list enum('0', '1', '2')
url_title varchar(100)
url_data varchar(8192)
rem_date datetime
rem_notes varchar(500)
original_url varchar(1000)
prevent_sync enum('0', '1')
checked_by int(11)
system_recommended enum('0', '1')
Please suggest.
This is a complex task for the DB, and there is not much you can do to get the result really efficiently. You can try to limit the IO with a subquery that operates on covering indexes. Remove everything from your query that you don't need to get the three image ids:
SELECT i.id
FROM images i
JOIN image_likes il ON il.image_id = i.id
WHERE i.created < DATE_SUB(CURDATE(), INTERVAL 48 HOUR)
GROUP BY i.id
ORDER BY COUNT(il.image_id) DESC
LIMIT 3 OFFSET 0
The smallest covering indexes would be images(created, id) and image_likes(image_id). With 5M likes, both indexes together will consume something like 100 - 200 MB and should easily fit into memory. The size of the temporary table, that has to be sorted by the count, will also be smaller.
Use that query as derived table (subquery in FROM clause) and join only the three rows from the images table:
SELECT
`i`.`id`,
`i`.`local_file`,
`i`.`remote_file`,
`i`.`remote_file_big`,
`i`.`image_name`,
`i`.`description`,
IF(`i`.`prevent_sync`='1', '5', `i`.`status`) `status`,
GROUP_CONCAT(`il`.`user_id` SEPARATOR ',') AS `likes`,
COUNT(`il`.`user_id`) AS `likes_count`
FROM (
SELECT i.id
FROM images i
JOIN image_likes il ON il.image_id = i.id
WHERE i.created < DATE_SUB(CURDATE(), INTERVAL 48 HOUR)
GROUP BY i.id
ORDER BY COUNT(il.image_id) DESC
LIMIT 3 OFFSET 0
) sub
JOIN images i ON i.id = sub.id
JOIN image_likes il ON il.image_id = i.id
GROUP BY i.id
ORDER BY likes_count;
If that isn't fast enough, you should cache the likes_count using triggers.
This probably suffers from the "inflate-deflate" syndrome which often happens with JOIN + GROUP BY. Also it usually leads to incorrect aggregate values.
SELECT `id`, `local_file`, `remote_file`,
`remote_file_big`, `image_name`, `description`,
IF(`prevent_sync`='1', '5', `status`) `status`,
s.likes, s.likes_count
FROM `images` AS `i`
JOIN
( SELECT GROUP_CONCAT(user_id SEPARATOR ',') AS likes,
COUNT(*) AS likes_count
FROM `image_likes`
GROUP BY image_id
ORDER BY `likes_count` DESC
LIMIT 3 OFFSET 0;
) AS s ON s.`image_id`=`i`.`id`
WHERE `created` < CURDATE() - INTERVAL 2 DAY
ORDER BY `likes_count` DESC;
This variant will exclude rows with likes_count=0, but that seems reasonable.
It assumes that the PRIMARY KEY of images is id.
image_likes needs INDEX(user_id) and will make one scan of that table. Then only 3 lookups into images.
The original query had to scan all the rows of images and repeatedly scan all of image_likes.
Related
Is there a way to combine two MySQL queries to display one unique value and an AVG value for each column?
Such as in this query:
SELECT `price`, `cost`, `shipping`
FROM `orders`
WHERE `year` = 2019 AND `product_id` = 5 AND `order_id` = 77
LIMIT 1
WITH
SELECT AVG(`price`), AVG(`cost`), AVG(`shipping`)
FROM `orders`
WHERE `year` = 2019 AND `product_id` = 5
I've been playing with unions and joins but I'm not finding a way to return this data in the same query. (I can make two separate queries and put the resulting data side-by-side, but I'd prefer to do it with one query if possible.)
Any ideas?
Since both queries return just one record, you could just turn them to subqueries and CROSS JOIN them :
SELECT a.price, a.cost, a.shipping, avg.price, avg.cost, avg.shipping
FROM
(
SELECT `price`, `cost`, `shipping`
FROM `orders`
WHERE `year` = 2019 AND `product_id` = 5 AND `order_id` = 77
LIMIT 1
) a
CROSS JOIN (
SELECT AVG(`price`) price, AVG(`cost`) cost, AVG(`shipping`) shipping
FROM `orders`
WHERE `year` = 2019 AND `product_id` = 5
) avg
The purpose of the LIMIT 1 clause in the first subquery is unclear : since there is no ORDER BY, it is unpredictable which record will be returned if more than one matches.
Here is an alternative approach using conditional aggregation (if several record exist with order id 77, the maximum value of each column will be displayed) :
SELECT
MAX(CASE WHEN `order_id` = 77 THEN `price` END) price,
MAX(CASE WHEN `order_id` = 77 THEN `cost` END) cost,
MAX(CASE WHEN `order_id` = 77 THEN `shipping` END) shipping,
AVG(`price`) avg_price,
AVG(`cost`) avg_cost,
AVG(`shipping`) avg_shipping
FROM `orders`
WHERE `year` = 2019 AND `product_id` = 5
I have a table that has over 100,000,000 rows and I have a query that looks like this:
SELECT
COUNT(IF(created_at >= '2015-07-01 00:00:00', 1, null)) AS 'monthly',
COUNT(IF(created_at >= '2015-07-26 00:00:00', 1, null)) AS 'weekly',
COUNT(IF(created_at >= '2015-06-30 07:57:56', 1, null)) AS '30day',
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null)) AS 'recent'
FROM
items
WHERE
user_id = 123456;
The table looks like so:
CREATE TABLE `items` (
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`,`created_at`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The explain looks fairly harmless, minus the massive row counts:
1 SIMPLE items ref PRIMARY,user_id user_id 4 const 559864 Using index
I use the query to gather counts for a specific user for 4 segments of time.
Is there a smarter/faster way to obtain the same data or is my only option to tally these as new rows are put into this table?
If you have an index on created_at, I would also put in the where clause created_at >= '2015-06-30 07:57:56' which is the lowest date possible in your segment.
Also with the same index it might work splitting in 4 queries:
select count(*) AS '30day'
FROM
items
WHERE
user_id = 123456
and created_at >= '2015-06-30 07:57:56'
union ....
And so on
I would add an index on created_at field:
ALTER TABLE items ADD INDEX idx_created_at (created_at)
or (as Thomas suggested) since you are also filtering for user_id a composite index on created_at and user_id:
ALTER TABLE items ADD INDEX idx_user_created_at (user_id, created_at)
and then I would write your query as:
SELECT 'monthly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-01 00:00:00' AND user_id = 123456
UNION ALL
SELECT 'weekly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-26 00:00:00' AND user_id = 123456
UNION ALL
SELECT '30day' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-06-30 07:57:56' AND user_id = 123456
UNION ALL
SELECT 'recent' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-29 17:03:44' AND user_id = 123456
yes, the output is a little different. Or you can use inline queries:
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...
and if you want an average, you could use a subquery:
SELECT
monthly,
weekly,
monthly / total,
weekly / total
FROM (
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...,
(SELECT COUNT(*) FROM items WHERE user_id=...) AS total
) s
INDEX(user_id, created_at) -- optimal
AND created_at >= '2015-06-30 07:57:56' -- helps because it cuts down on the number of index entries to touch
Doing a UNION does not help since it leads to 4 times as much work.
Doing for subquery SELECTs does not help for the same reason.
Also
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null))
can be shortened to
SUM(created_at >= '2015-07-29 17:03:44')
(But probably does not speed it up much)
If the data does not change over time, only new rows are added, then Summary tables of past data would lead to a significant speedup, but only if you could avoid things like '07:57:56' for '30day'. (Why have '00:00:00' for only some of them?) Perhaps the speedup would be another factor of 10 on top of the other changes. Want to discuss further?
(I do not see any advantage in using PARTITION.)
I've got 2 tables: members and member_logs.
Members can belong to groups, which are in the members table. Given a date range and a group I'm trying to figure out how to get the 10 days with the highest number of successful logins. What I have so far is a massive nest of subquery terror.
SELECT count(member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and `reg_date` IN
(SELECT DISTINCT DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and (DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' and '2014-03-04'))
and `member_id` IN
(SELECT `member_id`
FROM members
WHERE `group_id` = 'XXXXXXX'
and `deleted` = 0)
ORDER BY `num_users` desc
LIMIT 0, 10
As far as I understand what is happening is that the WHERE clause is evaluating before the subqueries generate, and that I also should be using joins. If anyone can help me out or point me in the right direction that would be incredible.
EDIT: Limit was wrong, fixed it
The first subquery is totally unnecessary because you can filter by dates directly in the current table member_logs. I also prefer a JOIN for the second subquery. Then what you are missing is grouping by date (day).
A query like the following one (not tested) will do the job you want:
SELECT COUNT(ml.member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs ml
INNER JOIN members m ON ml.member_id = m.member_id
WHERE `login_success` = 1
AND DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' AND '2014-03-04'
AND `group_id` = 'XXXXXXX'
AND `deleted` = 0
GROUP BY `reg_date`
ORDER BY `num_users` desc
LIMIT 10
SELECT count(member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and `login_date` IN
(SELECT `login_date`
FROM member_logs
WHERE `login_success` = 1
and (DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' and '2014-03-04'))
and `member_id` IN
(SELECT `member_id`
FROM members
WHERE `group_id` = 'XXXXXXX'
and `deleted` = 0)
Group by `login_date`
ORDER BY `num_users` desc
LIMIT 0, 10
As a slightly more index friendly version of the previous answers;
To make the query index friendly, you shouldn't do per row calculations in the search conditions. This query removes the per row calculation of the string format date in the WHERE, so it should be faster if there are many rows to eliminate by date range;
SELECT COUNT(*) num_users, DATE(login_date) reg_date
FROM member_logs JOIN members ON member_logs.member_id = members.member_id
WHERE login_success = 1 AND group_id = 'XXX' AND deleted = 0
AND login_date >= '2012-02-25'
AND login_date < DATE_ADD('2014-03-04', INTERVAL 1 DAY)
GROUP BY DATE(login_date)
ORDER BY num_users DESC
LIMIT 10
Table
Each row represents a video that was on air at particular time on particular date. There are about 1600 videos per day.
CREATE TABLE `air_video` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL,
`time` TIME NOT NULL,
`duration` TIME NOT NULL,
`asset_id` INT(10) UNSIGNED NOT NULL,
`name` VARCHAR(100) NOT NULL,
`status` VARCHAR(100) NULL DEFAULT NULL,
`updated` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE INDEX `date_2` (`date`, `time`),
INDEX `date` (`date`),
INDEX `status` (`status`),
INDEX `asset_id` (`asset_id`)
)
ENGINE=InnoDB
Task
There are two conditions.
Each video must be shown not more than 24 times per day.
Each video must be in rotation no longer than 72 hours.
In rotation means time span between the fist and the last time the video was on air.
So I need to select all videos that violate those conditions, given user-specified date range.
The result must be grouped by day and by asset_id (video id). For example:
date asset_id name dailyCount rotationSpan
2012-04-27 123 whatever_the_name 35 76
2012-04-27 134 whatever_the_name2 39 20
2012-04-28 125 whatever_the_name3 26 43
Query
By now I have written this query:
SELECT
t1.date, t1.asset_id, t1.name,
(SELECT
COUNT(t3.asset_id)
FROM air_video AS t3
WHERE t2.asset_id = t3.asset_id AND t3.date = t1.date
) AS 'dailyCount',
MIN(CONCAT(t2.date, ' ', t2.time)) AS 'firstAir',
MAX(CONCAT(t2.date, ' ', t2.time)) AS 'lastAir',
ROUND(TIMESTAMPDIFF(
MINUTE,
MIN(CONCAT(t2.date, ' ', t2.time)),
MAX(CONCAT(t2.date, ' ', t2.time))
) / 60) as 'rotationSpan'
FROM
air_video AS t1
INNER JOIN
air_video AS t2 ON
t1.asset_id = t2.asset_id
WHERE
t1.status NOT IN ('bumpers', 'clock', 'weather')
AND t1.date BETWEEN '2012-04-01' AND '2012-04-30'
GROUP BY
t1.asset_id, t1.date
HAVING
`rotationSpan` > 72
OR `dailyCount` > 24
ORDER BY
`date` ASC,
`rotationSpan` DESC,
`dailyCount` DESC
Problems
The bigger the range between user specified days - the longer it takes to complete the query (for a month range it takes about 9 sec)
The lastAir timestamp is not the latest time the video was aired on particular date but the latest time it was on air altogether.
If you need to speed up your query you need to remove the select sub query on line 3.
To still have that count you can inner join it again in the from clause with the exact parameters you used initially. This is how it should look:
SELECT
t1.date, t1.asset_id, t1.name,
COUNT(t3.asset_id) AS 'dailyCount',
MIN(CONCAT(t2.date, ' ', t2.time)) AS 'firstAir',
MAX(CONCAT(t2.date, ' ', t2.time)) AS 'lastAir',
ROUND(TIMESTAMPDIFF(
MINUTE,
MIN(CONCAT(t2.date, ' ', t2.time)),
MAX(CONCAT(t2.date, ' ', t2.time))
) / 60) as 'rotationSpan'
FROM
air_video AS t1
INNER JOIN
air_video AS t2 ON
(t1.asset_id = t2.asset_id)
INNER JOIN
air_video AS t3
ON (t2.asset_id = t3.asset_id AND t3.date = t1.date)
WHERE
t1.status NOT IN ('bumpers', 'clock', 'weather')
AND t1.date BETWEEN '2012-04-01' AND '2012-04-30'
GROUP BY
t1.asset_id, t1.date
HAVING
`rotationSpan` > 72
OR `dailyCount` > 24
ORDER BY
`date` ASC,
`rotationSpan` DESC,
`dailyCount` DESC
Since t2 is not bound by date, you are obviously looking at the whole table, instead of the date range.
Edit:
Due to a lot of date bindings the query still ran too slowly. I then took a different approach. I created 3 views (which you obviously can combine into a normal query without the views, but I like the end result query better)
--T1--
CREATE VIEW t1 AS select date,asset_id,name from air_video where (status not in ('bumpers','clock','weather')) group by asset_id,date order by date;
--T2--
CREATE VIEW t2 AS select t1.date,t1.asset_id,t1.name,min(concat(t2.date,' ',t2.time)) AS 'firstAir',max(concat(t2.date,' ',t2.time)) AS 'lastAir',round((timestampdiff(MINUTE,min(concat(t2.date,' ',t2.time)),max(concat(t2.date,' ',t2.time))) / 60),0) AS 'rotationSpan' from (t1 join air_video t2 on((t1.asset_id = t2.asset_id))) group by t1.asset_id,t1.date;
--T3--
CREATE VIEW t3 AS select t2.date,t2.asset_id,t2.name,count(t3.asset_id) AS 'dailyCount',t2.firstAir,t2.lastAir,t2.rotationSpan AS rotationSpan from (t2 join air_video t3 on(((t2.asset_id = t3.asset_id) and (t3.date = t2.date)))) group by t2.asset_id,t2.date;
From there you can then just run the following query:
SELECT
date,
asset_id,
name,
dailyCount,
firstAir,
lastAir,
rotationSpan
FROM
t3
WHERE
date BETWEEN '2012-04-01' AND '2012-04-30'
AND (
rotationSpan > 72
OR
dailyCount > 24
)
ORDER BY
date ASC,
rotationSpan DESC,
dailyCount DESC
I have problem with optimize this query:
SET #SEARCH = "dokumentalne";
SELECT SQL_NO_CACHE
`AA`.`version` AS `Version` ,
`AA`.`contents` AS `Contents` ,
`AA`.`idarticle` AS `AdressInSQL` ,
`AA` .`topic` AS `Topic` ,
MATCH (`AA`.`topic` , `AA`.`contents`) AGAINST (#SEARCH) AS `Relevance` ,
`IA`.`url` AS `URL`
FROM `xv_article` AS `AA`
INNER JOIN `xv_articleindex` AS `IA` ON ( `AA`.`idarticle` = `IA`.`adressinsql` )
INNER JOIN (
SELECT `idarticle` , MAX( `version` ) AS `version`
FROM `xv_article`
WHERE MATCH (`topic` , `contents`) AGAINST (#SEARCH)
GROUP BY `idarticle`
) AS `MG`
ON ( `AA`.`idarticle` = `MG`.`idarticle` )
WHERE `IA`.`accepted` = "yes"
AND `AA`.`version` = `MG`.`version`
ORDER BY `Relevance` DESC
LIMIT 0 , 30
Now, this query using ^ 20 seconds. How to optimize this?
EXPLAIN gives this:
1 PRIMARY AA ALL NULL NULL NULL NULL 11169 Using temporary; Using filesort
1 PRIMARY ALL NULL NULL NULL NULL 681 Using where
1 PRIMARY IA ALL accepted NULL NULL NULL 11967 Using where
2 DERIVED xv_article fulltext topic topic 0 1 Using where; Using temporary; Using filesort
This is example server with my data:
user: bordeux_4prog
password: 4prog
phpmyadmin: http://phpmyadmin.bordeux.net/
chive: http://chive.bordeux.net/
Looks like your db is dead. Getting rid of inner query is the key part to optimization. Please try this (not tested) query:
SET #SEARCH = "dokumentalne";
SELECT SQL_NO_CACHE
aa.idarticle AS `AdressInSQL`,
aa.contents AS `Contents`,
aa.topic AS `Topic`,
MATCH(aa.topic , aa.contents) AGAINST (#SEARCH) AS `Relevance`,
ia.url AS `URL`,
MAX(aa.version) AS `Version`
FROM
xv_article AS aa,
xv_articleindex AS ia
WHERE
aa.idarticle = ia.adressinsql
AND ia.accepted = "yes"
AND MATCH(aa.topic , aa.contents) AGAINST (#SEARCH)
GROUP BY
aa.idarticle,
aa.contents,
`Relevance`,
ia.url
ORDER BY
`Relevance` DESC
LIMIT
0, 30
To further optimize your query you may also split getting articles with newest version from full text search as the latter is the most expensive. This can be done by subquerying (also not tested on your db):
SELECT SQL_NO_CACHE
iq.idarticle AS `AdressInSQL`,
iq.topic AS `Topic`,
iq.contents AS `Contents`,
iq.url AS `URL`,
MATCH(iq.topic, iq.contents) AGAINST (#SEARCH) AS `Relevance`
FROM (
SELECT
a.idarticle,
a.topic,
a.contents,
i.url,
MAX(a.version) AS version
FROM
xv_article AS a,
xv_articleindex AS i
WHERE
i.accepted = "yes"
AND a.idarticle = i.adressinsql
GROUP BY
a.idarticle AS id,
a.topic,
a.contents,
i.url
) AS iq
WHERE
MATCH(iq.topic, iq.contents) AGAINST (#SEARCH)
ORDER BY
`Relevance` DESC
LIMIT
0, 30
The first thing I noticed in your DB is that you don't have an index on xv_articleindex.adressinsql. Add it, and it should significantly improve the query performance. Also, one table is MyISAM, whereas another one is InnoDb. Use one engine(in general, I'd recommend InnoDB)