I'm trying to break down and re-write a view that had been created by a long gone developer. The query takes well over three minuites to access, I'm assuming from all the CONCATs.
CREATE VIEW `active_users_over_time` AS
select
`users_activity`.`date` AS `date`,
time_format(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),'%H:%i:%s') AS `time`,
`users_activity`.`username` AS `username`,
count(addtime(concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0'))) AS `checkouts`
from `users_activity`
group by
concat(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),
`users_activity`.`username`);
The data comes from the SQL table:
CREATE TABLE `users_activity` (
`id` int(10) unsigned NOT NULL auto_increment,
`featureid` smallint(5) unsigned NOT NULL,
`date` date NOT NULL,
`time` time NOT NULL,
`duration_checkout` int unsigned NOT NULL,
`update_date` date NOT NULL,
`username` varchar(255) NOT NULL,
`checkout` smallint(5) unsigned NOT NULL,
`licid` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `featureid_licid_username` (`featureid`,`licid`,`date`,`time`,`username`),
FOREIGN KEY(featureid) REFERENCES features(id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I'm having a hard time decifering what exactly what is needed and what isnt needed.
Anyone have any ideas? Thanks.
I think this does everything that the original query did, skipping a bunch of redundant steps:
select `date`
, `time`
, `username`
, count(1) as `checkouts`
from
(
select
`users_activity`.`date` AS `date`
,time_format(
addtime(`users_activity`.`date`,`users_activity`.`time`)
+ interval `users_activity`.`duration_checkout` second
,'%H:%i:%s'
) AS `time`
,`users_activity`.`username` AS `username`
from `users_activity`
) x
group by `username`, `date`, `time`
You may also want to look at what indexes are on the table to see if optimisations can be made elsewhere (e.g. if you don't already have an index on the username and date fields you'd get a lot of benefit for this query by adding one).
You can start from rewriting GROUP BY clase from this:
group by
concat(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),
`users_activity`.`username`);
to this one:
GROUP BY `users_activity`.`date`,
`users_activity`.`time`,
`users_activity`.`duration_checkout`,
`users_activity`.`username`
This change should give some slight savings on converting dates to strings and concatenating them, and the result of the query shouldn't change.
Then you may consider creating a composite index on GROUP BY columns.
According to this link: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index
It means, that if we create the following index:
CREATE INDEX idx_name ON `users_activity`(
`date`,`time`,`duration_checkout`,`username`
);
then MySql might use it to optimize GROUP BY (but there is no guarantee).
Related
Show create table structure with;
CREATE TABLE `quote` (
`id` int(8) unsigned NOT NULL AUTO_INCREMENT,
`code` text COLLATE utf8mb4_unicode_ci,
`date` date DEFAULT NULL,
`open` double DEFAULT NULL,
`high` double DEFAULT NULL,
`low` double DEFAULT NULL,
`close` double DEFAULT NULL,
`volume` bigint(15) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=17449887 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Every code group has different max(date),select code and its max(date) with:
select code,max(date) as date from quote group by code;
I want to get records grouped by code and date is max(date) in its code group and other columns' value.
create table b SELECT code,max(date) as date from quote group by code ;
select * from quote as a , b where a.code = b.code and a.date = b.date;
The efficient is extremly low,how to write a new efficient mysql code ?
You seem to be sort of asking multiple things here. To speed up this query:
SELECT code, MAX(date) AS date FROM quote GROUP BY code;
you want this index:
CREATE INDEX idx ON quote (code, date);
I suspect that the second query you have in mind is trying to find the records for each code having the maximum date. That query might look something like:
SELECT q1.*
FROM quote q1
INNER JOIN
(
SELECT code, MAX(date) AS max_date
FROM quote
GROUP BY code
) q2
ON q2.code = q1.code AND
q2.max_date = q1.date;
The same index suggested above should also help this version of the query.
I have a table with lots of numeric data. I need to query this to get the closest row for a specific date_added and name.
My problem is that this data is not ordered by date, so when returning results I need to include ORDER BY date_added (or it doesn't return the correct row). Currently doing this takes a good 90 seconds to run because of this ORDER BY condition.
Are there any ways I can further optimise this? I've already indexed the date_added and name columns, so I'm not really sure what else can be done. I considered creating a new table with the data reordered in date_added order, but this isn't practical as new entries need to be added regularly.
I've stored the numeric data as decimal as it can potentially be very small, very large or both. Perhaps storing this data in a different way would be more efficient?
Add a compound index on name and date. The query above will run without using filesort.
An alternative way for the query:
SELECT date_added, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9
FROM numeric_data
WHERE date_added =
(select min(date_added) from numeric_data where date_added >= '2018-05-03 11:00:00' and name = 'aaa')
and name = 'aaa'
limit 1;
Fiddle: http://sqlfiddle.com/#!9/4e8d89/1 .
You can use range partitioning:
https://dev.mysql.com/doc/refman/5.7/en/partitioning-range.html
You need to define your partitions depends on the date range you have.
CREATE TABLE `numeric_data` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`date_added` datetime NOT NULL,
`name` varchar(8) COLLATE utf8mb4_unicode_ci NOT NULL,
`data_1` decimal(30,17) NOT NULL,
`data_2` decimal(30,17) NOT NULL,
`data_3` decimal(30,17) NOT NULL,
`data_4` decimal(30,17) NOT NULL,
`data_5` decimal(30,17) NOT NULL,
`data_6` decimal(30,17) NOT NULL,
`data_7` decimal(30,17) NOT NULL,
`data_8` decimal(30,17) NOT NULL,
`data_9` decimal(30,17) NOT NULL,
PRIMARY KEY (`id`),
KEY `date_added` (`date_added`),
KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=60000000 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
PARTITION BY RANGE( TO_DAYS(date_added) ) (
PARTITION p1 VALUES LESS THAN (TO_DAYS('2018-01-01')),
PARTITION p2 VALUES LESS THAN (TO_DAYS('2018-02-01')),
PARTITION p3 VALUES LESS THAN (TO_DAYS('2018-03-01')),
PARTITION p4 VALUES LESS THAN (TO_DAYS('2018-04-01')),
PARTITION future VALUES LESS THAN MAXVALUE
);
For the below query will only use partition "future" :
SELECT date_added, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9
FROM numeric_data
WHERE date_added >= '2018-05-03 11:00:00'
AND name = 'aaa'
ORDER BY date_added LIMIT 1
How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.
I created a simple statistics tool for our user PCs. It records every 5 minutes the state of all of our PCs. And a little frontend gives me a usage chart:
Now with growing data the SQL queries are getting slower and slower and I'm searching a way to optimize it.
This is the structure. As you can see, the table "usage" contains about 6 million records and it uses MySQL InnoDB:
CREATE TABLE IF NOT EXISTS `usage` (
`id` int(11) unsigned NOT NULL,
`host_id` int(10) unsigned NOT NULL,
`time` int(10) unsigned NOT NULL,
`state` enum('LinuxTU','LinuxExt','View','Browser','Idle','Offline') CHARACTER SET latin1 NOT NULL DEFAULT 'Offline'
) ENGINE=InnoDB AUTO_INCREMENT=5963366 DEFAULT CHARSET=utf8;
ALTER TABLE `usage`
ADD PRIMARY KEY (`id`), ADD KEY `host_id` (`host_id`), ADD KEY `time` (`time`);
ALTER TABLE `usage`
MODIFY `id` int(11) unsigned NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=5963366;
The following query takes about 7 seconds to execute. It is the query that gives the data to the screenshot.
/* create pivot table */
SELECT `time`,
SUM(IF(state='LinuxTU', statecount, 0)) AS LinuxTU,
SUM(IF(state='LinuxExt', statecount, 0)) AS LinuxExt,
SUM(IF(state='View', statecount, 0)) AS View,
SUM(IF(state='Browser', statecount, 0)) AS Browser
FROM (
/* get data from last 24h grouped by state */
SELECT `time`, `state`, COUNT(`state`) statecount
FROM `usage` u
/* group by time to get every 5 minutes
group by state to get the state counter */
GROUP BY `time`, `state`
HAVING `time` > 1441271078 AND `time` < 1441357478
) AS s
GROUP BY `time`
ORDER BY `time` ASC
I don't know how to optimize it. Is there something I missed? Or do I need to reorganize the structure? Any Hint?
In addition to moving the time comparison into a where clause, you can get rid of the subquery entirely:
/* create pivot table */
SELECT `time`,
SUM(state = 'LinuxTU') AS LinuxTU,
SUM(state = 'LinuxExt') AS LinuxExt,
SUM(state = 'View') AS View,
SUM(state = 'Browser') AS Browser
FROM usage u
WHERE `time` > 1441271078 AND `time` < 1441357478
GROUP BY `time`
ORDER BY `time` ASC;
I think your problem is on the last
GROUP BY `time`
ORDER BY `time` ASC
because of the subquery your indexes are gone. So, you should find a way to eliminate that.
Do you also have the option to make some processing in the programming language ? Just make the inner select + the variables without sum from the outer select, also add the order and then make the processing in the programming language.
Or must you write this in a query ?
I've found the bottleneck. The problem is the inner query. HAVING seems to be much slower than WHERE. So I tried some different queries and now I got this result:
Takes 7 seconds:
SELECT `time`, `state`, COUNT(`state`) statecount
FROM `usage` u
GROUP BY `time`, `state`
HAVING `time` > 1441271078 AND `time` < 1441357478
Takes 0.1 seconds:
SELECT `time`, `state`, COUNT(`state`) `statecount`
FROM `usage` u
WHERE `time` > 1441271078 AND `time` < 1441357478
GROUP BY `time`, `state`
And gives me the same result. The frontend is now much faster.
I am having a little bit of a situation here.
The environment
I have a database for series here.
One table for the series itself, one for the season connected to the series table, one for the episodes connected to the seasons table.
Since there are air dates for different countries I have another table called 'series_data` which looks like the following:
CREATE TABLE IF NOT EXISTS `episode_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`episode_id` int(11) NOT NULL,
`country` char(3) NOT NULL,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`tba` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `episode_id` (`episode_id`),
KEY `date` (`date`),
KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now I am trying to collect the last aired episodes from each series in the database using the following query:
SELECT
*
FROM
`episode_data` ed
WHERE
`ed`.`date` < CURDATE( ) &&
`ed`.`date` != '1970-01-01' &&
`ed`.`series_id` = 1
GROUP BY
`ed`.`country` DESC
ORDER BY
`ed`.`date` DESC
Since I have everything normalized I changed 'episode_id' with 'series_id' to make the query less complicated.
What I am trying to accomplish
I want to have the last aired episodes for each country which are actually announced (ed.date != '1970-01-01') as the returning result of one query.
What's the problem
I know now (searched google, found not for me working answers here), that the ordering takes place AFTER grouping, so my "date" ordering is completly useless.
The other problem is that the query above is working, but always takes those entries with the lowest id matching my conditions, because those are the first ones in the tables index.
What is the question?
How may accomplish the above. I do not know if the grouping is the right way to do it. If there is no "one liner", I think the only way is a sub query which I want to avoid since this is as far as I know slower than a one liner with the right indexes set.
Hope in here is everything you need :)
Example data
CREATE TABLE IF NOT EXISTS `episode_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`episode_id` int(11) NOT NULL,
`country` char(3) NOT NULL,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`tba` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `episode_id` (`episode_id`),
KEY `date` (`date`),
KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `episode_data` (`id`, `episode_id`, `country`, `title`, `date`, `tba`) VALUES
(4942, 2471, 'de', 'Väter und Töchter', '2013-08-06', 0),
(4944, 2472, 'de', 'Neue Perspektiven', '2013-08-13', 0),
(5013, 2507, 'us', 'Into the Deep', '2013-08-06', 0),
(5015, 2508, 'us', 'The Mirror Has Three Faces', '2013-08-13', 0);
Attention!
This is the original table data with "EPISODE_ID" not "SERIES_ID".
The data I want are those with closest dates to today, which are here 4944 and 5015.
If you want the last aired date for each country, then use this aggregation:
SELECT country, max(date) as lastdate
FROM `episode_data` ed
WHERE `ed`.`date` < CURDATE( ) AND
`ed`.`date` != '1970-01-01' AND
`ed`.`series_id` = 1
GROUP BY `ed`.`country`;
If you are trying to get the episode_id and title as well, you can use group_concat() and substring_index():
SELECT country, max(date) as lastdate,
substring_index(group_concat(episode_id order by date desc), ',', 1
) as episode_id,
substring_index(group_concat(title order by date desc separator '|'), '|', 1
) as title
FROM `episode_data` ed
WHERE `ed`.`date` < CURDATE( ) AND
`ed`.`date` != '1970-01-01' AND
`ed`.`series_id` = 1
GROUP BY `ed`.`country`;
Note that this uses a different separator for the title, under the assumption that it might have a comma.