Related
I'm trying to optimize my query speed as much as possible. A side problem is that I cannot see the exact query speed, because it is rounded to a whole second. The query does get the expected result and takes about 1 second. The final query should be extended even more and for this reason i am trying to improve it. How can this query be improved?
The database is constructed as an electricity utility company. The query should eventually calculate an invoice. I basically have 4 tables, APX price, powerdeals, powerload, eans_power.
APX price is an hourly price, powerload is a quarterly hour volume. First step is joining these two together for each quarter of an hour.
Second step is that I currently select the EAN that is indicated in the table eans_power.
Finally I will join the Powerdeals that currently consist only of a single line and indicates from which hour, until which hour and weekday from/until it should be applicable. It consist of an hourly volume and price. Currently it is only joined on the hours, but it will be extended to weekdays as well.
MYSQL Query:
SELECT l.DATE, l.PERIOD_FROM, a.PRICE, l.POWERLOAD,
SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
FROM timeseries.powerload l
INNER JOIN timeseries.apxprice a ON l.DATE = a.DATE
INNER JOIN contracts.eans_power c ON l.ean = c.ean
LEFT OUTER JOIN timeseries.powerdeals d ON d.period_from <= l.period_from
AND d.period_until >= l.period_until
WHERE l.PERIOD_FROM >= a.PERIOD_FROM
AND l.PERIOD_FROM < a.PERIOD_UNTIL
AND l.DATE >= '2018-01-01'
AND l.DATE <= '2018-12-31'
GROUP BY l.date
Explain:
1 SIMPLE c NULL system PRIMARY,ean NULL NULL NULL 1 100.00 Using temporary; Using filesort
1 SIMPLE l NULL ref EAN EAN 21 const 35481 11.11 Using index condition
1 SIMPLE d NULL ALL NULL NULL NULL NULL 1 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE a NULL ref DATE DATE 4 timeseries.l.date 24 11.11 Using index condition
Create table queries:
apxprice
CREATE TABLE `apxprice` (
`apx_id` int(11) NOT NULL AUTO_INCREMENT,
`date` date DEFAULT NULL,
`period_from` time DEFAULT NULL,
`period_until` time DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`apx_id`),
KEY `DATE` (`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=29664 DEFAULT CHARSET=latin1
powerdeals
CREATE TABLE `powerdeals` (
`deal_id` int(11) NOT NULL AUTO_INCREMENT,
`date_deal` date NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`weekday_from` int(11) NOT NULL,
`weekday_until` int(11) NOT NULL,
`period_from` time NOT NULL,
`period_until` time NOT NULL,
`hourly_volume` int(11) NOT NULL,
`price` int(11) NOT NULL,
`type_deal_id` int(11) NOT NULL,
`contract_id` int(11) NOT NULL,
PRIMARY KEY (`deal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
powerload
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`ean` varchar(18) DEFAULT NULL,
`date` date DEFAULT NULL,
`period_from` time DEFAULT NULL,
`period_until` time DEFAULT NULL,
`powerload` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`),
KEY `EAN` (`ean`,`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin1
eans_power
CREATE TABLE `eans_power` (
`ean` char(19) NOT NULL,
`contract_id` int(11) NOT NULL,
`invoicing_id` int(11) NOT NULL,
`street` varchar(255) NOT NULL,
`number` int(11) NOT NULL,
`affix` char(11) NOT NULL,
`postal` char(6) NOT NULL,
`city` varchar(255) NOT NULL,
PRIMARY KEY (`ean`),
KEY `ean` (`ean`,`contract_id`,`invoicing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Sample data tables
apx_prices
apx_id,date,period_from,period_until,price
1,2016-01-01,00:00:00,01:00:00,23.86
2,2016-01-01,01:00:00,02:00:00,22.39
powerdeals
deal_id,date_deal,start_date,end_date,weekday_from,weekday_until,period_from,period_until,hourly_volume,price,type_deal_id,contract_id
1,2019-05-15,2018-01-01,2018-12-31,1,5,08:00:00,20:00:00,1000,50,3,1
powerload
powerload_id,ean,date,period_from,period_until,powerload
1,871688520000xxxxxx,2018-01-01,00:00:00,00:15:00,9
2,871688520000xxxxxx,2018-01-01,00:15:00,00:30:00,11
eans_power
ean,contract_id,invoicing_id,street,number,affix,postal,city
871688520000xxxxxx,1,1,road,14,postal,city
Result, without sum() and group by:
DATE,PERIOD_FROM,PRICE,POWERLOAD,a.PRICE*l.POWERLOAD,d.hourly_volume/4,
2018-01-01,00:00:00,27.20,9,244.80,NULL
2018-01-01,00:15:00,27.20,11,299.20,NULL
Result, with sum() and group by:
DATE, PERIOD_FROM, PRICE, POWERLOAD, SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
2018-01-01,08:00:00,26.33,21,46193.84,12250.0000
2018-01-02, 08:00:00,47.95,43,90623.98,12250.0000
Preliminary optimizations:
Use InnoDB, not MyISAM.
Use CHAR only for constant-lenght strings
Use consistent datatypes (see ean, for example)
For an alternative to using time-to-the-second, check out the Handler counts .
Because range tests (such as l.PERIOD_FROM >= a.PERIOD_FROM AND l.PERIOD_FROM < a.PERIOD_UNTIL) are essentially impossible to optimize, I recommend you expand the table to have one entry per hour (or 1 per quarter hour, if necessary). Looking up a row via a key is much faster than doing a scan of "ALL" the table. 9K rows for an entire year is trivial.
When you get past these recommendations (and the Comments), I will have more tips on optimizing the indexes, especially InnoDB's PRIMARY KEY.
How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.
I have three tables.
One table contains submissions which has about 75,000 rows
One table contains submission ratings and only has < 10 rows
One table contains submission => competition mappings and for my test data also has about 75,000 rows.
What I want to do is
Get the top 50 submissions in a round of a competition.
Top is classified as highest average rating, followed by highest amount of votes
Here is the query I am using which works, but the problem is that it takes over 45 seconds to complete! I profiled the query (results at bottom) and the bottlenecks are copying the data to a tmp table and then sorting it so how can I speed this up?
SELECT `submission_submissions`.*
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
submission_submissions
CREATE TABLE `submission_submissions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`genre` int(11) NOT NULL,
`goals` text,
`submission` text NOT NULL,
`date_created` datetime DEFAULT NULL,
`date_modified` datetime DEFAULT NULL,
`date_deleted` datetime DEFAULT NULL,
`cover_image` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `genre` (`genre`),
KEY `account_id` (`account_id`),
KEY `date_created` (`date_created`)
) ENGINE=InnoDB AUTO_INCREMENT=115037 DEFAULT CHARSET=latin1;
submission_ratings
CREATE TABLE `submission_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`stars` tinyint(1) NOT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `submission_id` (`submission_id`),
KEY `account_id` (`account_id`),
KEY `stars` (`stars`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
competition_submissions
CREATE TABLE `competition_submissions` (
`competition_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`top_round` int(11) DEFAULT '1',
PRIMARY KEY (`submission_id`),
KEY `competition_id` (`competition_id`),
KEY `top_round` (`top_round`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SHOW PROFILE Result (ordered by duration)
state duration (summed) in sec percentage
Copying to tmp table 33.15621 68.46924
Sorting result 11.83148 24.43260
removing tmp table 3.06054 6.32017
Sending data 0.37560 0.77563
... insignificant amounts removed ...
Total 48.42497 100.00000
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE competition_submissions index_merge PRIMARY,competition_id,top_round competition_id,top_round 4,5 18596 Using intersect(competition_id,top_round); Using where; Using index; Using temporary; Using filesort
1 SIMPLE submission_submissions eq_ref PRIMARY PRIMARY 4 inkstakes.competition_submissions.submission_id 1 Using where
1 SIMPLE submission_ratings ALL submission_id 5 Using where; Using join buffer (flat, BNL join)
Assuming that in reality you won't be interested in unrated submissions, and that a given submission only has a single competition_submissions entry for a given match and top_round, I suggest:
SELECT s.*
FROM (SELECT `submission_id`,
AVG(`stars`) AvgStars,
COUNT(`id`) CountId
FROM `submission_ratings`
GROUP BY `submission_id`
ORDER BY AVG(`stars`) DESC, COUNT(`id`) DESC
LIMIT 50) r
JOIN `submission_submissions` s
ON r.`submission_id` = s.`id` AND
s.`date_deleted` IS NULL
JOIN `competition_submissions` c
ON c.`submission_id` = s.`id` AND
c.`top_round` = 1 AND
c.`competition_id` = '2'
ORDER BY r.AvgStars DESC,
r.CountId DESC
(If there is more than one competition_submissions entry per submission for a given match and top_round, then you can add the GROUP BY clause back in to the main query.)
If you do want to see unrated submissions, you can union the results of this query to a LEFT JOIN ... WHERE NULL query.
There is a simple trick that works on MySql and helps to avoid copying/sorting huge temp tables in queries like this (with LIMIT X).
Just avoid SELECT *, this copies all columns to the temporary table, then this huge table is sorted, and in the end, the query takes only 50 records from this huge table ( 50 / 70000 = 0,07 % ).
Select only columns that are really necessary to perform sort and limit, and then join missing columns only for selected 50 records by id.
select ss.*
from submission_submissions ss
join (
SELECT `submission_submissions`.id,
AVG(submission_ratings.`stars`) stars,
COUNT(submission_ratings.`id`) cnt
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
) xx
ON ss.id = xx.id
ORDER BY xx.stars DESC,
xx.cnt DESC;
Older questions seen
Counting one table of records for matching records of another table
MySQL Count matching records from multiple tables
Count records from two tables grouped by one field
Table(s) Schema
Table entries having data from 2005-01-25
CREATE TABLE `entries` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`ctg` VARCHAR(15) NOT NULL,
`msg` VARCHAR(200) NOT NULL,
`nick` VARCHAR(30) NOT NULL,
`date` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `msg` (`msg`),
INDEX `date` (`date`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Child table magnets with regular data from 2011-11-08(There might be a few entries from before that)
CREATE TABLE `magnets` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`eid` INT(10) UNSIGNED NOT NULL,
`tth` CHAR(39) NOT NULL,
`size` BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
`nick` VARCHAR(30) NOT NULL DEFAULT 'hjpotter92',
`date` DATETIME NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `eid_tth` (`eid`, `tth`),
INDEX `entriedID` (`eid`),
INDEX `tth_size` (`tth`, `size`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Question
I want to get the count of total number of entries by any particular nick(or user) entered in either of the table.
One of the entry in entries is populated at the same time as magnets and the subsequent entries of magnets can be from the same nick or different.
My Code
Try 1
SELECT `e`.id, COUNT(1), `e`.nick, `m`.nick
FROM `entries` `e`
INNER JOIN `magnets` `m`
ON `m`.`eid` = `e`.id
GROUP BY `e`.nick
Try 2
SELECT `e`.id, COUNT(1), `e`.nick
FROM `entries` `e`
GROUP BY `e`.nick
UNION ALL
SELECT `m`.eid, COUNT(1), `m`.nick
FROM `magnets` `m`
GROUP BY `m`.nick
The second try is generating some relevant outputs, but it contains double entries for all the nick which appear in both tables.
Also, I don't want to count twice, those entries/magnets which were inserted in the first query. Which is what the second UNION statement is doing. It takes in all the values from both tables.
SQL Fiddle link
Here is the link to a SQL Fiddle along with randomly populated entries.
I really hope someone can guide me through this. If it's any help, I will be using PHP for final display of data. So, my last resort would be to nest loops in PHP for the counting(which I am currently doing).
Desired output
The output that should be generated on the fiddle should be:
************************************************
** Nick ||| Count **
************************************************
** Nick1 ||| 10 **
** Nick2 ||| 9 **
** Nick3 ||| 6 **
** Nick4 ||| 10 **
************************************************
There might be a more efficient way but this works if I understand correctly:
SELECT SUM(cnt), nick FROM
(SELECT count(*) cnt, e.nick FROM entries e
LEFT JOIN magnets m ON (e.id=m.eid AND e.nick=m.nick)
WHERE eid IS NULL GROUP BY e.nick
UNION ALL
SELECT count(*) cnt, nick FROM magnets m GROUP BY nick) u
GROUP BY nick
In the same datbase I have a table messages whos columns: id, title, text I want. I want only the records of which title has no entries in the table lastlogon who's title equivalent is then named username.
I have been using this SQL command in PHP, it generally took 2-3 seconds to pull up:
SELECT DISTINCT * FROM messages WHERE title NOT IN (SELECT username FROM lastlogon) LIMIT 1000
This was all good until the table lastlogon started to have about 80% of the values table messages. Messages has about 8000 entries, lastlogon about 7000. Now it takes about a minute to 2 minutes for it to go through. MySQL shoots up to very high CPU usage.
I tried the following but had no luck reducing the time:
SELECT id,title,text FROM messages a LEFT OUTER JOIN lastlogon b ON (a.title = b.username) LIMIT 1000
Why all of a sudden is it taking so long for such low amount of entries? I tried restarting mysql and apache multiple times. I am using debian linux.
Edit: Here are the structures
--
-- Table structure for table `lastlogon`
--
CREATE TABLE IF NOT EXISTS `lastlogon` (
`username` varchar(25) NOT NULL,
`lastlogon` date NOT NULL,
`datechecked` date NOT NULL,
PRIMARY KEY (`username`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE IF NOT EXISTS `messages` (
`id` smallint(9) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`email` varchar(50) NOT NULL,
`text` mediumtext,
`folder` tinyint(2) NOT NULL,
`read` smallint(5) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`ip` varchar(15) NOT NULL,
`attachment` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`username` varchar(300) NOT NULL,
`error` varchar(500) NOT NULL,
PRIMARY KEY (`id`),
KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=9010 ;
Edit 2
Edited structure with new indexes.
After putting an index on both messages.title and lastlogon.username I came up with these results:
Showing rows 0 - 29 (623 total, Query took 74.4938 sec)
First: replace the key on title, with a compound key on title + id
ALTER TABLE messages DROP INDEX title;
ALTER TABLE messages ADD INDEX title (title, id);
Now change the select to:
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
LIMIT 1000;
Or
SELECT m.* FROM messages m
WHERE m.title NOT IN (SELECT l.username FROM lastlogon l)
-- GROUP BY m.id DESC -- faster than distinct, I don't think you need it though.
LIMIT 1000;
Another problem with the slowness is the SELECT m.* part.
By selecting all column, you are forcing MySQL to do extra work.
Only select the columns you need:
SELECT m.title, m.name, m.email, ......
This will speed up the query as well.
There's another trick you can use:
Replace the limit 1000 with a cutoff date.
Step 1: Add an index on timestamp (or whatever field you want to use for the cutoff).
SELECT m.* FROM messages m
LEFT JOIN lastlogon l ON (l.username = m.title)
WHERE (m.id > (SELECT MIN(M2.ID) FROM messages m2 WHERE m2.timestamp >= '2011-09-01'))
AND l.username IS NULL
-- GROUP BY m.id DESC -- faster replacement for distinct. I don't think you need this.
I suggest you to add an index on messages.title . Then try to run again the query and test the performance.