MySQL subquery count with calendar table slow - mysql

I have a sales table in MySQL (InnoDB). It's +- 1 million records big. I would like to show some nice charts. Fetching the right data is not a problem. Fetching it fast is...
So I like to count the amount of sales in table A grouped per day (later on also month, and year) for PERIOD A till Z. Concrete; for the last 30 days I like to know for each day how many sales records we have in the DB.
So MySQL would have to return something like this:
I like to achieve that MySQL returns the data like this:
date, count
2017-04-01, 2482
2017-04-02, 1934
2017-04-03, 2701
...
The structure of the Sales basically like this:
CREATE TABLE `sales` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `contacts_created_at_index` (`created_at`),
KEY `contacts_deleted_at_index` (`deleted_at`),
KEY `ind_created_at_deleted_at` (`created_at`,`deleted_at`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Some days (datapoints) might not have any results, but I don't like to have gaps in the data. So I also have some 'calendar' table.
CREATE TABLE `time_dimension` (
`id` int(11) NOT NULL,
`db_date` date NOT NULL,
`year` int(11) NOT NULL,
`month` int(11) NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `td_ymd_idx` (`year`,`month`,`day`),
UNIQUE KEY `td_dbdate_idx` (`db_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Fetching 30 rows (30 days) with a count per day takes 30 secs...
This is the first query I tried:
SELECT
`db_date` AS `date`,
(SELECT
COUNT(1)
FROM
sales
WHERE
DATE(created_at) = db_date) AS count
FROM
`time_dimension`
WHERE
`db_date` >= '2017-04-11'
AND `db_date` <= '2017-04-25'
ORDER BY `db_date` ASC
But like I said it's really slow (11.9 secs). I tried al kinds of other approaches, but without luck. For example:
SELECT time_dimension.db_date AS DATE,
COUNT(1) AS count
FROM sales RIGHT JOIN time_dimension ON (DATE(sales.created_at) =
time_dimension.db_date)
WHERE
(time_dimension.db_date BETWEEN '2017-03-11' AND '2017-04-11')
GROUP BY
DATE
A query for just 1 datapoint takes only 5.4ms:
SELECT COUNT(1) FROM sales WHERE created_at BETWEEN '2017-04-11 00:00:00' AND '2017-04-25 23:59:59'
I haven't checked innodb_buffer_poolsize on my local machine. I will check that as well. Any ideas on how to make queries like this fast? In the future I would even need to where clauses and joins, to filter the set of sales records..
Thanks.
Nick

You could try to count sale data first, then join count result with your calendar table.
SELECT time_dimension.db_date AS date,
by_date.sale_count
FROM time_dimension
LEFT JOIN (SELECT DATE(sales.created_at) sale_date,
COUNT(1) AS sale_count
FROM sales
WHERE created_at BETWEEN '2017-03-11 00:00:00' AND
'2017-04-11 23:59:59'
GROUP BY DATE(sales.created_at)) by_date
ON time_dimension.db_date = by_date.sale_date
WHERE time_dimension.db_date BETWEEN '2017-03-11' AND '2017-04-11'

The problematic part of your query is the data type conversion DATE(created_at), which effectively prevents Mysql from using the index at created_at.
Your 1 datapoint query avoids that, and that is why it is working fast.
To fix this you should check if created_at is within a range of specific day, like that:
created_at BETWEEN db_date AND DATE_ADD(db_date,INTERVAL 1 DAY)
This way Mysql will be able to make use of index on it (do a range lookup), as appropriate.

WHERE DATE(created_at) = db_date)
-->
WHERE created_at >= db_date
AND created_at < db_date + INTERVAL 1 DAY
This avoids including midnight of second day (as BETWEEN does)
Work for all flavors: DATE, DATETIME, DATETIME(6)
Does not hid the created_at inside a function where the index cannot see it.
For time_dimension, get rid of PRIMARY KEY (id) and change UNIQUE(db_date) to the PK.
After making these changes, your original subquery may be competitive with the LEFT JOIN ( SELECT ... ). (It depends on which version of MySQL.)

Related

How I can optimize query with where index?

I have query
select `price`, `asset_id`
from `history_average_pairs`
where `currency_id` = 1
and date(`created_at`) >= DATE_SUB(NOW(), INTERVAL 7 DAY)
group by hour(created_at), date(created_at), asset_id
order by `created_at` asc
And table
CREATE TABLE IF NOT EXISTS history_average_pairs (
id bigint(20) unsigned NOT NULL,
asset_id bigint(20) unsigned NOT NULL,
currency_id bigint(20) unsigned NOT NULL,
market_cap bigint(20) NOT NULL,
price double(20,6) NOT NULL,
volume bigint(20) NOT NULL,
circulating bigint(20) NOT NULL,
change_1h double(8,2) NOT NULL,
change_24h double(8,2) NOT NULL,
change_7d double(8,2) NOT NULL,
created_at timestamp NOT NULL DEFAULT current_timestamp(),
updated_at timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
total_supply bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE history_average_pairs
ADD PRIMARY KEY (id),
ADD KEY history_average_pairs_currency_id_asset_id_foreign (currency_id,asset_id),
ALTER TABLE history_average_pairs
MODIFY id bigint(20) unsigned NOT NULL AUTO_INCREMENT;
It contains more than 10 000 000 rows, and query takes
Showing rows 0 - 24 (32584 total, Query took 27.8344 seconds.)
But without currency_id = 1, it takes like 4 sec.
UPDATE 1
Okey, I updated key from currency_id, asset_id to currency_id, asset_id, created_at and it takes
Showing rows 0 - 24 (32784 total, Query took 6.4831 seconds.)
Its much faster, any proposal to do it more faster?
GROUP BY here to take only first row for every hour.
For example:
19:01:10
19:02:14
19:23:15
I need only 19:01:10
You can rephrase the filtering predicate to avoid using expressions on columns. For example:
select max(`price`) as max_price, `asset_id`
from `history_average_pairs`
where `currency_id` = 1
and created_at >= date_add(curdate(), interval - 7 day)
group by hour(created_at), date(created_at), asset_id
order by `created_at` asc
Then, this query could be much faster if you added the index:
create index ix1 on `history_average_pairs` (`currency_id`, created_at);
You must make the test "sargeable"; change
date(`created_at`) >= DATE_SUB(NOW(), INTERVAL 7 DAY)
to
created_at >= CURDATE() - INTERVAL 7 DAY
Then the optimal index is
INDEX(currency_id, -- 1st because of "=" test
created_at, -- 2nd to finish out WHERE
asset_id) -- only for "covering"
When designing an index, it is usually best to handle the WHERE first.
The GROUP BY cannot use the index. Did you really want the hour first?
"I need only 19:01:10" is unclear, so I have not factored that in. Where's the date? Where's the asset_id? See "only_full_group_by". Do you need "groupwise max"?
Making the ORDER BY have the same columns as the GROUP BY avoids a sort. (In your query, the order may be slightly different, but it probably does not matter.)
Datatype issues...
BIGINT takes 8 bytes; INT takes only 4 bytes and is usually big enough. Shrinking the table provides some speed.
double(8,2) takes 8 bytes -- Don't use (m,n) on FLOAT or DOUBLE; it adds an extra rounding. Perhaps you meant DECIMAL(8,2), which takes 4 bytes.

Why does SQL query visit all rows and is very slow

The table 'reading' contains readings taken every 40s, for today. The query returns averages for 180s periods. 'time_stamp' is indexed. The query below returns a reasonable number of rows (a few hundred) but visits ALL rows and get slower the bigger the table gets. WHERE clause does not seem to be restricting it to today's rows only.
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE DATE(time_stamp) = CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)
Table schema:
CREATE TABLE reading (
id bigint(20) NOT NULL AUTO_INCREMENT,
time_stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
temp_c float NOT NULL,
pressure_hpa float NOT NULL,
wind_speed_kt int(11) NOT NULL,
wind_dir_degree int(11) NOT NULL,
rain_mm float NOT NULL,
rain_day_mm float NOT NULL,
wind_gust_kt int(11) NOT NULL,
humidity float DEFAULT NULL,
PRIMARY KEY (id),
KEY time_stamp (time_stamp),
KEY time_stamp_idx (time_stamp)
) ENGINE=InnoDB AUTO_INCREMENT=1747097 DEFAULT CHARSET=latin1;
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE DATE(time_stamp) = CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)
When the above query is executed, MySQL optimizer isn't interested in index scan (could be because of cost factor) rather full table scan is initiated and the issue appears to be because of WHERE DATE(time_stamp) = CURDATE().
Having changed your where clause to time_stamp >= CURDATE(), I've seen index being used and less number of rows were fetched shunning full scan.
Hence, your final query will be:
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE time_stamp >= CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180);
I suspect date(time_stamp) isn't that efficient with index. Similar topic was discussed here (see ypercube's answer).
The above query can be further improved by choosing an alternate of round(UNIX_TIMESTAMP(time_stamp) / 180) as UNIX_TIMESTAMP(timestamp) doesn't use index. But, I'm not trying furthermore.
Hope this helps!

Speed up this MySQL query

I have a query which is getting slower and slower because there are more and more records in my table. So I'm trying to speed things up.
Database size:
Records: 1,200,000
Data 22,9 MiB
Index 46,8 MiB
Total 69,7 MiB
The purpose of the query is counting the number of records that exist that match the conditions. The conditions are a date (current date) and a status number. See query below:
SELECT
COUNT(id) AS total
FROM
order_process
WHERE
DATE(datetime) = CURDATE() AND
status = '7';
At the moment, this query is taking 800ms. And I need to run this query multiple times with different dates. These are all in the same script so script execution is going over the 3 seconds at the moment. How can I speed this up?
What have I already done:
Created indexes (Index on status and datetime both don't speed up the query).
Tested InnoDB engine (which is slower, mostly reading on this table)
To make it complete, below the current table setup.
CREATE TABLE IF NOT EXISTS `order_process` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`remark` text NOT NULL,
PRIMARY KEY (`id`),
KEY `orderid` (`order_id`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When you use date() function on a timestamp/datetime column and even if the column is indexed it can't use the index
So you need to construct the query as
where
datetime >= concat(CURDATE(),' 00:00:00')
and datetime <= concat(CURDATE(),' 23:59:59')
and status = '7'

MySQL Select next 2 rows greater than time

I have a database that has stored time values for a train schedule. This is my table:
CREATE TABLE IF NOT EXISTS `bahn_hausen` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`time` time NOT NULL,
`day` varchar(12) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=132 ;
Now I want to select the next two rows after now():
SELECT time FROM bahn_hausen WHERE time > now() LIMIT 2
The Problem is that when now is > than last time today (23:45:00), there is no row selected. However, I want to select the next 2 values of course (00:15:00 and 00:45:00). This only works correctly when now() is >= 0:00:00
*[edit]*For clarification: The problem I am having is that SQL doesn't recognize 00:15 to be greater than 23:45.
How do I do this?
Thanks for any help.
Your query is almost there. You just need an order by:
SELECT time
FROM bahn_hausen
ORDER BY time > now() desc, time
LIMIT 2;
Have you try to use the method CURTIME() or DATEDIFF(...) > 0
https://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_current-time

Counting year and month entries from datetime fields

I have a problem constructing a mysql query:
I have this table "tSubscribers" were I store the subscribers for my newsletter mailing list.
The table looks like this (simplified):
--
-- Table structure for tSubscriber
--
CREATE TABLE tSubscriber (
fId INT UNSIGNED NOT NULL AUTO_INCREMENT,
fSubscriberGroupId INT UNSIGNED NOT NULL,
fEmail VARCHAR(255) NOT NULL DEFAULT '',
fDateConfirmed DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
fDateUnsubscribed TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (fId),
INDEX (fSubscriberGroupId),
) ENGINE=MyISAM;
Now what I want to accomplish is to have a diagram showing the subscriptions and unsubscriptions per month per subscriber group.
So I need to extract the year and months from the fDateConfirmed, fDateUnsubscribed dates, count them and show the count sorted by month and year for a subscriber group.
I think this sql query gets quite complex and I just can't get my head around it. Is this even possible with one query.
You will need two separate queries, one for subscriptions and other for unsubscriptions.
SELECT COUNT(*), YEAR(fDateConfirmed), MONTH(fDateConfirmed) FROM tSubscriber GROUP BY YEAR(fDateConfirmed), MONTH(fDateConfirmed)
SELECT COUNT(*), YEAR(fDateUnsubscribed), MONTH(fDateUnsubscribe ) FROM tSubscriber GROUP BY YEAR(fDateUnsubscribed), MONTH(fDateUnsubscribed)