Consider a large table, which of the below would be faster?
Both queries will select rows where time is greater than the current time.
Calling NOW() within the WHERE clause:
SELECT *
FROM myTable
WHERE time > NOW()
Wrapping the call to NOW() in a sub query:
SELECT *
FROM myTable
LEFT JOIN (
SELECT NOW() AS currentTime
) AS currentTimeTable ON TRUE
WHERE time > currentTime
"Both queries will select rows where time is no older than 10 days."
Sorry, but your query is incorrect. I tested it in MySQL and got this:
mysql> SELECT DATE(NOW() - 10);
+------------------+
| DATE(NOW() - 10) |
+------------------+
| 2013-11-21 |
+------------------+
21.11.2013 - is the current date and not now() minus 10 days.
You should use DATE_SUB function:
mysql> SELECT DATE_SUB( NOW(), INTERVAL 10 DAY );
+-------------------------------------+
| DATE_SUB( NOW() , INTERVAL 10 DAY ) |
+-------------------------------------+
| 2013-11-11 19:40:38 |
+-------------------------------------+
Something like this:
SELECT *
FROM `myTable`
WHERE `time` > DATE_SUB(NOW(), INTERVAL 10 DAY );
Here is the analysis of both of your query types:
EXPLAIN
SELECT
*
FROM
users
WHERE
created > now();
Test data
I tried users table from my Drupal installation.
mysql> EXPLAIN SELECT *
FROM users
WHERE created < NOW();
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | users | range | created | created | 4 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN
SELECT
*
FROM
users
LEFT JOIN (
SELECT NOW() AS currentTime
) AS tbl ON TRUE
WHERE created < tbl.currentTime;
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------------+
| 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | users | range | created | created | 4 | NULL | 1 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------------+
3 rows in set (0.02 sec)
Conclusion
Obviously, the 1-st query is better, as it doesn't require creating any temporary tables. Even with my little sample data, it took 0 seconds to execute the 1-st query and 0,02 secs for the 2-nd.
1) When you talk about date optimisation, using timestamps is the way to go.
2) I suggest instead of calling a function to do the math, create another column for each rows with their timestamps
3) As PeeHaa said on this thread: « When talking about which piece of code is faster (in your OP) you're talking about micro-optimization and is really something you shouldn't have to worry about. ☺
The real question is which piece of code is: better maintainable, readable, understandable. »
I understand what is very large database management but once you've optimised indexes and datatypes you most likely will be good enough.
Related
This is my MYSQL table demo having more than 7 million rows;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | varchar(42) | YES | MUL | NULL | |
| date | datetime | YES | MUL | NULL | |
| text | varchar(100) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
I read that indexes work sequentially.
Case 1:
select * from demo where id="43984a7e-edcf-11ea-92c7-509a4cb89342" order by date limit 30;
I created (id, date) index and it is working fine and query is executing too fast.
But Hold on to see the below cases.
Case 2:
Below is my SQL query.
select * from demo where id>"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
to execute the above query faster I created an index on (id, date). But it is taking more than 10 sec.
then I made another index on (date). This took less than 1 sec. Why the composite index(id, date) is too much slower than (date) index in this case ??
Case 3:
select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
for this query, even the (date) index is taking more than 1.8 sec. Why < operator is not optimized with any index either it is (date) or(id, date).
and even this query is just going through around 300 rows and still taking more than 1.8 sec why?
mysql> explain select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | demo | NULL | index | demoindex1,demoindex2 | demoindex3 | 6 | NULL | 323 | 36.30 | Using where |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
Any suggestions for how to create an index in Case 3 to optimize it?
In your first query, the index can be used for both the where clause and the ordering. So it will be very fast.
For the second query, the index can only be used for the where clause. Because of the inequality, information about the date is no longer in order. So the engine needs to explicitly order.
In addition, I imagine that the second query returns much more data than the first -- a fair amount of data if it take 10 seconds to sort it.
I have a MySQL query that I am having performance problems with that I do not understand. When I try to debug and run the overall query as a sequence of separate subqueries they seem to perform reasonably well, given the volume of data. When I combine them into a single nested query I get much much much longer execution times.
The main ratings table mentioned below is approx 30 million rows (4GB of disk space), with a couple of foreign keys (it's a many-to-many table linking users and items with a small amount of additional supplementary user specific item information - approx 13 fields and 30 bytes).
Query 1 - approx 23s
SELECT COUNT(1) FROM (SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1) AS t1;
Query 1 saved to table - approx 65s if I save the results to a temporary table
CREATE TABLE temp_table SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1;
Query 2 - approx 3s
SELECT COUNT(1) FROM temp_table WHERE id IN (SELECT id from item_stats WHERE
ratings_count > 1000);
Bases on this I would expect a combined query to be approx 30s or so, and not more than approx 70s.
Combined query (Query 1 + Query 2) - indeterminate time (10s of minutes before I give up and cancel)
SELECT COUNT(1) from (SELECT * FROM (SELECT fields FROM ratings WHERE (id >= 0
AND id < 10000) AND item_type = 1) AS t1 WHERE t1.id IN (SELECT id FROM
item_stats WHERE ratings_count > 1000)) as t2;
Can anyone help explain this difference and guide me in creating a query that works? If I need to I can rely on the sequential queries (which would take approx 70s), but that is cumbersome and does not seem the right way to go.
I have tried using INNER JOIN instead of IN but this did not seem to make much difference. The ID count from the item_stats table is about 2700 IDs.
It's using MySQL 8.0 on a laptop (16GB RAM, SSD).
Response to suggestions / questions:
Query 1
EXPLAIN select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1;
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | collections | NULL | ALL | user_id | NULL | NULL | NULL | 32898400 | 1.31 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Query 2
EXPLAIN select * from temp_coll where game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
| 1 | SIMPLE | temp_coll | NULL | ALL | NULL | NULL | NULL | NULL | 1674386 | 10.00 | Using where; Using join buffer (hash join) |
| 2 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
Combined query
EXPLAIN select * from (select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1) as t1 where t1.game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | <subquery3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | collections | NULL | ref | user_id,game_id | game_id | 5 | <subquery3>.game_id | 199 | 1.31 | Using where |
| 3 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
Your query appears to be functionally identical to the following (rather implausible) query:
SELECT COUNT(*) total
FROM ratings r
JOIN item_stats s
ON s.id = r.id
WHERE r.id >= 0
AND r.id < 10000
AND r.item_type = 1
AND s.ratings_count > 1000
r.id is, presumably, the PRIMARY KEY, so it's automatically included in any INNODB index, which leaves just item_type and ratings_count requiring indexes.
You would benefit a lot from an online tutorial on learning how to read the EXPLAIN plan. The EXPLAINS you shared clearly show missing indexes.
As a general rule, queries should not take 23 seconds or 65 seconds, even with millions of rows. Proper indexes + partitioning should resolve the slowness.
Query 1: The user_id index on that table is not helping performance, as 99% of users are within the range in the where clause. You can add an index on item_type_id
ALTER TABLE collections ADD KEY (item_type_id)
Query 2: The temp_coll table is missing a game_id index. Also, I'm not sure if the underlying code for games_ratings_stats has an index on ratings_count and if that would help. I dont have experience with MySQL materialized tables.
ALTER TABLE temp_coll ADD KEY (game_id)
Query 3:
Would benefit from above indexes.
Increasing the InnoDB Buffer Pool Size (now set to 8GB) seems to have made a significant improvement. If anyone has any further setup or tuning advice on MySQL then that would be appreciated!
EDIT: It is on mysql version 5.5.62-38.14-log, that I have the problem, BTW, although the examples were run on 5.7.27-0ubuntu0.18.04.1 on my local machine. I have changed the UNIX_TIMESTAMP() in my queries to TIMESTAMP(), but no change.
Can somebody help see the light, please? I have a relatively simple table:
mysql> CREATE TABLE `game_instance` (
-> `game_instance_id` bigint(20) NOT NULL AUTO_INCREMENT,
-> `game_id` int(11) NOT NULL,
-> `currency_code` varchar(15) DEFAULT NULL,
-> `start_datetime` timestamp,
-> `status` varchar(20) NOT NULL DEFAULT '' COMMENT 'COMING, NMB = No More Bets, RESOLVED, TB= Taking Bets',
-> `created_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-> `end_datetime` datetime DEFAULT NULL,
-> `external_ref` varchar(50) DEFAULT NULL,
-> `game_room_id` int(11) DEFAULT NULL,
-> PRIMARY KEY (`game_instance_id`,`start_datetime`),
-> KEY `GI_IDX4` (`external_ref`),
-> KEY `GI_IDX5` (`game_id`,`status`),
-> KEY `game_instance_status` (`status`),
-> KEY `game_instance_end_datetime` (`end_datetime`),
-> KEY `game_instance_start_datetime` (`start_datetime`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=118386942 DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.14 sec)
mysql> explain select * from game_instance where start_datetime >= unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | game_instance | NULL | ALL | game_instance_start_datetime | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
I have an index on start_datetime, but I still get a full table scan, according to explain.
However:
mysql> create table ex1(
-> id bigint(20),
-> start_datetime timestamp,
-> primary key (id,start_datetime),
-> key (start_datetime)
-> );
Query OK, 0 rows affected (0.02 sec)
mysql> explain select * from ex1 where start_datetime>=unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | ex1 | NULL | index | start_datetime | start_datetime | 4 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
1 row in set, 3 warnings (0.00 sec)
The warnings are:
mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1292 | Incorrect datetime value: '1563663600' for column 'start_datetime' at row 1 |
| Warning | 1292 | Incorrect datetime value: '1563663600' for column 'start_datetime' at row 1 |
| Note | 1003 | /* select#1 */ select `ex`.`ex1`.`id` AS `id`,`ex`.`ex1`.`start_datetime` AS `start_datetime` from `ex`.`ex1` where (`ex`.`ex1`.`start_datetime` >= <cache>(unix_timestamp(concat((curdate() - interval 30 day),' ','00:00:00')))) |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
This seems to suggest that start_datetime is silently converted in the background, which would explain why the index is not used, but then why does it not happen in both queries? (And as a corollary, how do I convert my date string to whatever the MySQL TIMESTAMP is?)
EDIT 2:
I've run optimize on the table, as suggested in comments (I haven't run the analyze, since it seems to have done that already):
mysql> optimize table game_instance;
+-----------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+----------+----------+-------------------------------------------------------------------+
| gameiom.game_instance | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| gameiom.game_instance | optimize | status | OK |
+-----------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (21 min 31.80 sec)
However, it made no difference:
mysql> explain select * from game_instance
where start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and
start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | game_instance | ALL | game_instance_start_datetime | NULL | NULL | NULL | 19065747 | Using where |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)
This is a real problem, since the table is 19m rows (not 11m as I said earlier).
Sometimes the query planner makes decisions about whether to scan the whole table or use the index based on statistics about the number and distribution of values in the index. Sometimes it guesses that a full table scan will take less CPU and IO resources than a table lookup.
When tables have small numbers of rows, the query planner's choices often don't match intuition. Make sure you have a few thousand rows at least, before you spend a lot of time trying to make sense of EXPLAIN output.
Also, the query planner gets better at its job with each MySQL release.
Do OPTIMIZE TABLE game_instance to clean up your table, especially if you have inserted many rows.
Then do ANALYZE TABLE game_instance to recompute the statistics used by the query planner.
By the way,
where start_datetime>=unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
is precisely the same as
where start_datetime >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
MySQL knows how to use the results of date computations directly in TIMESTAMP filters, and UNIX_TIMESTAMP() yields integers, not TIMESTAMPs.
About your invalid timestamp warning, may I suggest you ask another question? Please include your time zone setting in the question.
The answer by O. Jones was correct, but let me just add some notes about what I did to find out. What I was seeing was this, which I couldn't understand:
mysql> explain extended
select * from game_instance
where
start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and
start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | game_instance | ALL | game_instance_start_datetime | NULL | NULL | NULL | 18741262 | 50.00 | Using where |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
So, I found that you can force MySQL to use an index, which gave me:
mysql> explain extended select * from game_instance force index (game_instance_start_datetime) where start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | game_instance | range | game_instance_start_datetime | game_instance_start_datetime | 4 | NULL | 9391936 | 100.00 | Using where |
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
IOW, using the index selects about half of all the rows in the table, and now the filtered column makes sense: it is the percentage of rows that are thrown away because they don't match the criteria, which is why MySQL doesn't use the index: it is less efficient because you'd alternate between reading the index and seeking addresses in the table.
I'm trying to get a running total using a Subquery. (I'm using Metabase, which doesn't seem to accept/process variables in queries)
My Query:
SELECT date_format(t.`session_stop`, '%d') AS `session_stop`,
sum(t.`energy_used` / 1000) AS `csum`,
(
SELECT (SUM(a.`energy_used`) / 1000)
FROM `sessions` a
WHERE date_format(a.`session_stop`, '%Y-%m-%d') <= date_format(t.`session_stop`, '%Y-%m-%d')
AND str_to_date(concat(date_format(a.`session_stop`, '%Y-%m'), '-01'), '%Y-%m-%d') = str_to_date(concat(date_format(now(), '%Y-%m'), '-01'), '%Y-%m-%d')
ORDER BY str_to_date(date_format(a.`session_stop`, '%e'), '%d') ASC
) AS `sum`
FROM `sessions` t
WHERE str_to_date(concat(date_format(t.`session_stop`, '%Y-%m'), '-01'), '%Y-%m-%d') = str_to_date(concat(date_format(now(), '%Y-%m'), '-01'), '%Y-%m-%d')
GROUP BY date_format(t.`session_stop`, '%e')
ORDER BY str_to_date(date_format(t.`session_stop`, '%d'), '%d') ASC;
This takes about 1.29secs to run. (43K rows in total, returns 14)
If I remove the sum(t.`energy_used` / 1000) AS `csum`, line, the query takes up 8 mins and 40 secs.
Why is this? I'd rather not have that line, but I also can't wait 8mins for a query to process.
(I know I can create a cumulative column, but I'm especially interested why this additional sum() speeds the whole query up)
ps. tested this on both the MySQL console and the Metabase interface.
EXPLAIN query:
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
| 1 | PRIMARY | t | ALL | NULL | NULL | NULL | NULL | 42055 | Using where; Using tempora
| 2 | DEPENDENT SUBQUERY | a | ALL | NULL | NULL | NULL | NULL | 42055 | Using where
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
2 rows in set (0.00 sec)
Without the extra sum():
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | PRIMARY | t | ALL | NULL | NULL | NULL | NULL | 44976 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | a | ALL | NULL | NULL | NULL | NULL | 44976 | Using where |
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Schema is not much more than a table with:
session_id (INT, auto incr., prim.key) | session_stop (datetime) | energy_used (INT) |
1 | 1-1-2016 10:00:00 | 123456 |
2 | 1-1-2016 10:05:00 | 123456 |
3 | 1-2-2016 10:10:00 | 123456 |
4 | 1-2-2016 12:00:00 | 123456 |
5 | 3-3-2016 14:05:00 | 123456 |
Some examples on the internets show using the ID for the WHERE-clause, but I had some poor results with this.
Your queries are not similar at all. In fact, they are poles apart.
If I remove the sum(t.energy_used / 1000) AS csum, line, the query
takes up 8 mins and 40 secs.
When you use SUM, it's an aggregation. sum(t.energy_used/ 1000) will produce an entirely different result from just selecting t.energy_used that's why there is such a huge difference in the query timings.
It is also very unclear why you are comparing dates in this manner:
WHERE date_format(a.`session_stop`, '%Y-%m-%d') <= date_format(t.`session_stop`, '%Y-%m-%d')
Why are you converting them both with date_format before comparision? Since both tables apparently contain the same data type, you should be able to do a.session_stop <= t.session_stop this will be much faster for both cases.
Since it's an inequality comparison, it's not a good candidate for indexes but you can still try creating an index on that column to see if it has any effect.
So to recap, the performance difference is because you are not merely adding/removing an extra column but adding/removing an aggregation.
TABLE User
User Table Structure .
mysql> desc User;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| EMAIL_ID | varchar(250) | YES | | NULL | |
| IP_ADDRESS | varchar(255) | YES | | NULL | |
| CREATED_TIME | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
This table contains million's of rows and it will be gradually increasing on day by day .
Goal :
To getting past 12 month's user details from this table .
First get a user id was created before 12 month's . My query look like this .
Option 1:
Select * from User where ID > `Account created before 12 months` .
Option 2:
Select * from User where CREATED_TIME > UNIX_TIMESTAMP(`2011-01-2011 00:00:00`)*1000;
Which is efficient for fetching details . And this query will be used redudantly for audit purpose .
Try to avoid calling functions on each row. You can write your first part of the WHERE clause like this to speed up a lot (especially if you couple it with an index on the CREATED_TIME field):
Accounts.CREATED_TIME
BETWEEN UNIX_TIMESTAMP(CURRENT_DATE() - INTERVAL 1 DAY) * 1000
AND UNIX_TIMESTAMP(CURRENT_DATE() - INTERVAL 1 DAY) * 1000 + 999
Note that this will make the function calls only once and indices on the CREATED_TIME field can be used to resolve the query.