I am trying to run the following query on a very large table with over 90 million of rows increasing
SELECT COUNT(DISTINCT device_uid) AS cnt, DATE_FORMAT(time_start, '%Y-%m-%d') AS period
FROM game_session
WHERE account_id = -2 AND DATE_FORMAT(time_start '%Y-%m-%d') BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
GROUP BY period
ORDER BY period DESC
I have the following table structure:
CREATE TABLE `game_session` (
`session_id` bigint(20) NOT NULL,
`account_id` bigint(20) NOT NULL,
`authentification_type` char(2) NOT NULL,
`source_ip` char(40) NOT NULL,
`device` char(50) DEFAULT NULL COMMENT 'Added 0.9',
`device_uid` char(50) NOT NULL,
`os` char(50) DEFAULT NULL COMMENT 'Added 0.9',
`carrier` char(50) DEFAULT NULL COMMENT 'Added 0.9',
`protocol_version` char(20) DEFAULT NULL COMMENT 'Added 0.9',
`lang_key` char(2) NOT NULL DEFAULT 'en',
`instance_id` char(100) NOT NULL,
`time_start` datetime NOT NULL,
`time_end` datetime DEFAULT NULL,
PRIMARY KEY (`session_id`),
KEY `game_account_session_fk` (`account_id`),
KEY `lang_key_fk` (`lang_key`),
KEY `lookup_active_session_idx` (`account_id`,`time_start`),
KEY `lookup_finished_session_idx` (`account_id`,`time_end`),
KEY `start_time_idx` (`time_start`),
KEY `lookup_guest_session_idx` (`device_uid`,`time_start`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
How can I optimize this?
Thank for your answer
DATE_FORMAT(time_start '%Y-%m-%d') sounds expensive.
Every calculation on a column reduces the use of indexes. You probably run in to a full index scan + calculation of DATE_FORMAT for each value instead of a index lookup / range scan.
Try to store the computed value in the column (or create a computed index if mysql supports it). Or even better rewrite your conditions to compare directly to the value stored in the column.
Well, 90mlns is a lot, but I suspect it doesn't use the start_time_idx because of the manipulations, which you can avoid (you can manipulate the values you compare it with with, it also must be done only once per query if mysql is smart enough), have you checked EXPLAIN?
You may want to group and sort by time_start instead of the period value you create when the query is run. Sorting by period requires all of those values to be generated before any sorting can be done.
Try swapping out your WHERE clause with the following:
WHERE account_id = -2 AND time_start BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
MySQL will still catch the dates between, the only ones you'll need to worry about are the ones from today, which might get truncated due to technically being greater than midnight.
You can fix that by incrementing the second CURDATE( ) with CURDATE( ) + INTERVAL 1 DAY
I'd change
BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
to
> (CURDATE() - INTERVAL 90 DAY)
You don't have records from future, do you?
Change the query to:
SELECT COUNT(DISTINCT device_uid) AS cnt
, DATE_FORMAT(time_start, '%Y-%m-%d') AS period
FROM game_session
WHERE account_id = -2
AND time_start >= CURDATE() - INTERVAL 90 DAY
AND time_start < CURDATE() + INTERVAL 1 DAY
GROUP BY DATE(time_start) DESC
so the index of (account_id, time_start) can be used for the WHERE part of the query.
If it's still slow - the DATE(time_start) does not look very good for performance - add a date_start column and store the date part of time_start.
Then add an index on (account_id, date_start, device_uid) which will further improve performance as all necessary info - for the GROUP BY date_start and the COUNT(DISTINCT device_uid) parts - will be on the index:
SELECT COUNT(DISTINCT device_uid) AS cnt
, date_start AS period
FROM game_session
WHERE account_id = -2
AND date_start BETWEEN CURDATE() - INTERVAL 90 DAY
AND CURDATE()
GROUP BY date_start DESC
Related
I have the following database
CREATE TABLE `table` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`time` bigint(20) DEFAULT NULL,
`name` varchar(20) DEFAULT NULL,
`messages` varchar(2000) NOT NULL,
PRIMARY KEY (`id`)
)
INSERT INTO `table` VALUES (1,1467311473,"Jim", "Jim wants a book"),
(2,1467226792,"Tyler", "Tyler wants a book"),
(3,1467336672,"Phil", "Phil wants a book");
I need to get the records between date 29 Jun 2016 and 1 July 2016 for time intervals 18:59:52 to 01:31:12.
I wrote a query but it doesn't return the desired output
SELECT l.*
FROM table l
WHERE ((time >=1467226792) AND (CAST(FROM_UNIXTIME(time/1000) as time) >= '18:59:52') AND (CAST(FROM_UNIXTIME(time/1000) as time) <= '01:31:12') AND (time <=1467336672))
Any suggestions??
As I understand it, you're simply interested in all periods greater than '2016-06-29 18:59:52' and less than '2016-07-01 01:31:12' where the time element is NOT between '01:31:12' and '18:59:52'
I think you can turn that logic into sql without further assistance
Ah, well, here's a fiddle - left out all the from_unixtime() stuff because it adds unnecessary complication to an understanding of the problem - but adapting this solution to your needs is literally just a case of preceding each instance of the column time with that function:
http://rextester.com/OOGWB23993
If i got it right
SELECT l.*
FROM `table` l
WHERE time >=1467226792
AND time <=1467336672
AND CAST(FROM_UNIXTIME(time/1000) as time) >= '18:59:52'
AND FROM_UNIXTIME(time/1000) <= DATE_ADD(DATE_ADD(DATE_ADD(CAST(FROM_UNIXTIME(time/1000) as date), INTERVAL 25 HOUR), INTERVAL 31 MINUTE), INTERVAL 12 SECOND)
I'm facing a strange mysql behavior...
If I want to return the rows from "MyTable" with a date lower than date-10 seconds ago or a future date
I also store future date because in my real program, I "launch" some queries with delay and date is actually the last query date...i.e.: a kind of queue...:
SELECT (NOW() - date) AS new_delay, id
FROM MyTable
WHERE (NOW() - date < 10)
ORDER BY new_delay DESC;
This one does not work as expected: It returns all the entries:
EDIT: here is the result:
However, this one is working just fine:
SELECT (NOW() - date) AS new_delay, id
FROM MyTable
WHERE (NOW() < date + 10)
ORDER BY new_delay DESC;
DB example:
CREATE TABLE IF NOT EXISTS `MyTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
INSERT INTO `MyTable` (`id`, `date`) VALUES
(1, (NOW())),
(2, (NOW()-10)),
(3, (NOW()+100));
Any ideas??
Don't do the comparisons like that. In a numeric context now() end up being converted to an integer -- and in an arcane format. Instead, use DATEDIFF() or just regular comparisons. For instance, if you want the difference in days:
SELECT datediff(curdate(), date) as new_delay, id
FROM MyTable
WHERE date >= date_sub(now(), interval 10 day)
ORDER BY new_delay DESC;
use mysql DATEDIFF
select DATEDIFF(curdate(),date) as new_delay, id from MyTable
where date >= date_sub(curdate(), interval 10 day)
ORDER BY new_delay DESC;
DATEDIFF() function returns the time between two dates
As proposed by #Gordon in the his answer, I can use the date_sub / date_add functions...
I can correct the where clause to be :
WHERE NOW() < date_add(ServerRequests.date, interval 10 second)
OR
WHERE date > date_sub(now(), interval 10 second)
OR as proposed in my initial post:
WHERE (NOW() < date + 10)
But I still don't see why I cannot use the sub operation...So if anyone can give me a reason, I would be happy to understand...
In MySQL, I am using the following query to get records having the UpdatedAt field in between 2015-02-02 and 2015-02-06
SELECT `TotalPrice`,`UpdatedAt` FROM `price` WHERE `UpdatedAt`>= '2015-02-02' AND `UpdatedAt`<='2015-02-06' ORDER BY `UpdatedAt` DESC
And this is displaying UpdatedAt from 2015-02-05 to 2015-02-02. And the table has records for 2015-02-06 also. Kindly indicate the mistake i have done.
My guess is that UpdatedAt has a time component. Here is an alternative where clause:
WHERE `UpdatedAt` >= '2015-02-02' AND `UpdatedAt` < '2015-02-07'
You might find this more readable as:
WHERE `UpdatedAt` >= '2015-02-02' AND
`UpdatedAt` < DATE_ADD('2015-02-06', INTERVAL 1 DAY)
What you should not get in the habit of doing is:
WHERE DATE(UpdatedAt) >= '2015-02-02' AND DATE(UpdatedAt) <= '2015-02-06'
Although technically correct, this prevents the use of indexes.
i've made this SQL code :
CREATE TABLE `logs` (
`id_log` INT(11) NOT NULL AUTO_INCREMENT,
`data_log` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id_log`),
)
i made it to Insert a record when my server goes down,but i would like to make some check if it wasn't Inserted the same record 10 minutes before.
So i was looking for some SELECT that shows only records from NOW() to 10 minutes before.
You're looking for INTERVAL # [UNIT], there's various ways to use it -- http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html:
SELECT count(*)
FROM logs
WHERE data_log > NOW() - INTERVAL 10 MINUTE;
This will return the count of records written to the log in the last ten minutes:
SELECT Count(*) as count_in_last_10 FROM logs WHERE data_log BETWEEN DATE(NOW()-INTERVAL 10 MINUTE) AND NOW()
I'm running a query like this in a python script
results = []
for day in days:
for hour in hours:
for id in ids:
query = "SELECT AVG(weight) from table WHERE date >= '%s' \
AND day=%s \
AND hour=%s AND id=%s" % \
(paststr, day, hour, _id)
results.append(query.exec_and_fetch())
Or for people not used to python, for every day, for every hour in that day and for all the ids in a list for each of those hours I need to get the average weight for some items.
as an example:
day 0 hour 0 id 0
day 0 hour 0 id 1
...
day 2 hour 5 id 4
day 2 hour 6 id 0
...
This results in a lot of queries, so I'm thinking if it's possible to do this in one query instead. I've been fiddling a bit with views but I've always got stuck on the varying parameters, or they get so very very slow, it's a rather big table.
My closest guess is this:
create or replace view testavg as
select date, day, hour, id, (select avg(weight) from cuWeight w_i
where w_i.date=w_o.date
and w_i.day=w_o.day
and w_i.hour=w_o.hour)
from cuWeight w_o;
But that hasn't returned anything yet, after waiting a minute or two I cancel the query.
table looks like this:
CREATE TABLE `cuWeight` (
`id` int(11) NOT NULL default '0',
`date` date default NULL,
`hour` int(11) default '0',
`weight` float default '0',
`day` int(11) default '0',
KEY `id_index` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
myisam and latin1 are for historical(almost fossilised) reasons.
You need a GROUP BY query
select date, day, hour, id, avg(weight)
from cuWeight
where date > *some date*
group by date, day, hour, id ;
If it's still slow you can split it up in chunks, for example:
for day in days:
query = "select date, day, hour, id, avg(weight) \
from cuWeight \
where date > '%s' \
and day = %s \
group by date, day, hour, id " % \
(paststr, day)
...