I have a table containing:
Balance, Client_ID, Date
This table has ~25 Million rows - Most days, a service executes and creates a new row for each client, with today's date, and balance of the client.
Inside a date range, lets say 01/01/2016 to 12/05/2016, I need to get the first and last row.
*the service does not run every day, so doing Date = 12/05/2016 will not work. If today's balance is equal to yesterday's balance, there is no row inserted (saves me about 90% of the data, which if I calculate correctly, should be 300 Million rows)
To do such, I run these two queries:
Get the first date: 6.9433851242065 seconds
SELECT * FROM (SELECT * FROM daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016') dates
GROUP BY Client_ID
Get the last date: 32.034277915955 seconds
SELECT * FROM (SELECT * FROM daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016'
ORDER BY Date DESC) dates
GROUP BY Client_ID
The first query has no order, because rows are inserted always in the right order, by the service mentioned above - and such is much faster. (7/32)
How can I make both queries faster, or at least the second one?
Query description:
Get the row where the date is the first date after 01/01/2016
Get the row where the date is the last date before 13/05/2016
EDIT: The checked answer gives me the following:
ASC and DESC are mine, 'combined' is the suggested answer
dates_ASC: 33.300458192825
dates_DESC: 8.9232740402222
dates_combined: 8.4357199668884
dates_ASC: 5.4825110435486
dates_DESC: 10.173403978348
dates_combined: 2.7024359703064
dates_ASC: 15.090759038925
dates_DESC: 29.375104904175
dates_combined: 3.2885720729828
Pick each client's min and max time in a derived table. Join with that table:
select *
from daily d1
join (select Client_ID, max(TIME) as maxtime, min(TIME) as mintime
from daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016'
group by Client_ID) d2
on d1.Client_ID = d2.Client_ID and d1.TIME in (d2.mintime, d2.maxtime)
Try first query as:
SELECT * FROM daily WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016' ORDER BY TIME ASC LIMIT 1
The second query as:
SELECT * FROM daily WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016' ORDER BY TIME DESC LIMIT 1
Related
Use Case: I have a cron checking every 5 minutes some statistics and insert it into the database table stats
**Structure**
`time` as DATETIME (index)
`skey` as VARCHAR(50) (index)
`value` as BIGINT
Primary (time and skey)
Now I want to create a graph to display the daily average in progress over the day - so i.E. a graph for playing users:
from 0-1 i have 10 playing users (avg value from 0-1 is now 10)
from 1-2 i have 6 playing users (avg value is now 8 => (10+6) / 2)
from 2-3 i have 14 playing users (avg value is no 10 => (10+6+14) / 3
and next day it begins from start
I got already queries running, but it takes 3.5+ seconds to run
First attempt:
SELECT *
, (SELECT AVG(value)
FROM stats as b
WHERE b.skey = stats.skey
AND b.time <= stats.time
AND DATE(b.time) = DATE(stats.time))
FROM stats
ORDER
BY stats.time DESC
Second attempt:
SELECT *
, (SELECT AVG(b.value)
FROM stats as b
WHERE b.skey = stats.skey
AND DATE(b.time) = DATE(stats.time)
AND b.time <= stats.time) as avg
FROM stats
WHERE skey = 'playingUsers'
GROUP
BY HOUR(stats.time)
, DATE(stats.time)
First try was to get each entry and calculate the average
Second try was to group by hour (like my example)
Anyway, this does not change anything in performance
Is there anyway to boost performance in mysql or do i have to change the full logic behind it?
DB Fiddle:
https://www.db-fiddle.com/f/krFmR1yPsmnPny2zi5NJGv/4
I suggest to separate the calculation of the average per hour from the calculation of the days average and to calculate these values only once per hour via grouping.
If you are on MySQL 8, I suggest to use CTE as follows:
with HOURLY AS (
SELECT distinct
DATE_,
HOUR_,
AVG(b.value) as avg_per_hour
FROM (SELECT s.value, DATE(s.time) DATE_, HOUR(s.time) HOUR_
FROM stats s
where skey = 'playingUsers'
) b
GROUP BY b.DATE_, b.HOUR_
ORDER BY b.DATE_ DESC, b.HOUR_ DESC
)
SELECT *
, (SELECT AVG(b.avg_per_hour)
FROM HOURLY as b
WHERE b.DATE_ = HOURLY.DATE_
AND b.HOUR_ <= HOURLY.HOUR_) as avg
FROM HOURLY
This statement lasts < 300 ms in the given fiddle.
The calculation corresponds to the algorithm you described in the table above.
However, the results differ from those of the statements presented.
I am trying to find out a maximum number from a given date ranges.
for example, my table contains
date number
---------- --------
01-01-2019 1
05-01-2019 3
07-01-2019 2
10-01-2019 1
11-01-2019 2
and I want to find the max number in date from 06-01-2019 to 11-01-2019
When I use the query,
select max(count) from TABLE where date between startDate and endDate;
the output is 2.
But what I wanted is if the startDate is not in the table, to include the previous row. For example in the previous case, I want to include the row 05-01-2019 and thus the output should be 3.
Is there any query for this process or do I need to write an algorithm?
Assume the dates in table are sorted and I use a MySQL database.
You can do this by using subquery
SELECT MAX(number)
FROM TABLE
WHERE date >= (
SELECT date
FROM TABLE
WHERE date <= startDate
ORDER BY date DESC
LIMIT 1
)
AND date <= endDate
Subquery will return largest nearest date to startDate.
This date can then be used as a minimum value for your outer query.
In MySQL 8+, you can use lead():
select max(number)
from (select t.*, lead(date) over (order by date) as next_date
from t
) t
where next_date > $start_date and
date <= $end_date;
I have a db table with about a half-million rows of user sign in data.
simple db table:
users_signin:
id
userid
datetime
What I am trying to figure out is how to acquire the average, or most common, hour of day that a specific person signs into the website.
I am wanting to have a "hour" returned, such as: 04 or 23 (4am/11pm).
The datetime field is a unix time stamp
I have fiddled around doing avg() but getting just the hour is where I am hitting a wall at.
If you want the most common hour of the day for a specific user, you can try the following query:
select hour(datetime) as hr, count(*)
from simple
where userid = $userid
group by hour(datetime)
order by count(*) desc
limit 1;
EDIT:
If the thing you are calling datetime is really a unix time, then you should do:
select hour(from_unixtime(datetime)) as hr, count(*)
from simple
where userid = $userid
group by hour(from_unixtime(datetime))
order by count(*) desc
limit 1;
Let's say I have a table in MySQL DB with following columns
employee, status, work, start_date
Consider that start_date column is date and time.
If I do
SELECT employee, status, work, start_date from table_name WHERE DATE(date) >= CURDATE()-10
this will give me records from Current date - 10 days. In this case I might get 1 record to 100 records based on the data.
I need only 10 records based on date/time (e.g. if there are 10 employees that started to work today then I should get only today's records and not 10 days record)
How can I do that?
You mean you want the ten most recent entries? You can add an ORDER BY to set the order in which the results come back, and a LIMIT to reduce the total number of results.
SELECT employee, status, work, start_date from table_name
WHERE DATE(date) >= CURDATE()-10
ORDER BY date DESC
LIMIT 10
you need to use order by on start_date and limit
SELECT employee, status, work, start_date
from table_name
order by start_date desc
limit 10
I want to do a MySQL Query which selects a given number of Rows from a single table from a given offset like
SELECT * FROM table
WHERE timestamp < '2011-11-04 09:01:05'
ORDER BY timestamp DESC
LIMIT 100
My problem is that i always want all rows within a day if one row of a day will be included in the result.
It would be no problem to have a result with e.g. 102 rows instead of 100
Can i realize this with a single SQL statement?
Thanks for your help!
This seems to work on my system:
SELECT UserID, Created
FROM some_user
WHERE Created < '2011-11-04 09:10:11'
AND Created >= (
SELECT DATE(Created) -- note: DATE() strips out the time portion from datetime
FROM some_user
WHERE Created < '2011-11-04 09:10:11'
ORDER BY Created DESC
LIMIT 99, 1 -- note: counting starts from 0 so LIMIT 99, 1 returns 100th row
)
ORDER BY Created DESC
-- 0 rows affected, 102 rows found. Duration for 1 query: 0.047 sec.
There might be a faster alternative.
If I understand your question correctly, you're intrested in retrievieng 100 rows, + any rows that are on the same day as ones already retrieved. You can do this using a subquery:
SELECT table.*
FROM table, (
SELECT DISTINCT day
FROM (
SELECT TO_DAYS(timestamp) day
FROM table
WHERE timestamp < :?
LIMIT 100
)
) days
WHERE TO_DAYS(table.timestamp) = days.day
ORDER BY timestamp
Exclude the time part in the query and remove the LIMIT.
SELECT * FROM table
WHERE timestamp < '2011-11-04 00:00:00'
ORDER BY timestamp DESC