MySQL Query to get only one entry per interval from database - mysql

I have a table with this structure and some sample values:
ID | created | value | person
1 | 1 | 5 | 1
2 | 2 | 2 | 2
3 | 3 | 3 | 3
4 | 4 | 5 | 1
5 | 5 | 1 | 2
6 | 6 | 32 | 3
7 | 7 | 9 | 1
8 | 8 | 34 | 2
10 | 9 | 25 | 3
11 | 11 | 53 | 1
12 | 12 | 52 | 2
13 | 13 | 15 | 3
... etc
The created column will have timestamps. I.e. A number like "1555073978". I just made it incremental to demonstrate that the timestamps will rarely be the same.
So values are stored per person with creation times. Values are added every minute. After a week, this table is quite big. So when I do a query to draw a graph, PHP run's out of memory because the dataset is so huge.
So what I am looking for, is an easy way to do a query on a table like this, so that I get values in smaller intervals.
How would I query this table, so that I get:
- only one value per person per interval
- where interval should be 15 mins, 30 mins, 60 mins etc (i.e. a parameter in the query)
I've started with an approach but don't want to spend too much time, in case i am missing a much easier way. My way involves converting the timestamp to YEAR-MONTH-DAY-HOUR, but this will only work for hourly. I am also struggling to make sure that the query returns the MOST RECENT entry PER PERSON for that hour.
Any help would be greatly appreciated.

assuming your created column is a timestamp and you want the max value for person every each 15 minutes you could try
select person, max(value)
from my_table
group by person, FLOOR(UNIX_TIMESTAMP(created )/(15 * 60))
but if you dont need unix_timestamp
then
group by person, FLOOR(created /(15 * 60))
If you want the most recent values for person and interval then you could use
select * from my_table m
inner join (
select person, max(created) max_created
from my_table
group by person, FLOOR(UNIX_TIMESTAMP(created )/(15 * 60))
) t on t.person = m.person and t.max_created = m.created

Related

SQL calculate timediff between intervals including a time from a separate table

I have 2 different tables called observations and intervals.
observations:
id | type, | start
------------------------------------
1 | classroom | 2017-06-07 16:18:40
2 | classroom | 2017-06-01 15:12:00
intervals:
+----+----------------+--------+------+---------------------+
| id | observation_id | number | task | time |
+----+----------------+--------+------+---------------------+
| 1 | 1 | 1 | 1 | 07/06/2017 16:18:48 |
| 2 | 1 | 2 | 0 | 07/06/2017 16:18:55 |
| 3 | 1 | 3 | 1 | 07/06/2017 16:19:00 |
| 4 | 2 | 1 | 3 | 01/06/2017 15:12:10 |
| 5 | 2 | 2 | 1 | 01/06/2017 15:12:15 |
+----+----------------+--------+------+---------------------+
I want a view that will display:
observation_id | time_on_task (total time in seconds where task = 1)
1 | 13
2 | 5
So I must first check to see if the first observation has task = 1, if it is I must record the difference between the current interval and the start from the observations table, then add that to the total time. From there on after if the task = 1, I just add the time difference from the current interval and previous interval.
I know I can use:
select observation_id, TIME_TO_SEC(TIMEDIFF(max(time),min(time)))
from your_table
group by observation_id
to find the total time in the intervals table between all intervals outside of the first one.
But
1. I need to only include interval times where task = 1. (The endtime for the interval is the one listed)
2. Need the timediff between the first interval and initial start (from observations table) if number = 1
I'm still new to the Stackoverflow community, but you could try to use SQL
LAG() function
For instance
Using an outer Select Statement
SELECT COl1, COL2, (DATEDIFF(mi, Inner.prevtime, Currentdatetime,0)) AS Difference
FROM ( SELECT LAG(Created_Datetime) OVER (ORDER BY Created_Datetime) AS prevtime
From MyTable
Where SomeCondition) as Inner
Sorry if it looks goofy, still trying to learn to format code here.
https://explainextended.com/2009/03/12/analytic-functions-optimizing-lag-lead-first_value-last_value/
Hope it helps

Mysql sum multiple column values with date condition

I have a client table with below columns which have data of every day purchase of every client month wise.
ID|MONTH|DAY1|DAY2|DAY3|DAY4|..........|DAY31
1 | 4 | 10 | 20 | 0 | 15 |..........|10
2 | 4 | 20 | 30 | 23 | 7 |..........| 5
1 | 5 | 5 | 10 | 20 | 4 |..........| 20
1 | 6 | 12 | 0 | 10 | 5 |..........| 10
2 | 6 | 10 | 10 | 5 | 10 |..........| 5
Now i want to find the total qty purchased by every client between 15/4/2015 to 15/6/2015.
I am new to mysql, so have no idea how to move forward.
Thanks in advance
It's not a good idea to use a column for every day of the month to store a value. But sometimes we are stuck on bad data formats, here's how you can get the count you need:
SELECT ID, SUM(qty)
FROM (
SELECT
ID,
MAKEDATE(2015,1) + INTERVAL (month-1) MONTH AS dt,
DAY1 AS qry
FROM yourtable
UNION ALL
SELECT
ID,
MAKEDATE(2015,1) + INTERVAL (month-1) MONTH AS dt + INTERVAL 1 DAY,
DAY2 AS qry
UNION ALL
SELECT
ID,
MAKEDATE(2015,1) + INTERVAL (month-1) MONTH AS dt + INTERVAL 2 DAY,
DAY3 AS qry
FROM yourtable
UNION ALL
...
until day 31
...
) s
WHERE
dt>='2015-04-15' AND dt<='2015-06-15'
GROUP BY ID
I'm using a subquery to normalize the data structure, then I'm doing the counts on the outer query, a simple where clause and group by will give the results that you need.

SQL query to find a winner depending on winning odds

I need a SQL query to determine a random winner. Each user has it's own winning odds. The more the winning_odds value is high, the more the user has chances to win. Here's a look at the table structure:
id email winning_odds
1 test#test.com 3
2 test2#test.com 5
3 test3#test.com 2
4 test4#test.com 1
5 test5#test.com 9
MySQL database. Table has approximately 100000 rows. There's only one winner, one time. Emails are unique. Anyone has a solution?
Thanks.
Select email from user order by winning_odds*rand() limit 1
I really liked this question, I'm posting the answer for postgresql.
select
*, generate_series(1, mytable.winning_odds)
from
mytable
order by
random()
limit 1;
This is how it works. For each row of your table, we replicate the row N times as your winning odds.
So you get at first and without limiting the query results:
5 | test5#test.com | 9 | 9
2 | test2#test.com | 5 | 3
3 | test3#test.com | 2 | 1
1 | test#test.com | 3 | 1
5 | test5#test.com | 9 | 5
1 | test#test.com | 3 | 3
5 | test5#test.com | 9 | 2
2 | test2#test.com | 5 | 4
2 | test2#test.com | 5 | 5
5 | test5#test.com | 9 | 1
4 | test4#test.com | 1 | 1
5 | test5#test.com | 9 | 7
5 | test5#test.com | 9 | 4
5 | test5#test.com | 9 | 6
2 | test2#test.com | 5 | 1
5 | test5#test.com | 9 | 8
3 | test3#test.com | 2 | 2
1 | test#test.com | 3 | 2
2 | test2#test.com | 5 | 2
5 | test5#test.com | 9 | 3
Now, selecting randomly any row of the generated table will reflect the probabilities of your winning_odds field.
All you have to do is to order it randomly and get the first record.
9 | test5#test.com | 9 | 2
Regards
I am speculating that the "odds" are not integers and that you want something that has a "9" to be nine times more likely than a "1".
The proper way to do this is with a cumulative sum. Then generate a random value between the min and max of the cumulative sum and choose the record that is in that range. The following query does this in MySQL:
select t.*
from (select t.*,
coalesce((select sum(odds) from t t2 where t2.id < t.id), 0) as cumsum,
const.sumodds
from t cross join
(select rand()*sum(odds) as val from t) const
) t
where val between cumsum and cumsum + t.odds
However, this is doing a non-equijoin and would probably be prohibitively expensive in MySQL. Other databases have the ability to do a cumulative sum in a single query. MySQL does not have an efficient way of doing this.
How to optimize the query depends on certain other factors in the problem. How many different values do "odds" take on? Can you use temporary tables?
I don't have the time right now to write out the solution, but there is a more efficient way. YThe idea is to split the problem into two searches. The first will find which "odds" value wins. The second will find which row wins.
Here are the details:
(1) Summarize the data into a table by the odds. This table would have 11 rows, and contain the "odds" and the "count" for each.
(2) Calculate the sum of "count*odds" for each row, starting at 0 for the first row. You can use the above query as a guide, since this is such a small amount of data it will run quickly.
(3) Calculate a random number as rand()*<sum of all odds>. Now, locate the odds where the number is between cumsum an cumsum+odds.
(4) Now return to the original table and issue a query such as:
select *
from t
where odds = <winning odds>
order by rand()
limit 1
If I understand the question correctly, you are asking how to select a random record from the table. This should work:
SELECT *
FROM tableName
ORDER BY RAND() LIMIT 0,1;
Still now clear how are you planning to user the winning_odds value.

SQL for last changed value before given index

My table tracks changes to a value that is otherwise constant.
For example this is what it might look like if resources 1 and 2 start with values of 100 and 50 respectively - then resource 1 increases to 110 on day 5 (but resource 2 is unchanged), then resource 2 changes to 60 on day 7.
resource_id | date_id | value
------------+---------+------
1 | 1 | 100
2 | 1 | 50
1 | 5 | 110
2 | 7 | 60
Is there a simple query to get the value of the resources on a specific day? Something like:
SELECT resource_id, date_id, value FROM Resources WHERE ??? -- day = 6
resource_id | date_id | value
------------+---------+------
1 | 5 | 110
2 | 1 | 50
Note: I don't want interpolation of the values - just the last set value.
I think this should work:
SELECT resource_id,
date_id,
value
FROM resources,
(SELECT resource_id,
Max(date_id) AS date_id
FROM resources
WHERE date_id <= 6
GROUP BY resource_id ) temp
WHERE resources.resource_id = temp.resource_id
AND temp.date_id = resources.date_id
SQLFiddle Demo

Counting messages per day (after 17:00 counts for next day)

I have two MySQL tables: stats (left) and messages (right)
+------------+---------+ +---------+------------+-----------+----------+
| _date | msgcount| | msg_id | _date | time | message |
+------------+---------+ +----------------------+-----------+----------+
| 2011-01-22 | 2 | | 1 | 2011-01-22 | 06:23:11 | foo bar |
| 2011-01-23 | 4 | | 2 | 2011-01-22 | 15:17:03 | baz |
| 2011-01-24 | 0 | | 3 | 2011-01-22 | 17:05:45 | foobar |
| 2011-01-25 | 1 | | 4 | 2011-01-22 | 23:58:13 | barbaz |
+------------+---------+ | 5 | 2011-01-23 | 00:06:32 | foo foo |
| 6 | 2011-01-23 | 13:45:00 | bar foo |
| 7 | 2011-01-25 | 02:22:34 | baz baz |
+---------+------------+-----------+----------+
I filled in stats.msgcount, but in reality it is still empty. I'm looking for a query way to:
count the number of messages for every stats._date (notice the zero msgcount on 2011-01-25)
messages.time is in 24-hour format. All messages AFTER 5 o'clock (17:00:00) should be counted for the next day (notice msg_id 3 and 4 count for 2011-01-23)
update stats.msgcount to hold all counts
I'm especially concerned about the "later than 17:00:00 count for next day" part. Is this possible in (My)SQL?
You could use:
UPDATE stats LEFT JOIN
( SELECT date(addtime(_date,time) + interval 7 hour) as corrected_date,
count(*) as message_count
FROM messages
GROUP BY corrected_date ) mc
ON stats._date = mc.corrected_date
SET stats.msgcount = COALESCE( mc.message_count, 0 )
However this query requires dates you are interested in to be in the stats table already, if you don't have them make _date primary or unique key if its not yet and use:
INSERT IGNORE INTO stats(_date,msgcount)
SELECT date(addtime(_date,time) + interval 7 hour) as corrected_date,
count(*) as message_count
FROM messages
GROUP BY corrected_date
Really, all you're doing is shifting the times by 7 hours. Something like this should work:
UPDATE stats s
SET count = (SELECT COUNT(msg_id) FROM messages m
WHERE m._date BETWEEN DATE_SUB(DATE_ADD(s._date, INTERVAL TIME_TO_SEC(m.time) SECOND), INTERVAL 7 HOUR)
AND DATE_ADD(DATE_ADD(s._date, INTERVAL TIME_TO_SEC(m.time) SECOND), INTERVAL 17 HOUR));
The basic idea is that it takes each date in your stats table, adjusts it by 7 hours, and looks for messages sent in that range. If you used a DATETIME column instead of separate DATE and TIME columns, you wouldn't need the extra DATE_ADD(..., TIME_TO_SEC) stuff.
There may be a better way to add a date and a time, I didn't see one with a quick look at the MySQL reference documents.
So all you'd need to do is insert a new row in the stats table with a 0 for the msgcount, and run the update command. If you only wanted to update a few days (since the message count probably isn't changing 6 days later) you just need a simple where clause on the update:
UPDATE stats s
SET ...
WHERE s._date BETWEEN '2012-04-03' AND '2012-04-08'