I have a table that's structured like so:
id value hour
1 4 176475
2 2 176475
3 3 176475
4 2 176475
1 2 184563
2 1 184563
3 4 184563
4 3 184563
... ... ...
1 2 N
2 3 N
3 1 N
4 4 N
The key property is that the data is split into hours which are in ascending order. The 'hours' are timestamps truncated to enforce 24 buckets per day. I want to do several things:
Pull all of the rows for the first hour
Sum values for each ID over 3 hours, 8 hours...N hours.
Is there a simple way to do this? I am aware that I could use NTILE to label the data but that's a very expensive operation in Spark.
EDIT:
Expected Result for aggregating hours 1-3:
id value
1 9
2 7
3 10
4 8
The values are made up, but the idea is to sum the values of the IDs in each of the 3 hours, so that I have one value per ID, instead of three.
This is the query you're looking for:
SELECT id, SUM(value) as `value`
FROM yourTableHere
WHERE hour between (NOW() - INTERVAL X HOUR) AND NOW()
GROUP BY id, hour
Breaking the query down.
Select the ID and count the value from yourTable.
Where hour is between X hours ago and now.
Group the results by id and hour between the given timestamps.
Replace X with 1/3/8 or more for the hours you wish.
Related
I have a table,
Name Seconds Status_measure
a 0 10
a 10 13
a 20 -1
a 30 15
a 40 20
a 50 12
a 60 -1
Here I want for a particular name a new column which is calculated by, "The number of times the value goes >-1 only after once the -1 is met" . So in this particular data I want a new column for the name "a" which has the value=3 , because once the -1 is reached in Status_measure, we have 3 values (15 and 20 and 12)>-1
Required data frame:
Id Name Seconds Status_measure Value
1 a 0 10 3
2 a 10 13 3
3 a 20 -1 3
4 a 30 15 3
5 a 40 20 3
6 a 50 12 3
7 a 60 -1 3
I tried doing
count(status_measure>-1) over (partition by name order by seconds)
But this is not giving any desired result
You can do it in 2 steps, group data, count entries of the grp = 1.
select *, sum(Status_measure > -1 and grp = 1) over(partition by name) n
from (
select *
, row_number() over(partition by name order by Seconds) - sum(Status_measure > -1 ) over(partition by name order by Seconds) grp
from tbl
) t
An option is using a variable update, which:
starts from 0
increases its value when reaches a -1
decreases its value when reaches a second -1
Once you have this column, you can run a sum over your values.
SET #change = 0;
SELECT *, SUM(CASE WHEN Status_measure = -1
THEN IF(#change=0, #change := #change + 1, #change := #change - 1)
ELSE #change END) OVER() -1 AS Value_
FROM tab
Check the demo here.
Limitations: this solution assumes you have only one range of interesting values between -1s.
Note: there's a -1 decrement from your sum because the first update of the variable will leave 1 in the same row of -1, which you don't want. For better understanding, comment out the application of SUM() OVER and see intermediate output.
More of a clarification to your question first. I want to expand your original data to include another row for the sake of 2 vs 3 entries. Also, is there some auto-increment ID in your data that the sequential consideration is applicable such as
Id Name Seconds Status_measure Value
1 a 0 10 3
2 a 10 13 3
3 a 20 -1 3
4 a 30 15 3
5 a 40 20 3
6 a 50 12 3
7 a 60 -1 3
If sequential, and you have IDs 1 & 2 above the -1 at ID #3. This would indicate two entries. But then for IDs 4-6 above -1 have a count of three entries before ID #7.
So, what "VALUE" do you want to have in your result. The max count of 3 for all rows, or would it be a value of 2 for ID#s 1, 2 and 3? And value of 3 for Ids 4-7? Or, do you want ALL entries to recognize the greatest count before -1 measure to show 3 for all entries.
Please EDIT your question, you can copy/paste this in your original question if need be and provide additional clarification as requested (auto-increment as well as that is an impact of final output / determining break).
I keep on looking around StackOverflow for a similar question but it seems that I can't find one. I would like to know the difference between timestamps in different rows grouped by employee ID.
Time Logs table:
id timestamp log_type
1 2019-06-19 12:34:50 log_in
2 2019-06-19 13:12:46 start_break
3 2019-06-19 13:13:56 end_break
4 2019-06-19 17:23:40 start_break
5 2019-06-19 17:44:36 end_break
6 2019-06-19 19:00:04 start_break
7 2019-06-19 19:03:17 end_break
8 2019-06-19 20:05:54 log_out
What I'm trying to accomplish is to calculate all duration of breaks. In this case, 1st break (id #2 and #3) is 1 minute and 10 seconds, 2nd break (id #4 and #5) is 20 minutes and 56 seconds, 3rd break (id #6 and #7) is 3 minutes and 13 seconds thus with the total of 25 minutes and 19 seconds.
Thanks for helping out! Much appreciated.
You can try below -
DEMO
select SEC_TO_TIME(sum(diff)) as result from
(
select
timestampdiff(second,min(case when log_tpe='start_break' then timestamps end) ,
min(case when log_tpe='end_break' then timestamps end)) as diff
from t
group by date(timestamps),hour(timestamps)
)A
OUTPUT:
result
00:25:19
I'am trying to get total full time and total half time by user, Timing is stored in single column, Integer value of timing should come in full time as sum(timing) and floating value in half time but in count
id user_id timing
1 2 1
2 2 2.5
3 1 1.5
4 1 1
5 3 3
6 2 2.5
I need the result as
user_id fulltime halftime
1 2 1
2 5 2
3 3 0
SELECT user_id
, SUM(FLOOR(timing)) AS fulltime
, SUM((timing % 1) * 2) AS halftime
FROM table
GROUP BY user_id;
This query should help you. please try it on your data
SELECT user_id,
sum(if(ceil(timing)>timing,0,timing)) as fulltime,
sum(if(ceil(timing)>timing,timing,0)) as halftime
FROM rest
GROUP BY user_id
Thanks
Amit
In mysql, I need a query that returns the quantity of repeated values in the field "Info" of my table "Log".
Table Log:
ID_Log User Info
1 1 3
2 1 3
3 1 3
4 1 5
5 1 6
6 1 6
7 1 7
8 1 8
9 1 8
The query should return "4" (Info 3 appears three times, Info 6 appears two times, Info 8 appears two times).
Any suggestions?
You can get the number of values that have already appeared by using a simple subtraction. Subtract the number of distinct values from the total number of rows:
select count(*) - count(distinct info)
from log;
The difference is the number that "repeat".
This should work. Group the values of info together and only keep the results where the number of occurrences minus 1 is greater than 0. Then sum the numbers of occurrences.
select sum(repeats)
from (SELECT Info, count(*) - 1 AS repeats
FROM Log
GROUP BY Info
HAVING repeats > 0)
I've read similar questions here on stackoverflow, but the OP's table structure is never quite the same as mine, so the answer doesn't work for me. The posts I've read are only trying to GROUP BY one column as opposed to two. I'm using MySQL, latest stable release.
Here's my table "reference":
id formatID referenceTime
1 1 2011-6-12 12:40
2 2 2011-6-12 1:04
3 4 2011-6-12 1:03
4 2 2011-6-12 15:20
5 3 2011-6-12 9:30
6 3 2011-6-12 2:55
7 5 2011-6-12 13:15
8 1 2011-6-12 12:32
(etc)
I want to create a query that show how many of each type of format occurred by hour of day. The point of this is to see what is the busiest time of day. I am trying to write a query that will create output that I can use for some simple graph web apps (Highcharts.js). I want it to look like this:
Timeofday Subgroup Count
12AM 1 2
12AM 2 6
12AM 3 7
12AM 4 2
12AM 5 0
1AM 1 3
1AM 2 3
1AM 3 0
1AM 4 0
1AM 5 1
(etc)
I'm using this query:
SELECT date_format(referenceTime,'%I %p') AS timeofday,
reference.referenceFormatID AS subgroup,
count(*) AS count
FROM reference
GROUP BY timeofday,subgroup ASC
However, the output skips "rows" where the count equals zero and so ends up looking like this:
Timeofday Subgroup Count
12AM 1 2
12AM 2 6
1AM 3 7
1AM 4 2
1AM 5 1
3AM 1 3
6AM 2 3
7AM 3 1
7AM 4 1
9AM 5 1
(etc)
I need those zeros to be able to create a properly formatted data series for my app.
The LEFT JOIN method where you put all the times into a second table isn't working for me because I am grouping by two different columns. Apparently, the LEFT JOIN criteria is satisfied as long as each hour shows up somewhere in the output table, but I need each hour to appear for each format.
Any suggestions?
You have two options, either create a lookup table with the possible hours in it, or use strange query involving the dual table and union to get the values that you are looking for.
In the first case, you would have a table with maybe a single field for the moment, let's just call it hours and the field is timeofday.
In the hours timeofday, you would have the following data:
timeofday
12AM
1AM
2AM
....
Then your query is as simple as
SELECT hours.timeofday,
reference.referenceFormatID AS subgroup,
count(reference.referenceFormatID) AS count
FROM hours
LEFT JOIN reference on date_format(referenceTime,'%I %p') = hours.timeofday
GROUP BY hours.timeofday,subgroup ASC
EDIT
To get all combinations, you would also need a formats table with all the possible formatIDs as was mentioned by rfausak. You could also do this with a distinct, but let's just assume that you have this table, let's call it formats. Again, this table could have a single column.
Part 1 is to get all the combinations:
SELECT hours.timeofday,
formats.ID
from hours
join formats
This is a Cartesian join that would merge all possible hours and format IDs.
Now we add in the LEFT JOIN
SELECT hours.timeofday,
formats.ID,
count(reference.subgroup)
FROM hours
JOIN formats
LEFT JOIN reference on date_format(referenceTime,'%I %p') = hours.timeofday
AND reference.subgroup = formats.ID
GROUP BY hours.timeofday,formats.ID ASC
If you try to do it using a DUAL table look up, you can use a method similar to generate days from date range