I have a dataset that I need to do some aggregation on to display.
Date, Rank, Vote
07/20/2013, 8, -1
07/21/2013, 8, -1
07/23/2013, 7, -1
07/24/2013, 6, 1
07/25/2013, 7, 1
07/26/2013, 7, -1
07/27/2013, 8, 1
07/28/2013, 8, 1
07/29/2013, 8, -1
I'd like to group by consecutive ranks, summing the vote, and choosing the first of the grouped dates. The above data set would return:
07/20/2013,8,-2
07/23/2013,7,-1
07/24/2013,6,1
07/25/2013,7,0
07/27/2013,8,1
My first thought was GROUP BY Date and Rank but that wouldn't consolidate multiple days. I can do this after the fact by looping thru the data or I can use a cursor within a stored procedure but I'm curious if there is a simpler way with a SQL query.
This does it:
SELECT firstDate, Rank, SUM(vote) Votes
FROM (
SELECT #first_date := CASE WHEN Rank = #prev_rank
THEN #first_date
ELSE Date
END firstDate,
#prev_rank := Rank curRank,
data.*
FROM (SELECT #first_date := NULL, #prev_rank := NULL) init
JOIN (SELECT Date, Rank, Vote
FROM MyTable
Order by Date) data
) grouped
GROUP BY firstDate, Rank
SQLFIDDLE
Most straightforward way I can see, is what you already pointed out.
Use the Group By SQL:
SELECT date, rank, SUM(vote) FROM YourTable
GROUP BY date, rank
Fiddle Demo: http://sqlfiddle.com/#!2/d65d5c/3
Iterate through this record set in your program and do what is needed to get your data.
(If you tag your question with a programming language, I can show you how).
So basically no, I can't see any better way to do this. As far I can see this would end up in a fairly complicated and slow SQL query. But maybe someone can teach me better.
Related
I have a MySQL table with two columns: takenOn(datetime), and count(int). count contains the number of steps I have taken.
I'm trying to write a query that will tell me the time when I meet my goal of 10,000 steps every day.
So far, I have the following query:
SET #runningTotal=0;
SELECT
`Date`,
DATE_FORMAT(MIN(takenOn), '%l:%i %p') AS `Time`,
TotalCount
FROM
(SELECT
DATE(s.takenOn) AS `Date`,
s.takenOn,
s.`count`,
#runningTotal := #runningTotal + s.`count` AS TotalCount
FROM
(select * from step where DATE(takenOn) = '2016-10-29') s) temp
WHERE TotalCount >= 10000;
This works, but of course gives me the MIN(takenOn) for October 29th only. How can I expand this query to give me MIN(takenOn) for all possible dates in the table?
Thank you!
I am assuming that the steps you care about are all within one day. You are on the right track. Here is the code for multiple days:
SELECT `Date`, DATE_FORMAT(MIN(takenOn), '%l:%i %p') AS `Time`,
MIN(TotalCount)
FROM (SELECT DATE(s.takenOn) AS `Date`,
s.takenOn,
s.`count`,
(#runningTotal := if(#d = DATE(s.takenOn), #runningTotal + s.`count`,
if(#d := DATE(s.takeOn), s.`count`, s.`count`)
)
) AS TotalCount
FROM step s CROSS JOIN
(SELECT #runningTotal := 0, #d = '') params
ORDER BY takenOn
) s
WHERE TotalCount >= 10000
GROUP BY `Date`;
Note that all the variable assignments are in one expression. This is important because MySQL does not guarantee the order of evaluation of expressions in a SELECT. So, if you split the assignments across more than one expression, you are not guaranteed that the code will work.
You can use the Group By and Having clause to achieve this, refer to this example:
SELECT
sum(takenON),date
FROM
step
GROUP BY
day(date)
Having SUM(takenON)>150
I have a SQL table, one row is the revenue in the specific day, and I want to add a new column in the table, the value is the incremental (could be positive or negative) revenue between a specific day and the previous day, and wondering how to implement by SQL?
Here is an example,
original table,
...
Day1 100
Day2 200
Day3 150
...
new table (add incremental column at the end, and for first column, could assign zero),
Day1 100 0
Day2 200 100
Day3 150 -50
I am using MySQL/MySQL Workbench.
thanks in advance,
Lin
SELECT a.day, a.revenue , a.revenue-COALESCE(b.revenue,0) as previous_day_rev
FROM DailyRevenue a
LEFT JOIN DailyRevenue b on a.day=b.day-1
the query assume that each day has one record in the table. If there could be more than 1 row for each day you need to create a view that sums up all days grouping by day.
If you're okay with re-ordering the columns slightly, something like this is pretty simple to understand:
SET #prev := 0;
SELECT day, revenue - #prev AS diff, #prev := revenue AS revenue
FROM revenue ORDER BY day ASC;
The trick is that we calculate the difference to the previous first, then set the previous to the current and display it as the current in one step.
Note, this depends on the order being correct since the calculations are done during the returning of the rows, so you need to make sure you have an ORDER BY clause that returns the days in the correct order.
Try;
select
t.date_col, t.val_col,
case when t1.val_col is null then 0
else t.val_col - t1.val_col end diff
from (
select t.* , #r := #r + 1 lev
from tbl t,
(select #r := 0) r
order by t.date_col
) t
left join (
select t.* , #r1 := #r1 + 1 lev
from tbl t,
(select #r1 := 1) r
order by t.date_col
) t1
on t.lev = t1.lev
This will calculate value diff even if there is a missing date
I have a table with columns like this:
id | timestamp | ...
and I am looking for rows where the timestamp decreased since the previous row.
I tried a statement like this:
SELECT count(a.id)
FROM tbl AS a INNER JOIN tbl AS b ON a.id+1=b.id
WHERE a.timestamp<b.timestamp;
but it appears not to have worked. I get zero results even though I expect some. Any suggestions what is wrong?
I would also appreciate any ideas on a better way to write this query.
I am using MySQL.
You can get the previous value using a correlated subquery, and then use that for the comparison:
select t.*
from (select t.*,
(select t2.timestamp from tbl t2 where t2.id < t.id order by t2.id desc limit 1
) as prevts
from tbl t
) t
where timestamp < prevts;
The problem with your query is probably that the ids have gaps in them.
EDIT:
You can do this with variables. The challenge is getting the variable comparison and assignment in a single expression. This is needed because MySQL does not guarantee the order of evaluation of expressions in a select statement.
The following assigns a value to IsDecreasing and assigns the values:
select t.*
from (select t.*,
if(#prev > timestamp, if(#prev := timestamp, 1, 1),
if(#prev := timestamp, 0, 0)
) IsDecreasing
from tbl t cross join
(select #prev := -1) vars
order by id
) t
where IsDecreasing = 1;
This should be faster than the previous method -- probably even when you have the right index.
I have a table containing date, id, and value, with about 1000 id rows per date. I need to calculate the percentile rank of each row, by date. I am using the following code for percentile rank for a single date, but with over 10 years of daily data this is very inefficient to run date-by-date. Seems that it should be able to be formulated in MySQL but I've not been able to make it work.
Date ID Value
date1 01 -7.2
date1 02 0.6
date2 01 1.2
date2 02 3.8
SELECT c.id, c.value, ROUND( (
(#rank - rank) / #rank ) *100, 2) AS rank
FROM (
SELECT * , #prev := #curr , #curr := a.value,
#nxtRnk := #nxtRnk + 1,
#rank := IF( #prev = #curr , #rank , #nxtRnk ) AS rank
FROM (
SELECT id, value
FROM temp
WHERE date = '2013-06-28'
) AS a, (
SELECT #curr := NULL , #prev := NULL , #rank :=0, #nxtRnk :=0
) AS b
ORDER BY value DESC
) AS c
So basically I want to SELECT DISTINCT(date), and then for each date perform the above SELECT, which is preceeded by INSERT INTO table2( ... ) to write the results to table2.
Thanks for any help,
Hugh
I finally developed an acceptable solution by using a temporary table. Maybe not the optimum solution, but it works in about 5 sec on a million + record table.
My temporary table (t1) contains date and the count of rows for date.
The third select above is changed to
SELECT t1.date, t1.cnt, id, value FROM t1 LEFT JOIN temp ON(t1.date = temp.date)
Also, the calculations in the first SELECT above were changed to use c.cnt rather than #rank, and an #prevDate variable was created to reset the rank count on date changes.
Thanks to anyone who looked at this and tried to work up a solution.
I was trying to solve this for quite some time and then I found the following answer. Honestly brilliant. Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds).
SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY
field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 95th Per
FROM table_name;
As you can imagine just replace table_name and field_name with your table's and column's names.
For further information check Roland Bouman's original post
I have a table consisting of groups of, for example, five rows each. Each row in each group possesses a date value unique to that group.
What I want to do in my query, is go through the table, and increment a user variable (#count) when this date value changes. That's to say, #count should equal the group count, rather than the row count.
My current query looks like this, in case you're wondering:
SELECT #row := #row +1 AS rownum, date
FROM ( SELECT #row := 0 ) r, stats
Thanks a lot.
What about something like this?
SELECT
(CASE WHEN #date <> date THEN #row := #row +1 ELSE #row END) AS rownum,
#date:= date,
date
FROM ( SELECT #row := 0, #date := NOW() ) r, stats
You don't need a user variable to answer the query that you are doing. Is there a reason you want to use the user variable (for example, to emulate a ranking function?)
If not:
-- how many groups are there?
select count(distinct date) distinct_groups from table;
-- each group and count of rows in the group
select date, count(*) from table group by date;
-- if you want the Nth row from each group, assuming you have an auto_increment called id:
select *
from table
join ( select date, max(id) id
from table
group by date
) sq
on table.id = sq.id