Using MYSQL conditional statements and variables within a query - mysql

Hi just cant seem to construct the MYSQL query Im after.
Say I have a result of two columns: 1) browser name and 2) browser count.
Where it gets complicated is I want once 90% of the total count has been reached to rename all other browsers as others and mark the left over percentage accordingly.
I know I can get the total count as a variable before I begin the main statement:
SELECT #total := COUNT(id) FROM browser_table WHERE start LIKE "2010%";
Then I can group the results by browser:
SELECT browser, COUNT(id) AS visits
FROM browser_table
WHERE start LIKE "2010%"
GROUP BY browser
I know I need to whack in a case statement (and a counter variable) to sort the columns but not sure of how to implement the into the above query:
CASE
WHEN counter >= 0.9* #total THEN 'other'
ELSE browser
END AS browser;
Hope that makes sense? Thanks for your time....

Here's one approach...
You can calculate a running total based on this answer. Example:
SET #rt := 0;
SELECT
browser,
visits,
(#rt := #rt + visits) AS running_total
FROM
(
SELECT
browser,
COUNT(id) AS visits
FROM
browser_table
WHERE
start LIKE '2010%'
GROUP BY
browser
ORDER BY
visits DESC
) AS sq
;
Once you have that in place, you can build on that to create an 'Other' category:
SET #threshold := (SELECT COUNT(id) FROM browser_table WHERE start LIKE '2010%') * 0.90;
SET #rt := 0;
SELECT
browser,
SUM(visits) AS total_visits
FROM
(
SELECT
IF (#rt < #threshold, browser, 'Other') AS browser,
visits,
(#rt := #rt + visits) AS running_total
FROM
(
SELECT
browser,
COUNT(id) AS visits
FROM
browser_table
WHERE
start LIKE '2010%'
GROUP BY
browser
) AS sq1
ORDER BY
visits DESC
) AS sq2
GROUP BY
browser
ORDER BY
total_visits DESC
;

Related

MySQL minimum records to add up to 10000

I have a MySQL table with two columns: takenOn(datetime), and count(int). count contains the number of steps I have taken.
I'm trying to write a query that will tell me the time when I meet my goal of 10,000 steps every day.
So far, I have the following query:
SET #runningTotal=0;
SELECT
`Date`,
DATE_FORMAT(MIN(takenOn), '%l:%i %p') AS `Time`,
TotalCount
FROM
(SELECT
DATE(s.takenOn) AS `Date`,
s.takenOn,
s.`count`,
#runningTotal := #runningTotal + s.`count` AS TotalCount
FROM
(select * from step where DATE(takenOn) = '2016-10-29') s) temp
WHERE TotalCount >= 10000;
This works, but of course gives me the MIN(takenOn) for October 29th only. How can I expand this query to give me MIN(takenOn) for all possible dates in the table?
Thank you!
I am assuming that the steps you care about are all within one day. You are on the right track. Here is the code for multiple days:
SELECT `Date`, DATE_FORMAT(MIN(takenOn), '%l:%i %p') AS `Time`,
MIN(TotalCount)
FROM (SELECT DATE(s.takenOn) AS `Date`,
s.takenOn,
s.`count`,
(#runningTotal := if(#d = DATE(s.takenOn), #runningTotal + s.`count`,
if(#d := DATE(s.takeOn), s.`count`, s.`count`)
)
) AS TotalCount
FROM step s CROSS JOIN
(SELECT #runningTotal := 0, #d = '') params
ORDER BY takenOn
) s
WHERE TotalCount >= 10000
GROUP BY `Date`;
Note that all the variable assignments are in one expression. This is important because MySQL does not guarantee the order of evaluation of expressions in a SELECT. So, if you split the assignments across more than one expression, you are not guaranteed that the code will work.
You can use the Group By and Having clause to achieve this, refer to this example:
SELECT
sum(takenON),date
FROM
step
GROUP BY
day(date)
Having SUM(takenON)>150

Calculate medians for multiple columns in the same table in one query call

StackOverflow to the rescue!, I need to find the medians for five columns at once, in one query call.
The median calculations below work for single columns, but when combined, multiple uses of "rownum" throws the query off. How can I update this to work for multiple columns? THANK YOU. It's to create a web tool where nonprofits can compare their financial metrics to user-defined peer groups.
SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp
FROM (
SELECT #rownum := #rownum +1 AS `row_number` , d_wages.totalwages_pctoftotexp
FROM data_990_c3 d_wages, (
SELECT #rownum :=0
)r_wages
WHERE totalwages_pctoftotexp >0
ORDER BY d_wages.totalwages_pctoftotexp
) AS t1_wages, (
SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_wages
WHERE totalwages_pctoftotexp >0
) AS t2_wages
WHERE 1
AND t1_wages.row_number = FLOOR( total_rows /2 ) +1
--- [that was one median, below is another] ---
SELECT t1_solvent.solvent_days AS median_solvent_days
FROM (
SELECT #rownum := #rownum +1 AS `row_number` , d_solvent.solvent_days
FROM data_990_c3 d_solvent, (
SELECT #rownum :=0
)r_solvent
WHERE solvent_days >0
ORDER BY d_solvent.solvent_days
) AS t1_solvent, (
SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_solvent
WHERE solvent_days >0
) AS t2_solvent
WHERE 1
AND t1_solvent.row_number = FLOOR( total_rows /2 ) +1
[those are two - there are five in total I'll eventually need to find medians for at once]
This kind of thing is a big pain in the neck in MySQL. You might be wise to use the free Oracle Express Edition or postgreSQL if you're going to do tonnage of this statistical ranking work. They all have MEDIAN(value) aggregate functions that are either built-in or available as extensions. Here's a little sqlfiddle demonstrating that. http://sqlfiddle.com/#!4/53de8/6/0
But you didn't ask about that.
In MySQL, your basic problem is the scope of variables like #rownum. You also have a pivoting problem: that is, you need to turn rows of your query into columns.
Let's tackle the pivot problem first. What you're going to do is create a union of several big fat queries. For example:
SELECT 'median_wages' AS tag, wages AS value
FROM (big fat query making median wages) A
UNION
SELECT 'median_volunteer_hours' AS tag, hours AS value
FROM (big fat query making median volunteer hours) B
UNION
SELECT 'median_solvent_days' AS tag, days AS value
FROM (big fat query making median solvency days) C
So here are your results in a table of tag / value pairs. You can pivot that table like so, to get one row with a value in each column.
SELECT SUM( CASE tag WHEN 'median_wages' THEN value ELSE 0 END
) AS median_wages,
SELECT SUM( CASE tag WHEN 'median_volunteer_hours' THEN value ELSE 0 END
) AS median_volunteer_hours,
SELECT SUM( CASE tag WHEN 'median_solvent_days' THEN value ELSE 0 END
) AS median_solvent_days
FROM (
/* the above gigantic UNION query */
) Q
That's how you pivot up rows (from the UNION query in this case) to columns. Here's a tutorial on the topic. http://www.artfulsoftware.com/infotree/qrytip.php?id=523
Now we need to tackle the median-computing subqueries. The code in your question looks pretty good. I don't have your data so it's hard for me to evaluate it.
But you need to avoid reusing the #rownum variable. Call it #rownum1 in one of your queries, #rownum2 in the next one, and so on. Here's a dinky sql fiddle doing just one of these. http://sqlfiddle.com/#!2/2f770/1/0
Now let's build it up a bit, doing two different medians. Here's the fiddle http://sqlfiddle.com/#!2/2f770/2/0 and here's the UNION query. Notice the second half of the union query uses #rownum2 instead of #rownum.
Finally, here's the full query with the pivoting. http://sqlfiddle.com/#!2/2f770/13/0
SELECT SUM( CASE tag WHEN 'Boston' THEN value ELSE 0 END ) AS Boston,
SUM( CASE tag WHEN 'Bronx' THEN value ELSE 0 END ) AS Bronx
FROM (
SELECT 'Boston' AS tag, pop AS VALUE
FROM (
SELECT #rownum := #rownum +1 AS `row_number` , pop
FROM pops,
(SELECT #rownum :=0)r
WHERE pop >0 AND city = 'Boston'
ORDER BY pop
) AS ordered_rows,
(
SELECT COUNT( * ) AS total_rows
FROM pops
WHERE pop >0 AND city = 'Boston'
) AS rowcount
WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
UNION ALL
SELECT 'Bronx' AS tag, pop AS VALUE
FROM (
SELECT #rownum2 := #rownum2 +1 AS `row_number` , pop
FROM pops,
(SELECT #rownum2 :=0)r
WHERE pop >0 AND city = 'Bronx'
ORDER BY pop
) AS ordered_rows,
(
SELECT COUNT( * ) AS total_rows
FROM pops
WHERE pop >0 AND city = 'Bronx'
) AS rowcount
WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
) D
This is just two medians. You need five. I think it's easy to make the case that this median computation is absurdly difficult to do in MySQL in a single query.
Suppose you have a table with three columns like table(key, value1, value2).
this query gives you the median value of the two value columns for each key:
SELECT key,
((array_agg(value1 order by value1 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value1 order by value1 asc) )[ceiling( (count(*)+1)::float/2) ] )/2,
((array_agg(value2 order by value2 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value2 order by value2 asc) )[ceiling( (count(*)+1)::float/2) ] )/2
FROM table
GROUP BY key

Running count based on field

I'd like to get a running count for each user with the query below
Does anyone know how I can do this? If I remove where user = 1 it gives me an overall running count due to the grouping.
I'd like to get running counts for user 1 through N. I'd rather not run the query for each user individually as there are a few million.
SET #runtot:=0;
SELECT q1.t, q1.user, q1.c, (#runtot := #runtot + q1.c) AS rt
FROM (
SELECT
time AS t,
user,
COUNT(distinct `post`) AS c
FROM
interactions
WHERE user = 1
GROUP BY
user,time
ORDER BY
user
) AS q1
You can use this query:
SET #runtot:=0;
SET #last_user:=NULL;
SELECT
q1.t,
q1.user,
q1.c,
CASE WHEN #last_user=q1.user
THEN #runtot := #runtot + q1.c
ELSE #runtot:=q1.c END AS rt,
#last_user:=q1.user
FROM (
...
) AS q1
This query will keep last user into the #last_user variable, and whenever the user changes it will start the count again.

Getting latest rows in MySQL based on date (grouped by another column)

This type of question is asked every now and then. The queries provided works, but it affects performance.
I have tried the JOIN method:
SELECT *
FROM nbk_tabl
INNER JOIN (
SELECT ITEM_NO, MAX(REF_DATE) as LDATE
FROM nbk_tabl
GROUP BY ITEM_NO) nbk2
ON nbk_tabl.REF_DATE = nbk2.LDATE
AND nbk_tabl.ITEM_NO = nbk2.ITEM_NO
And the tuple one (way slower):
SELECT *
FROM nbk_tabl
WHERE REF_DATE IN (
SELECT MAX(REF_DATE)
FROM nbk_tabl
GROUP BY ITEM_NO
)
Is there any other performance friendly way of doing this?
EDIT: To be clear, I'm applying this to a table with thousands of rows.
Yes, there is a faster way.
select *
from nbk_table
order by ref_date desc
limit <n>
Where is the number of rows that you want to return.
Hold on. I see you are trying to do this for a particular item. You might try this:
select *
from nbk_table n
where ref_date = (select max(ref_date) from nbk_table n2 where n.item_no = n2.item_no)
It might optimize better than the "in" version.
Also in MySQL you can use user variables (Suppose nbk_tabl.Item_no<>0):
select *
from (
select nbk_tabl.*,
#i := if(#ITEM_NO = ITEM_NO, #i + 1, 1) as row_num,
#ITEM_NO := ITEM_NO as t_itemNo
from nbk_tabl,(select #i := 0, #ITEM_NO := 0) t
order by Item_no, REF_DATE DESC
) as x where x.row_num = 1;

Rank in MySQL table

I have a MySQL table called "MyTable" and it basically lists usernames and points (two columns, name and points). I want to say something like "what is joe1928's rank?", which of course is based off his points. How could I do this in MySQL without having to download all that data and sort it and determine the rank myself?
The person with the highest number of points would be ranked 1.
Try getting the number of people with a higher score than your user:
select count(*) from MyTable where score > (select score from MyTable where user = 'Joe');
That will return 0 for the top user.
This page seems to describe and solve your problem.
Notes from that page:
SET #rownum := 0;
SELECT rank, correct FROM (
SELECT #rownum := #rownum + 1 AS rank, correct, uid
FROM quiz_user ORDER BY correct DESC
) as result WHERE uid=xxxxxxxx
SELECT #r AS Rank
FROM MyTable u, (SELECT #r := 0)
WHERE (#r := #r + 1) * (u.Username = 'joe1928')
ORDER BY u.Score DESC
LIMIT 1
select * from [TABLENAME] where [USERNAME] = blah order by [POINTS] desc limit 1;
Based on the link posted by #Dave your query will look like something below:
select Rank,name from
(select #rownum:=#rownum+1 AS 'Rank', p.name
from calls p, (select #rownum:=0) r
order by p.points desc) as rankResults
where name = 'joe';
This is from another stack overflow page, seems to solve your problem.
SELECT uo.*,
(
SELECT COUNT(*)
FROM users ui
WHERE (ui.points, ui.id) >= (uo.points, uo.id)
) AS rank
FROM users uo
WHERE id = #id