order by lower confidence bound in mysql - mysql

I have data in a MySQL database that looks something like this:
name |score
----------
alice|60
mary |55
...
A name can appear many times in the list, but can also appear just once. What I would like is to order the list based on the lower bound of a 95% confidence interval for the name. I tried the following:
SELECT name, count(*) as count_n, stddev_samp(score) as stdv, avg(score) as mean
FROM `my.table`
GROUP BY name
ORDER BY avg(score)-1.96*std(score)/sqrt(count(*)) desc
This produces an output that is ok. Ideally though, I would like to vary the value 1.96, since this should depend on the value of count_n for that name. In fact, it should be a value based on the t-distribution for count_n-1 degrees of freedom. Are there MySQL functions that can do this for me?
I have seen the following answer which looks good but doesn't vary the value as I wold like.

I solved my problem by creating a sepearate table 'tdistribution' with the following structure:
dof | tvalue
------------
1 | -12.706
2 | -4.3026
It contains the degree of freedom and the asscociated t value. Then this table can be joined with the original styled query.
SELECT table2.name,
round(table2.mean-abs(tdistribution.tvalue*table2.stdv/sqrt(table2.nn)),2) AS LCB,
round(table2.mean+abs(tdistribution.tvalue*table2.stdv/sqrt(table2.nn)),2) AS UCB
FROM
(SELECT table1.name, count(table1.name) AS nn, avg(table1.score) AS mean, stddev_samp(table1.score) AS stdv
FROM
(SELECT name, score FROM my.table) AS table1
GROUP BY name
) AS table2
LEFT JOIN tdistribution
ON table2.nn-1=tdistribution.dof
WHERE nn>1
ORDER BY LCB DESC
It seems to work!

Related

MySQL sort data by specific data entries

I'm not too sure if the title made much sense but I will explain in more detail.
So I have a table named 'members' which has a list of ranks (Col, Maj, Cpt) under the field name 'rank'
I would like to order the data by the rank name going from highest rank (Col) to lowest rank (Rct).
I will include a screenshot of my table just in case I don't make sense.
Table screenshot
I think Alex K.'s solution is the best. You're looking for something like
select * from members order by rank desc; This will return them in descending order alphabetically. If you defined a rank table like Alex suggests you could do a:
select t.*
from members t join rank_table rt on t.rankname = rt.rankname
order by rt.rankvalue desc;
In this solution the rank_table looks like:
Col | 2
Maj | 1
Cap | 0

count rows where date is equal but separated by name

I think it will be easiest to start with the table I have and the result I am aiming for.
Name | Date
A | 03/01/2012
A | 03/01/2012
B | 02/01/2012
A | 02/01/2012
B | 02/01/2012
A | 02/01/2012
B | 01/01/2012
B | 01/01/2012
A | 01/01/2012
I want the result of my query to be:
Name | 01/01/2012 | 02/01/2012 | 03/01/2012
A | 1 | 2 | 2
B | 2 | 2 | 0
So basically I want to count the number of rows that have the same date, but for each individual name. So a simple group by of dates won't do because it would merge the names together. And then I want to output a table that shows the counts for each individual date using php.
I've seen answers suggest something like this:
SELECT
NAME,
SUM(CASE WHEN GRADE = 1 THEN 1 ELSE 0 END) AS GRADE1,
SUM(CASE WHEN GRADE = 2 THEN 1 ELSE 0 END) AS GRADE2,
SUM(CASE WHEN GRADE = 3 THEN 1 ELSE 0 END) AS GRADE3
FROM Rodzaj
GROUP BY NAME
so I imagine there would be a way for me to tweak that but I was wondering if there is another way, or is that the most efficient?
I was perhaps thinking if the while loop were to output just one specific name and date each time along with the count, so the first result would be A,01/01/2012,1 then the next A,02/01/2012,2 - A,03/01/2012,3 - B,01/01/2012,2 etc. then perhaps that would be doable through a different technique but not sure if something like that is possible and if it would be efficient.
So I'm basically looking to see if anyone has any ideas that are a bit outside the box for this and how they would compare.
I hope I explained everything well enough and thanks in advance for any help.
You have to include two columns in your GROUP BY:
SELECT name, COUNT(*) AS count
FROM your_table
GROUP BY name, date
This will get the counts of each name -> date combination in row-format. Since you also wanted to include a 0 count if the name didn't have any rows on a certain date, you can use:
SELECT a.name,
b.date,
COUNT(c.name) AS date_count
FROM (SELECT DISTINCT name FROM your_table) a
CROSS JOIN (SELECT DISTINCT date FROM your_table) b
LEFT JOIN your_table c ON a.name = c.name AND
b.date = c.date
GROUP BY a.name,
b.date
SQLFiddle Demo
You're asking for a "pivot". Basically, it is what it is. The real problem with a pivot is that the column names must adapt to the data, which is impossible to do with SQL alone.
Here's how you do it:
SELECT
Name,
SUM(`Date` = '01/01/2012') AS `01/01/2012`,
SUM(`Date` = '02/01/2012') AS `02/01/2012`,
SUM(`Date` = '03/01/2012') AS `03/01/2012`
FROM mytable
GROUP BY Name
Note the cool way you can SUM() a condition in mysql, becasue in mysql true is 1 and false is 0, so summing a condition is equivalent to counting the number of times it's true.
It is not more efficient to use an inner group by first.
Just in case anyone is interested in what was the best method:
Zane's second suggestion was the slowest, I loaded in a third of the data I did for the other two and it took quite a while. Perhaps on smaller tables it would be more efficient, and although I am not working with a huge table roughly 28,000 rows was enough to create significant lag, with the between clause dropping the result to about 4000 rows.
Bohemian's answer gave me the least amount to code, I threw in a loop to create all the case statements and it worked with relative ease. The benefit of this method was the simplicity, besides creating the loop for the cases, the results come in without the need for any php tricks, just simple foreach to get all the columns. Recommended for those not confident with php.
However, I found Zane's first suggestion the quickest performing and despite the need for extra php coding it seems I will be sticking with this method. The disadvantage of this method is that it only gives the dates that actually have data, so creating a table with all the dates becomes a bit more complicated. What I did was create a variable that keeps track of what date it is supposed to be compared to the table column which is reset on each table row, when the result of the query is equal to that date it echoes the value otherwise it does a while loop echoing table cells with 0 until the dates do match. It also had to do a check to see if the 'Name' value is still the same and if not it would switch to the next row after filling in any missing cells with 0 to the end of that row. If anyone is interested in seeing the code you can message me.
Results of the two methods over 3 months of data (a column for each day so roughly 90 case statements) ~ 12,000 rows out of 28,000:Bohemian's Pivot - ~0.158s (highest seen ~0.36s)Zane's Double Group by - ~0.086s (highest seen ~0.15s)

Selecting most recent as part of group by (or other solution ...)

I've got a table where the columns that matter look like this:
username
source
description
My goal is to get the 10 most recent records where a user/source combination is unique. From the following data:
1 katie facebook loved it!
2 katie facebook it could have been better.
3 tom twitter less then 140
4 katie twitter Wowzers!
The query should return records 2,3 and 4 (assume higher IDs are more recent - the actual table uses a timestamp column).
My current solution 'works' but requires 1 select to generate the 10 records, then 1 select to get the proper description per row (so 11 selects to generate 10 records) ... I have to imagine there's a better way to go. That solution is:
SELECT max(id) as MAX_ID, username, source, topic
FROM events
GROUP BY source, username
ORDER BY MAX_ID desc;
It returns the proper ids, but the wrong descriptions so I can then select the proper descriptions by the record ID.
Untested, but you should be able to handle this with a join:
SELECT
fullEvent.id,
fullEvent.username,
fullEvent.source,
fullEvent.topic
FROM
events fullEvent JOIN
(
SELECT max(id) as MAX_ID, username, source
FROM events
GROUP BY source, username
) maxEvent ON maxEvent.MAX_ID = fullEvent.id
ORDER BY fullEvent.id desc;

MySQL: adding a position column

Given a table:
id | score | position
=====================
1 | 20 | 2
2 | 10 | 3
3 | 30 | 1
What query can I use to optimally set the position columns using the (not nullable) score column?
(I'm sure I've seen this before but I can't seem to get the correct keywords to find it!)
set #rank = 0;
update tbl a join (select score, #rank:=#rank+1 as rank from tbl group by score
order by score desc) b on a.score = b.score set a.position = b.rank;
to update the position in one fell swoop that would do the trick. equal scores get equal position
SELECT * FROM your_table ORDER BY score;
Your position column appears to be redundant, although it is hard to tell from the information given in the question. The functionality you appear to want is accomplished using database row ordering, seen in the above example with the 'ORDER BY' expression. Even if you would want a position column for some good reason, remember that most likely there exists an index for the score column anyway, which would in most cases be doing exactly the same thing that a position column would do. Either that, or I completely misunderstand your question.
Calculate it in the language which you are using. Thus your solution will be:
more portable
more readable
equivalently efficient

In MySQL, can I have a table returning the ten last rated games by rating?

The actual question is a little more complex than that, so here goes.
I have a website which reviews games. Ratings/reviews are posted for each game, and so I have a MySQL database to handle it all.
Thing is, I'd really like a page that showed what score (out of 10) meant what, and to illustrate it would have the game that was last reviewed as an example. I can always do it without, but this would be cooler.
So the query should return something like this (but running from 10 to 0):
|---------------*----------------*-----------------*-----------------|
* game.gameName | game.gameImage | review.ourScore | review.postedOn *
|---------------*----------------*-----------------*-----------------|
| Top Game | img | 10 | (unix timestamp)|
| NearlyTop Game| img | 9 | (unix timestamp)|
| Great Game | img | 8 | (unix timestamp)|
|---------------*----------------*-----------------*-----------------|
The information is in two tables, game and review. I think you'd use MAX() to find out the last timestamp and corresponding game information, but as far as complex queries go, I'm in way over my head.
Of course this could be done with 10 simple SELECTs but I'm sure there must be a way to do this in one query.
Thanks for any help.
Here is an ugly solution I found:
This query simply gets the IDs and scores of the reviews that you want to look at. I have included it so that you can understand what the trick is, without getting distracted by other stuff:
SELECT * FROM
(SELECT reviewID, ourScore FROM review ORDER BY postedOn DESC) as `r`
GROUP BY ourScore
ORDER BY ourScore DESC;
This exploits MySQL's 'GROUP BY' behavior. When the grouping is done, if the source rows have different values for different columns, then the value of the topmost source row is used. So if you had rows in this order:
reviewId Score
1 3
0 3
2 3
Then after you group by score, the reviewId is 1 because that row was on the top:
reviewId Score
1 3
So we want to put the most recent review on the top before we do the group by. Since ORDERing is always dones after grouping, in a single SELECT statement, I had to make a subquery to accomplish this. Now we just dress up this query a little bit to get all the fields you wanted:
SELECT `r`.*, game.gameName, game.gameImage FROM
(SELECT reviewID, ourScore, postedOn, gameID FROM review ORDER BY **postedOn DESC**) as `r`
JOIN game ON `r`.gameID = game.gameID
GROUP BY ourScore
ORDER BY ourScore DESC;
That should work.
SELECT DISTINCT game.gameName, game.gameImage, review.ourScore FROM game
LEFT JOIN review
ON game.ID = review.gameID
ORDER BY review.postedOn
LIMIT 10
Or something like that, check out how to use the Distinct first, I'm not sure on the syntax, and you may have to tell the ORDER BY DESC or ASC depending on what you want.
Well..
SELECT game.gameName, game.gameImage, review.ourScore
FROM game
LEFT JOIN review ON game.gameID = review.gameID
GROUP BY review.ourScore DESC
LIMIT 10
returns a list of games grouped by each individual score. But this isn't what I want, I want the game that is last posted - this is why the timestamp is important. With that query, MySQL returns the first result it can find.
I think this would work:
select g.gameName, g.gameImage, r.ourScore, r.postedOn
from game g, review r
where g.gameId = r.gameId
and r.postedOn = (select max(sr.postedOn)
from review sr where sr.ourScore = r.ourScore)
group by r.ourScore
order by r.ourScore desc;
Edit: above SQL was corrected after David Grayson's comment. I think this query is pretty easy to understand but probably performs poorly compared with his solution.