Mysql ranking from two tables - mysql

So I have two tables, "participations" and "votes".
Every row in the table "votes" contains a "participation_id" to reference the participation for which the vote was cast.
Now, I want to be able to select the absolute ranking a participation has based on the number of votes it has.
Normally, this would be simple just using this simple query:
SELECT p.id, COUNT(v.id) as votes
FROM participations as p
JOIN votes as v on p.id = v.participation_id
GROUP BY v.participation_id
ORDER BY votes DESC;
BUT, I have to be able to add some WHERE-clauses in there somewhere. So if I do that, I'll just get a relative ranking (ie. its ranking relative to the filtered rowset)
Does anybody know if this is possible with just one query!? (ofcourse subqueries are allowed) I hope this question makes sense to anyone.

You need to use a variable, which is incremented every row, and select from your query's results, like this:
SET #rank := 0;
SELECT id, votes, rank
from (
SELECT id, votes, (#rank := #rank + 1) as rank
from (
SELECT p.id, COUNT(v.id) as votes
FROM participations as p
JOIN votes as v on p.id = v.participation_id
WHERE ... -- add your where clause, if any, here to eliminate completely from results
GROUP BY v.participation_id
ORDER BY votes DESC
) x
) y
-- Now join to apply filtering to ranked results, for example:
JOIN participations p1 on p1.id = y.id
where p1.date between '2011-06-01' and now(); -- just making up an example
and p1.gender = 'female'; -- for example
Some explanation:
The phrase (#rank := #rank + 1) increments the variable and returns the result of the increment, which has been given the alias rank.
The x at the end is the alias of your query's results and is required by the syntax (but any alias will do - I just chose x). It is necessary to use the inner query, because that is what provides the ordering - you can't add rank until that's been done.
Note:
Any where clause or other order by processing you want must happen in the inner query - the outer query is only for the rank processing - it takes the final query row set and adds a rank to it.

Related

Unwanted result with two "Join" in SQL

I'm currently stuck on a problem with my database. I have a table of film reviews, a table of positives and another one of negatives. These last ones are linked to the id of an review.
Here are the positive and negative tables:
I'd like to get this result:
But I have this one instead:
Here's my SQL code to get this result:
SELECT positives.libelle AS positive, negatives.libelle AS negative FROM reviews LEFT JOIN positives ON positives.review_id = reviews.id LEFT JOIN negatives ON negatives.review_id = reviews.id WHERE reviews.id = 1
The result that you want is not really in a relational format -- because the column values on a given row really have nothing to do with each other.
MySQL does not support full join, so my recommendation is union all with row_number() to enumerate the rows and group by to bring them together:
SELECT MAX(positive) as positive), MAX(negative) as negative)
FROM ((SELECT p.review_id, p.libelle as positive, NULL as negative,
ROW_NUMBER() OVER (PARTITION BY p.review_id ORDER BY id) as seqnum
FROM positives p
WHERE p.review_id = 1
) UNION ALL
(SELECT n.review_id, NULL, n.libelle,
ROW_NUMBER() OVER (PARTITION BY n.review_id ORDER BY id) as seqnum
FROM negatives n
WHERE n.review_id = 1
)
) pn
GROUP BY review_id, id
ORDER BY review_id, id;
Note this will return no rows if there are no reviews (positive and negative). You can incorporate a left join if that really is a consideration.
If you don't need any information about reviewer, why you join tables with reviewer? just limit tables by review_id=1.
The following query doesn't cover your need totally, however, maybe be helpful for your problem. Consider that Union is much more efficient that Join. If you can use Union, avoid using Join.
(SELECT positives.libelle AS positive,NULL AS negative FROM positives WHERE review_id=1)
UNION
(SELECT NULL,negatives.libelle FROM negatives WHERE review_id=1)

Creating a ranking column from a queried column in MySQL

I want to build a ranking column into my query--I've found some similar cases on Stack but this one's a little different and I can't quite make it work. I have a single table, EnrollmentX, with two columns, a unique StudentID and a GroupId (for sake of argument, groups 1:3). I need to simultaneously count the number of students in each of these three groups and then rank the the groups by number of students. I've made it as far as the counting:
SELECT
EnrollmentX.GroupId,
COUNT(EnrollmentX.StudentId) AS StudentCnt
FROM EnrollmentX
GROUP BY
EnrollmentX.GroupId
This puts out two columns, one for GroupId, 1:3, and one for StudentCnt, with the correct number of students in each group. What I can't work out is how to use that StudentCnt column after building it to create a third ranking column.
IF you are on mysql 8 there are more readable options.Change the order in the inner query if you want a different rank.
SELECT GroupId, StudentCnt, #Rank:=#Rank + 1 AS rank FROM
(SELECT EnrollmentX.GroupId,
COUNT(EnrollmentX.StudentId) AS StudentCnt
FROM EnrollmentX
GROUP BY
EnrollmentX.GroupId
ORDER BY StudentCnt DESC
) x CROSS JOIN (SELECT #Rank:=0) y
Try this query:
select ex.GroupId, ex.StudentId, exg.cnt from EnrollmentX ex
left join (
SELECT GroupId, COUNT(*) cnt
FROM EnrollmentX
GROUP BY GroupId
) exg on ex.GroupId = exg.GroupId
order by exg.cnt
try it..
SET #Rank = 0;
SELECT #Rank:=#Rank + 1 rank, EnrollmentX.GroupId,
COUNT(EnrollmentX.StudentId) StudentCnt
FROM EnrollmentX
GROUP BY
EnrollmentX.GroupId
ORDER BY StudentCnt DESC;

Joining on "greater than" returning more than one row for left table

I have a query.
SELECT * FROM users LEFT JOIN ranks ON ranks.minPosts <= users.postCount
This returns a row every time it is matched. By using a GROUP BY users.id I get each row as a individual id.
However, when they group I only get the first row. I would instead like the row with the highest value of ranks.minPosts
Is there a way to do this, also, would it be faster (less resources) to just use two different queries?
Assuming there is only one column in ranks that you want, you can do this using a correlated subquery:
SELECT u.*,
(select r.minPosts
from ranks r
where r.minPosts <= u.PostCount
order by minPosts desc
limit 1
) as minPosts
FROM users u;
If you need the entire row from ranks, then join it back in:
SELECT ur.*, r.*
FROM (SELECT u.*,
(select r.minPosts
from ranks r
where r.minPosts <= u.PostCount
order by minPosts desc
limit 1
) as minPosts
FROM users u
) ur join
ranks r
on ur.minPosts = r.minPosts;
(The * is for convenience; you should list out the columns you want.)
Because you're using mysql, this will work:
SELECT * FROM (
SELECT *, users.id user_id
FROM users
LEFT JOIN ranks ON ranks.minPosts <= users.postCount
ORDER BY ranks.minPosts DESC
) x
GROUP BY user_id
Mysql always returns the first row encountered for each unique group, so if you first order the data, then use the non-standard grouping behaviour, you'll get the row you want.
Disclaimer:
Although this works reliably in practice, the mysql documentation says not to rely on it. If you use this convenient approach (which will reliably pass any test you can write), you should consider that it is not recommended by mysql and that later releases of mysql may not continue behave in this way.
What we'd really like to do would be to order the rows by ranks.minPosts before the group by. Unfortunately MySQL doesn't support that without using a subquery of some form.
If the ranks are already ordered by their ids then you can extract the id by selecting MAX(ranks.id), and if they're not, you can still get the highest ranks.minPosts by selecting MAX(ranks.minPosts). However, it would be nice to be able to get the entire record. I guess you're left with the subquery solution, which is as follows:
SELECT <fields> FROM users LEFT JOIN
(SELECT * FROM ranks ORDER BY minPosts DESC) as r
ON r.minPosts <= users.postCount GROUP BY users.id

Selecting a Good, Better, Best suggestion from Best Selling Products in MySQL Database

I have two tables:
Products and SalesRecords
From this I can create a simple SQL statement to get me the top 100 best selling products
SELECT p.item, p.price, COUNT(s.itemId)
FROM
SalesRecords s
LEFT JOIN Products p ON p.id=s.itemId
GROUP BY p.id
ORDER BY COUNT(s.itemId) DESC
LIMIT 100
(Incidentally, I am selecting from SalesRecords and then JOINing Products as I found it to be much faster than the other way around - I'd like to know why, but that's not the primary question!)
Hopefully, the Database Schema is clear enough from that to know what is going on. We have a ID column in Products which relates to the itemId column in SalesRecords, join the tables on these relations and then sort by how many times each product row appears in SalesRecords.
What I want to be able to do now, is re-order that list by price and split it into three sections, then randomly return two rows from each of the three sections.
The intended result being:
Two Items from the top third of prices
Two Items from the middle third
Two Items from the bottom third
Thus returning a Good, Better, Best suggestion of products from the top sellers.
(In practice there'll be other WHERE clasues etc to make this more relevant, but the basis of the query is what I need)
Is this possible with SQL? (MySQL)
I am not sure that i get what you want correctly, but.
This is your sql with where clause rather then left join (i hope you dont have nulls)
SELECT p.item, p.price, COUNT(s.itemId)
FROM SalesRecords AS s,Products AS p
WHERE p.id = s.itemid
GROUP BY p.id
ORDER BY COUNT(s.itemId) DESC
LIMIT 100
So, above query returns a table which you need.But you want it to be ordered by prices.
SELECT p.item, p.price
FROM ( "put your above query here" )
ORDER BY p.price DESC
This will sort your query by prices.
I think we should add row numbers for your selection operation.
SET #rownumber := 1;
SELECT #rownumber := #rownumber + 1 AS rownum , p.item, p.price
FROM ( "put above query ordered by price" );
This will return a table with 100 rows of best items ordered by prices and the indexes of this list. Now i guess, you want to select first 2 best (index with 1,2) , 2 from middle ( index with 51,52) and 2 from bottom (indexes with 99,100)
SET #rownumber := 1;
SELECT #rownumber := #rownumber + 1 AS rownum , p.item, p.price
FROM ( "put above query ordered by price" );
WHERE rownum = 1 OR rownum = 2 OR rownum = 50 OR row num = 51 ....
This queries are seem highly inefficient and might cause table crashes. You might wanna create a WIEV in you database which is a derived virtual table. This new table will your best 100 items ordered by prices with row indexes (and updated frequently). if you use select itemid where rowindex is 1,2,51,... in that VIEW, it would be much safer. Of course you will need this only if when your system overload the db in future.
Also VIEWs can be very handy to use when you need advanced strict mysql queries.
You explicitly want random rows from the top third, middle, and bottom third. What happens if you don't have 100 items?
The place to start is by counting the number of items actually returns and also enumerating them. You can do this with variables. After the first pass, the variable #rn will contain the number of returned values, so this query takes advantage of that. The following assigned the price group to each row:
SELECT ps.*, floor((rn*3 - 1)/ #rn) as pricegroup
FROM (SELECT p.item, p.price, COUNT(s.itemId) as cnt, #rn := #rn + 1 as rn
FROM SalesRecords s LEFT JOIN
Products p
ON p.id=s.itemId CROSS JOIN
(select #rn := 0) const
GROUP BY p.id
ORDER BY COUNT(s.itemId) DESC
LIMIT 100
) ps;
Next, you want to get two random ids from each one. This is a pain in MySQL. Here is a method where the ids are concatenated:
SELECT floor((rn*3 - 1)/ #rn) as pricegroup,
substring_index(group_concat(p.item order by rand()), ',', 2) as randitems
FROM (SELECT p.item, p.price, COUNT(s.itemId) as cnt, #rn := #rn + 1 as rn
FROM SalesRecords s LEFT JOIN
Products p
ON p.id=s.itemId CROSS JOIN
(select #rn := 0) const
GROUP BY p.id
ORDER BY COUNT(s.itemId) DESC
LIMIT 100
) ps
GROUP BY floor((rn*3 - 1)/ #rn);
Finally, we can join back to get the fuller information:
SELECT p.*, pricegroup
FROM (SELECT floor((rn*3 - 1)/ #rn) as pricegroup,
substring_index(group_concat(p.item order by rand()), ',', 2) as randitems
FROM (SELECT p.item, p.price, COUNT(s.itemId) as cnt, #rn := #rn + 1 as rn
FROM SalesRecords s LEFT JOIN
Products p
ON p.id=s.itemId CROSS JOIN
(select #rn := 0) const
GROUP BY p.id
ORDER BY COUNT(s.itemId) DESC
LIMIT 100
) ps
GROUP BY floor((rn*3 - 1)/ #rn)
) pg join
products p
on find_in_set(p.item, pg.randitems);
These operations would not be recommended on large data. However, you are limited the data to 100 rows, so the performance should be pretty reasonable.

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.