Unwanted result with two "Join" in SQL - mysql

I'm currently stuck on a problem with my database. I have a table of film reviews, a table of positives and another one of negatives. These last ones are linked to the id of an review.
Here are the positive and negative tables:
I'd like to get this result:
But I have this one instead:
Here's my SQL code to get this result:
SELECT positives.libelle AS positive, negatives.libelle AS negative FROM reviews LEFT JOIN positives ON positives.review_id = reviews.id LEFT JOIN negatives ON negatives.review_id = reviews.id WHERE reviews.id = 1

The result that you want is not really in a relational format -- because the column values on a given row really have nothing to do with each other.
MySQL does not support full join, so my recommendation is union all with row_number() to enumerate the rows and group by to bring them together:
SELECT MAX(positive) as positive), MAX(negative) as negative)
FROM ((SELECT p.review_id, p.libelle as positive, NULL as negative,
ROW_NUMBER() OVER (PARTITION BY p.review_id ORDER BY id) as seqnum
FROM positives p
WHERE p.review_id = 1
) UNION ALL
(SELECT n.review_id, NULL, n.libelle,
ROW_NUMBER() OVER (PARTITION BY n.review_id ORDER BY id) as seqnum
FROM negatives n
WHERE n.review_id = 1
)
) pn
GROUP BY review_id, id
ORDER BY review_id, id;
Note this will return no rows if there are no reviews (positive and negative). You can incorporate a left join if that really is a consideration.

If you don't need any information about reviewer, why you join tables with reviewer? just limit tables by review_id=1.
The following query doesn't cover your need totally, however, maybe be helpful for your problem. Consider that Union is much more efficient that Join. If you can use Union, avoid using Join.
(SELECT positives.libelle AS positive,NULL AS negative FROM positives WHERE review_id=1)
UNION
(SELECT NULL,negatives.libelle FROM negatives WHERE review_id=1)

Related

MySQL Spring complicated query - ways to order and query efficiency

I run this complicated query on Spring JPA Repository.
My goal is to get all info from the site table, ordering it by events severity on each site.
This is my query:
SELECT alls.* FROM sites AS alls JOIN
(
SELECT distinct ets.id FROM
(
SELECT s.id, et.`type`, et.severity_level, COUNT(et.`type`) FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
ORDER BY et.severity_level, COUNT(et.`type`) DESC
) AS ets
) as etsd ON alls.id = etsd.id
The second select (the one with "distinct") returns site_ids ordered correctly by severity.
Note that there are different event_types + severity in each site, and I use pagination on the answer, so I need the distinct.
The problem is - the main select doesn't keep this order.
Is there any way to keep the order in one complicated query?
Another related question - one of my ideas was making two queries:
The "select distinct" query that will return me the order --> saved in a list "order list"
The main "sites" query (that becomes very simple) with "where id in {"order list"}
Order the second query in code by "order list".
I use the query every 10 seconds, so it is very sensitive on performance.
What seems to be faster in this case - original complicated query or those 2?
Any insight will be appreciated.
Tnx a lot.
A quirk of SQL's declarative set-oriented syntax for us procedural programmers: ORDER by clauses in subqueries are not carried through to the outer query, except sometimes by accident. If you want ordering at any query level, you must specify it at that level or you will get unpredictable results. The query optimizers are usually smart enough to avoid wasting sort operations.
Your requirement: give at most one sites row for each sites.id value, ordered by the worst event. Worst: lowest event severity, and if there are more than one event with lowest severity, the largest count.
Use this sort of thing to get the "worst" for each id, in place of DISTINCT.
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
This gives at most one row per sites.id value. Then your outer query is
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
Putting it all together:
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
SELECT s.id, et.severity_level, COUNT(et.`type`) num
FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
An index on users.user_id will help performance for these single-user queries.
If you still have performance trouble, please read this and ask another question.

How to select comment count, Sum of votes, and whether active user has voted

Im having trouble structuring my MySQL query to return an accurate comment count, sum of votes, and the active users vote.
My tables are
wall_posts ( id, message, username, etc )
comments ( id, wall_id, username, text, etc )
votes ( id, wall_id, vote (+1 or -1), username )
My query looks like this
SELECT
wall_posts.*,
COUNT( comments.wall_id ) AS comment_count,
COALESCE( SUM( v1.vote ), 0 ) AS vote_tally,
v2.vote
FROM
wall_posts
LEFT JOIN comments ON wall_posts.id = comments.wall_id
LEFT JOIN votes v1 ON wall_posts.id = v1.wall_id
LEFT JOIN votes v2 ON wall_posts.id = v2.wall_id AND v2.username=:username
WHERE
symbol =: symbol
GROUP BY
wall_posts.id
ORDER BY
date DESC
LIMIT 15
It works for always returning the correct value for the specific active users vote (+1 or -1) or null if hasnt voted. If there are no comments on an item, the total vote sum is correct. If there are any comments, the vote sum will always be equal to the comment count, possibly with a negative sign if there are down votes but always equal to the amount of comments.
I think its obviously the way ive connected my tables but i just cant figure out why its copying the comment count, 1000000 points to someone who can explain this to me :)
You need to perform the aggregate operations in subqueries. Right now instead you're JOINing all of the tables (pre-aggregation) together. If you remove the aggregates (and the GROUP BY) you'll see the large mass of data which doesn't really mean anything.
Instead, try this (note I'm using a VIEW):
CREATE VIEW walls_posts_stats AS
SELECT
wall_posts.id,
COALESCE( comments_stats.comment_count, 0 ) AS comment_count,
COALESCE( votes_stats.vote_tally, 0 ) AS vote_tally
FROM
wall_posts
LEFT OUTER JOIN
(
SELECT
wall_id,
COUNT(*) AS comment_count
FROM
comments
GROUP BY
wall_id
) AS comments_stats ON wall_posts.id = comments_stats.wall_id
LEFT OUTER JOIN
(
SELECT
wall_id,
SUM( vote ) AS vote_tally
FROM
votes
GROUP BY
wall_id
) AS votes_stats ON wall_posts.id = votes_stats.wall_id
Then you can query it JOINed with your original wall data:
SELECT
wall_posts.*, -- note: avoid the use of * in production queries
stats.comment_count,
stats.vote_tally,
user_votes.vote
FROM
wall_posts
INNER JOIN walls_posts_stats AS stats ON wall_posts.id = stats.id
LEFT OUTER JOIN
(
SELECT
wall_id,
vote
FROM
votes
WHERE
username = :username
) AS user_votes ON wall_posts.id = user_votes.wall_id
ORDER BY
date DESC
LIMIT 15
Hypothetically you could combine it into a single large query (basically copy+paste the VIEW body into the INNER JOIN walls_posts_stats clause) but I feel that would introduce maintainability issues.
While MySQL does support views, it does not support parameterized views (aka composable table-valued functions; stored procedures are not composable) so that's why the user_votes subquery isn't in the walls_posts_stats VIEW.

Joining on "greater than" returning more than one row for left table

I have a query.
SELECT * FROM users LEFT JOIN ranks ON ranks.minPosts <= users.postCount
This returns a row every time it is matched. By using a GROUP BY users.id I get each row as a individual id.
However, when they group I only get the first row. I would instead like the row with the highest value of ranks.minPosts
Is there a way to do this, also, would it be faster (less resources) to just use two different queries?
Assuming there is only one column in ranks that you want, you can do this using a correlated subquery:
SELECT u.*,
(select r.minPosts
from ranks r
where r.minPosts <= u.PostCount
order by minPosts desc
limit 1
) as minPosts
FROM users u;
If you need the entire row from ranks, then join it back in:
SELECT ur.*, r.*
FROM (SELECT u.*,
(select r.minPosts
from ranks r
where r.minPosts <= u.PostCount
order by minPosts desc
limit 1
) as minPosts
FROM users u
) ur join
ranks r
on ur.minPosts = r.minPosts;
(The * is for convenience; you should list out the columns you want.)
Because you're using mysql, this will work:
SELECT * FROM (
SELECT *, users.id user_id
FROM users
LEFT JOIN ranks ON ranks.minPosts <= users.postCount
ORDER BY ranks.minPosts DESC
) x
GROUP BY user_id
Mysql always returns the first row encountered for each unique group, so if you first order the data, then use the non-standard grouping behaviour, you'll get the row you want.
Disclaimer:
Although this works reliably in practice, the mysql documentation says not to rely on it. If you use this convenient approach (which will reliably pass any test you can write), you should consider that it is not recommended by mysql and that later releases of mysql may not continue behave in this way.
What we'd really like to do would be to order the rows by ranks.minPosts before the group by. Unfortunately MySQL doesn't support that without using a subquery of some form.
If the ranks are already ordered by their ids then you can extract the id by selecting MAX(ranks.id), and if they're not, you can still get the highest ranks.minPosts by selecting MAX(ranks.minPosts). However, it would be nice to be able to get the entire record. I guess you're left with the subquery solution, which is as follows:
SELECT <fields> FROM users LEFT JOIN
(SELECT * FROM ranks ORDER BY minPosts DESC) as r
ON r.minPosts <= users.postCount GROUP BY users.id

How to select n% random rows with specific condition in SQL?

I have two tables.
Table1 has two columns: brand and review_counter.
Table2 also has two columns: brand and review.
There are several reviews for each brand. Is there any way in SQL to randomly select around 10% of reviews for each brand, without using "top n" command?
For example for 'Sony' there are 2,005 reviews, and I need to select 10% of them, 200 reviews.
Thank you in advance.
Possible method using user counters. Compared to a solution similar to that above this is likely to bring back a more accurate 10% of reviews per brand, but it also likely to be slower (neither solution will be fast as both rely on using RAND() on every row on the tables).
This gets all the rows, ordered by brand and then RAND(). It uses that as a sub query and adds a sequence number, resetting back to 1 for the first record for each brand. Then that in turn is used as the source for a query which eliminates records where the generated sequence number is <= to a tenth of the reviews for that brand.
SELECT sub1.brand, sub1.review
FROM
(
SELECT sub0.brand, sub0.reviews_wanted, sub0.review, #cnt:=IF(#brand = brand, #cnt+1, 1) AS cnt, #brand := brand
FROM
(
SELECT Table1.brand, (Table1.review_counter * 0.1) AS reviews_wanted, Table2.review
FROM Table1
INNER JOIN Table2
ON Table1.brand = Table2.brand
ORDER BY Table1.brand, RAND()
) sub0
CROSS JOIN (SELECT #cnt:=0, #brand:='') sub2
) sub1
WHERE cnt <= sub1.reviews_wanted
EDIT.
This might be a bit more memory efficient (although probably slower).
This has a sub query that gets the unique id of all the reviews for a brand in a random order, along with a count that is 1 tenth of the number of reviews for the brand. It then uses the count with SUBSTRING_INDEX to get the ids of the first random 10%, and joins that using FIND_IN_SET with the reviews table.
SELECT sub0.brand, Table2.review
FROM
(
SELECT Table1.brand, CEIL(Table1.review_counter * 0.1) AS reviews_wanted, GROUP_CONCAT(Table2.id ORDER BY RAND()) AS id
FROM Table1
INNER JOIN Table2
ON Table1.brand = Table2.brand
GROUP BY Table1.brand, reviews_wanted
) sub0
INNER JOIN Table2
ON FIND_IN_SET(Table2.id, SUBSTRING_INDEX(sub0.id, ',', reviews_wanted))
You might be able to do something a bit more efficient using one of the solutions here:-
How can i optimize MySQL's ORDER BY RAND() function?
RAND() generates random values between 0 and 1. Why don't you try this?
UPDATED2
SELECT review
FROM
(
SELECT review, (review_counter * RAND()) / review_counter AS rand
FROM Table1 INNER JOIN Table2 ON Table1.brand = Table2.brand
) t
WHERE rand < 0.1

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.