MySQL alternative to nested queries? - mysql

I have read that using nested queries is not a good idea, It was said that nested queries slow down mysql a great lot and stuff like that, so I figured I should not use nested queries, but what is really an alternative to that?
For example I have a comments rating system which helps bring top-rated comments to the top and it goes in 2 tables:
comments which stores comments
comment_ratings which stores the comment ID and the person who has rated it.
Note: there's only positive ratings so if a record exists in the comment_ratings table its +1.
So now if I wanted to pick up comments for some stuff I'd go like
SELECT stuff, (SELECT COUNT(*) FROM comment_ratings s WHERE s.id = c.id) as votes
FROM comments c
ORDER BY votes DESC
How would I do this without using a nested query?

Whether nested queries are good or bad depends on the situation. In your particular example, if you have an index on comment_ratings(id), then there is probably no issue. Maybe that should be comment_ratings(comment_id) -- the naming convention is poor for these tables.
You could replace this with an aggregation query:
select c.*, count(cr.id) as votes
from comments c left join
comment_ratings cr
on c.id = cr.id
group by c.id
order by votes desc;
However, because of the way that MySQL implements group by, this might perform worse than your original query. I prefer the group by. To me, it more clearly describes what you want and most other database engines will optimize it well.

select stuff, count(*) as votes
from comments c, comment_ratings cr
where c.id = cr.id
group by stuff
order by votes desc;
and as gordon mentioned, to not forget the comments with no rating.. go for left join:
select stuff, count(cr.id) as votes
from comments c left join
comment_ratings cr on c.id = cr.id
group by stuff
order by votes desc;
http://sqlfiddle.com/#!2/79e54/2

Related

Why does COUNT(*) does the same as COUNT(column) in this query?

I am finishing a SQL course and they have a query example that I don't quite understand.
I have
these tables
and I need to get how many 'etiquetas' (tags in Spanish), are in each post, so they have this solution:
SELECT posts.titulo, COUNT(*) num_etiquetas
FROM posts
INNER JOIN posts_etiquetas ON posts.id = posts_etiquetas.post_id
INNER JOIN etiquetas ON etiquetas.id = posts_etiquetas.etiqueta_id
GROUP BY posts.id
ORDER BY num_etiquetas DESC;
I have been trying to understand this query and two questions came up:
Why COUNT(asterisk) does the same as COUNT(etiquetas.nombre)? For me only the latter makes sense, I don't quite understand why COUNT(asterisk) works in the given solution, isn't COUNT(*) supposed to count the total number of rows? maybe the issue is that I don't really understand how GROUP BY really works.
Why does deleting this line doesn't change the result? What is its use in the original solution?
INNER JOIN etiquetas ON etiquetas.id = posts_etiquetas.etiqueta_id

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

Multiple joins on the same table with counting in one query

I have an elementary question about SQL query with joining the same table twice. It sounds very simple, but I have some troubles with it. I hope, anyone can help me with this issue :)
I have two little tables: "peoples" (columns: id, name, ...) and "likes" (id, who, whom). People may set the "likes" to each other. The relationship is many to many.
I want get the table with peoples likes: count of received "likes", delivered and count of mutual likes.
All is correctly, when I use only one join. But for two joins (or more) MySQL combine all rows (as expected) and I get wrong values in counts. I don't know, how I must use count/sum/group-by operators in this case:( I would like to do this without subqueries in one query.
I used a query like this:
SELECT *, count(l1.whom), count(l2.whom)
FROM people p
LEFT JOIN likes l1 ON l1.who = p.id
LEFT JOIN likes l2 ON l2.whom = p.id
GROUP BY p.id;
SELECT p.id, name,
count(lwho.who) delivered_likes,
count(lwhom.whom) received_likes,
count(lmut.who) mutual_likes
FROM people AS p
LEFT JOIN likes AS lwho ON p.id = lwho.who
LEFT JOIN likes AS lwhom ON lwhom.id = lwho.id
LEFT JOIN likes AS lmut ON lwhom.who = lmut.whom AND lwhom.whom = lmut.who
GROUP BY p.id;
But it's calculated the counts of likes incorrect.
It's issue just for training and performance is not important, but I guess, that three joins in my last query is too much. Can I do it using 2 joins?
Thanks in advance for help.
I surmise that there is a 1:N relationship between people and likes.
One problem with your second query, as far as I can tell, is that the lwhom correlation of likes is joined to lwho via id=id. Basically lwhom is lwho. I'd recommend changing the ON clause for this correlation from lwhom.id = lwho.id to p.id = lwhom.whom.
The counts will still be affected by the JOINs, however. Supposing that you have an ID column in the likes table, though, you could then have each COUNT tally the distinct Like IDs per person – if not, consider just using COUNT(DISTINCT correlation.*) instead.
Digressions aside, the following should hopefully work:
SELECT p.id, name,
count(distinct lwho.id) delivered_likes,
count(distinct lwhom.id) received_likes,
count(distinct lmut.id) mutual_likes
FROM people AS p
LEFT JOIN likes AS lwho ON p.id = lwho.who
LEFT JOIN likes AS lwhom ON p.id = lwhom.whom
LEFT JOIN likes AS lmut ON lwhom.who = lmut.whom AND lwhom.whom = lmut.who
GROUP BY p.id,p.name;
I have an SQL Fiddle here.

Adding count(*) from a join to a query

Ugh... I really struggle with these mySQL joins...
Here's what I'm after. My current query looks like this.
SELECT profiles.photo, postings.postid, postings.text, postings.date, members.fname, members.lname, members.userid
FROM profiles, postings, members
WHERE postings.wallid=postings.posterid
AND postings.wallid=members.userid
AND postings.wallid=profiles.userid
I'm trying to add in a count(*) of matching records from another table. The other table is called likes, and postings.postid = likes.post_id.
What I've tried (and returns no results) is this...
SELECT profiles.photo, postings.postid, postings.text, postings.date, members.fname, members.lname, members.userid,
(SELECT COUNT(*) FROM likes WHERE postings.postid=likes.post_id)
FROM profiles, postings, members, likes
WHERE postings.wallid=postings.posterid
AND postings.wallid=members.userid
AND postings.wallid=profiles.userid
What I've done here is add the nested SELECT, which I thought would resolve this.
Basically, what I'm asking... given my first query, how can I also obtain the count of the number of records in the likes table where postings.postid = likes.post_id? As always, any help / tips / suggestions is always appreciated.
try that:
SELECT profiles.photo, postings.postid, postings.text, postings.date, members.fname,members.lname, members.userid,
likes.like_count
FROM profiles, members, postings
LEFT JOIN (SELECT post_id, COUNT(*) like_count FROM likes GROUP BY post_id) as likes
ON postings.postid=likes.post_id
WHERE postings.wallid=postings.posterid
AND postings.wallid=members.userid
AND postings.wallid=profiles.userid
First off, it's a lot easier to use the new-style sql syntax for this. Conceptually, it's going to be confusing and tough to do outer joins with that old syntax. Second, you're missing the "profile" table and mistakenly seem to have self-joined the postings table. Try writing your queries more like this:
SELECT
profiles.photo,
postings.postid,
postings.text,
postings.date,
members.fname,
members.lname,
members.userid,
COUNT(DISTINCT like_id) as CountOfLikes
FROM profiles
INNER JOIN postings
ON postings.wallid=profiles.userid
INNER JOIN members
ON members.userid=postings.wallid
LEFT JOIN likes
ON likes.post_id = postings.post_id;

MySQL alternative to using a subquery

So let's say I have the following tables Person and Wage. It's a 1-N relation, where a person can have more then one wage.
**Person**
id
name
**Wage**
id
person_id
amount
effective_date
Now, I want to query a list of all persons and their latest wages. I can get the results by doing the following query:
SELECT
p.*,
( SELECT w.amount
FROM wages a w
WHERE w.person_id = p.id
ORDER BY w.effective_date
LIMIT 1
) as wage_amount,
( SELECT w.effective_date
FROM wages a w
WHERE w.person_id = p.id
ORDER BY w.effective_date
LIMIT 1
) as effective_date
FROM person as p
The problem is, my query will have multiple sub-queries from different tables. I want to make it as efficient as possible. Is there an alternative to using sub-queries that would be faster and give me the same results?
Proper indexing would probably make your version work efficiently (that is, an index on wages(person_id, effective_date)).
The following produces the same results with a single subquery:
SELECT p.*, w.amount, w.effective_date
from person p left outer join
(select person_id, max(effective_date) as maxdate
from wages
group by personid
) maxw
on maxw.person_id = p.id left outer join
wages w
on w.person_id = p.id and w.effective_date = maxw.maxdate;
And this version might make better us of indexes than the above version:
SELECT p.*, w.amount, w.effective_date
from person p left outer join
wages w
on w.person_id = p.id
where not exists (select * from wages w2 where w2.effective_date > w.effective_date);
Note that these version will return multiple rows for a single person, when there are two "wages" with the same maximum effective date.
Subqueries can be a good solution like Sam S mentioned in his answer but it really depends on the subquery, the dbms you are using, and your indexes. See this question and answers for a good discussion on the performance of subqueries vs. joins: Join vs. sub-query
If performance is an issue for you, you must consider using the EXPLAIN command of your dbms. It will show you how the query is being built and where the bottlenecks are. Based on its results, you might consider rewriting your query some other way.
For instance, it was usually the case that a join would yield better performance, so you could rewrite your query according to this answer: https://stackoverflow.com/a/2111420/362298 and compare their performance.
Note that creating the right indexes will also make a big difference.
Hope it helps.
Subqueries are very efficient as long as you make sure you use indexes. Try running EXPLAIN on your query and see if it uses correct indexes
SELECT p.name, w.amount, MAX(w.effective_date) FROM Person p LEFT JOIN Wage
w ON w.person_id = p.id GROUP BY p.name
I didn't test this query.