My website contains pieces of content on which users can vote (like/dislike similar to reddit upvotes). When selecting an individual piece of content, I run the following subqueries to get the number of likes, the number of dislikes and the current user's vote.
The votes are stored in a separate table {contentId, userId, vote}
SELECT
[... BUNCH OF FIELDS ...]
(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::LIKE.') AS likes,
(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::DISLIKE.') AS dislikes,
COALESCE((SELECT vote FROM votes vt WHERE vt.cId = c.contentId AND userId = '.USER_ID.'), '.Constants::NO_VOTE.') AS myVote
FROM content
[... OTHER STUFF ... ]
Is there a better way to achieve this (combine those subqueries or otherwise)?
In terms of performance, those correlated subqueries can eat your lunch. And devour your lunchbox too, for large sets, because of the way MySQL processes them. Each of those subqueries gets executed for every row returned in the outer query. And that can get very expensive for large sets.
An alternative approach is to use an inline view to materialize the likes and dislikes for all content, and then do a join operation to that.
But, this approach can be expensive too, particularly when you are only needing the vote "counts" for just a few content rows, out of a bazillion rows. Often, there is a predicate from the outer query that can also be incorporated into the inline view, to limit the number of rows that need to be examined and returned.
We want to use an OUTER join to that inline view, so it returns a result equivalent to your query; returning a row from content when there are no matching rows in the vote table.
SELECT [... BUNCH OF FIELDS ...]
, COALESCE(v.likes,0) AS likes
, COALESCE(v.dislikes,0) AS dislikes
, COALESCE(v.myvote,'.Constants::NO_VOTE.') AS myvote
FROM content c
LEFT
JOIN ( SELECT vt.cId
, SUM(vt.vote = '.Constants::LIKE.') AS likes
, SUM(vt.vote = '.Constants::DISLIKE.') AS dislikes
, MAX(IF(vt.userId = '.USER_ID.',vt.vote,NULL)) AS myvote
FROM votes vt
GROUP
BY vt.cId
) v
ON v.cId = c.contentId
[... OTHER STUFF ... ]
Note that the inline view query (aliased as v) is going to look at EVERY single row from the votes table. If you only need a subset, then consider adding an appropriate predicate (either in a WHERE clause or as a JOIN to another table). There's no indication from the [... OTHER STUFF ...] in your query whether it's returning just a few rows from content or if you are needing all of the rows because you are ordering by likes, etc.
For a small number of rows selected from the content table, using the correlated subqueries (like in your query) can actually be faster than materializing a huge inline view and performing a join operation against it.
Oh... and for both queries, it goes without saying that an appropriate index on the votes table with a leading column of cId will benefit performance. For the inline view, you don't want the overhead of MySQL having to perform a filesort operation on all of those rows to do the GROUP BY. And for the correlated subqueries, you want them to use a index range scan, not a full scan.
Your problem is simple your current sub queries are running for every single row returned.
You need to join to that data instead. You will need to change the code I've added to give you the correct counts but this should point you in the right direction.
SELECT
BLAH
Likes,
Dislikes
FROM CONTENT as C
INNER JOIN (
SELECT
cID,
COUNT(votes) as Likes, --you will need to alter this
COUNT(votes) as Dislikes --to count your up and downvotes
FROM Votes
GROUP BY cID
) AS V
ON V.cID = C.ContentID
Related
I have written a MYSQL query without much expertise in this area but as my database has increased in size, I'm finding the results are taking far too long to be returned. I can understand why but I can't figure out how to better group my query so that MYSQL isn't searching through the entire database to return the results. I know there is a far more efficient way to do this but I can't figure out how. If I remove the ORDER BY statement, the results are returned in less than a quarter of the time. As it stands now with a table that has 180,000 entries in it (members), it's taking about 4 seconds to return the results.
SELECT members.mem_id, members.username, members.online,
members.dob, members.regdate, members.sex,
members.mem_type, members.aboutme,
geo_cities.name AS city,
geo_countries.name AS country, photos.photo_path
FROM members
LEFT JOIN geo_cities
ON members.cty_id=geo_cities.cty_id
LEFT OUTER JOIN geo_countries
ON geo_cities.con_id=geo_countries.con_id
RIGHT OUTER JOIN photos
ON members.mem_id=photos.mem_id
WHERE (photos.main=1
AND photos.approved=1
AND members.banned!="1"
AND members.profile_photo="1"
AND members.profile_essentials="1"
AND members.profile_user="1")
ORDER BY lastdate DESC
LIMIT 12
It looks like you want to show the most recent 12 members who meet certain criteria.
A few things.
Your RIGHT JOIN on photos is actually an ordinary inner JOIN: its columns appear in your WHERE clause.
You probably need compound indexes on the members and photos tables.
SELECT many columns FROM ... JOIN ... ORDER BY column... LIMIT 12 is a notorious performance antipattern: It constructs a complex result set, then sorts the whole thing, then discards almost all of it. Wasteful.
You have WHERE....members.banned != "1" Inequality filters like this make SQL work harder (==slower) than equalities. If you can change that to = "0" or something like that do it.
(I guess your lastdate column is in your members table, but you didn't tell us that in your question.)
So try something like this to find the twelve members you want to display.
SELECT members.mem_id
FROM members
JOIN photos ON members.mem_id=photos.mem_id
WHERE photos.main=1
AND photos.approved=1
AND members.banned!="1"
AND members.profile_photo="1"
AND members.profile_essentials="1"
AND members.profile_user="1")
ORDER BY lastdate DESC
LIMIT 12
That gets you the ids of the twelve members you want. Use it in your main query.
SELECT members.mem_id, members.username, members.online,
members.dob, members.regdate, members.sex,
members.mem_type, members.aboutme,
geo_cities.name AS city,
geo_countries.name AS country, photos.photo_path
FROM members
LEFT JOIN geo_cities ON members.cty_id=geo_cities.cty_id
LEFT JOIN geo_countries ON geo_cities.con_id=geo_countries.con_id
JOIN photos ON members.mem_id=photos.mem_id
WHERE members.mem_id IN (
SELECT members.mem_id
FROM members
JOIN photos ON members.mem_id=photos.mem_id
WHERE photos.main=1
AND photos.approved=1
AND members.banned!="1"
AND members.profile_photo="1"
AND members.profile_essentials="1"
AND members.profile_user="1")
ORDER BY lastdate DESC
LIMIT 12
)
ORDER BY lastdate DESC
LIMIT 12
This finds the twelve members you care about, then pulls out only their records, instead of pulling all the records.
Then, create a compound index on members(profile_photo, profile_essentials, profile_user, banned, lastdate). That compound index will speed up your WHERE clause a great deal.
Likewise, create a compound index on photos(mem_id, main, approved, photo_path).
Things always get exciting when databases start to grow! Read Markus Winand's online book https://use-the-index-luke.com/
I have a relatively simple query that returns a user profile, together with 2 counts related to that user (stream events & inventory);
SELECT u.*, r.regionName, COUNT(i.fmid) AS invcount, COUNT(s.fmid) AS streamcount
FROM fm_users u
JOIN fm_regions r
ON u.region=r.regionid
LEFT OUTER JOIN fm_inventory i
ON u.fmid=i.fmid
LEFT OUTER JOIN fm_stream s
ON u.fmid=s.fmid
WHERE u.username='sampleuser'
Both the inventory & stream values could be zero, hence using a left outer join.
However, the values for both currently return numbers for all users, not the specific user (always identified as integer 'fmid' in each table). This is obviously because the query doesn't specify a 'where' for either count - but I'm not sure where. If I change the last part to this;
WHERE u.username='sampleuser' AND s.fmid=u.fmid AND i.fmid=u.fmid
GROUP BY u.fmid
It still returns incorrect numbers, albeit 'different' numbers depending on the search criteria - and the name number for both invcount and streamcount.
Your query cross joins the fm_inventory and fm_stream tables for each user. For example if a specific user has 2 matching fm_inventory rows, and fm_stream also has two, the result will have 4 (=2 x 2) rows and each will be counted in both COUNT() functions.
The way to accomplish what you are doing is to use correlated subselects for the counts.
SELECT u.*, r.regionName,
(select COUNT(*) from fm_inventory i
where i.fmid = u.fmid) AS invcount,
(select COUNT(*) from fm_stream s
where s.fmid = u.fmid) AS streamcount
FROM fm_users u
JOIN fm_regions r
ON u.region=r.regionid
WHERE u.username='sampleuser'
This allows the counts to be independent of each other. This works even if more than just one user is selected. In that case you need a group by clause, but otherwise it is the same syntax.
When you mix normal columns with aggregates, include all normal columns in the GROUP BY. In this particular case, you are missing the r.regionName in the GROUP BY.
See longer explanation.
SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.
Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.
Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.
Its particular query pops up in the slow query log all the time for me. Any way to improve its efficiency?
SELECT
mov_id,
mov_title,
GROUP_CONCAT(DISTINCT genres.genre_name) as all_genres,
mov_desc,
mov_added,
mov_thumb,
mov_hits,
mov_numvotes,
mov_totalvote,
mov_imdb,
mov_release,
mov_type
FROM movies
LEFT JOIN _genres
ON movies.mov_id = _genres.gen_movieid
LEFT JOIN genres
ON _genres.gen_catid = genres.grenre_id
WHERE mov_status = 1 AND mov_incomplete = 0 AND mov_type = 1
GROUP BY mov_id
ORDER BY mov_added DESC
LIMIT 0, 20;
My main concern is in regard to the group_concat function, which outputs a comma separated list of genres associated with the particular film, which I put through a for loop and make click-able links.
Do you need the genre names? If you can do with just the genre_id, you can eliminate the second join. (You can fill in the genre name later, in the UI, using a cache).
What indexes do you have?
You probably want
create index idx_movies on movies
(mov_added, mov_type, mov_status, mov_incomplete)
and most certainly the join index
create index ind_genres_movies on _genres
(gen_mov_id, gen_cat_id)
Can you post the output of EXPLAIN? i.e. put EXPLAIN in front of the SELECT and post the results.
I've had quite a few wins with using SELECT STRAIGHT JOIN and ordering the tables according to their size.
STRAIGHT JOIN stops mysql guess which order to join tables and does it in the order specified so if you use your smallest table first you can reduce the amount of rows being joined.
I'm assuming you have indexes on mov_id, gen_movieid, gen_catid and grenre_id?
The 'using temporary, using filesort' is from the group concat distinct genres.genre_name.
Trying to get a distinct on a column without an index will cause a temp table to be used.
Try adding an index on genre_name column.
I have the following query:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123;
I have the following questions:
Is the USING syntax synonymous with ON syntax?
Are these joins evaluated left to right? In other words, does this query say: x = companies JOIN users; y = x JOIN jobs; z = y JOIN useraccounts;
If the answer to question 2 is yes, is it safe to assume that the companies table has companyid, userid and jobid columns?
I don't understand how the WHERE clause can be used to pick rows on the companies table when it is referring to the alias "j"
Any help would be appreciated!
USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.
SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.
That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.
More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.
No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.
The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.
I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.
But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:
Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.
Is the JOIN performed on a foreign key field? This also gets high priority.
Does an index exist on the joined field? If so, bump the priority.
Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.
What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.
How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.
Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.
And here's a good article to read:
http://www.informit.com/articles/article.aspx?p=377652
ON EDIT:
To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.
The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.
In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:
EXPLAIN SELECT [...]
See "7.2.1 Optimizing Queries with EXPLAIN"
Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.
Build schema:
CREATE TABLE users (
name text
);
CREATE TABLE orders (
order_id text,
user_name text
);
CREATE TABLE shipments (
order_id text,
fulfiller text
);
Add data:
INSERT INTO users VALUES ('Bob'), ('Mary');
INSERT INTO orders VALUES ('order1', 'Bob');
INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
Run query:
SELECT *
FROM users
LEFT OUTER JOIN orders
ON orders.user_name = users.name
JOIN shipments
ON shipments.order_id = orders.order_id
Result:
Only the Bob row is returned
Analysis:
In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.
Second query:
SELECT *
FROM users
LEFT OUTER JOIN (
orders
JOIN shipments
ON shipments.order_id = orders.order_id)
ON orders.user_name = users.name
Result:
One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.
Analysis:
The parenthesis changed the evaluation order.
Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html
SEE http://dev.mysql.com/doc/refman/5.0/en/join.html
AND start reading here:
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.
These changes have five main aspects:
The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).
Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.
Resolution of column names in NATURAL or USING joins.
Transformation of NATURAL or USING joins into JOIN ... ON.
Resolution of column names in the ON condition of a JOIN ... ON.
Im not sure about the ON vs USING part (though this website says they are the same)
As for the ordering question, its entirely implementation (and probably query) specific. MYSQL most likely picks an order when compiling the request. If you do want to enforce a particular order you would have to 'nest' your queries:
SELECT c.*
FROM companies AS c
JOIN (SELECT * FROM users AS u
JOIN (SELECT * FROM jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123)
)
as for part 4: the where clause limits what rows from the jobs table are eligible to be JOINed on. So if there are rows which would join due to the matching userids but don't have the correct jobid then they will be omitted.
1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm
It is more difficult to read in my opinion, so I'd go spelling out the joins.
3) It is not clear from this query, but I would guess it does not.
2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:
Origional:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123
What I think it is likely suggesting:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = u.userid
JOIN useraccounts AS us on us.userid = u.userid
WHERE j.jobid = 123
You could switch you lines joining jobs & usersaccounts here.
What it would look like if everything joined on company:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = c.userid
JOIN useraccounts AS us on us.userid = c.userid
WHERE j.jobid = 123
This doesn't really make logical sense... unless each user has their own company.
4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...
if you returned
SELECT c.*, j.jobid....
you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.