does LIMIT have effect before or after JOIN? - mysql

It's a simple question - I just found different answers to this so I'm not sure. Let me describe:
If you have a query, like this one:
SELECT logins.timestamp, users.name
FROM logins
LEFT JOIN users
ON users.id = logins.user_id
LIMIT 10
This basically would list the last 10 entries of the logins table, replacing the user_id with the username over a JOIN.
Now my question is, does LIMIT take effect while the JOIN happens (so that it only joins the first 10 entries) or after the JOIN? (Where it would join the whole table, and then cut out the first 10 entries).
I'm asking this because the sample table logins will have many entries - and I'm not sure if a JOIN is too costly performance wise. If LIMIT would only case 10 JOIN's to happen, that wouldn't be a problem.
A second question that came up with this: Is the functionality the same, if a DISTINCT is added? Will it still stop, at 10 entries? And no, this isn't going to be ordered by ORDER BY.

Don't worry. The LIMIT will happen at the same time with the join: MySQL will not read through the entire logins table, but fetch line by line (joining each time on users) until it has found 10 lines.
Do note that if a users.id appears two times in the table, the JOIN will duplicate the logins line and add each users line. The total amount of lines will still be 10, but you'll have 9 logins.

Related

Executing extremely slow MySQL query

this query has multiple JOIN including aggregate functions
executing this query for approximately 6000 users took 20 seconds.
is there any other method to run this query faster?
SELECT users.id, SUM(orders.totalCost) AS bought, COUNT(comment.id) AS commentsCount, COUNT(topics.id) AS topicsCount, COUNT(users_login.id) AS loginCount, COUNT(users_download.id) AS downloadsCount
FROM users
LEFT JOIN orders ON users.id=orders.userID AND orders.status=1
LEFT JOIN comment ON users.id=comment.userID
LEFT JOIN topics ON users.id=topics.userID
LEFT JOIN users_login ON users.id=users_login.userID
LEFT JOIN users_download ON users.id=users_download.userID
WHERE users.id='$userID'
GROUP BY users.id
ORDER BY `bought` DESC
The result of running explain
The EXPLAIN output shows you are doing full-table scans on everything except users. You need to create secondary (non-unique) indexes on userID on all the other tables in the join. That will speed up queries on individual users.
However, if you're going to process all users in one pass then do a single select without a WHERE users.id= clause. Your aggregation returns only one row per user and you should create a single resultset containing all the rows and iterate over that, instead of reissuing the query once per user. In this case the secondary indexes may still help as counts can be determined from the index alone without looking at the tables themselves.

MySQL Selecting things where a condition on a row is met 2 or more times, but showing the two or more results

If I use GROUP BY then I will get just 1 row per group. For example
Sessions table: SessionId (other things)
Actions table: ActionId, SessionId, (other things)
With:
SELECT S.*, A.* FROM ActionList A JOIN SessionList S ON A.SessionId
=S.SessionId
WHERE 1 /*various criteria to filter*/
ORDER BY S.SessionId DESC, ActionId DESC;
Thus showing me the most recent session at the top. Now I want to look at only sessions with 2 or more actions.
If I use GROUP BY A.SessionId then I can get COUNT(ActionId) and use HAVING to look at rows only with the required count, but I wont get both rows (or more) rows, just the one.
I suspect I can do this by JOINing a table with SessionIds and the count of action IDs but I'm fairly new to joins (I could do this via a subquery any ANY).
If a view would help, I would create a view of the form:
SELECT SessionId, COUNT(*) FROM Actions GROUP BY SessionId;
Or put this in brackets and JOIN on it (but I confess I'd have to loop up 3 table joins)
What is the neatest way to do this?
Also is this where "Foreign keys" come into play? That'd probably stop the "ambiguity errors" I get if I don't qualify SessionId. I've avoided them for fear of TRIGGERs, I also didn't know about JOINs and just used subqueries until recently. I've realised it is stupid to avoid things that were added to help.
Additionally I'm quite timid with joins because I know what it does, well worst case. If I JOIN on a table with m rows, and another with n I end up with m*n rows. That could be VERY large! I'm dealing with large tables (as in: schema wont fit in RAM large) so that is quite scary. I do know MySQL optimises well (able to move stuff from HAVING to WHERE and so forth) but still!
If you want to look at sessions with two or more actions, then use a join:
select sl.*
from SessionList sl join
(select SessionId, count(*) as cnt
from Actions
group by SessionId
) a
on sl.SessionId = a.SessionId and cnt > 1;

mysql count rows in query containing LEFT JOIN hangs server

I have two tables. Table people with 16500 rows, visits with 17000 rows.
My query contains LEFT JOIN because I have to link visits to people. I'm aware that if there is people record without visits record those visits columns will be NULL.
This simple query works like a charm.
SELECT * FROM people LEFT JOIN visits ON people.id = visits.id_people;
But when I try to count returned rows, MySQL hangs (or counting) 30+ seconds or until I kill it. That is not acceptable in production environment.
Here are different methods I tried to use for counting resulted rows, but all of them has the same hanging result.
SELECT COUNT(*) FROM people LEFT JOIN visits ON people.id = visits.id_people;
SELECT SQL_CALC_FOUND_ROWS * FROM people LEFT JOIN visits ON people.id = visits.id_people;
SELECT FOUND_ROWS();
Strange is that those methods are working fine on small testing tables (5 and 5 rows).
Can anyone help?
If you are creating a new MySQL table you can specify a column to index by using the INDEX term.Indexes are something extra that you can enable on your MySQL tables to increase performance
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm
http://www.tutorialspoint.com/mysql/mysql-indexes.htm view this it gives you much idea..
cheers

How to grab most popular rows in table?

I have a table with comments almost 2 million rows. We receive roughly 500 new comments per day. Each comment is assigned to a specific ID. I want to grab the most popular "discussions" based on the specific ID.
I have an index on the ID column.
What is best practice? Do I just group by this ID and then sort by the ID who has the most comments? Is this most efficient for a table this size?
Do I just group by this ID and then sort by the ID who has the most comments?
That's pretty much simply how I would do it. Let's just assume you want to retrieve the top 50:
SELECT id
FROM comments
GROUP BY id
ORDER BY COUNT(1) DESC
LIMIT 50
If your users are executing this query quite frequently in your application and you're finding that it's not running quite as fast as you'd like, one way you could optimize it is to store the result of the above query in a separate table (topdiscussions), and perhaps have a script or cron that runs intermittently every five minutes or so which would update that table.
Then in your application, just have your users select from the topdiscussions table so that they only need to select from 50 rows rather than 2 million.
The downside of this of course being that the selection will no longer be in real-time, but rather out of sync by up to five minutes or however often you want to update the table. How real-time you actually need it to be depends on the requirements of your system.
Edit: As per your comments to this answer, I know a little more about your schema and requirements. The following query retrieves the discussions that are the most active within the past day:
SELECT a.id, etc...
FROM discussions a
INNER JOIN comments b ON
a.id = b.discussion_id AND
b.date_posted > NOW() - INTERVAL 1 DAY
GROUP BY a.id
ORDER BY COUNT(1) DESC
LIMIT 50
I don't know your field names, but that's the general idea.
If I understand your question, the ID indicates the discussion to which a comment is attached. So, first you would need some notion of most popular.
1) Initialize a "Comment total" table by counting up comments by ID and setting a column called 'delta' to 0.
2) Periodically
2.1) Count the comments by ID
2.2) Subtract the old count from the new count and store the value into the delta column.
2.3) Replace the count of comments with the new count.
3) Select the 10 'hottest' discussions by selecting 10 row from comment total in order of descending delta.
Now the rest is trivial. That's just the comments whose discussion ID matches the ones you found in step 3.

mysql query and performance

I would like to know the impact on performance if I run this query in the following conditions.
Query:
select `players`.*, count(`clicks`.`id`) as `clicks_count`
from `players` left join `clicks` on `clicks`.`player_id` = `players`.`id`
group by `players`.`id`
order by `clicks_count` desc
limit 1
Conditions:
In the clicks table I expect to get
insert 1000 times in a 1 minute
The clicks table will contain more
then 1,000,000 rows
The players table will contain
10,000 rows
The players table get inserted into every 5
minutes
I would like to know what to expect performance-wise if I run the query 1000 times in 1 minute.
Thanks
That query will never run in milliseconds with any meaningful amounts of data in your tables. It'll run two full table scans, join the two together, aggregate the mess, and fetch the top row from that.
Use a trigger to store the total in the players, and index that field. You'll then be able to avoid the join altogether:
select p.* from players p order by clicks_count desc limit 1
First & foremost, you should worry about your schema if you want decent performance with that number of records and frequent writes; i.e. proper indexes and constraints must be created if not already in place.
Next, the query itself, select the minimum number of fields needed (so if you do not need ALL players field, avoid using "players.*").
Personal pref, I'd restructure tables (e.g. playerID in place of id) and query like so:
SELECT p.*, COUNT(c.id) as clicks_count
FROM players p
JOIN clicks c USING(playerID)
GROUP BY p.playerID
ORDER BY clicks_count desc
LIMIT 1
Again, see if you really need ALL player table fields; if not, omit "p.*" and replace with p.foo, p.bar, etc.