I am a complete beginner in SQL. I am using a program that queries a database, and then processes the results. The default query is:
SELECT *
FROM data,
questions,
users
where users.U_Id = data.Subj_Id
and data.Subj_Id between 1 and 10
and data.Q_Id = questions.Q_Id
and questions.Q_Id between 1 and 10
order by Subj_Id;
I'd like it to query every Subj_Id and every Q_Id. I do not know how many there are of either, and different subjects have different numbers of questions. How should I alter the above query?
You can rewrite the above query like this.
select *
from
data
inner join
users on users.U_Id = data.Subj_Id
inner join
questions on data.Q_Id = questions.Q_Id
where data.Subj_Id between 1 and 10
and questions.Q_Id between 1 and 10
order by Subj_Id;
This makes it clearer by separating the joins between tables from the filters on the data.
So to query the entire database, you just remove the where clause from the above...
select *
from
data
inner join
users on users.U_Id = data.Subj_Id
inner join
questions on data.Q_Id = questions.Q_Id
order by Subj_Id;
Related
I'm a bit of a db noob and have a nasty query that is taking over 30 seconds to run. I'm trying to learn a bit more about EXPLAIN and optimize the query but am at a loss. Here is the query:
SELECT
feed.*, users.username, smf_attachments.id_attach AS avatar,
games.name AS item_name, games.image, feed.item_id, u2.username AS follow_name
FROM feed
INNER JOIN following ON following.follow_id = feed.user_id AND following.user_id = 1
LEFT JOIN users ON users.id = feed.user_id
LEFT JOIN smf_members ON smf_members.member_name = users.username
LEFT JOIN smf_attachments ON smf_attachments.id_member = smf_members.id_member
LEFT JOIN games ON games.id = feed.item_id
LEFT JOIN users u2 ON u2.id = feed.item_id
ORDER BY feed.timestamp DESC
LIMIT 25
Explain results:
The result you will want to avoid in your execution plan (the output of an explain statement) is "full scan" (extra field of the explain output). In order to avoid it, you need to create the correct indexes on your tables.
If you have a table scan, it means the query engine read sequentially each row of the the table. Instead, with index access, the query engines accesses more directly the relevant data.
More explanation here: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
I have to run this query and it is pretty slow (4.86 seconds):
SELECT DISTINCT (users.id), users . *
FROM users
LEFT JOIN user_stages ON users.id = user_stages.user_id
LEFT JOIN user_tags ON users.id = user_tags.user_id
LEFT JOIN log ON log.user_id = users.id
ORDER BY last_activity DESC
When I do profiling it looks like Copying to tmp table takes 91% of the time (3.710409 seconds).
The size of the tables: users - almost 100,000 records, log - 1,443,000 records, user_stages - 66,000 records, user_tags - 260,000 records.
There are indexes properly added, if you want I can write all the indexes. How can I rewrite the query or modify the mysql settings to make this query faster?
Assuming last_activity is in the users table, you can change the query to the following:
SELECT users.*
FROM users
ORDER BY last_activity DESC
Your query is selecting only columns from the users table. The left join ensures that all rows from the table appear at least once. The distinct is removing duplicates added by the other tables. Hence, the joins are unnecessary.
If last_activity is in another table, then you might need to join that information in.
Your joins are probably taking so much time because you are getting cross products of rows for each user from the various tables.
SELECT `users`.*
FROM `users`
LEFT JOIN `user_stages` ON `users`.`id` = `user_stages`.`user_id`
LEFT JOIN `user_tags` ON `users`.`id` = `user_tags`.`user_id`
LEFT JOIN `log` ON `log`.`user_id` = `users`.`id`
GROUP BY `users`.`id`
ORDER BY `last_activity` DESC;
The query is built on the fly based on user's input. Sometimes it looks like this:
SELECT DISTINCT (users.id), users . *
FROM users
LEFT JOIN user_stages ON users.id = user_stages.user_id
LEFT JOIN user_tags ON users.id = user_tags.user_id
LEFT JOIN log ON log.user_id = users.id
WHERE user_stages.stage_id = 5
AND user_tags.tag_id = 10
ORDER BY last_activity DESC
The query has been written using GROUP BY initially but it was slower (about 8 seconds). I replaced GROUP BY with DISTINCT and it was faster but not fast enough. If you have any suggestions I would appreciate.
I am often confronted with such db queries:
Get all entries (e.g. comments) of userX and also all entries of the friends of userX
Which is the best way to do this in SQL (MySQL), assuming userX is not friend of himself.
1. Make two queries and merge them later with PHP
a = SELECT *
FROM comments
WHERE user = X
b = SELECT c.*
FROM comments c
INNER JOIN relation r ON r.user2 = c.user
WHERE r.user1 = X
merge(a, b)
That is what I have usually done. It is rather performant, but I cannot use things as ORDER BY or LIMIT
2. Subqueries with IN and UNION
SELECT c.*
FROM comments
WHERE user IN (
SELECT "X"
UNION
SELECT user2 FROM relation WHERE user1 = X
)
This seems to be very slow, and therefore a bad idea, isn't it?
3. Other solutions? Conditional Joins or something...
One way to do this is to start the query from the user table. Join it with the relation table. Then you can join comments on this with a conditional 'ON' statement. This way MySQL can use the indexes.
select c.* from users a
left outer join relation friend on a.id = friend.user1_id
join comments c on (c.user_id = a.id or c.user_id = friend.user2_id)
where a.id = 1
group by c.id;
Here's a working example: http://sqlfiddle.com/#!2/da298/1
Why not:
SELECT c.*
FROM comments
WHERE user = X OR user IN (
SELECT user2 FROM relation WHERE user1 = X
)
If that's slow, you should look at the execution plan. You might be missing an index.
There is nothing wrong with your #2. You most likely don't have the needed indexes.
As an academic case study, here are various forms that should return the same result set. You should try each one with the EXPLAIN and see which performs best for you (making sure the correct indexes are there) (I'm partial to the third because I have found it perform just as well if not better AND it allows for multiple fields to be related. You can search SO or Google for articles about MySQL performance of IN vs EXISTS)
UNION the results - this is an improved version of your #1; the data is still UNIONed in MySQL, not PHP
SELECT *
FROM comments
WHERE user = X
UNION
SELECT c.*
FROM comments c
INNER JOIN relation r
ON r.user2 = c.user
WHERE r.user1 = X
Your option #2 - IN
SELECT c.*
FROM comments
WHERE user IN (
SELECT "X"
UNION
SELECT user2 FROM relation WHERE user1 = X)
EXISTS
Generally, I prefer this to the IN since it allows for multiple fields to be related
SELECT c.*
FROM comments AS c
WHERE c.user = X
OR EXISTS(SELECT *
FROM relation AS r
WHERE r.user1 = X
AND r.user2 = c.user)
I have this statement:
SELECT board.*, numlikes
FROM board
LEFT JOIN (SELECT
pins.board_id, COUNT(source_user_id) AS numlikes
FROM likes
INNER JOIN pins ON pins.id = likes.pin_id
GROUP BY pins.board_id) likes ON board.id = likes.board_id
WHERE who_can_tag = ''
ORDER BY numlikes DESC LIMIT 10
But I need to also join these other two statements to it:
SELECT COUNT(owner_user_id)
FROM repin
INNER JOIN pins ON pins.id = repin.from_pin_id
WHERE pins.board_id = '$id'
and
SELECT COUNT(is_following_board_id)
FROM follow
WHERE is_following_board_id = '$id'
I managed to get the first one joined but I'm having trouble with the others - thinking it might get too long.
Is there a quicker way to execute?
Ideally, start with the smallest result set, and then start joining to the next smallest table.
You don't want the database to do full table joins on a bunch of big tables, and then at the end have a where clause that removes 99% of the rows the database just created.
In Oracle, I do a:
SELECT *
FROM big_table bt
JOIN DUAL ON bt.best_filter_column='the_value'
--now there are only a few rows
JOIN other_table_1 ...
LEFT JOIN outer_join_tables ...
Include all OUTER JOINS last, since they don't drop any rows, so hopefully you've already filtered out a lot of rows.
I have a noob question but rather a troublesome one for me. I am using SELECT on three tables the middle one of which is realtional (Holds relations - ID of user against ID of Place), the first is a table of users, the last of places. I have written this perfectly woking query
$query = "SELECT users.Username,usrxplc.User,places.Name
FROM users,usrxplc,places
WHERE usrxplc.Place=places.ID AND usrxplc.User=users.ID"
That spits out all places associated with all users. Fine, but I would like to limit it only to a certain user. Seems simple, but I am stuck.
You use a WHERE clause to filter the results, so just add a clause for users.ID:
select users.Username,
usrxplc.User,
places.name
from users,
usrxplc,
places
where usrxplc.Place = places.ID
and usrxplc.User = users.ID
and users.ID = 123
Just felt the need to post the alternative - instead of selecting and all tables you can use INNER JOIN to join one table onto another
SELECT
users.Username,
places.Name
FROM users
INNER JOIN usrxplc ON usrxplc.User=users.ID
INNER JOIN places ON places.ID = usrxplc.Place
WHERE users.ID = 111
It's functionally the same as the other answer, however when you get onto more complex queries and tables you will find that using JOINs allows for greater optimisation as you are able to further limit the rows each individual JOIN gets, for example the following is also valid, where the User row is limited before joining onto other tables
SELECT
users.Username,
places.Name
FROM places
INNER JOIN usrxplc ON usrxplc.Place = places.ID
INNER JOIN users ON users.ID = usrxplc.User AND users.ID = 111
In more complicated queries, or if these tables were to be far larger, this would in turn offer a more optimal query generally speaking