I have a query that I know can be done using a subselect, but due to large table sizes (100k+ rows per table) I would like to find an alternative using a join. This is not a homework question, but it's easier to share an example in such terms.
Suppose there are two tables:
Students
:id :name
1 Tom
2 Sally
3 Ben
Books
:id :student_id :book
1 1 Math 101
2 1 History
3 2 NULL
4 3 Math 101
I want to find all students who don't have a history book. Working subselect is:
select name from students where id not in (select student_id from books where book = 'History');
This returns Sally and Ben.
Thanks for your replies!
Is performance the problem? Or is this just some theoretical (homework?) question to avoid a subquery? If it's performance then this:
SELECT *
FROM studnets s
WHERE NOT EXISTS
(SELECT id FROM books WHERE student_id = s.id AND book = 'History')
will perform a lot better than the IN you're doing on MySQL (on some other databases, they will perform equivalently). This can also be rephrased as a join:
SELECT s.*
FROM studnets s
LEFT JOIN books b ON s.id = b.student_id AND b.book = 'History'
WHERE b.id IS NULL
Related
I have 2 tables:
Table1: users
id
name
faculty_id
level_id
1
john
1
1
2
mark
1
1
3
sam
1
2
Table 2: subjects
id
title
faculty_id
1
physics
1
2
chemistry
1
3
english
2
SQL query:
SELECT count(subjects.id) FROM users INNER JOIN subjects ON users.faculty_id = subjects.faculty_id WHERE users.level_id = 1
I'm trying to get count of subjects where users.level_id = 1, Which should be 2 in this case physics and chemistry.
But it's returning more than 2.
Why is that and how to get only 2?
I would recommend exists:
SELECT COUNT(*)
FROM subjects s
WHERE EXISTS (SELECT 1
FROM users u
WHERE u.faculty_id = s.faculty_id AND
u.level_id = 1
);
This counts subjects where a user exists with a level of 1.
You are joining users and subjects on faculty_id; this produces every combination of user and subject rows (2 users and 2 subjects makes 4 combined rows); change your query to SELECT users.*, subjects.* FROM... to see how this works.
count(subjects.id) counts the number of non-null subjects.id values in your results; you can just do count(distinct subjects.id).
The two tables are not directly related as none is parent to the other. The faculty table is parent to both tables and this is what relates the two tables indirectly.
When joining the faculties' students with the faculties' subjects per faculty, you get all combinations (john|physics, mark|physics, sam|physics, john|chemistry, mark||chemistry, ...). Whether John really has the subject Physics cannot even be gathered from the database. We see that John studies a faculty containing the subjects Physics and Chemistry, but does every student have every subject belonging to their faculty? You probably know but we don't. That shows that in order to write proper queries, one should know their database :-)
Now you are joining the tables and get all students per faculty multiplied with all subjects per faculty. You limit this to level_id = 1, which gets you 2 students x 2 subjects = 4. You could use COUNT(*) for this, because you are counting rows. By applying COUNT(subjects.id) instead you are only counting rows for which the subject ID is not null, but that is true for all rows, because all four combined rows have either subject ID 1 (Physics) or 2 (Chemistry). Counting something that cannot be null makes no sense, except for counting distinct, as has already been suggested. You can COUNT(DISTINCT subjects.id) to get the distinct number of subjects matching yur conditions.
This, however, has two drawbacks. First, the query doesn't clearly show your intention. Why do you join all students with all subjects, when your are not really interested in the (four) combinations? Secondly, you are building an unnecessary intermediate result (four rows in your small example) that must be searched for duplicates, so these can be removed from the counting. That means more memory consumed and more work for the DBMS.
What you want to count is subjects. So select from the subjects table. Your condition is that a student exists with level 1 for the same faculty. Conditions belong in the WHERE clause. Use EXISTS as Gordon suggests in his answer or use IN which is slightly shorter to write and may hence be considered a tad more readable (but that boils down to personal preference, as EXISTS and IN express exactly the same thing here).
select count(*)
from subjects
where faculty_id in (select faculty_id from users where level_id = 1);
You can just add "distinct" before subjects.id
your SQL query like:
SELECT count(distinct subjects.id) FROM users INNER JOIN subjects ON users.faculty_id = subjects.faculty_id WHERE users.level_id = 1
You want to count level_id and you have mentioned subject_id in the code. I would suggest first join two tables.
SELECT users.name, users.level_id,
subjects.title
FROM users
INNER JOIN subjects ON
users.faculty_id = subjects.faculty_id as new_table
After joining the table u can get the count.
SELECT level_id, COUNT(level_id)
FROM new_table
GROUP BY level_id
WHERE level_id = 1
(You have not mentioned group by in your code.)
I'm looking for some guidance on the following. I have two SQL tables A and B.
Table A Contains a list of languages
language_id
language
1
English
2
French
3
Spanish
I would like select all languages, and do a check on Table B to see if it exists for a user in table B. A user can select multiple languages, and many users could opt for any give language. The Query will be for a specific userid.
Table B
language_id
userid
1
1
2
1
2
2
3
2
I am not sure how to add the condition "WHERE userid = 1", I am hoping it will provide the following result.
$stmt = "SELECT A.language_id, A.language, COUNT(B.userid) FROM A INNER JOIN B.language_id = A.language_id ORDER BY A.language";
So for userid 1 it would produce the following.
language_id
language
COUNT()
1
English
1
2
French
1
3
Spanish
0
I am just trying to check if the user has selected each language or not. Thank you in advance.
you can use below query for this.
select A.language_id, A.language, COALESCE(B.counter, 0)counter
FROM tableA A
LEFT OUTER JOIN (
SELECT language_id, COUNT(1)counter FROM tableB where userid=1 group by language_id
)B ON A.language_id = B.language_id
I have an issue with data redundancy. My JOIN query in MySQL creates a very large data set (~8mb) while a lot of the data is redundant. After analysis, I can see that the query is fast, but the data transfer can take several seconds. What options do I have?
For example, say that I have the two tables
Users:
user_id
user_name
1
Alex
2
Joe
And Purchases:
user_id
purchase_id
purchase_amount
1
A
100
2
B
200
1
C
300
1
D
400
If I simply LEFT Join the tables with
SELECT users.user_id, users.user_name, purchase_id, purchase_amount
FROM Users
LEFT JOIN purchases ON users.id = purchases.user_id
I will end up with a result:
user_id
user_name
purchase_id
purchase_amount
1
Alex
A
100
2
Joe
B
200
1
Alex
C
300
1
Alex
D
400
However, as we can see, the user_id 1 and user_name Alex exists in three places. For very large result sets this can become an issue.
I'm thinking about using GROUP BY and GROUP_CONCAT to reduce the redundancy. Is this in general a good idea? My first tests seem to work, but I have to set the MySQL SET SESSION group_concat_max_len = 1000000; which might not be a good thing since I don't know what to set it to.
For example I could do something like
SELECT user_id, user_name, GROUP_CONCAT(CONCAT(purchase_id, ':', purchase_amount))
FROM Users
LEFT JOIN purchases ON users.id = purchases.user_id
GROUP BY user_id, user_name
And end up with a result:
user_id
user_name
GROUP_CONCAT...
1
Alex
A:100,C:300,D:400
2
Joe
B:200
Are there any other options for me? Is this the way to go? Parsing the concatenated column is not an issue. I am trying to solve the large data set being returned.
we can have temp table in between?
use apache spark's map reduce to get data in desired format.
ive been searching for hours but cant find a solution. its a bit complicated so i'll break it down into a very simple example
i have two tables; people and cars
people:
name_id firstname
1 john
2 tony
3 peter
4 henry
cars:
name_id car_name
1 vw gulf
1 ferrari
2 mustang
4 toyota
as can be seen, they are linked by name_id, and john has 2 cars, tony has 1, peter has 0 and henry has 1.
i simply want to do a single mysql search for who has a (1 or more) car. so the anwser should be john, tony, henry.
the people table is the master table, and im using LEFT JOIN to add the cars. my problem arises from the duplicates. the fact that the table im joining has 2 entries for 1 id in the master.
im playing around with DISTINCT and GROUP BY but i cant seem to get it to work.
any help is much appreciated.
EDIT: adding the query:
$query = "
SELECT profiles.*, invoices.paid, COUNT(*) as num
FROM profiles
LEFT JOIN invoices ON (profiles.id=invoices.profileid)
WHERE (profiles.id LIKE '%$id%')
GROUP BY invoices.profileid
";
try this
select distinct p.name_id, firstname
from people p, cars c
where p.name_id = c.name_id
or use joins
select distinct p.name_id, firstname
from people p
inner join cars c
on p.name_id = c.name_id
If you only want to show people that have a car, then you should use a RIGHT JOIN. This will stop any results from the left table (people) to be returned if they didn't have a match in the cars table.
Group by the persons name to remove duplicates.
SELECT firstname
FROM people P
RIGHT JOIN cars C ON C.name_id = P.name_id
GROUP BY firstname
SELECT DISTINCT firstname
FROM people
JOIN cars ON cars.name_id = people.name_id;
If this doesn't work you might have to show us the full problem.
The way to propose it there's no need for a left join since you need at least a car per person. Left join is implicitely an OUTER join and is intended to return the results with 0 corresponding records in the joinned table.
table user
____________________________________________
id name nickname info_id
1 john apple 11
2 paul banana 12
3 pauline melon 13
table info
_____________________________________________
id job location
11 model usa
12 engineer russia
13 seller brazil
result I want
______________________________________________
1 john apple model usa
my query
left join:
select * from user a left join info b on b.id = a.info_id where a.id=1
subquery:
select a.*, b.* from (user a, info b) where b.id = a.info_id
which is better?
SELECT a.`name`, a.`nickname`, b.`job`, b.`location`
FROM `user` AS a
LEFT JOIN `info` AS b
ON ( a.`info_id` = b.`id` )
That should be pretty efficient. Try using MySQL EXPLAIN if you are concerned (also make sure there are indexes on the ID fields):
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
UPDATE
After seeing that you are not having performance problems just yet, I would not worry about it. "Don't fix what ain't broken". If you find that it is slowing down in the future, or it is bottle-necking on that function, then worry about it.
The query I gave should be pretty efficient.