Sub select or group by - mysql

Hello have users table, related many-to-many with another table.
fiddle
CREATE TABLE users (
id INT NOT NULL PRIMARY KEY,
name varchar(50)
);
;
CREATE TABLE items (
id INT NOT NULL PRIMARY KEY,
name varchar(50)
);
CREATE TABLE user_items (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id int NOT NULL,
item_id int NOT NULL,
FOREIGN KEY (user_id) REFERENCES users(id),
FOREIGN KEY (item_id) REFERENCES items(id)
)
What is the best way to display the user information + count(of the related items) ?
I tried 2 queries .. but they both have some disadvantages:
SELECT u.*,
(SELECT count(id) FROM user_items WHERE user_id = u.id) as items_count
FROM users u;
SELECT u.*,
COUNT(ui.id) as items_count
FROM users u
JOIN user_items ui ON ui.user_id = u.id
GROUP BY u.id, u.name;
The sub select query .. will become heavy when this tables become big ..(and they will).
The grouping query .. force me to put all user fields in the GROUP BY ... to have a normal result witch becomes inconvenient for real data .. and for structure updates.
What is the best practice in this scenario ?
Also what is the optimal scenario for speed optimization if the table become large ?

Actually, in MySQL you don't have to either aggregate or group by an item to include it in a grouped result set:
SELECT u.*,
COUNT(*) as items_count
FROM users u
JOIN user_items ui ON ui.user_id = u.id
GROUP BY u.id
Alternatively, you could group inside an inline view in the from clause:
SELECT u.*,
ui.items_count
FROM users u
JOIN (select user_id, COUNT(*) as items_count
from user_items
group by user_id) ui
ON ui.user_id = u.id
GROUP BY u.id
SQLFiddle here.

Related

Select three different things from three diffrent tables

I wanna a query to get first_name of students and first_name of teachers which have the most courses with each other with the number of these courses.
Table Student:
CREATE TABLE Student(
id INT AUTO_INCREMENT,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255),
email VARCHAR(255) UNIQUE,
PRIMARY KEY (id)
);
Table Teacher:
CREATE TABLE Teacher(
id INT AUTO_INCREMENT,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255),
email VARCHAR(255) UNIQUE,
degree VARCHAR(10) NUT NULL,
PRIMARY KEY (id)
);
Table Course:
CREATE TABLE Course(
id INT AUTO_INCREMENT,
code INT NOT NULL UNIQUE,
name VARCHAR(255) NOT NULL,
st_id INT,
teach_id INT,
PRIMARY KEY (id),
FOREIGN KEY st_id REFERENCES Student (id),
FOREIGN KEY teach_id REFERENCES Teacher (id)
);
Is the below query correct? i.e. Can I use 3 SELECT in a query?
query1:
SELECT S.first_name
FROM Student AS S
INNER JOIN Course AS C
ON C.st_id = S.id
SELECT T.first_name
FROM Teacher AS T
INNER JOIN Course AS CC
ON CC.teach_id = T.id
SELECT COUNT(*)
FROM Course
WHERE Course.st_id = S.id
AND Course.teach_id = T.id
GROUP BY COUNT(*)
ORDER BY DESC;
query2:
SELECT S.first_name, T.first_name, COUNT(*)
FROM Student AS S, Teacher AS T, Course
WHERE Course.st_id = S.id
AND Course.teach_id = T.id
GROUP BY COUNT(*)
ORDER BY DESC;
If the above queries are not correct(probably the first one is wrong) guide me to correct answer, please.
NOTE: If the ordering isn't unique, order by the name of teachers first, then order by the name of the students(for clarity but not important so much to me).
Your second query is closer to being right but it has some issues. I would recommend using JOIN statements rather than implied joins. This makes the query easier to read.
Something like this should work:
SELECT t.first_name,
t.id,
s.first_name,
s.id,
COUNT(*) AS course_count
FROM Course c
JOIN Student s ON c.st_id = s.id
JOIN Teacher t ON c.teach_id = t.id
GROUP BY t.id, s.id
ORDER BY course_count DESC, t.first_name, s.first_name;
You need to add a group by in order to get your count on a per student basis. Putting the group by on the id columns rather than the name makes sure you get counts on unique students and teachers in case you have multiple records in your table with the same first name. I am also adding the id columns to the select for the same reason, but these are not necessary and can be removed without affecting the accuracy of the query.
SELECT t.first_name, t.id, s.first_name, s.id, COUNT(c.id) AS course_count
FROM course c
JOIN student s ON c.st_id = s.id
JOIN teacher t ON c.teach_id = t.id
GROUP BY t.id, s.id
ORDER BY t.first_name, s.first_name
The essential data is contained in the Courses table, with the Student and Teacher tables only required for gathering the names. This query joins the 3 tables in question, computing the count of courses shared by teachers and students.

What Am I Missing? MySQL Left Join Most Newest Entry From 2nd Table

I need a fresh pair of eyes on this. I have two tables, one of which has users and the second which contains login records, multiple records for each user. What I'm trying to do is select all entries from the first table, and the most recent record from the second table, e.g., a list of all users but only show the most recent activity. Both tables have auto increment in the ID column.
My code currently is thus:
SELECT u.user_id, u.name, u.email, r.rid, r.user_id
FROM users AS u
LEFT JOIN login_records AS r ON r.user_id = u.user_id
WHERE
r.rid = (
SELECT MAX( rid )
FROM login_records
WHERE user_id = u.user_id
)
I've scoured answers to similar questions on SO and tried all of them, but results have been either returning nothing or only getting odd results (not necessarily the newest one). ID in both tables is auto-increment, so I thought it should be a relatively simple matter to get the only or highest ID for a particular user, but it either returns nothing or a completely different selection each time.
It's my first time using JOIN - do I have the wrong JOIN? Do I need to ORDER or GROUP things differently?
Thanks for your help. It's got to be something simple, since Danny Coulombe's answer appearing here seems to work for other users.
You will need a subquery I believe:
https://www.db-fiddle.com/f/2wudMDVxReYJz4FEyG19Va/0
CREATE TABLE users (
user_id INT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY
);
CREATE TABLE users_logins (
user_login_id INT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED NOT NULL
);
INSERT INTO users SELECT 1;
INSERT INTO users SELECT 2;
INSERT INTO users_logins SELECT 1,1;
INSERT INTO users_logins SELECT 2,1;
INSERT INTO users_logins SELECT 3,1;
INSERT INTO users_logins SELECT 4,1;
INSERT INTO users_logins SELECT 5,2;
INSERT INTO users_logins SELECT 6,2;
And the query:
SELECT
u.user_id, ul.latest_login_id
FROM users u
LEFT JOIN
(
SELECT user_id, MAX(user_login_id) latest_login_id
FROM users_logins
GROUP BY user_id
) ul ON u.user_id = ul.user_id
You have to ORDER BY with what column you want to display by desc, for example ORDER BY last_login DESC.
Change the last_login column with the column you want to order, but you must first declare the last_login column after SELECT.
How about replacing all rid in where clause and corrolated subquery by record_id?
SELECT u.user_id, u.name, u.email, r.rid, r.record_id, r.user_id
FROM test_users AS u
LEFT JOIN test_login_records AS r ON r.user_id = u.user_id
WHERE
(r.record_id = (
SELECT MAX(record_id)
FROM test_login_records
WHERE user_id = u.user_id
) OR r.record_id is null);
Test here

Why is this query really slow with 70k+ rows?

First of all, this is my table structure:
CREATE TABLE IF NOT EXISTS `site_forum_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`forum_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`data` int(11) NOT NULL,
`comment` longtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
Before importing my backup, it had like 10-15 rows and I made a ranking system based on number of comments and this query was working flawlessly:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users AS u
LEFT JOIN site_forum_comments AS f ON (f.user_id = u.id)
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
But now, with more than 70k rows inserted, the script won't even load and just crashes the server.
What have I possibly done wrong? Is this problem about the query specifically or is it the table structure?
Thanks in advance, cheers!
This is your query:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users u LEFT JOIN
site_forum_comments f
ON f.user_id = u.id
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
Because you are choosing the highest ranked user, you can probably use an inner join rather than an outer join. In any case, this version doesn't have a great many optimization opportunities. But, you need an index on site_forum_comments(user_id, id).
You might get better performance with the same index and a correlated subquery:
SELECT u.id, u.username,
(SELECT COUNT(*)
FROM site_forum_comments f
WHERE f.user_id = u.id
) as rank
FROM site_users u
ORDER BY rank DESC
LIMIT :l;
You are currently joining all users to their comments without an index on the user_id column thats slow.
The following query will select the highest user first and only join that one user with the highest rank with the site_users table (using the index over site_users.id). So it should be faster.
SELECT site_users.id, site_users.username, a.rank
FROM (
SELECT user_id, COUNT(*) as rank
FROM site_forum_comments
GROUP BY user_id
ORDER BY rank DESC
LIMIT 1
) AS a
LEFT JOIN site_users ON a.user_id = site_users.id
note that with this query you won't get a result if the rank is 0

MySQL select from multiple tables with multiple where clauses and find_in_set

I'm having an issue trying to avoid partial duplicate results with a MySQL query. I admit I'm really new at MySQL but I have learned from research on SO that the schema I'm about to lay out for you below could definitely be done a better way (the linked_users column of the users table should be a separate table). However, I cannot change the way it is set up right now.
I'm trying to return the user names of the user ids assigned to each item in t2 or the user names of the users who are linked to those users. The query is returning two sets of names for each item, however. I think this is happening because it is searching them all twice, and I've attempted to read this tutorial on SO regarding returning multiple values from multiple tables but I can't seem to wrap my mind around the JOINS and UNIONS and whatnot.
My question is two-fold:
What can I do to solve this issue without changing the way the database is set up?
How should the database be changed to better allow for queries like this in the future?
Thank you for your time.
Schema:
create table users (user_id int, user_name varchar (55), linked_users varchar (55));
insert into users( user_id, user_name, linked_users)values(1, 'user1', '2,154,4,45');
insert into users( user_id, user_name, linked_users)values(2, 'user2', '13,1,200');
create table t2 (t2_id int, user_id int);
insert into t2( t2_id, user_id)values(1, 2);
insert into t2( t2_id, user_id)values(2, 1);
insert into t2( t2_id, user_id)values(3, 1);
insert into t2( t2_id, user_id)values(4, 2);
insert into t2( t2_id, user_id)values(5, 3);
Query:
SELECT t.*, u.user_name
FROM t2 t, users u
WHERE t.user_id = u.user_id
OR find_in_set(t.user_id, u.linked_users) > 0
Fiddle: http://sqlfiddle.com/#!9/c5540/10
You should definitely change the structure of your tables. The correct structure would use two tables, one for the entity ("users") and the other for the relationships between them:
create table users (
user_id int not null primary key auto_increment,
user_name varchar(55)
);
create table userlinks (
user_id int not null references users(user_id),
linked_user_id int not null references users(user_id)
);
create table t2 (
t2_id int,
user_id int not null references users(user_id)
);
Your structure has several obvious flaws:
You are storing integer values as string representations of them.
You are storing lists in a string, and SQL has very little support for comma-delimited lists.
You have no definitions of primary or foreign keys.
With this structure, you can get the list of users for a given "t" value without duplicates and without a distinct by using exists:
select u.user_id
from users u cross join
(select user_id from t2 where t2.tid = 1) t
where u.user_id = t.user_id or
exists (select 1
from ul
where ul.user_id = t.user_id and
u.user_id = ul.linked_user_id
);
The query says: "Get all the users where the user id matches the corresponding id in t2 or the user is linked to the user_id in t2."
As a note, you can also do this with your structure by doing:
select u.user_id
from users u cross join
(select user_id from t2 where t2.tid = 1) t
where u.user_id = t.user_id or
find_in_set(u.user_id,
(select u.linked_users from users u2 where u2.user_id = t.user_id)
) > 0;
You have a many to many relationship within the same table. One user may have many linked users and a user may be the linked user of more that one user. To represent that you need an additional table that you may call linked_users, defined with two columns:
user_id int foreign key to user_id in users table
linked_user_id int foreign key to user_id in users table
The primary key of this table should be both user_id and linked_user_id
Now, let's say you have users 1 to 10 with 1 having 5, 6, 8 as linked users and 3 having 1, 6, 10 as linked users. You should insert
1, 5
1, 6
1, 8
3, 1
3, 6
3, 10
in linked_users. You query becomes:
select u1.*, u2.*
from users u1 inner join linked_users lu on u1.user_id = lu.user_id
inner join users u2 on lu.linked_user_id = u2.user_id
If you have to stick to your design
select u1.*, u2.*
from users u1 inner join users u2 on
find_in_set(u2.user_id, u1.linked_users) > 0
Thank you to Gordon and Tarik for your responses. I tried the code you supplied in my fiddle but was unable to pull the t2 information, so after messing around with it some more I found this query solved the issue I was having:
SELECT t2.*, u.user_name
FROM users u, t2 t
WHERE t.user_id = u.user_id
AND (t.user_id = '$user_id'
OR t.user_id IN
(SELECT u2.user_id
FROM users u2
WHERE find_in_set('$user_id', u2.linked_users)
)
)
Again, I understand that this is not the optimal way to do it, but since I can't change the structure this will serve the purpose.

joining tables with not exists query [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Mysql: Perform of NOT EXISTS. Is it possible to improve permofance?
Is there a better/optimal way to do it. Should I use exists instead of join? Or two separate queries? And what about temporary tables, as I was reading about those but uncertain.
Getting members email from a group. Checking that they have not received a item yet.
SELECT m.email,g.id
FROM group g
LEFT JOIN members m
ON g.mid = m.id
AND g.gid='1'
WHERE NOT EXISTS
( SELECT id
FROM items AS i
WHERE i.mid=m.id
AND i.item_id='5'
)
Here's the same thing written as a JOIN:
SELECT m.email, g.id
From members m
JOIN group g ON g.mid = m.id AND g.gid = '1'
LEFT JOIN items i ON i.mid = m.id AND i.item_id = '5'
WHERE i.id IS NULL
Use the following compound indexes:
group (mid, gid)
items (mid, item_id)
I reversed the LEFT JOIN on members and group because it seems like you're returning members, not groups, and I changed the LEFT JOIN into an INNER JOIN since you only want members from that group.
I think this one might read better:
SELECT m.email, g.id
From members m
JOIN group g ON g.mid = m.id
LEFT JOIN items i ON i.mid = m.id AND i.item_id = 5
WHERE g.gid = 1
AND i.id IS NULL
You might be wondering if we can move the i.item_id = 5 part to the WHERE clause also. You can't because there are no rows where i.id IS NULL and i.item_id = 5. You must do the join first and then eliminate the NULL rows in the WHERE clause.
I don't believe a temporary table is necessary. We'd really only go that route if we can't get acceptable performance.
From your query, we gather your schema looks like this:
group (id INT PK, gid INT, mid INT)
items (id INT PK, item_id INT, mid INT)
members (id INT PK, email VARCHAR)
It looks like your group table is really a "membership" table, which resolves/implements a many-to-many relationship between a group and a person. (That is, a person can be a member of zero, one or more groups; a group can have zero, or or more persons as members.)
You are using a LEFT JOIN between group and members. This will return a row for group (returning group.id) when there are no matching members, with a NULL for members.email (which may be what you want). But if you only want to return email addresses, then this can be changed to an INNER JOIN.
The NOT EXISTS predicate can be replaced with an OUTER JOIN and a test for a NULL value returned from the JOINED table. If the group.gid and/or items.item_id columns are numeric datatype, then you can remove the quotes from around the integer literals in the predicates.
Here is an alternative which will return an equivalent resultset, and may perform better:
SELECT m.email
, g.id
FROM members m
JOIN group g ON g.mid = m.id AND g.gid = 1
LEFT
JOIN items i ON i.mid = m.id AND i.item_id = 5
WHERE i.id IS NULL
ADDENDUM:
TEST CASE (provided in comment on selected answer) demonstrates difference in result set between queries with the predicate items.item_id = 5 in the ON clause and in the WHERE clause. (Moving this predicate to the WHERE clause messes with the anti-join.)
CREATE TABLE `group` (`id` INT PRIMARY KEY, `gid` INT, `mid` INT);
CREATE TABLE `items` (`id` INT PRIMARY KEY, `item_id` INT, `mid` INT);
CREATE TABLE `members` (`id` INT PRIMARY KEY, `email` VARCHAR(40));
INSERT INTO `group` VALUES (1,1,1), (2,1,2);
INSERT INTO `items` VALUES (1,5,1);
INSERT INTO `members` VALUES (1,'one#m.com'),(2,'two#m.com');