Avoid duplicated rows in SQL query when JOINing many to many tables

Avoid duplicated rows in SQL query when JOINing many to many tables - mysql

I have a basic blog system with tables for posts, authors and tags.
One author can write a post but a post can only be written by an author (one to many relationship). One tag can appear in many different posts and any post can have several tags (many to many relationship). In that case I've created a 4th table to link posts and tags as follows:
post_id -> posts_tag
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
| 4 | 1 |
I need a single query to be able to list every post along with its user and its tags (if any). I'm pretty close with a double JOIN query but I get duplicated rows for posts with more than one tag (everything in that rows is duplicated but the tag register). The query I'm using goes as follows:
SELECT title,
table_users.username author,
table_tags.tagname tag
FROM table_posts
JOIN table_users
ON table_posts.user_id = table_users.id
LEFT
JOIN table_posts_tags
ON table_posts.id = table_posts_tags.post_id
LEFT
JOIN table_tags
ON table_tags.id = table_posts_tags.tag_id
Could any one suggest an amend to this query or a new proper one to solve the row duplication issue* when there's more than one tag associated to the same post? Ty
(*) To make clear: in the above table the query will throw 4 rows when it should be throwing 3, 1 for post #1 (with 2 tags), one for post #2 and one for post #4.
Table Recreate
CREATE TABLE `table_posts` (
`id` int NOT NULL AUTO_INCREMENT,
`title` varchar(120) NOT NULL,
`content` text NOT NULL,
PRIMARY KEY (`id`),
)
CREATE TABLE `table_tags` (
`id` int NOT NULL AUTO_INCREMENT,
`name_tag` varchar(18) NOT NULL,
PRIMARY KEY (`id`)
)
CREATE TABLE `table_posts_tags` (
`id` int NOT NULL AUTO_INCREMENT,
`post_id` int NOT NULL,
`tag_id` int NOT NULL,
PRIMARY KEY (`id`),
KEY `tag_id` (`tag_id`),
KEY `FK_t_posts_tags_t_posts` (`post_id`),
CONSTRAINT `FK_t_posts_tags_t_posts` FOREIGN KEY (`post_id`) REFERENCES `t_posts` (`id`),
CONSTRAINT `FK_t_posts_tags_t_tags` FOREIGN KEY (`tag_id`) REFERENCES `t_tags` (`id`)
)
CREATE TABLE `table_users` (
`id` int NOT NULL AUTO_INCREMENT,
`username` varchar(16) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`banned` tinyint DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `FK_t_users_t_roles` (`role_id`),
CONSTRAINT `FK_t_users_t_roles` FOREIGN KEY (`role_id`) REFERENCES `t_roles` (`id`)
)

One option aggregates the tags in a CSV list using group by and group_concat():
select p.title, u.username author, group_concat(t.tagname) tagnames
from table_posts p
inner join table_users u on u.id = p.user_id
left join table_posts_tags pt on pt.post_id = p.id
left join table_tags t on t.id = tp.tag_id
group by p.id, p.title, u.username
Note that I added table aliases to the query, and used them to qualify all columns; this makes the query shorter and easier to write and read.

Related

Query gets data from two different users when I try to get data just from just one user

I want to get data just from only one specific user but I get data from both users. Why is that? I don't understand. How can I solve this?.
I have three tables:
/*User*/
CREATE TABLE `User` (
`IDUser` INT NOT NULL AUTO_INCREMENT,
`Name` VARCHAR(50) NOT NULL,
PRIMARY KEY (`IDUser`)
);
/*Category*/
CREATE TABLE `Category` (
`IDCategory` CHAR(3) NOT NULL,
`FK_User` INT NOT NULL,
`CategoryName` VARCHAR(40) NOT NULL,
PRIMARY KEY (`IDCategory`, `FK_User`)
);
/*Product*/
CREATE TABLE `Product` (
`IDProduct` VARCHAR(18) NOT NULL,
`FK_User` INT NOT NULL,
`ProductName` VARCHAR(150) NOT NULL,
`FK_Category` CHAR(3) NOT NULL,
PRIMARY KEY (`IDProduct`, `FK_User`)
);
ALTER TABLE `Product` ADD FOREIGN KEY (`FK_User`) REFERENCES `User`(`IDUser`);
ALTER TABLE `Product` ADD FOREIGN KEY (`FK_Category`) REFERENCES `Category`(`IDCategory`);
ALTER TABLE `Category` ADD FOREIGN KEY (`FK_User`) REFERENCES `User`(`IDUser`);
insert into User(Name) values('User1');
insert into User(Name) values('User2');
insert into Category(IDCategory,FK_User,CategoryName) values('CT1',1,'Category1User1');
insert into Category(IDCategory,FK_User,CategoryName) values('CT1',2,'Category1User2');
If two different users insert both the same product with the same ID:
insert into Product values('001',1,'shoe','CT1');
insert into Product values('001',2,'shoe','CT1');
Why do I keep getting data from both users if I try a query like this one:
SELECT P.IDProduct,P.ProductName,P.FK_Category,C.CategoryName
FROM Product P inner join Category C on P.FK_Category=C.IDCategory
WHERE P.FK_User=1
this is the result I get:

You are getting two rows because both categories have the same IDCategory value which is the value you are JOINing on. You need to also JOIN on the FK_User values so that you don't also get User2's category values:
SELECT P.IDProduct,P.ProductName,P.FK_Category,C.CategoryName
FROM Product P
INNER JOIN Category C ON P.FK_Category=C.IDCategory AND P.FK_User = C.FK_User
WHERE P.FK_User=1

You need to add p.FK_User=C.Fk_User this condition in your join clause
SELECT P.IDProduct,P.ProductName,P.FK_Category,C.CategoryName
FROM Product P inner join Category C
on P.FK_Category=C.IDCategory and p.FK_User=C.Fk_User
WHERE P.FK_User=1

A PRIMARY KEY is a UNIQUE key. Shouldn't CategoryID be unique? That is, shouldn't Category have PRIMARY KEY(CategoryId)?
(Check other tables for a similar problem.)

Why this simple SELECT is taking 6 seconds when user has many comments?

I have this select to show users some notifications when someone comments in one post.
I noticed that users that has posts with many comments it can take 6 seconds +.
select 'comments' prefix, c.foto, c.data as data, c.user,
concat(k.user, ' comments your post') as logs
from comments c
inner join posts p on c.foto = p.id
inner join cadastro k on c.user = k.id
where p.user = 1 and c.user <> 1 and c.delete = 0
order by c.data desc
limit 5
I'd like to show users notifications, someone comments your post, to do so, I used inned join on posts (to know if the comment is from user '1') and inner join cadastro (to get user nick name - user who comments user 1 post).
checking on where if user is 1, c.user <> 1 (not show his own comments notifications) and c.delete (comment not deleted).
my tables:
`posts` (
`id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`user` int(11) UNSIGNED NOT NULL,
`foto` varchar(400),
`data` datetime NOT NULL,
`delete` tinyint(1) NOT NULL DEFAULT '0',
FOREIGN KEY (`user`) REFERENCES cadastro (`id`),
PRIMARY KEY (`id`)
)
`comments` (
`id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`foto` int(11) UNSIGNED NOT NULL,
`user` int(11) UNSIGNED NOT NULL,
`texto` varchar(3000) NOT NULL,
`data` datetime NOT NULL,
`delete` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `foto_delete` (foto, `delete`),
FOREIGN KEY (`foto`) REFERENCES posts (`id`) ON DELETE CASCADE
)
any ideas why it is taking so long when an user has about 200.000 comments? (if user has 1000, it is fast).

Without indexes, to run your Query the engine usually scans all rows looking for the required values in the ON, WHERE, as well the ORDER BY clause.
A simple thing you can do is to create the indexes:
CREATE INDEX cadastro_id ON cadastro(id);
CREATE INDEX posts_id ON posts(id);
CREATE INDEX posts_user ON posts(user);
CREATE INDEX comments_foto ON comments(foto);
CREATE INDEX comments_user ON comments(user);
CREATE INDEX comments_delete ON comments(delete);
CREATE INDEX comments_data ON comments(data);
Measure the current time it takes, then apply these Indexes and measure again, and tell here.
See also:
https://dev.mysql.com/doc/refman/5.7/en/create-index.html
https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

Display unmatched data along with aggregate functions and multiple joins

So, what i have is a system using MySQL for storage that should be storing donations made by people (donators). Donation is entered into system by authorized user.
Here are create tables for all 4 tables:
CREATE TABLE `donator` (
`DONATOR_ID` int(11) NOT NULL AUTO_INCREMENT,
`DONATOR_NAME` varchar(50) NOT NULL,
`STATUS` char(1) COLLATE NOT NULL DEFAULT 'A',
PRIMARY KEY (`DONATOR_ID`)
)
CREATE TABLE `user` (
`USER_ID` int(11) NOT NULL AUTO_INCREMENT,
`USERNAME` varchar(100) NOT NULL,
`PASSWORD` varchar(200) NOT NULL,
`TYPE` char(1) COLLATE NOT NULL,
PRIMARY KEY (`USER_ID`)
)
CREATE TABLE `sif_res` (
`RES_ID` int(11) NOT NULL AUTO_INCREMENT,
`RES_NAME` varchar(50) NOT NULL,
`MON_VAL` double NOT NULL,
PRIMARY KEY (`RES_ID`)
)
CREATE TABLE `donations` (
`DONATION_ID` int(11) NOT NULL AUTO_INCREMENT,
`RESOURCE` int(11) NOT NULL,
`AMOUNT` int(11) NOT NULL,
`DONATOR` int(11) NOT NULL,
`ENTRY_DATE` datetime NOT NULL,
`ENTERED_BY_USER` int(11) NOT NULL,
PRIMARY KEY (`DONATION_ID``),
KEY `fk_resurs` (`RESOURCE``),
KEY `fk_donator` (`DONATOR``),
KEY `fk_user` (`ENTERED_BY_USER``),
CONSTRAINT `fk_1` FOREIGN KEY (`DONATOR`) REFERENCES `donator` (`DONATOR_ID`) ON UPDATE CASCADE,
CONSTRAINT `fk_2` FOREIGN KEY (`RESOURCE`) REFERENCES `sif_res` (`RES_ID`) ON UPDATE CASCADE,
CONSTRAINT `fk_3` FOREIGN KEY (`ENTERED_BY_USER`) REFERENCES `user` (`USER_ID`) ON UPDATE CASCADE
)
As you can see, I have a list of donators, users and resources that can be donated.
Now, I want to display all donators' name and their id's, however in third column I would like to display their balance (sum of all of items they donated) - this is calculated with
donation.AMOUNT * sif_res.MON_VAL
for each donation
The SQL SELECT I have written works, however donators that haven't donated anything are left out (they are not matched by JOIN). I would need that it displays everyone (with STATUS!=D) even if they don't have any entries (in that case their balance may be 0 or NULL)
This is my SQL i have written:
SELECT DONATOR_ID
, DONATOR_NAME
, round(SUM(d.AMOUNT * sr.MON_VAL)) as BALANCE
from donator c
join donations d on c.DONATOR_ID=d.DONATOR
join sif_res sr on sr.RES_ID=d.RESOURCE
where c.STATUS!='D'
group by DONATOR_ID, DONATOR_NAME
So, if i execute next sentences:
INSERT INTO donator(DONATOR_NAME, STATUS) VALUES("John", 'A'); //asigns id=1
INSERT INTO donator(DONATOR_NAME, STATUS) VALUES("Willie", 'A'); //asigns id=2
INSERT INTO user (USERNAME, PASSWORD, TYPE) VALUES("user", "pass", 'A'); //asigns id=1
INSERT INTO sif_res(RES_NAME, MON_VAL) VALUES("Flour", "0.5"); //asigns id=1
INSERT INTO donations(RESOURCE, AMOUNT, DONATOR, ENTRY_DATE, ENTERED_BY_USER) VALUES(1, 100, 1, '2.2.2017', 1);
I will get output (with my SELECT sentence above):
DONATOR_ID | DONATOR_NAME | BALANCE
--------------------------------------------
1 | John | 50
What i want to get is:
DONATOR_ID | DONATOR_NAME | BALANCE
--------------------------------------------
1 | John | 50
2 | Willie | 0
I have tried all version of joins (left, right, outer, full,..) however none of them worked for me (probably because i was using them wrong)
If it was just the problem of unmatched data i would be able to solve it, however the aggregate function SUM and another JOIN make it all more complicated

Using a left outer join on the second two tables should do the trick:
SELECT c.DONATOR_ID
, c.DONATOR_NAME
, ifnull(round(SUM(d.AMOUNT * sr.MON_VAL)),0) as BALANCE
from donator c
left outer join donations d on c.DONATOR_ID=d.DONATOR
left outer join sif_res sr on sr.RES_ID=d.RESOURCE
where c.STATUS!='D'
group by DONATOR_ID, DONATOR_NAME
I also wrapped the BALANCE expression in ifnull to display 0 instead of null.

Possible multiple returned matches joined to a MySQL table

I am not even sure how to ask this question but here's my situation. I use Plex to stream movies at home. I've built a database which I translate to a webpage that I use as an index. With in this database I have a few tables. The main one is called movie_list. 1 of the fields is called Rating which has an association table called assc_movie_genre which simply stores the movie id generated from the main table and a genre id which is read from another association table. There can be multiple movie Id's that are the same which match a Genre, for instance let's say The Matrix falls under the category Action and Sci Fi there will be 2 entries for MovieId each on matching the corresponding genre code. Anyways, my question is I need a query (if possible) that can join all genres to the appropriate row. Right now I have the following query
SELECT a.`Title`,a.`Year`,b.`Rating` FROM movie_list a, assc_rating b WHERE b.`Id` = a.`Rating
But would need to expand it to I guess join the multiple genres that match. I hope that all makes sense.
Thanks in advance
Update
Thanks to your help I am also there. Here is my current query
SELECT a.Title, c.Rating,
GROUP_CONCAT(DISTINCT b.GenreId ORDER BY b.GenreId)
AS Genres FROM assc_movie_genre b, movie_list a, assc_rating c
WHERE a.Id = b.MovieId AND a.Rating = c.Id group by a.Title
ORDER BY a.Title;
But the issue remains where I am just getting the GenreId instead of the genre name. I would assume I need to put a select in there somewhere so that it is pulling the name from the assc_genres tables just not 100% sure where.
Here's what the current output looks like
Title Rating Genres
28 Days Later... R 11,16,17
The concat works great and I'm so close. Thanks again
Update
Here are the queries to create my tables, you can get the structure from here (obviously)
CREATE TABLE IF NOT EXISTS `assc_genres` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Genre` varchar(50) NOT NULL DEFAULT '0',
PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `assc_movie_genre` (
`MovieId` int(11) NOT NULL DEFAULT '0',
`GenreId` int(11) NOT NULL DEFAULT '0',
KEY `FK_assc_movie_genre_movie_list` (`MovieId`),
KEY `FK_assc_movie_genre_assc_genres` (`GenreId`),
CONSTRAINT `FK_assc_movie_genre_movie_list` FOREIGN KEY (`MovieId`) REFERENCES `movie_list` (`Id`),
CONSTRAINT `FK_assc_movie_genre_assc_genres` FOREIGN KEY (`GenreId`) REFERENCES `assc_genres` (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `assc_rating` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Rating` char(50) NOT NULL DEFAULT '0',
PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `assc_status` (
`Id` tinyint(4) NOT NULL,
`Status` char(50) NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `movie_list` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Title` varchar(100) NOT NULL DEFAULT '0',
`Year` year(4) NOT NULL DEFAULT '2000',
`Rating` int(11) NOT NULL DEFAULT '0',
`Folder` varchar(50) NOT NULL DEFAULT '0',
PRIMARY KEY (`Id`),
UNIQUE KEY `Title_Year` (`Title`,`Year`),
KEY `FK_movie_list_assc_rating` (`Rating`),
CONSTRAINT `FK_movie_list_assc_rating` FOREIGN KEY (`Rating`) REFERENCES `assc_rating` (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=614 DEFAULT CHARSET=latin1;

I am not sure as well if this is what you are asking, but you can join all 3 tables to get the data like
SELECT a.`Title`,
a.`Year`,
b.`Rating`
FROM movie_list a
JOIN assc_movie_genre c ON a.Id = c.movie_id
JOIN assc_rating b ON b.`Id` = c.genre_id;
Per your comment you can use GROUP_CONCAT() like
SELECT a.`Title`,
a.`Year`,
b.`Rating`,
xx.genre_list
FROM movie_list a
JOIN ( select movie_id, genre_id, group_concat(genre) as genre_list
from assc_movie_genre
group by movie_id) xx ON a.Id = xx.movie_id
JOIN assc_rating b ON b.`Id` = xx.genre_id;
You can modify your query like
SELECT a.Title, c.Rating,
GROUP_CONCAT(DISTINCT d.`Genre` ORDER BY d.`Genre`) AS Genres
FROM movie_list a
JOIN assc_movie_genre b ON a.Id = b.MovieId
JOIN assc_rating c ON a.Rating = c.Id
JOIN `assc_genres` d ON b.`GenreId` = d.Id
group by a.Title
ORDER BY a.Title;

SQL select entries in other table linked by foreign keys

I have redesigned my database structure to use PRIMARY and FOREIGN KEYs to link the entries in my 3 tables together, and I am having problems trying to write queries to select data in one table given data in a another table. Here is an example of my 3 CREATE TABLE statements:
CREATE TABLE IF NOT EXISTS players (
id INT(10) NOT NULL AUTO_INCREMENT,
username VARCHAR(16) NOT NULL,
uuid VARCHAR(200) NOT NULL DEFAULT 0,
joined TIMESTAMP DEFAULT 0,
last_seen TIMESTAMP DEFAULT 0,
PRIMARY KEY (id)
);
/* ^
One |
To
| One
v
*/
CREATE TABLE IF NOT EXISTS accounts (
id INT(10) NOT NULL AUTO_INCREMENT,
account_id INT(10) NOT NULL,
pass_hash VARCHAR(200) NOT NULL,
pass_salt VARCHAR(200) NOT NULL,
created BIGINT DEFAULT 0,
last_log_on BIGINT DEFAULT 0,
PRIMARY KEY (id),
FOREIGN KEY (account_id) REFERENCES players(id) ON DELETE CASCADE
) ENGINE=InnoDB;
/* ^
One |
To
| Many
v
*/
CREATE TABLE IF NOT EXISTS purchases (
id INT(10) NOT NULL AUTO_INCREMENT,
account_id INT(10) NOT NULL,
status VARCHAR(20) NOT NULL,
item INT NOT NULL,
price DOUBLE DEFAULT 0,
description VARCHAR(200) NOT NULL,
buyer_name VARCHAR(200) NOT NULL,
buyer_email VARCHAR(200) NOT NULL,
transaction_id VARCHAR(200) NOT NULL,
payment_type VARCHAR(20) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE
) ENGINE=InnoDB;
Say for example, I want to select all the usernames of users who purchased anything greater than $30. All the usernames are stored in the players table, which is linked to the accounts table and that is linked to the purchases table. Is this this the best way to design this relational database? If so, how would I run queries similar to the above example?
I was able to get get all of a users purchase history given their username, but I did it with 2 sub-queries... Getting that data should be easier than that!
Here is the SELECT query I ran to get all of a players purchase data:
SELECT *
FROM purchases
WHERE account_id = (SELECT id FROM accounts WHERE account_id = (SELECT id FROM players WHERE username = 'username'));
Also, when I try to make references to the other tables using something like 'players.username', I get an error saying that the column doesn't exist...
I appreciate any help! Thanks!

Your design is ok in my opinion. The relation between players and account is one-to-many and not one-to-one since this way, you can have two tuples referencing a single player.
I would write the query you need as:
SELECT DISTINCT p.id, p.username
FROM players p INNER JOIN accounts a ON (p.id = a.account_id)
INNER JOIN purchases pc ON (a.id = pc.account_id)
WHERE (pc.price > 30);
As Sam suggested, I added DISTINCT to avoid repeating id and username in case a user have multiple purchases.
Note the id is here to avoid confusion among repeated usernames.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Avoid duplicated rows in SQL query when JOINing many to many tables - mysql

Related

Query gets data from two different users when I try to get data just from just one user

Why this simple SELECT is taking 6 seconds when user has many comments?

Display unmatched data along with aggregate functions and multiple joins

Possible multiple returned matches joined to a MySQL table

SQL select entries in other table linked by foreign keys

Categories

Resources