Join operation duplication - mysql

Let's imagine we have two tables: Users (UserId, UserName, UserPhoto) and Articles (ArticleId, UserId, ArticleText). Now we execute inner join query to retrieve users with articles:
SELECT UserId, UserName, UserPhoto, ArticleId, ArticleText
FROM Users as u INNER JOIN Articles as a ON u.UserId = a.UserId
The structure of the query result will be the following:
UserId1 UserName1 UserPhoto1 ArticleId1 ArticleText1
UserId1 UserName1 UserPhoto1 ArticleId2 ArticleText2
So for the first user we have two articles and UserName1 and UserPhoto1 are duplicated. And what if UserPhoto stores several gigabytes blob?
I hope database protocols have some optimizations for such situations (may be some mapping telling that UserPhoto is equal for first and second lines) but I never met any notes about this. So I just want to be sure that such kind of optimization exists and I don't need to workaround it by myself

First, create a third table for Photos and associate UserId with Photo. Second, you'll need to run two separate queries in order to retrieve:
Each photo submitted by a user
Each article associated with a specific user/photo
You'll loop over all user/photo pairs, and query the articles inside your loop.

You could run two queries, one to get the User data (so each photo will travel once):
SELECT u.UserId
, u.UserName
, u.UserPhoto
FROM Users as u
and another to get the rest (Article) data:
SELECT a.UserId <--- only UserId this time
, a.ArticleId
, a.ArticleText
FROM Users as u
INNER JOIN Articles as a
ON u.UserId = a.UserId
Finally, combine the results in your application code, using the userids.

You can avoid fetching the photos multiple times like this:
SELECT * FROM (
SELECT UserId, UserName, UserPhoto, ArticleId, ArticleText
FROM Users as u INNER JOIN Articles as a ON u.UserId = a.UserId
WHERE ArticleId IN (SELECT MIN(ArticleId) FROM Articles GROUP BY UserId)
UNION ALL
SELECT UserId, UserName, NULL, ArticleId, ArticleText
FROM Users as u INNER JOIN Articles as a ON u.UserId = a.UserId
WHERE ArticleId NOT IN (SELECT MIN(ArticleId) FROM Articles GROUP BY UserId)
) base
ORDER BY ArticleId; // UserId,ArticleId will also work if you want it sorted by users.
This only fetches the photo with the first article fetched, and returns NULL for subsequent articles. Your application can cache the photo on first read.

1) No matter how many times the photoblob appears in your result set it will be read(from Disk to memory in the server) only once, There are optimizations built in to make sure this is happening.
2) However it can be transported(from server to client) multiple times, there are no optimization built in for that.
3) The best solution would be to wrap this as a stored procedure that returns 2 record sets, and you do the join in the clinet code, this approach is different from running 2 queries which needs two round trip.
4) if you dont want to do that you can get all the article ids of the user in a CSV format, and then you can easily split the csv into separate strings in the client code.
Here is the sample output
UserId UserName UserPhoto CSV_ArticleId CSV_ArticleText
------- --------- ---------- ------------------------ ----------------------------
UserId1 UserName1 UserPhoto1 ",ArticleId1,ArticleId2" ",ArticleText1,ArticleText2"
UserId2 UserName2 UserPhoto2 ",ArticleId3" ",ArticleText3"
here is how you can do it. Run the code verbatim on a test database and you can see the result
CREATE TABLE Users(UserId int , UserName nvarchar(256), UserPhoto nvarchar(256))
CREATE TABLE Articles (ArticleId int , UserId int , ArticleText nvarchar(256))
INSERT INTO Users(UserId,UserName,UserPhoto)
VALUES (2,'2a','2pa')
INSERT INTO Users(UserId,UserName,UserPhoto)
VALUES (1,'a','pa')
INSERt INTO Articles (ArticleId, UserId, ArticleText)
VALUES (2,2,'text2')
INSERt INTO Articles (ArticleId, UserId, ArticleText)
VALUES (1,2,'text1')
;WITH tArticles AS (SELECT ArticleId, UserId, ArticleText FROM Articles)
SELECT
UserId,
UserName,
UserPhoto,
(SELECT TOP 1 LTRIM(
(SELECT ',' + CONVERT(nvarchar(256),A.ArticleId) FROM Articles A WHERE U.UserId = A.UserId ORDER BY A.ArticleId FOR XML PATH(''))
)) as CSV_ArticleId,
(SELECT TOP 1 LTRIM(
(SELECT ',' + CONVERT(nvarchar(256),A.ArticleText) FROM Articles A WHERE U.UserId = A.UserId ORDER BY A.ArticleId FOR XML PATH(''))
)) as CSV_ArticleText
FROM Users U

Related

Finding Values of same ID from different table

I have 2 tables:
1.Users (This table contains all the information of users like name, Userid, mobileno)
2.Transaction (This table contains the information of all the transaction of a user)
But the UserID is same in both the tables
I have some filter conditions like:
[ TransactionType=1 AND status=1 and (RealCash>0 or Bonus>0 or Winning>0)] which i want to apply on Transaction table
once I applied the condition i will have some UserID
Now i want that the information of the users from the Users table that have the same UserID which i've obtained from above from the transaction table
How can i do that in MYSQL ?
use JOIN : https://www.mysqltutorial.org/mysql-join/
For example:
SELECT
u.name,
u.Userid,
u.mobileno,
t.TransactionType
FROM
Users u
INNER JOIN Transaction t ON t.Userid = c.Userid
WHERE t.TransactionType=1 AND t.status=1 and (t.RealCash>0 or t.Bonus>0 or t.Winning>0)
But read carefully about other join types (left, right, cross) as you may get different results.
SELECT
name, Userid, mobileno
FROM
Users
WHERE
UserID IN (SELECT
UserID
FROM
Transaction
WHERE
TransactionType=1 AND status=1 and (RealCash>0 or Bonus>0 or Winning>0);

Count concatenated values in MySQL

I'm generating a query where I'm getting list of userid's seprated by comma using GROUP_CONCAT. I want to count these IDs in the same query. Can I do so?
$query="SELECT id,
longitude,
latitude,
game_date,
min_player,
game_description,
is_public,
is_user_coming,
allow_player_invite,
location,
game_type,
game_status,
cdate,
ownerid,
COUNT(j.users) as joinees,
users.username
FROM games
left join
(SELECT gameid, GROUP_CONCAT(userid, ',') as users
from user_game_join where games.id=user_game_join.gameid) j on j.gameid=id
join (select id as uid,name as username from users) users on users.uid=ownerid
AND (`location` LIKE '$location%' or `location` LIKE '".ucfirst($location)."%')";
This is my query and I need to get the number of joineers. Attached herewith is the snapshot of my tables:
SELECT gameid,
GROUP_CONCAT(userid, ',') as users,
count(userid) as user_count
from user_game_join
where games.id = user_game_join.gameid
A friend of mine helped me with that. Surprisingly, I was using join in wrong place. Sharing the query just in case someone might find it helpful:
SELECT games.*,ugj.joinees,u.username FROM games JOIN
(select id as uid, name as username from users) users on users.uid = games.ownerid
AND games.id='$gid' left join
(select count(userid) as joinees,gameid as gid from user_game_join group by gameid ) ugj on games.id=ugj.gid LEFT JOIN
(select id,name as username from users) u on u.id=games.ownerid

How to query data without repeats and minimize the time?

There are 3 entities - articles, journals and subscribers. There are no restrictions on how to store data in database.
The same article can be simultaneously published in several journals.
How to select all published articles from subscribed journals sorted
by date of publication and without repeats?
The easiest way:
Create a table with articles:
posts
p_id, j1_id, j2_id, text, date
Create a table with subscribtions:
follows
f_id, u_id, j_id (u_id — is a user id from table users)
Execute:
example query
select posts.* from posts inner join follows on (j_id = j1_id or j_id
= j2_id) where u_id = 1 order by date desc
This query returns data with duplicates. You can use mechanisms DISTINCT or GROUP BY, but it creates an additional sorting operation to remove duplicates.
The other way it can be done using mechanism UNION, but it also uses a DISTINCT.
(select posts.* from posts inner join follows on j_id = j1_id where u_id = 1)
union
(select posts.* from posts inner join follows on j_id = j2_id where u_id = 1)
order by date desc
Perhaps I selected the incorrect storage structure in my way.
Actually the question, is it possible to do something about this problem, to minimize the time required for big data?
you can use the following table structure
posts : pid, text, date
journals : jid, jtext
journals_posts : jid, pid
follows : fid, uid, jid
select distinct posts.* from posts
inner join journals_posts on journals_posts.pid = posts.pid
inner join follows on follows.jid = journals_posts.jid
where follows.uid = <userid>
to take care of speed you can create index on
journals_posts(jid)
follows(uid)
you might required to create indexes on other fields check with "explain " which tables are scanned without using joins

Stacked SQL Query error and complications

I just learned you can stack SQL queries instead of running 4 different ones and combining the data. So I'm read tutorials and stuff but still can't figure this certain one out.
SELECT ID,
(SELECT firstname
FROM user
WHERE ID = fundraiser.user_ID) AS firstname,
(SELECT lastname
FROM user
WHERE ID = fundraiser.user_ID) AS lastname,
(SELECT org_fund_id
FROM fundraiser
WHERE ID = fundraiser.ID) AS org_fund_ID,
(SELECT ref_ID
FROM fundraiser
WHERE ID = fundraiser.ID) AS ref_ID
FROM fundraiser
WHERE 1
ORDER BY org_fund_ID ASC
Here's the basic setup for the database/tables being called:
[fundraiser] - (ID, ref_ID, user_ID, org_fund_ID) and
[user] - (firstname, lastname)
Basically, I want to pull all of the fields from "fundraiser" from the database but get the corresponding "user.firstname" and "user.lastname" where "fundraiser.user_ID" = "user.ID".
So it would come out something like this as a row:
fundraiser.ID, fundraiser.user_ID, fundraiser.ref_ID, user.firstname, user.lastname
I've tried like 30 different ways of writing this query and all have failed. The error I get is "#1242 - Subquery returns more than 1 row".
Not sure how I can give you more information so you can visualize what I'm talking about, but I will provide whatever data I can.
Thanks in advance.
to select ALL columns:
SELECT *
FROM fundraiser f
INNER JOIN user u
ON u.ID = f.user_ID
ORDER BY f.ord_fund_id ASC;
to select needed columns:
SELECT
u.firstname,
u.lastname,
f.org_fund_id,
f.ref_ID
FROM fundraiser f
INNER JOIN user u ON u.ID = f.user_ID
ORDER BY f.ord_fund_id ASC;
this should be, what you need. See this Wikipedia page.

MYSQL join subquery using count(*) to find the number of relationships a user has

I have a table named users which has fields id, email, username, firstname, and lastname.
I have another table named friends which has fields id, user1, user2, and relationship.
I am having a really hard time with this join query that shouldn't be so hard :(.
I want to find the most popular users that are not already related to you. For example, I have a relationship array already generated and I want to find the user info and the amount of relationships they have that are users not already related to you.
Here is my query so far, but I can't get it to work for some reason.
select id, email,username,firstname,lastname
from users as userInformation
left join (select count(*)
from friends
where friends.user1 = userInformation.id or friends.user2 = userInformation.id
) as x
where users.id NOT IN (2,44,26,33,1)
the "2,44,26,33,1" in the not in part is arbitrary depending on the logged in user.
the part that I can't get working properly is the left join which adds the relationship count.
Just to help out, here are the two queries that work. I just need to join the second one to be a column on the first query for each user
select id, email,username,firstname,lastname from users where id NOT IN (2,44,26,33,1)
select count(*) from friends where user1 =2 or user2 = 2
But the second query should be for each id in the first query. hope that clears it up.
This is getting closer
select id, email,username,firstname,lastname
from users as help
left join (
select count(*)
from friends
where user1 = help.id or user2 = help.id) as friendCounter
where help.id NOT IN (2,44,26,33,1)
For some reason it wont recognize help.id in the where clause in the end.
How 'bout this?
select userinformation.id, email,username,firstname,lastname,count(*)
from users as userInformation
left join friends on friends.user1 = userInformation.id or friends.user2 = userInformation.id
where userInformation.id NOT IN (2,44,26,33,1)
group by email,username,firstname,lastname
I'm going to re-phrase your problem statement as I understand it. Let me know if it's wrong.
For a given user, find most popular unrelated users:
declare #GivenUser as varchar(20)
set #GivenUser = '1' --replace '1' here with the user id you want
select id, email, username, firstname, lastname, Connections
from userInformation u1
inner join (
select TheUser, count(*) as Connections
from (
select user1 as TheUser
from friends
where user1 <> #GivenUser
and user2 <> #GivenUser
union all
select user2 as TheUser
from friends
where user1 <> #GivenUser
and user2 <> #GivenUser
) u
group by User
order by sum(Connections) desc
) u2
on u1.id = u2.TheUser
select * from (
select id, email,username,firstname,lastname,count(*) N
from users as userInformation
left join friends on friends.user1 = userInformation.id
or friends.user2 = userInformation.id
where userInformation.id NOT IN (2,44,26,33,1)
group by id,email,username,firstname,lastname) Aliased
order by N desc