I am designing a simple architecture where i have a table which stores users and some elements that they like so my table structure is something like this:
+---------+---------+
| user_id | like_id |
+---------+---------+
| 1 | 4 |
| 2 | 2 |
| 4 | 4 |
| 4 | 3 |
| 5 | 4 |
| 6 | 7 |
| 7 | 5 |
| 34 | 6 |
| 3 | 8 |
| 2 | 3 |
| 2 | 5 |
| 1 | 3 |
| 1 | 10 |
| 1 | 12 |
| 2 | 10 |
+---------+---------+
Now what i will have is id of any user (lets say user_id = 1 ) and i want a query to get all the other users who have similar Likes as that of 1.
So in the Output for user_id = 1 will be :
+---------------------------+------------------------+----------------+
| users_with_common_likes | no_of_common_likes | common_likes |
+---------------------------+------------------------+----------------+
| 4 | 2 | 3,4 |
| 2 | 2 | 3,10 |
| 5 | 1 | 4 |
+---------------------------+------------------------+----------------+
What I have achieved :
I can do this using a sub-query as below :
SELECT user_id
FROM `user_likes`
WHERE `like_id`
IN (
SELECT GROUP_CONCAT( `like_id` )
FROM user_likes
WHERE user_id =1
)
AND user_id !=1
LIMIT 0 , 30
However this query is not giving all the users,it misses the user_id = 2 which has like id 3 in common with user_id=1.
and i cant figure out how to find the remaining 2 columns.
Also I feel that this is not the best way to to this as this table will contain thousands of data and it may effect system performance.
I would like to do this with a single Mysql Query.
This assumes a PK formed on user_id,like_id...
SELECT y.user_id
, GROUP_CONCAT(y.like_id) likes
, COUNT(*) total
FROM my_table x
JOIN my_table y
ON y.like_id = x.like_id
AND y.user_id <> x.user_id
WHERE x.user_id = 1
GROUP
BY y.user_id;
Related
I am having an issues with fetching a particular kind of record from the database.
I have three different tables
Friends
Followers
PictureGalleries
Here is a sample of what the table looks like
Friends:
|id | senderId | receiverId | accepted |
|---|----------| -----------| ---------|
| 1 | 1 | 12 | 1 |
| 2 | 12 | 2 | 1 |
| 2 | 12 | 2 | 1 |
Followers:
| id | userId | UserIsFollowing |
| -- | ------ | --------------- |
| 1 | 12 | 63 |
| 2 | 22 | 12 |
PictureGalleries:
| id | UserId |
| -- | ------ |
| 1 | 13 |
| 2 | 12 |
| 3 | 1 |
| 4 | 10 |
| 5 | 2 |
| 6 | 63 |
So now here is the Issue!
I want to select all from the Picture Galleries
Where the userid has a friendship relationship with userId 12 where accepted is 1
And
Where the userID 12 is following a particular user
So Basically the result I want to see is the picture gallery of the following users ID: 1,2, and 63 which will look like this:
| id | UserID |
| -- | ------ |
| 3 | 1 |
| 5 | 2 |
| 6 | 6 |
use union and sub-query to get desired result
select p.id,p.UserId from
( select UserIsFollowing as id from
Followers fl where userId =12
union select senderId from friends f
where f.receiverId =12 AND accepted=1
union select receiverId from friends f
where f.senderId =12 AND accepted=1
) as t join PictureGalleries p on t.id=p.UserId
I think this query shows what you want:
select * from PictureGalleries
join Followers on Followers.userId = PictureGalleries.UserId
where exists (select 1 from Friends where (Friends.senderId = PictureGalleries.UserId or Friends.receiverId = PictureGalleries.UserId) and accepted = 1)
and Followers.UserIsFollowing = :user_id
But I think your model can be improved
EDIT:
Maybe you said it wrong first when you've said:
"Where the userid has a friendship relationship with userId 12 where accepted is 1
AND Where the userID 12 is following a particular user"
I think you mean OR, so the SQL should be something like:
select * from PictureGalleries
where exists (select 1 from Friends where (Friends.senderId = PictureGalleries.UserId or Friends.receiverId = PictureGalleries.UserId) and accepted = 1)
OR exists (select 1 from Followers where Followers.userId = PictureGalleries.UserId and Followers.UserIsFollowing = :user_id
I want a query that selects all rows that have the UploadedbyUserID = Rand() (selects random id from possible UploadbyUserID in this case 4, 3 and 22 and only those 3 not 2 nor 5)
And if the rand gives 4 it outputs this:
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 1 | 2222 | Testing | 4 |
| 2 | Jack | description| 4 |
| 6 | Zara | 2007-02-06 | 4 |
+------+------+------------+--------------------+
This is the whole table
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 1 | 2222 | Testing | 4 |
| 2 | Jack | description| 4 |
| 3 | ffdsd| 2007-05-06 | 4 |
| 4 | dsm | 2007-05-27 | 3 |
| 5 | dddd | 2007-04-06 | 3 |
| 6 | Zara | 2007-02-06 | 4 |
| 7 | John | 2007-01-24 | 22 |
+------+------+------------+--------------------+
and if it randomizes 3 it outputs this
+------+------+------------+--------------------+
| id | name | date | UploadedbyUserID |
+------+------+------------+--------------------+
| 4 | dsm | 2007-05-27 | 3 |
| 5 | dddd | 2007-04-06 | 3 |
+------+------+------------+--------------------+
Ask if you need more information
Hmmm. This is one way:
select t.*
from (select uploadedbyuserid
from t
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
First, let me say that this is weighted by the number of times that a user has uploaded something. So, user "4" would appear a bit more often than "3", in your example. If this is an issue:
select t.*
from (select uploadedbyuserid
from (select distinct uploadedbyuserid from t) t
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
The next observation is that this can be compute intensive. If you have lots of rows, there are various ways to speed these up. For instance, one simple method would be to get about 1 out of 10000 rows:
select t.*
from (select uploadedbyuserid
from (select distinct uploadedbyuserid
from t
) t
where rand() < 0.001
order by rand()
limit 1
) u join
t
using (uploadedbyuserid);
I'm looking to allow for a custom ordering logic through mySQL that allows the following data set:
+----+-----------------+------------+-------+--+
| ID | item | Popularity | Views | |
+----+-----------------+------------+-------+--+
| 1 | A special place | 3 | 10 | |
| 2 | Another title | 5 | 12 | |
| 3 | Words go here | 1 | 15 | |
| 4 | A wonder | 2 | 8 | |
+----+-----------------+------------+-------+--+
To return an order that alternates, row by row, by popularity and then by views, so the return results look like:
+----+-----------------+------------+-------+--+
| ID | item | Popularity | Views | |
+----+-----------------+------------+-------+--+
| 3 | Words go here | 1 | 15 | |
| 2 | Another title | 5 | 12 | |
| 4 | A wonder | 2 | 8 | |
| 1 | A special place | 3 | 10 | |
+----+-----------------+------------+-------+--+
Where you will see the first row returns the 'most popular', the second row returns the most views, the third row returns the second most popular, and the 4th row returns the 2nd most views.
Currently I'm gathering an entire table through mySQL twice, and then merging these results in PHP. This isn't going to cut it when the database is large. Is this possible in mysql at all?
I guess something along these lines could work. Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,x INT NOT NULL
,y INT NOT NULL
);
INSERT INTO my_table VALUES
(1,3,10),
(2,5,12),
(3,1,15),
(4,2, 8)
(5,4, 1);
We can rank x and y in turn, and then arrange those ranks in a single list - so will have x1,y1,x2,y2,etc - but all rows will appear twice; once for the x rank and once for the y rank...
SELECT * FROM
(
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.x <= a.x GROUP BY a.id )
UNION ALL
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.y <= a.y GROUP BY a.id )
) n
ORDER BY rank
+----+---+----+------+
| id | x | y | rank |
+----+---+----+------+
| 5 | 4 | 1 | 1 |
| 3 | 1 | 15 | 1 |
| 4 | 2 | 8 | 2 |
| 4 | 2 | 8 | 2 |
| 1 | 3 | 10 | 3 |
| 1 | 3 | 10 | 3 |
| 5 | 4 | 1 | 4 |
| 2 | 5 | 12 | 4 |
| 2 | 5 | 12 | 5 |
| 3 | 1 | 15 | 5 |
+----+---+----+------+
Now we can just grab the lowest rank for each id...
SELECT id
, x
, y
FROM
(
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.x <= a.x GROUP BY a.id )
UNION ALL
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.y <= a.y GROUP BY a.id )
) m
GROUP
BY id,x,y
ORDER
BY MIN(rank);
+----+---+----+
| id | x | y |
+----+---+----+
| 3 | 1 | 15 |
| 5 | 4 | 1 |
| 4 | 2 | 8 |
| 1 | 3 | 10 |
| 2 | 5 | 12 |
+----+---+----+
Incidentally, this should be faster with variables - but I cannot make that solution work at present - senior moment, perhaps.
i am loosing it over the following problem:
i have a table with participants and points. each participant can have up to 11 point entries of which i only want the sum of the top 6.
in this example lets say we want the top 2 of 3
+----+---------------+--------+
| id | participantid | points |
+----+---------------+--------+
| 1 | 1 | 11 |
+----+---------------+--------+
| 2 | 3 | 1 |
+----+---------------+--------+
| 3 | 3 | 4 |
+----+---------------+--------+
| 4 | 2 | 3 |
+----+---------------+--------+
| 5 | 1 | 5 |
+----+---------------+--------+
| 6 | 2 | 10 |
+----+---------------+--------+
| 7 | 2 | 9 |
+----+---------------+--------+
| 8 | 1 | 3 |
+----+---------------+--------+
| 9 | 3 | 4 |
+----+---------------+--------+
as a result i want something like
+---------------+--------+
| participantid | points |
+---------------+--------+
| 2 | 19 |
+---------------+--------+
| 1 | 16 |
+---------------+--------+
| 3 | 8 |
+---------------+--------+
(it should be ordered DESC by the resulting points)
is this at all possible with mysql? in one query?
oh and the resulting participant ids should be resolved into the real names from another 'partcipant' table where
+----+------+
| id | name |
+----+------+
| 1 | what |
+----+------+
| 2 | ev |
+----+------+
| 3 | er |
+----+------+
but that should be doable with a join at some point... i know...
Using one of the answers from ROW_NUMBER() in MySQL for row counts, and then modifying to get the top.
SELECT ParticipantId, SUM(Points)
FROM
(
SELECT a.participantid, a.points, a.id, count(*) as row_number
FROM scores a
JOIN scores b ON a.participantid = b.participantid AND cast(concat(a.points,'.', a.id) as decimal) <= cast(concat(b.points,'.', b.id) as decimal)
GROUP BY a.participantid, a.points, a.id
) C
WHERE row_number IN (1,2)
GROUP BY ParticipantId
Had an issue with ties until I arbitrarily broke them with the id
I've got four MySQL tables:
users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)
Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.
Suppose this is our data set:
TABLE users:
+------+-------+
| id | name |
+------+-------+
| 1 | Abe |
| 2 | Bob |
| 3 | Che |
| 4 | Den |
+------+-------+
TABLE polls:
+------+-----------------------+
| id | text |
+------+-----------------------+
| 1 | Do you like apples? |
| 2 | What is your gender? |
| 3 | What is your height? |
| 4 | Do you like polls? |
+------+-----------------------+
TABLE options:
+------+----------+---------+
| id | poll_id | text |
+------+----------+---------+
| 1 | 1 | Yes |
| 2 | 1 | No |
| 3 | 2 | Male |
| 4 | 2 | Female |
| 5 | 3 | Short |
| 6 | 3 | Tall |
| 7 | 4 | Yes |
| 8 | 4 | No |
+------+----------+---------+
TABLE responses:
+------+----------+------------+----------+
| id | poll_id | option_id | user_id |
+------+----------+------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
| 10 | 3 | 6 | 2 |
| 10 | 3 | 5 | 3 |
| 10 | 3 | 6 | 4 |
| 10 | 4 | 7 | 1 |
| 10 | 4 | 7 | 2 |
| 10 | 4 | 7 | 3 |
| 10 | 4 | 7 | 4 |
+------+----------+------------+----------+
Given the poll ID 1 and the option ID 2, the generated table should be something like this:
+----------+------------+-----------------------+
| poll_id | option_id | percent_correlated |
+----------+------------+-----------------------+
| 4 | 7 | 100 |
| 2 | 3 | 66.66 |
| 3 | 6 | 66.66 |
| 2 | 4 | 33.33 |
| 3 | 5 | 33.33 |
| 4 | 8 | 0 |
+----------+------------+-----------------------+
So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.
Don't have an instance handy to test, can you see if this gets proper results:
select
poll_id,
option_id,
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
n
from
(
select
poll_id,
option_id,
SUM(score) AS sum1,
SUM(score_rev) AS sum2,
SUM(score * score) AS sum1sq,
SUM(score_rev * score_rev) AS sum2sq,
SUM(score * score_rev) AS psum,
COUNT(*) AS n
from
(
select
responses.poll_id,
responses.option_id,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 0
ELSE SELECT 1
END CASE as score,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 1
ELSE SELECT 0
END CASE as score_rev,
from responses left outer join
(
select
user_id
from
responses
where
poll_id = 1 and
option_id = 2
)user_resp
ON (user_resp.user_id = responses.user_id)
) temp1
group by
poll_id,
option_id
)components
After a few hours of trial and error, I managed to put together a query that works correctly:
SELECT poll_id AS p_id,
option_id AS o_id,
COUNT(*) AS optCount,
(SELECT COUNT(*) FROM response WHERE option_id = o_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2')) /
(SELECT COUNT(*) FROM response WHERE poll_id = p_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2'))
AS percentage
FROM response
INNER JOIN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2') AS user_ids
ON response.user_id = user_ids.user_id
WHERE poll_id != '1'
GROUP BY option_id DESC
ORDER BY percentage DESC, optCount DESC
Based on a tests with a small data set, this query looks to be reasonably fast, but I'd like to modify it so the "IN" subquery is not repeated three times. Any suggestions?
This seems to give the right results for me:
select poll_stats.poll_id,
option_stats.option_id,
(100 * option_responses / poll_responses) as percent_correlated
from (select response.poll_id,
count(*) as poll_responses
from response selecting_response
join response on response.user_id = selecting_response.user_id
where selecting_response.poll_id = 1 and selecting_response.option_id = 2
group by response.poll_id) poll_stats
join (select options.poll_id,
options.id as option_id,
count(response.id) as option_responses
from options
left join response on response.poll_id = options.poll_id
and response.option_id = options.id
and exists (
select 1 from response selecting_response
where selecting_response.user_id = response.user_id
and selecting_response.poll_id = 1
and selecting_response.option_id = 2)
group by options.poll_id, options.id
) as option_stats
on option_stats.poll_id = poll_stats.poll_id
where poll_stats.poll_id <> 1
order by 3 desc, option_responses desc