sql matching users that have activities in common

sql matching users that have activities in common - mysql

so, i have a table with users and activities
users
id | activities
1 | "-2-3-4-"
2 | "-3-4-"
3 | "-1-2-3-4-"
activities
id | title
1 | running
2 | walking
3 | climbing
4 | singing
and I am trying for a user with id 3 to find users that have at least two same activities
what I tried to do is this
SELECT u.id FROM users u
WHERE ( SELECT COUNT(a.id) FROM activities a
WHERE a.id IN(TRIM( ',' FROM REPLACE( u.activities, '-', ',' ) ))
AND a.id IN(1,2,3) ) >= 2
any ideas?

For the love of god, make a 3rd table that contains user_id and activity_id.
This isn't a suitable solution in any way.
You should have a table which makes the connection between users and activities, not store all activities in a row in your users table.

You can first create a function that takes the user.activities string and splits the string in activities int ids like this:
create FUNCTION dbo.SplitStringToIds (#acts nvarchar(MAX))
RETURNS #acivityids TABLE (Id int)
AS
BEGIN
DECLARE #stringToInsert nvarchar (max)
set #stringToInsert=''
DECLARE #intToInsert int
set #intToInsert=0
DECLARE #stidx int
set #stidx=0
DECLARE #endidx int
set #endidx=0
WHILE LEN(#acts) > 3
BEGIN
set #stidx=CHARINDEX('-', #acts, 1)
set #acts=substring(#acts,#stidx+1,len(#acts))
set #endidx=CHARINDEX('-', #acts, 1)-1
set #stringToInsert=substring(#acts,1,#endidx)
set #intToInsert=cast(#stringToInsert as int)
INSERT INTO #acivityids
VALUES
(
#intToInsert
)
END
-- Return the result of the function
RETURN
END
GO
and then you can try something like this to get the users that have 2 and more same activities with user with id=3
select u.id,count(u.id) as ActivitiesCounter from users as u
cross apply SplitStringToIds(u.activities) as v
where v.id in (select v.id from users as u
cross apply SplitStringToIds(u.activities) as v
where u.id=3)
group by u.id having count(u.id)>=2
But i think that this way of storing relationships between tables is going to give you only troubles and its better to add a relationship table if you can.

Related

sql query to find potential duplicate records

I am working on employer's data to find out duplicate employers based on their names.
Data is Like this:
Employer ID | Legal Name | Operating Name
------------- | ---------------| --------------------
1 | AA | AA
2 | BB | AA
3 | CC | BB
4 | DD | DD
5 | ZZ | ZZ
Now if I try to find all duplicates of employer AA the query should return the following result:
Employer ID | Legal Name | Operating Name
------------- | ---------------| --------------------
1 | AA | AA
2 | BB | AA
3 | CC | BB
Employer 1's legal name and Employer 2's Operating Name are the direct match with the search.
But the catch is employer 3 which is not directly related with the search string but employer 2's legal name matches with employer 3's operating name.
And I need the search results up to nth level. I am not sure if that can be achieved by recursive query of something like that.
Please help
I was trying to achieve this by Recursive CTE but then I realized that it is going into infinite recursion. Here is the code:
DECLARE #SearchName VARCHAR(50)
SET #SearchName = 'AA'
;With CTE_EmployerNames
AS
(
-- Anchor Member definition
select *
from [dbo].[Name_Table]
where Leg_Name = #SearchName
OR Op_Name = #SearchName
UNION ALL
-- Recursive Member definition
select N.*
from [dbo].[Name_Table] N
JOIN CTE_EmployerNames C
ON N.ID <> C.ID
AND (N.Leg_Name = C.Leg_Name
OR N.Leg_Name = C.Op_Name
OR N.Op_Name = C.Leg_Name
OR N.Op_Name = C.Op_Name)
)
select *
from CTE_EmployerNames
Update:
I created a stored procedure to achieve what I want. But this procedure is a bit slow because of looping and cursor. As of now this is solving my problem by little compromising with execution time. Any suggestion to optimize it or another way to do this will be highly appreciated. thanks guys. Here is the code:
CREATE PROCEDURE [dbo].[Get_Similar_Name_Employers]
#P_BaseName VARCHAR(100)
AS
BEGIN
DECLARE #ID INT
DECLARE #Leg_Name VARCHAR(50)
DECLARE #Op_Name VARCHAR(50)
-- Create temp table to hold data temporarily
CREATE TABLE #Temp_Employers
(
[ID] [int] NULL,
[Leg_Name] [varchar](50) NULL,
[Op_Name] [varchar](50) NULL,
[Status] [bit] null -- To keep track if that record is processed or not
)
-- Insert all records which are directly matching with search criteria
INSERT INTO #Temp_Employers
SELECT NT.ID, NT.Leg_Name, NT.Op_Name, 0
FROM dbo.Name_Table NT
WHERE NT.Leg_Name = #P_BaseName
OR NT.Op_Name = #P_BaseName
while EXISTS (SELECT 1 from #Temp_Employers where Status = 0) -- until all rows are processed
BEGIN
DECLARE #EmployerCursor CURSOR
SET #EmployerCursor = CURSOR FAST_FORWARD
FOR
SELECT ID, Leg_Name, Op_Name
from #Temp_Employers
where Status = 0
OPEN #EmployerCursor
FETCH NEXT
FROM #EmployerCursor
INTO #ID, #Leg_Name, #Op_Name
WHILE ##FETCH_STATUS = 0
BEGIN
-- For every unprocessed record in temp table check if there is any possible duplicate.
-- and insert all possible duplicate records in same table for further processing to find their possible duplicates
INSERT INTO #Temp_Employers
select ID, Leg_Name, Op_Name, 0
from dbo.Name_Table
WHERE (Leg_Name = #Leg_Name
OR Op_Name = #Op_Name
OR Leg_Name = #Op_Name
OR Op_Name = #Leg_Name)
AND ID NOT IN ( select ID
FROM #Temp_Employers)
-- Update status of recently processed record to avoid processing again
UPDATE #Temp_Employers
SET Status = 1
WHERE ID = #ID
FETCH NEXT
FROM #EmployerCursor
INTO #ID, #Leg_Name, #Op_Name
END
-- close cursor and deallocate memory
CLOSE #EmployerCursor
DEALLOCATE #EmployerCursor
END
select ID,
Leg_Name,
Op_Name
from #Temp_Employers
Order By ID
DROP TABLE #Temp_Employers
END

You are basically trying to build a directed acyclic graph in which the nodes are names and you want to find all the names that lead to your employee.
There is a beginning tutorial at Oracle Tip: Solving directed graph problems with SQL, part 1, and a related StackOverflow question at Directed graph SQL.

You can do this with two self joins. I used DISTINCT to be safe - you don't need it for your example, but probably will for your actual data:
SELECT DISTINCT T2.EMPID, T2.LEGAL_NAME, T.LEGAL_NAME
FROM TABLE T
INNER JOIN TABLE T2 ON T.LEGAL_NAME = T2.OPERATING_NAME
INNER JOIN TABLE T3 ON T2.OPERATING_NAME = T3.OPERATING_NAME
WHERE T.LEGAL_NAME <> T3.LEGAL_NAME
Rename and alias tables and columns as you like.
SQL Fiddle Example
Edit - If you also want records where the op name is simply different from the legal name, UNION those in:
SELECT DISTINCT T2.EMPID, T2.LEGAL_NAME, T.LEGAL_NAME
FROM TABLE T
INNER JOIN TABLE T2 ON T.LEGAL_NAME = T2.OPERATING_NAME
INNER JOIN TABLE T3 ON T2.OPERATING_NAME = T3.OPERATING_NAME
WHERE T.LEGAL_NAME <> T3.LEGAL_NAME
UNION
SELECT EMPID, LEGAL_NAME, OP_NAME
FROM TABLE
WHERE LEGAL_NAME <> OP_NAME
SQL Fiddle Example 2

Stored Procedure combining resultsets

When I run below mysql stored procedure, I get three different tables as output. Is there any way, I can combine these three tables and display output as in a single table with different columns and single row ?
CREATE DEFINER=`root`#`localhost` PROCEDURE `retrieveApplicantStatus`(IN in_userId INT)
BEGIN
SELECT applicants_id, approval FROM applicants WHERE users_id = in_userId;
SELECT pass_fail.result AS test_result, pass_fail.license_approval FROM pass_fail WHERE user_id = in_userId;
SELECT result AS trial_result FROM trial_result WHERE user_id = in_userId;
END
Required Output :
--------------------------------------------------------------------------
applicants_id | approval | test_result | trial_result | license_approval |
--------------------------------------------------------------------------
| | | | |
--------------------------------------------------------------------------

If user_id is constrainted to be unique in each table then you can use:
CREATE DEFINER=`root`#`localhost` PROCEDURE `retrieveApplicantStatus`(IN in_userId INT)
BEGIN
SELECT u.in_userId,
a.applicants_id,
a.approval,
pf.result AS test_result,
pf.license_approval,
tr.result AS trial_result
FROM (SELECT in_userId) AS u
LEFT JOIN applicants AS a
ON a.users_id = u.in_userId
LEFT JOIN pass_fail AS pf
ON pf.user_id = u.in_userId
LEFT JOIN trial_result AS tr
ON tr.user_id = u.in_userId;
END
HOWEVER If it is not constrained to be unique this will give you a cartesian product, i.e. 2 rows in each table will give you 8 rows in total, 3 rows in each table will give you 27 results.

Try to use following code:-
BEGIN
SELECT applicants_id, approval
FROM applicants WHERE users_id = in_userId;
UNIION ALL
SELECT pass_fail.result AS test_result, pass_fail.license_approval
FROM pass_fail
WHERE user_id = in_userId;
UNION ALL
SELECT result AS trial_result
FROM trial_result
WHERE user_id = in_userId;
END

Should I redesign my tables or can I make this work?

Right now I'm working on expanding my website to new functionality. I want to enable notifications from different sources. Similar to groups and people on facebook. Here is my table layout right now.
course_updates
id | CRN (id of course) | update_id
------------------------------------
courses
id | course_name | course_subject | course_number
-------------------------------------------------
users
id | name | facebook_name
---------------------------------------------------
user_updates
id | user_id | update_id
------------------------
updates
id | timestamp | updateObj
---------------------------
What I would like to be able to do is take course_update and user_updates in one query and join them with updates along with the correct information for the tables. So for course_updates i would want course_name, course_subject, etc. and for user_updates i would want the username and facebook name. This honestly probably belongs in two separate queries, but I would like to arrange everything by the timestamp of the updates table, and I feel like sorting everything in php would be inefficient. What is the best way to do this? I would need a way to distinguish between notification types if i were to use something like a union because user_updates and course_updates can store a reference to the same column in updates. Any ideas?

You might not need updates table at all. You can include timestamp columns to course_updates and user_updates tables
CREATE TABLE course_updates
(
`id` int,
`CRN` int,
`timestamp` datetime -- or timestamp type
);
CREATE TABLE user_updates
(
`id` int,
`user_id` int,
`timestamp` datetime -- or timestamp type
);
Now to get an ordered and column-wise unified resultset of all updates you might find it convenient to pack update details for each update type in a delimited string (using CONCAT_WS()) in one column (let's call it details), inject a column to distinguish an update type (lets call it obj_type) and use UNION ALL
SELECT 'C' obj_type, u.id, u.timestamp,
CONCAT_WS('|',
c.id,
c.course_name,
c.course_subject,
c.course_number) details
FROM course_updates u JOIN courses c
ON u.CRN = c.id
UNION ALL
SELECT 'U' obj_type, u.id, u.timestamp,
CONCAT_WS('|',
s.id,
s.name,
s.facebook_name) details
FROM user_updates u JOIN users s
ON u.user_id = u.id
ORDER BY timestamp DESC
Sample output:
| OBJ_TYPE | ID | TIMESTAMP | DETAILS |
-------------------------------------------------------------------------
| C | 3 | July, 30 2013 22:00:00+0000 | 3|Course3|Subject3|1414 |
| U | 2 | July, 11 2013 14:00:00+0000 | 1|Name1|FB Name1 |
| U | 2 | July, 11 2013 14:00:00+0000 | 3|Name3|FB Name3 |
...
Here is SQLFiddle demo
You can then easily explode details values while you iterate over the resultset in php.

I don't think you should mix both of those concepts (user and course) together in a query. They have different number of columns and relate to different concepts.
I think you really should use two queries. One for users and one for courses.
SELECT courses.course_name, courses.course_subject, courses.course_number,
updates.updateObj,updates.timestamp
FROM courses, updates, course_updates
WHERE courses.id = course_updates.course_id
AND course_updates.udpate_id = updates.id
ORDER BY updates.timestamp;
SELECT users.name,users.facebook_name,updates.updateObj,updates.timestamp
FROM users ,updates, user_updates
WHERE users.id = user_updates.user_id
AND user_updates.update_id = updates.id
ORDER BY updates.timestamp;

If you are going to merge the two table you need to keep in mind 2 things:
Number of columns should ideally be the same
There should be a way to distinguish the source of the data.
Here is one way you could do this:
SELECT * FROM
(SELECT courses.course_name as name, courses.course_subject as details,
updates.updateObj as updateObj, updates.timestamp as timestamp,
"course" as type
FROM courses, updates, course_updates
WHERE courses.id = course_updates.course_id
AND course_updates.udpate_id = updates.id)
UNION ALL
SELECT users.name as name,users.facebook_name as details,
updates.updateObj as updateObj,updates.timestamp as timestamp,
"user" as type
FROM users ,updates, user_updates
WHERE users.id = user_updates.user_id
AND user_updates.update_id = updates.id) as out_table
ORDER BY out_table.timestamp DESC
The type will let you distinguish between user and course updates and could be used by your front end to differently colour the rows. The course_id does not appear in this but you can add it, just keep in mind that you will have to add some dummy text to the user select statement to ensure both queries return the same number of rows. Note that in case there is an update referring to both user and course, it will appear twice.
You could also order by type to differentiate user and course data.

Stored procedure counting trouble

I have a table [users] that I wish to count the number of each occurrence of Movie_ID and update the record in a different table called [total]. So for Movie_ID=81212 it would send the value 2 to my [total] table.
like below:
------------------------------------
| [users] | [total]
+---------+---------+ +---------+-------------+
|Movie_ID |Player_ID| |Movie_ID | Player_Count|
+---------+---------+ +---------+-------------+
|81212 |P3912 | | 81212 | 2 |
+---------+---------+ +---------+-------------+
|12821 |P4851 | | 12821 | 1 |
+---------+---------+ +---------+-------------+
|81212 |P5121 |
+---------+---------+
(movie_ID + player_ID form composite key
so Movie_ID does not need to be unique)
So i'm trying to accomplish this with a stored procedure, this is what I have so far: I'm not sure how to code the part where it loops through every entry in the [users] table in order to find each occurrence of movie_id and sums it up.
DELIMITER //
CREATE PROCEDURE `movie_total` (OUT movie_count int(5))
LANGUAGE SQL
MODIFIES SQL DATA
BEGIN
DECLARE movie_count int(5);
SELECT count(movie_id) AS movie_count FROM users
foreach unique row in Users ;
IF (SELECT COUNT(*) FROM users WHERE movie_id) > 0
THEN
INSERT INTO total (:movie_id, :Player_Count) VALUES (movie_id, movie_count);
END //

To update this field you can use a query like this -
UPDATE
total t
JOIN (SELECT Movie_ID, COUNT(*) cnt FROM users GROUP BY Movie_ID) m
ON t.Movie_ID = m.Movie_ID
SET
t.Player_Count = cnt
BUT: Do you really need a total table? You always can get this information using SELECT query; and the information in the total table may be out of date.

I think you can do this without a loop:
update total set total.Player_Count = (select COUNT(Movie_ID) from users where total.Movie_ID=users.Movie_ID group by (Movie_ID));

Split a MYSQL string from GROUP_CONCAT into an ( array, like, expression, list) that IN () can understand

This question follows on from MYSQL join results set wiped results during IN () in where clause?
So, short version of the question. How do you turn the string returned by GROUP_CONCAT into a comma-seperated expression list that IN() will treat as a list of multiple items to loop over?
N.B. The MySQL docs appear to refer to the "( comma, seperated, lists )" used by IN () as 'expression lists', and interestingly the pages on IN() seem to be more or less the only pages in the MySQL docs to ever refer to expression lists. So I'm not sure if functions intended for making arrays or temp tables would be any use here.
Long example-based version of the question: From a 2-table DB like this:
SELECT id, name, GROUP_CONCAT(tag_id) FROM person INNER JOIN tag ON person.id = tag.person_id GROUP BY person.id;
+----+------+----------------------+
| id | name | GROUP_CONCAT(tag_id) |
+----+------+----------------------+
| 1 | Bob | 1,2 |
| 2 | Jill | 2,3 |
+----+------+----------------------+
How can I turn this, which since it uses a string is treated as logical equivalent of ( 1 = X ) AND ( 2 = X )...
SELECT name, GROUP_CONCAT(tag.tag_id) FROM person LEFT JOIN tag ON person.id = tag.person_id
GROUP BY person.id HAVING ( ( 1 IN (GROUP_CONCAT(tag.tag_id) ) ) AND ( 2 IN (GROUP_CONCAT(tag.tag_id) ) ) );
Empty set (0.01 sec)
...into something where the GROUP_CONCAT result is treated as a list, so that for Bob, it would be equivalent to:
SELECT name, GROUP_CONCAT(tag.tag_id) FROM person INNER JOIN tag ON person.id = tag.person_id AND person.id = 1
GROUP BY person.id HAVING ( ( 1 IN (1,2) ) AND ( 2 IN (1,2) ) );
+------+--------------------------+
| name | GROUP_CONCAT(tag.tag_id) |
+------+--------------------------+
| Bob | 1,2 |
+------+--------------------------+
1 row in set (0.00 sec)
...and for Jill, it would be equivalent to:
SELECT name, GROUP_CONCAT(tag.tag_id) FROM person INNER JOIN tag ON person.id = tag.person_id AND person.id = 2
GROUP BY person.id HAVING ( ( 1 IN (2,3) ) AND ( 2 IN (2,3) ) );
Empty set (0.00 sec)
...so the overall result would be an exclusive search clause requiring all listed tags that doesn't use HAVING COUNT(DISTINCT ... ) ?
(note: This logic works without the AND, applying to the first character of the string. e.g.
SELECT name, GROUP_CONCAT(tag.tag_id) FROM person LEFT JOIN tag ON person.id = tag.person_id
GROUP BY person.id HAVING ( ( 2 IN (GROUP_CONCAT(tag.tag_id) ) ) );
+------+--------------------------+
| name | GROUP_CONCAT(tag.tag_id) |
+------+--------------------------+
| Jill | 2,3 |
+------+--------------------------+
1 row in set (0.00 sec)

Instead of using IN(), would using FIND_IN_SET() be an option too?
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set
mysql> SELECT FIND_IN_SET('b','a,b,c,d');
-> 2
Here's a full example based on the example problem in the question, confirmed as tested by the asker in an earlier edit to the question:
SELECT name FROM person LEFT JOIN tag ON person.id = tag.person_id GROUP BY person.id
HAVING ( FIND_IN_SET(1, GROUP_CONCAT(tag.tag_id)) ) AND ( FIND_IN_SET(2, GROUP_CONCAT(tag.tag_id)) );
+------+
| name |
+------+
| Bob |
+------+

You can pass a string as array, using a split separator, and explode it in a function, that will work with the results.
For a trivial example, if you have a string array like this: 'one|two|tree|four|five', and want to know if two is in the array, you can do this way:
create function str_in_array( split_index varchar(10), arr_str varchar(200), compares varchar(20) )
returns boolean
begin
declare resp boolean default 0;
declare arr_data varchar(20);
-- While the string is not empty
while( length( arr_str ) > 0 ) do
-- if the split index is in the string
if( locate( split_index, arr_str ) ) then
-- get the last data in the string
set arr_data = ( select substring_index(arr_str, split_index, -1) );
-- remove the last data in the string
set arr_str = ( select
replace(arr_str,
concat(split_index,
substring_index(arr_str, split_index, -1)
)
,'')
);
-- if the split index is not in the string
else
-- get the unique data in the string
set arr_data = arr_str;
-- empties the string
set arr_str = '';
end if;
-- in this trivial example, it returns if a string is in the array
if arr_data = compares then
set resp = 1;
end if;
end while;
return resp;
end
|
delimiter ;
I want to create a set of usefull mysql functions to work with this method. Anyone interested please contact me.
For more examples, visit http://blog.idealmind.com.br/mysql/how-to-use-string-as-array-in-mysql-and-work-with/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

sql matching users that have activities in common - mysql

For the love of god, make a 3rd table that contains user_id and activity_id. This isn't a suitable solution in any way. You should have a table which makes the connection between users and activities, not store all activities in a row in your users table.

Related

sql query to find potential duplicate records

Stored Procedure combining resultsets

Should I redesign my tables or can I make this work?

Stored procedure counting trouble

Split a MYSQL string from GROUP_CONCAT into an ( array, like, expression, list) that IN () can understand

Categories

Resources