I'm looking for a way to calculate differences between integers within a single table.
I'm planning a MYSQL table that looks like this:
user question answer
1 1 3
1 2 3
1 3 2
1 4 5
1 5 1
2 1 2
2 2 3
2 3 1
2 4 5
2 5 3
3 1 3
3 2 3
3 3 4
3 4 5
3 5 3
4 1 5
4 2 3
4 3 2
4 4 5
4 5 1
Each user (in this example) has answered 5 questions, giving an answer on a scale of 1 to 5.
What I'm looking to work out is which of the users 2, 3 and 4 have given answers that are most similar to those provided by user 1.
What I have in mind is calculating the difference between the answers given by each user for each question, in comparison to those of user 1, and then adding up those differences.
The user with the lowest number after that addition would be most similar to user 1.
I'm sorry to say that I don't really know where to begin constructing a query that does this efficiently and was wondering if anyone could point me in the right direction?
I'm also open to any suggestions for any better or more logical way to build the same results.
SELECT SUM(ABS(t2.answer - t1.answer)) AS total_diff, t2.user
FROM my_table AS t1
LEFT JOIN my_table AS t2 USING(question)
WHERE t1.user = 1 AND t2.user != t1.user
GROUP BY t2.user
ORDER BY total_diff ASC
result:
total_diff user
2 4
4 2
4 3
SELECT
yt1.user,
SUM(CASE WHEN yt1.answer = yt2.answer THEN 1 ELSE 0 END) AS howMuchAnswersInCommon
FROM yourTable yt1
INNER JOIN yourTable yt2 ON yt1.question = yt2.question
WHERE yt2.user = 1 AND yt1.user != 1
GROUP BY yt1.user
ORDER BY howMuchAnswersInCommon DESC
;
This will give you the one with the most common answers to user 1 on top.
Test data:
/*
create table yourTable (user int, question int, answer int);
insert into yourTable values
(1, 1, 3),
(1, 2, 3),
(1, 3, 2),
(1, 4, 5),
(1, 5, 1),
(2, 1, 2),
(2, 2, 3),
(2, 3, 1),
(2, 4, 5),
(2, 5, 3),
(3, 1, 3),
(3, 2, 3),
(3, 3, 4),
(3, 4, 5),
(3, 5, 3),
(4, 1, 5),
(4, 2, 3),
(4, 3, 2),
(4, 4, 5),
(4, 5, 1);
*/
OUTPUT:
user howMuchAnswersInCommon
4 4
3 3
2 2
Related
These three tables are part of a larger order management system:
orders
o_id c_id
1 1
2 1
3 2
4 3
5 3
6 4
7 5
order_items
o_id p_id
1 1
2 2
3 1
3 2
3 8
4 1
4 2
5 8
5 9
6 4
6 5
7 12
customers
c_id name
1 Doug
2 Tammy
3 Bill
4 Don
5 Kate
I want to find ALL pairs of customers where the second customer in the pair has purchased NONE of the products that the first customer in the pair has purchased. I can't seem to figure this out! My best attempt was grabbing the count of all unique products and trying to see if I could group and reduce by leveraging that count.
Expected Output
c_id1 c_id2
4 1
4 2
4 3
4 5
5 1
5 2
5 3
Or the exact opposite (no duplicates).
CREATE TABLE orders (
o_id INT,
c_id INT
);
INSERT INTO orders (o_id, c_id) VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 3),
(5, 3),
(6, 4),
(7, 5);
CREATE TABLE order_items (
o_id INT,
p_id INT
);
INSERT INTO order_items (o_id, p_id) VALUES
(1, 1),
(2, 2),
(3, 1),
(3, 2),
(3, 8),
(4, 1),
(4, 2),
(5, 8),
(5, 9),
(6, 4),
(6, 5),
(7, 12);
CREATE TABLE customers (
c_id INT,
name VARCHAR(10)
);
INSERT INTO customers (c_id, name) VALUES
(1, 'Doug'),
(2, 'Tammy'),
(3, 'Bill'),
(4, 'Don'),
(5, 'Kate');
Test
WITH cte AS ( SELECT *
FROM orders
NATURAL JOIN order_items
NATURAL JOIN customers )
SELECT t1.c_id id1, t2.c_id id2
FROM customers t1
JOIN customers t2 ON t1.c_id < t2.c_id
WHERE NOT EXISTS ( SELECT NULL
FROM cte cte1, cte cte2
WHERE cte1.c_id = t1.c_id
AND cte2.c_id = t2.c_id
AND cte1.p_id = cte2.p_id );
fiddle
The idea is to generate all pairs of customers (using a cross join).
Then check that they do not have the same items. This is a little tricky, but it involves not exists and joining down to the item level to see if any match on orders that match the customers:
select c1.c_id, c2.c_id
from customers c1 cross join
customers c2
where not exists (select 1
from order_items oi1 join
order_items oi2
on oi1.i_id = oi2.i_id join
orders o1
on o1.o_id = oi1.o_id join
orders o2
on o2.o_id = oi2.o_id
where o1.c_id = c1.c_id and
o2.c_id = o2.c_id
);
I'm working on another SQL query.
I have the following table.
PURCHASES
ID CUST_ID PROD_CODE PURCH_DATE
1 1 'WER' 01/12/2012
2 2 'RRE' 02/10/2005
3 3 'RRY' 02/11/2011
4 3 'TTB' 15/05/2007
5 3 'GGD' 20/06/2016
6 2 'SSD' 02/10/2011
I'm trying to add another column PURCH_COUNT that would display the purchase count for the CUST_ID based on PURCH_DATE.
If this is a first purchase it would return 1, if second then 2, and so on.
So the result I'm hoping is:
ID CUST_ID PROD_CODE PURCH_DATE PURCH_COUNT
1 1 'WER' 01/12/2012 1
2 2 'RRE' 02/10/2005 1
3 3 'RRY' 02/11/2011 2
4 3 'TTB' 15/05/2007 1
5 3 'GGD' 20/06/2016 3
6 2 'SSD' 02/10/2011 2
Thanks in advance!
Sample Data
DECLARE #Table1 TABLE
(ID int, CUST_ID int, PROD_CODE varchar(7), PURCH_DATE datetime)
;
INSERT INTO #Table1
(ID, CUST_ID, PROD_CODE, PURCH_DATE)
VALUES
(1, 1, 'WER', '2012-01-12 05:30:00'),
(2, 2, 'RRE', '2005-02-10 05:30:00'),
(3, 3, 'RRY', '2011-02-11 05:30:00'),
(4, 3, 'TTB', '2008-03-05 05:30:00'),
(5, 3, 'GGD', '2017-08-06 05:30:00'),
(6, 2, 'SSD', '2011-02-10 05:30:00')
;
IN SQL :
select ID,
CUST_ID,
PROD_CODE,
PURCH_DATE,
ROW_NUMBER()OVER(PARTITION BY CUST_ID ORDER BY (SELECT NULL))RN
from #Table1
In MySql :
SELECT a.ID, a.CUST_ID,a.PROD_CODE,a.PURCH_DATE, (
SELECT count(*) from #Table1 b where a.CUST_ID >= b.CUST_ID AND a.ID = b.ID
) AS row_number FROM #Table1 a
Use a correlated sub-query to get the counts per customer.
SELECT t.*,
(SELECT 1+count(*)
FROM table1
WHERE t.cust_id = cust_id
AND t.purch_date > purch_date) as purch_cnt
FROM table1 t
ORDER BY cust_id,purch_date
SQL Fiddle
Any correlated subquery or window function can be expressed as a join, too. Sometimes a join is easier to understand, or is produced from components you can re-use, and sometimes the DBMS doesn't support the fancier feature. (AFAIK, a subquery in a SELECT clause is nonstandard.)
create table T
(ID, CUST_ID, PROD_CODE, PURCH_DATE);
INSERT INTO T
(ID, CUST_ID, PROD_CODE, PURCH_DATE)
VALUES
(1, 1, 'WER', '2012-01-12 05:30:00'),
(2, 2, 'RRE', '2005-02-10 05:30:00'),
(3, 3, 'RRY', '2011-02-11 05:30:00'),
(4, 3, 'TTB', '2008-03-05 05:30:00'),
(5, 3, 'GGD', '2017-08-06 05:30:00'),
(6, 2, 'SSD', '2011-02-10 05:30:00')
;
select PURCH_COUNT, T.*
from T join (
select count(b.ID) as PURCH_COUNT
, a.CUST_ID, a.PURCH_DATE
from T as a join T as b
on a.CUST_ID = b.CUST_ID
and b.PURCH_DATE <= a.PURCH_DATE
group by a.CUST_ID, a.PURCH_DATE
) as Q
on T.CUST_ID = Q.CUST_ID
and T.PURCH_DATE = Q.PURCH_DATE
;
Output of subquery:
PURCH_COUNT CUST_ID PURCH_DATE
----------- ---------- -------------------
1 1 2012-01-12 05:30:00
1 2 2005-02-10 05:30:00
2 2 2011-02-10 05:30:00
1 3 2008-03-05 05:30:00
2 3 2011-02-11 05:30:00
3 3 2017-08-06 05:30:00
Output of query:
PURCH_COUNT ID CUST_ID PROD_CODE PURCH_DATE
----------- ---------- ---------- ---------- -------------------
1 1 1 WER 2012-01-12 05:30:00
1 2 2 RRE 2005-02-10 05:30:00
2 3 3 RRY 2011-02-11 05:30:00
1 4 3 TTB 2008-03-05 05:30:00
3 5 3 GGD 2017-08-06 05:30:00
2 6 2 SSD 2011-02-10 05:30:00
I have a requirement to distribute records equally into two categories. But in case I fall short of records in any one category, I should accommodate count the remaining records in other category.
Sample data:
If like this students of subject s1 are 12, and subject s2 are 20. I need to pick 30 students, result should give me 15 for each subject, but as s1 total is only 12, I should get 12 from s1 and 18 from s2.
This should do the trick:
DECLARE #t TABLE(ID INT, Student VARCHAR(10), Subject CHAR(2))
INSERT INTO #t VALUES
(1, 'Stud1', 's1'),
(2, 'Stud2', 's1'),
(3, 'Stud3', 's2'),
(4, 'Stud4', 's2'),
(5, 'Stud5', 's2'),
(6, 'Stud6', 's2'),
(7, 'Stud7', 's2'),
(8, 'Stud8', 's2'),
(9, 'Stud9', 's2')
;WITH cte AS(SELECT *, ROW_NUMBER() OVER(PARTITION BY Subject ORDER BY ID) AS rn FROM #t)
SELECT TOP 7 *
FROM cte
ORDER BY rn, Subject
The idea is that you are numbering the rows within subjects like:
1 Stud1 s1 1
3 Stud3 s2 1
2 Stud2 s1 2
4 Stud4 s2 2
5 Stud5 s2 3
6 Stud6 s2 4
7 Stud7 s2 5
8 Stud8 s2 6
9 Stud9 s2 7
So, when selecting top N rows, they are distributing automatically because of ordering by that column.
In a table I have records as follows:
ID, ID1, ID2
1, 2, 3
2, 2, 4
3, 2, 5
4, 3, 3
4, 3, 4
4, 4, 3
4, 4, 4
4, 4, 5
I want to be able to find all ID1 values which exist in the table which have ALL of the ID2 values 3, 4 AND 5
So in this case I would want some SQL to pull out only ID1 = 2 and ID1 = 4, but not ID1 = 3 because there exist only ID2=3 and ID2=4 for ID1=3... so it's missing a row for ID2=5 and hence I do not want it included in my result set.
Is there an efficient way to do this?
TY!
You will want to use the following which selects all rows that have an id2 with a value of 3, 4 or 5 and then applies a group by with a having clause to make sure that you return 3 distinct id2 values:
select id1
from yourtable
where id2 in (3, 4, 5)
group by id1
having count(distinct id2) = 3
See SQL Fiddle with Demo.
This type of query is known as relational division.
I have a single table with a self reference InReplyTo with some data like this:
PostID InReplyTo Depth
------ --------- -----
1 null 0
2 1 1
3 1 1
4 2 2
5 3 2
6 4 3
7 1 1
8 5 3
9 2 2
I want to write a query that will return this data in it's threaded form so that the post with ID=2 and all it's descendants will output before PostID=3 and so on for unlimited depth
PostID InReplyTo Depth
------ --------- -----
1 null 0
2 1 1
4 2 2
6 4 3
9 2 2
3 1 1
5 3 2
8 5 3
7 1 1
Is there a simple way to achieve this? I am able to modify the DB structure at this stage so would the new hierarchy datatype be the easiest way to go? Or perhaps a recursive CTE?
-- Test table
declare #T table (PostID int, InReplyTo int, Depth int)
insert into #T values (1, null, 0), (2, 1, 1), (3, 1, 1), (4, 2, 2),
(5, 3, 2), (6, 4, 3), (7, 1, 1), (8, 5, 3),(9, 2, 2)
-- The post to get the hierarchy from
declare #PostID int = 1
-- Recursive cte that builds a string to use in order by
;with cte as
(
select T.PostID,
T.InReplyTo,
T.Depth,
right('0000000000'+cast(T.PostID as varchar(max)), 10)+'/' as Sort
from #T as T
where T.PostID = #PostID
union all
select T.PostID,
T.InReplyTo,
T.Depth,
C.Sort+right('0000000000'+cast(T.PostID as varchar(max)), 10)+'/'
from #T as T
inner join cte as C
on T.InReplyTo = C.PostID
)
select PostID,
InReplyTo,
Depth,
Sort
from cte
order by Sort
Result:
PostID InReplyTo Depth Sort
----------- ----------- ----------- --------------------------------------------
1 NULL 0 0000000001/
2 1 1 0000000001/0000000002/
4 2 2 0000000001/0000000002/0000000004/
6 4 3 0000000001/0000000002/0000000004/0000000006/
9 2 2 0000000001/0000000002/0000000009/
3 1 1 0000000001/0000000003/
5 3 2 0000000001/0000000003/0000000005/
8 5 3 0000000001/0000000003/0000000005/0000000008/
7 1 1 0000000001/0000000007/
What you are looking for is indeed a recursive query.
A matching example to your case can be found here