SQL question. Find the two person having same hobbies in one table

SQL question. Find the two person having same hobbies in one table - mysql

TABLE [tbl_hobby]
person_id (int) , hobby_id(int)
has many records. I want to get a SQL query to find all pairs of personid who have the same hobbies( same hobby_id ).
If A has hobby_id 1, B has too, if A doesn't have hobby_id 2, B doesn't have too, we will output A & B 's person_ids.
If A and B and C reach the limits, we output A & B , B & C, A & C.
I've finished in a very very very stupid method, multiple joins the table itself and multiple sub-queries. And of course be laughed by leader.
Is there any high performance method in a SQL for this question?
I have been thinking hard for this since 36 hrs ago......
sample data in mysql dump
CREATE TABLE `tbl_hobby` (
`person_id` int(11) NOT NULL,
`hobby_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbl_hobby` (`person_id`, `hobby_id`) VALUES
(1, 1),(1, 2),(1, 3),(1, 4),(1, 5),(2, 2),
(2, 3),(2, 4),(3, 1),(3, 2),(3, 3),(3, 4),
(4, 1),(4, 3),(4, 4),(5, 1),(5, 5),(5, 9),
(6, 2),(6, 3),(6, 4),(7, 1),(7, 3),(7, 7),
(8, 2),(8, 3),(8, 4),(9, 1),(9, 2),(9, 3),
(9, 4),(10, 1),(10, 5),(10, 9),(10, 11);
COMMIT;
Expert result: (2 and 6 and 8 same, 3 and 9 same)
2,6
2,8
6,8
3,9
Order of result records and order of the two number in one record is not important. Result record in one column or in two columns are all accepted since it can be easily concated or seperated.

Aggregate per person to get strings of their hobbies. Then aggregate per hobby list find out which belong to more than one person.
select hobbies, group_concat(person_id order by person_id) as persons
from
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) persons
group by hobbies
having count(*) > 1
order by hobbies;
This gives a a list of persons per hobby. Which is the easiest way to output a solution as we would otherwise have to build all possible pairs.
UPDATE: If you want pairs, you'll have to query the table twice:
select p1.person_id as person 1, p2.person_id as person2
from
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) p1
join
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) p2 on p2.person_id > p1.person_id and p2.hobbies = p1.hobbies
order by person1, person2;

Alternative version, without using any proprietary string handling:
select distinct t1.person_id, t2.person_id
from tbl_hobby t1
join tbl_hobby t2
on t1.person_id < t2.person_id
where 2 = all (select count(*)
from tbl_hobby
where person_id in (t1.person_id, t2.person_id)
group by hobby_id);
Perhaps less efficient, but portable!

Related

SQL multiple JOINs or subqueries but avoid cartesian product

I want to realize an SQL database for a game. There are a number of players that participate in different tournaments. For each tournament, a player has a separate account. All games are listed in one large table in which the tournament accounts are used to describe winner, loser, along with the score of the game.
The schema is given in http://sqlfiddle.com/#!9/55378a or here again
CREATE TABLE `players` (
`id` int NOT NULL,
`name` varchar(5),
PRIMARY KEY (`id`)
);
CREATE TABLE `tournamentAccounts` (
`tId` int NOT NULL,
`playerId` int NOT NULL,
`handicap` int NOT NULL DEFAULT 10,
PRIMARY KEY (`tId`)
);
CREATE TABLE `games` (
`gameId` int NOT NULL,
`winnerTId` int NOT NULL,
`loserTId` int NOT NULL,
`score` int NOT NULL DEFAULT 0,
PRIMARY KEY (`gameId`)
);
INSERT INTO `players` (`id`, `name`) VALUES
(1, 'a'), (2, 'b'), (3, 'c');
INSERT INTO `tournamentAccounts` (`tId`, `playerId`, `handicap`) VALUES
(1, 1, 10), (2, 1, 2), (3, 2, 0);
INSERT INTO `games` (`gameId`, `winnerTId`, `loserTId`, `score`) VALUES
(1, 1, 3, 3), (2, 1, 3, 2), (3, 3, 1, 6);
What I want to achieve: List for a specific player all tournament scores, i.e. handicap + scorepoints of won games - scorepoints of lost games. For the given inputs, the result set should contain two rows with total scores 9 (for tId=1) and 2 (for tId=2), respectively. The example here is simplified, as in my example there are more conditions to match between the tournamentAccounts and games tables (e.g. time slots etc.), but I guess I can extend it myself once I understood the basic approach :-)
My approaches until now failed as I cannot get a nice JOIN or subqueries to work (I would like to avoid stored procedures).
Attempt 1: straight forward join
SELECT t.*, (t.handicap +COALESCE(SUM(w.score),0) -COALESCE(SUM(l.score),0)) AS score
FROM tournamentAccounts t
LEFT JOIN games w ON w.winnerTId = t.tId
LEFT JOIN games l ON l.loserTId = t.tId
WHERE playerId = 1
GROUP BY t.tId
Although this returns the correct number of rows, the double LEFT JOIN causes a cartesian product as it seems: the two won games are joined with the lost game into two datasets, hence 10 + 3 - 6 + 2 - 6. This effect obviously becomes worse the more matching rows I have in the games table.
Attempt 2: UNION with JOIN (similar to sql avoid cartesian product)
SELECT SUM(COALESCE(x.aa,0))
FROM
((SELECT -l.score AS aa FROM games l LEFT JOIN tournamentAccounts t ON l.loserTId = t.tId WHERE t.playerId = 1)
UNION
(SELECT w.score AS aa FROM games w LEFT JOIN tournamentAccounts t ON w.winnerTId = t.tId WHERE t.playerId = 1)) x
With this I get the proper score value summed up, however it is not yet combined with the corresponding handicap value, and also I don't know how to extend from here to cover all tournament accounts of that player (here, I just took a small snapshot of data) in an SQL manner.

I would just make the games portion of your query into a union, not the whole thing:
SELECT t.*, (t.handicap +COALESCE(SUM(win_score),0) -COALESCE(SUM(loss_score),0)) AS score
FROM tournamentAccounts t
LEFT JOIN (
SELECT w.winnerTId AS tId, w.score AS win_score, 0 AS loss_score FROM games w
UNION ALL
SELECT l.loserTId, 0, l.score FROM games l
) games_won_or_lost ON games_won_or_lost.tId=t.tId
WHERE playerId = 1
GROUP BY t.tId
The other alternative is to undo the effects of the cartesian product. You know the win score is too high by a factor of the number of lost games, so replace SUM(w.score) with ROUND(SUM(w.score)/GREATEST(COUNT(DISTINCT l.gameId),1)). And similarly, SUM(l.score) becomes ROUND(SUM(l.score)/GREATEST(COUNT(DISTINCT w.gameId),1)).
fiddle

How about following:-
SELECT t.*, (t.handicap + coalesce(wscore,0) - coalesce(lscore,0)) AS score
FROM tournamentAccounts t
LEFT JOIN (
select sum(score) wscore, winnerTId wid
from games
group by winnerTid
) as w ON w.wid = t.tid
left join (
select sum(score) lscore, loserTid lid
from games
group by loserTid
) as l ON l.lid = t.tid
where playerId = 1
I got the result as
tId playerId handicap score
1 1 10 9
2 1 2 2

Divide MAX score by Grade in different tables MySQL

table a
no name
2001 jon
2002 jonny
2003 mik
2004 mike
2005 mikey
2006 tom
2007 tomo
2008 tommy
table b
code name credits courseCode
A2 JAVA 25 wer
A3 php 25 wer
A4 oracle 25 wer
B2 p.e 50 oth
B3 sport 50 oth
C2 r.e 25 rst
C3 science 25 rst
C4 networks 25 rst
table c
studentNumber grade coursecode
2003 68 A2
2003 72 A3
2003 53 A4
2005 48 A2
2005 52 A3
2002 20 A2
2002 30 A3
2002 50 A4
2008 90 B2
2007 73 B2
2007 63 B3
SELECT a.num, a.Fname,
b.courseName, b.cMAXscore, b.cCode, c.stuGrade
FROM a
INNER JOIN c
ON a.no = c.no
INNER JOIN b
ON c.moduleCode = b.cCode
INNER JOIN b
ON SUM(b.cMAXscore) / (c.stuGrade)
AND b.cMAXscore = c.stug=Grade
GROUP BY a.Fname, b.cMAXscore, b.cCode, b.courseName,c.stuGrade
"calculate and display every student name(a.Fname) and their ID number(a.num) along with their grade (c.grade) versus the coursse name(b.courseName) and the courses max score(b.cMAXscoure). "
I cant figure out how to divide the MAX by the grade, can someone help?

From the specification, it doesn't look like an aggregate function or a GROUP BY would be necessary. But the specification is ambiguous. There's no table definitions (beyond the unfortunate names and some column references).
Definitions of the tables, along with example data and an example of the desired resultset would go a long ways to removing the ambiguity.
Based on the join predicates in the OP query, I'd suggest something like this query, as a starting point:
SELECT a.Fname
, a.num
, c.grade
, b.courseName
, b.cMAXsource
FROM a
JOIN c
ON c.no = a.no
JOIN b
ON b.cCode = c.moduleCode
ORDER
BY a.Fname
, a.num
, c.grade
, b.courseName
, b.cMAXsource
It seems like that would return the specified result (based on my interpretation of the vague specification.) If that's insufficient i.e. if that doesn't return the desired resultset, then in what way does the desired result differ from the result from this query?
(For more help with your question, I suggest you setup a sqlfiddle example with tables and example data. That will make it easier for someone to help you.)
FOLLOWUP
Based on the additional information provided in the question (table definitions and example data...
To get the maximum (highest) grade for a given course, you could use a query like this:
SELECT MAX(c.grade)
FROM c
WHERE c.coursecode = 'A2'
To get the highest grade for all courses:
SELECT c.coursecode
, MAX(c.grade) AS max_grade
FROM c
GROUP BY c.coursecode
ORDER BY c.coursecode
To match the highest grade for each course to each student grade, use that previous query as an inline view in another query. Something like this:
SELECT g.studentNumber
, g.grade
, g.coursecode
, h.coursecode
, h.highest_grade
FROM c g
JOIN ( SELECT c.coursecode
, MAX(c.grade) AS highest_grade
FROM c
GROUP BY c.coursecode
) h
ON h.coursecode = g.coursecode
To perform a calculation, you can use an expression in the SELECT list of the outer query.
For example, to divide the value of one column by another, you can use the division operator:
SELECT g.studentNumber AS student_number
, g.grade AS student_grade
, g.coursecode AS student_coursecode
, h.coursecode
, h.highest_grade
, g.grade / h.highest_grade AS `student_grade_divided_by_highest_grade`
FROM c g
JOIN ( SELECT c.coursecode
, MAX(c.grade) AS highest_grade
FROM c
GROUP BY c.coursecode
) h
ON h.coursecode = g.coursecode
If you want to also return the name of the student, you can perform a join operation to (the unfortunately named) table a. Assuming that studentnumber is UNIQUE in a :
LEFT
JOIN a
ON a.studentnumber = c.studentnumber
And include a.Fname AS student_first_name in the SELECT list.
If you also need columns from table b, then join that table as well. Assuming that coursecode is UNIQUE in b:
LEFT
JOIN b
ON b.coursecode = g.courscode
Then b.credits can be referenced in an expression in the SELECT list.
Beyond that, you need to be a little more explicit about what result should be returned by the query.
If you are after a "total overall grade" for a student, you'd need to specify how that result should be obtained.

Without knowing table definations it is very hard to provide solution to your problem.
Here is my version of what you are trying to look for:
DECLARE #Student TABLE
(StudentID INT IDENTITY,
FirstName VARCHAR(255),
LastName VARCHAR(255)
);
DECLARE #Course TABLE
(CourseID INT IDENTITY,
CourseCode VARCHAR(25),
CourseName VARCHAR(255),
MaxScore INT
);
DECLARE #Grade TABLE
(ID INT IDENTITY,
CourseID INT,
StudentID INT,
Score INT
);
--Student
insert into #Student(FirstName, LastName)
values ('Test', 'B')
insert into #Student(FirstName, LastName)
values ('Test123', 'K')
--Course
insert into #Course(CourseCode, CourseName, MaxScore)
values ('MAT101', 'MATH',100.00)
insert into #Course(CourseCode, CourseName, MaxScore)
values ('ENG101', 'ENGLISH',100.00)
--Grade
insert into #Grade(CourseID, StudentID, Score)
values (1, 1,93)
insert into #Grade(CourseID, StudentID, Score)
values (1, 1,65)
insert into #Grade(CourseID, StudentID, Score)
values (1, 1,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 1,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 1,69)
insert into #Grade(CourseID, StudentID, Score)
values (2, 1,95)
insert into #Grade(CourseID, StudentID, Score)
values (1, 2,100)
insert into #Grade(CourseID, StudentID, Score)
values (1, 2,65)
insert into #Grade(CourseID, StudentID, Score)
values (1, 2,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 2,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 2,88)
insert into #Grade(CourseID, StudentID, Score)
values (2, 2,96)
SELECT a.StudentID,
a.FirstName,
a.LastName,
c.CourseCode,
SUM(b.Score) AS 'StudentScore',
SUM(c.MaxScore) AS 'MaxCourseScore',
SUM(CAST(b.Score AS DECIMAL(5, 2))) / SUM(CAST(c.MaxScore AS DECIMAL(5, 2))) AS 'Grade'
FROM #Student a
INNER JOIN #Grade b ON a.StudentID = b.StudentID
INNER JOIN #Course c ON c.CourseID = b.CourseID
GROUP BY a.StudentID,
a.FirstName,
a.LastName,
c.CourseCode;

The problem statement doesn't say anything about dividing by the max, I think you're misunderstanding it.
You need to write a subquery that gets the maximum score for each class, using MAX and GROUP BY. You can then join this with the other tables.
SELECT s.name AS student_name, c.name AS course_name, g.grade, m.max_grade
FROM student AS s
JOIN grade AS g ON s.no = g.studentNumber
JOIN course AS c ON c.code = g.courseCode
JOIN (SELECT courseCode, MAX(grade) AS max_grade
FROM grade
GROUP BY courseCode) AS m
ON m.courseCode = c.courseCode
If you did need to divide the grade by the maximum, you can use g.grade/m.max_grade.

How select data from distinct Companies in the same row?

Is it possible to select distinct company names from the customer table but also displaying the iD's related?
at the minute I'm using
SELECT company,id, COUNT(*) as count FROM customers GROUP BY company HAVING COUNT(*) > 1;
which returns
MyDuplicateCompany1 64 2
MyDuplicateCompany2 20 3
MyDuplicateCompany6 175 2
but what I'm after is all the duplicate ID's for each.
so
CompanyName, TimesDuplicated, DuplicateId1, DuplicateId2, DuplicateId3
or a row for each so
MyDuplicateCompany1, DuplicateId1, TimesDuplicated
MyDuplicateCompany1, DuplicateId2, TimesDuplicated
MyDuplicateCompany2, DuplicateId1, TimesDuplicated
MyDuplicateCompany2, DuplicateId2, TimesDuplicated
MyDuplicateCompany2, DuplicateId3, TimesDuplicated
is this possible?

Not sure if this would be acceptable but there's a function in mySQL which allows you to combine multiple rows into one Group_Concat(Field), but show the distinct values for each record for columns specified (like ID in this case)
SELECT company
, COUNT(*) as count
, group_concat(ID) as DupCompanyIDs
FROM customers
GROUP BY company
HAVING COUNT(*) > 1;
SQL Fiddle
showing similar results with duplicate companies listed in one field.
If you need it in multiple columns or multiple rows, you could wrap the above as an inline view and inner join it back to customers on the name to list the duplicates and times duplicated.

You can use GROUP_CONCAT(id) to concat your id by comma, your query should be:
SELECT company, GROUP_CONCAT(id) as ids, COUNT(id) as cant FROM customers GROUP BY company HAVING cant > 1
You can test the query with this
CREATE TABLE IF NOT EXISTS `customers` (
`id` int(11) NOT NULL,
`company` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `customers` (`id`, `company`) VALUES
(1, 'MyDuplicateCompany1'),
(2, 'MyDuplicateCompany1'),
(3, 'MyDuplicateCompany1'),
(4, 'MyDuplicateCompany2'),
(5, 'MyDuplicateCompany2'),
(6, 'MyDuplicateCompany3'),
(7, 'MyDuplicateCompany3'),
(8, 'MyDuplicateCompany3'),
(9, 'MyDuplicateCompany3'),
(10, 'MyDuplicateCompany4');
Output:
Read more at:
http://monksealsoftware.com/mysql-group_concat-and-postgres-array_agg/

You are not looking for companies with more than 1 entry (GROUP BY company), but for duplicate company IDs (GROUP BY company, id):
SELECT company, id, COUNT(*)
FROM customers
GROUP BY company, id
HAVING COUNT(*) > 1;

This should give exactly what you're looking for without GROUP_CONCAT()
SELECT
company, id,
( SELECT COUNT(*) from customers AS b
WHERE a.company = b.company
) AS cnt
FROM customers AS a
GROUP BY company, id
HAVING cnt > 1
;
Note: GROUP_CONCAT does the same thing, just all in one row per company.

Convert columns into rows with inner join in mysql

Please take a look at this fiddle.
I'm working on a search filter select box and I want to insert the field names of a table as rows.
Here's the table schemea:
CREATE TABLE general
(`ID` int, `letter` varchar(21), `double-letters` varchar(21))
;
INSERT INTO general
(`ID`,`letter`,`double-letters`)
VALUES
(1, 'A','BB'),
(2, 'A','CC'),
(3, 'C','BB'),
(4, 'D','DD'),
(5, 'D','EE'),
(6, 'F','TT'),
(7, 'G','UU'),
(8, 'G','ZZ'),
(9, 'I','UU')
;
CREATE TABLE options
(`ID` int, `options` varchar(15))
;
INSERT INTO options
(`ID`,`options`)
VALUES
(1, 'letter'),
(2, 'double-letters')
;
The ID field in options table acts as a foreign key, and I want to get an output like the following and insert into a new table:
id field value
1 1 A
2 1 C
3 1 D
4 1 F
5 1 G
6 1 I
7 2 BB
8 2 CC
9 2 DD
10 2 EE
11 2 TT
12 2 UU
13 2 ZZ
My failed attempt:
select DISTINCT(a.letter),'letter' AS field
from general a
INNER JOIN
options b ON b.options = field
union all
select DISTINCT(a.double-letters), 'double-letters' AS field
from general a
INNER JOIN
options b ON b.options = field

Pretty sure you want this:
select distinct a.letter, 'letter' AS field
from general a
cross JOIN options b
where b.options = 'letter'
union all
select distinct a.`double-letters`, 'double-letters' AS field
from general a
cross JOIN options b
where b.options = 'double-letters'
Fiddle: http://sqlfiddle.com/#!2/bbf0b/18/0
A couple to things to point out, you can't join on a column alias. Because that column you're aliasing is a literal that you're selecting you can specify that literal as criteria in the WHERE clause.
You're not really joining on anything between GENERAL and OPTIONS, so what you really want is a CROSS JOIN; the criteria that you're putting into the ON clause actually belongs in the WHERE clause.

I just made this query on Oracle.
It works and produces the output you described :
SELECT ID, CASE WHEN LENGTH(VALUE)=2THEN 2 ELSE 1 END AS FIELD, VALUE
FROM (
SELECT rownum AS ID, letter AS VALUE FROM (SELECT DISTINCT letter FROM general ORDER BY letter)
UNION
SELECT (SELECT COUNT(DISTINCT LETTER) FROM general) +rownum AS ID, double_letters AS VALUE
FROM (
SELECT DISTINCT double_letters FROM general ORDER BY double_letters)
)
It should also run on Mysql.
I did not used the options table. I do not understand his role. And for this example, and this type of output it seems unnecessary
Hope this could help you to.

Percentage of students that have purchased rewards

I have a Reward System based on MySQL/PHP. Teachers award students points; students can then purchase rewards using their accrued points.
What I'd like to do is display a single percentage figure of students who have purchased rewards.
This sounds quite simple, but in practice it may be a little bit more complicated to achieve.
I don't have a list of students in the database, I simply have transactions
If a student has received a point, their ID will be displayed in the transactions table under the Recipient_ID column
The transactions table looks like: Transaction_ID, Datetime, Giver_ID, Recipient_ID, Points, Category_ID, Reason.
The purchases table looks like: Purchase_ID, Datetime, Reward_ID, Quantity, Student_ID, Student_Name (blank), Date_DealtWith, Date_Collected
So, for example, I can list all of my student IDs using SELECT DISTINCT Recipient_ID
FROM transactions.
All I basically need is:
[students] A count of students with a point+ (i.e. Recipient_ID in transactions)
[purchases] A count of students with a purchase+ (i.e. Student_ID in `purchases)
[purchases] / [students] * 100
.. but I'm not sure how to do that in one query!
EDIT: Insert Statements*
purchases...
INSERT INTO `purchases` (`Purchase_ID`, `Datetime`, `Reward_ID`, `Quantity`, `Student_ID`, `Student_Name`, `Date_DealtWith`, `Date_Collected`) VALUES
(1, '2011-09-27 16:55:16', 1, 1, 34240, '', '2011-09-27 16:55:16', '2011-12-12 15:45:43'),
(2, '2011-09-28 13:02:26', 1, 1, 137636, '', '2011-09-27 16:55:16', '2011-09-27 16:55:16'),
(3, '2011-09-29 11:29:09', 1, 1, 137685, '', NULL, NULL);
transactions...
INSERT INTO `transactions` (`Transaction_ID`, `Datetime`, `Giver_ID`, `Recipient_ID`, `Points`, `Category_ID`, `Reason`) VALUES
(1, '2011-09-07', 36754, 34401, 5, 6, 'Gave excellent feedback on the new student notebook'),
(2, '2011-09-07', 34972, 137615, 10, 9, 'Helping TG'),
(6, '2011-09-07', 35006, 90185, 2, 1, '');

Here's an ugly version, which manages to avoid doing the count of students from transactions twice through a hackish join between 2 sub-queries.
SELECT
numrewards_students,
numpurchase_students,
numpurchase_students / numrewards_students * 100.0
FROM
( SELECT 0 AS joiner, COUNT( DISTINCT Recipient_ID ) AS numrewards_students FROM transactions ) AS trs
JOIN ( SELECT 0 AS joiner, COUNT( DISTINCT Student_ID ) AS numpurchase_students FROM purchases ) AS prs ON trs.joiner = prs.joiner
Another version, I'm not certain will work without a FROM clause in the sub-query - I can't remember if MySQL will permit this:
SELECT
numrewards_students,
numpurchase_students,
numpurchase_students / numreward_students * 100.0
FROM
(
(SELECT COUNT(DISTINCT) Recpipient_ID AS numrewards_students FROM transactions) AS numrewards_stduents,
(SELECT COUNT(DISTINCT) Student_ID AS numpurchases_students FROM purchases) AS numpurchase_stduents
)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL question. Find the two person having same hobbies in one table - mysql

Related

SQL multiple JOINs or subqueries but avoid cartesian product

Divide MAX score by Grade in different tables MySQL

How select data from distinct Companies in the same row?

Convert columns into rows with inner join in mysql

Percentage of students that have purchased rewards

Categories

Resources