Most common values for a group dependent on a select query - mysql

I'm breaking my head over how to do this one in SQL. I have a table:
| User_id | Question_ID | Answer_ID |
| 1 | 1 | 1 |
| 1 | 2 | 10 |
| 2 | 1 | 2 |
| 2 | 2 | 11 |
| 3 | 1 | 1 |
| 3 | 2 | 10 |
| 4 | 1 | 1 |
| 4 | 2 | 10 |
It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})
I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.
For example, For the above table:
For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
For Answer_ID = 2 - (Question 2 - Answer ID 11)
Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?

Though #rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.
You need a query to find out the most common answers to questions:
SELECT
question_id,
answer_id,
COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC
This would return a table where for each question_id we output counts in descending order.
| 1 | 1 | 3 |
| 1 | 2 | 1 |
| 2 | 10 | 3 |
| 2 | 11 | 1 |
And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.
In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
SELECT
question_id, answer_id, cnt,
IF(question_id=#q_id, NULL, #q_id:=question_id) as v
FROM (
SELECT
question_id, answer_id, COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC) cnts
JOIN (
SELECT #q_id:=-1
) as init;
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
| 1 | 1 | 3 | 1 |
| 1 | 2 | 1 |(null)|
| 2 | 10 | 3 | 2 |
| 2 | 11 | 1 |(null)|
Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:
SELECT
question_id, answer_id
FROM (
SELECT
question_id, answer_id
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, COUNT(*) DESC) cnts
JOIN (
SELECT #q_id:=-1
) as init
WHERE IF(question_id=#q_id, NULL, #q_id:=question_id) IS NOT NULL;
The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
CREATE TABLE user_answers (
user_id INTEGER,
question_id INTEGER,
answer_id INTEGER,
UNIQUE INDEX (question_id, answer_id, user_id)
) engine=InnoDB;
Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.

Do you want a fish? Or do you want to learn how to fish?
Your question seems to have multiple steps.
Fetch info about "questions by users with the given answer". Devise this SELECT and imagine that the results form a new table.
Apply the "OTHER" restriction. This is probably a minor AND ... != ... added to SELECT #1.
Now find the "most common answer". This probably involves ORDER BY COUNT(*) DESC LIMIT 1. It is likely to
use a derived table:
SELECT ...
FROM ( select#2 )

Your question is multi conditional, you have to get first Questions with their asking user from Question table:
select question_id,user_id from question
Then insert the answer to the asked question and make some checks in your Java code like (is user has answered to this same question as the user who is asking this question, is user answered this question for multiple times).
select question_id,user_id from question where user_id=asking-user_id // gets all questions and show on UI
select answer_id,user_id from answer where user_id=answering-user_id // checks the answers that particular user

Related

How to count distinct values from two columns into one number

The two tables I'm working on are these:
Submissions:
+----+------------+
| id | student_id |
+----+------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+----+------------+
Group_submissions:
+----+---------------+------------+
| id | submission_id | student_id |
+----+---------------+------------+
| 1 | 1 | 2 |
| 2 | 2 | 1 |
+----+---------------+------------+
Only one student actually makes the submission and goes into the submissions table while the others go to the group_submissions table(if the submission is a group submission)
I want to count the unique number of students that have made submission either as a group or alone
I want just the number to be returned in the end (3 based on the data on the tables above)
A student that is in the submissions table should not be counted twice if he is in the group_submission table and vice-versa.
Also students that only have done individual submissions(that are not in the group_submissions table) also should be counted regardless if the have ever been in a group submission
I'm already doing some other operations on these table in a query I'm building so if you can give me a solution based on joining these two tables that would help.
This is what i have tried:
count(distinct case when group_submissions.student_id is not null then group_submissions.student_id end) + count(distinct case when submissions.student_id is not null then submissions.student_id end)
But it gives me duplicates so if a student is in both tables he is counted two times.
Any ideas?
NOTE: This is a MySQL database.
I think you want union and a count:
select count(*)
from ((select student_id
from submissions
)
union -- on purpose to remove duplicates
(select student_id
from group_submissions
)
) s;
After listening to the clarification, I think it is not wise to force yourself to compute this using the join. You can instead make the count just a simple expression as the final outcome. Use UNION and then distinct will help for building such an expression.
OLD ANSWER BELOW THAT DOES NOT FIT THE PROBLEM:
Very simple fix is needed to your current version...
count(distinct case when group_submissions.student_id is not null then
group_submissions.student_id when assignment_submissions.student_id is
not null then assignment_submissions.student_id end)
Note:
your original expression is an addition between 2 case expressions, each with a single WHEN inside
now I turn it into a single case expression with 2 WHEN's```SQL

MySQL: select all rows where just the name is distinct [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
I'm currently trying to select unique entries in only the name column. I have tried using this query but it will not return prices that are the same as well. I've tried other variations with no success either.
SELECT DISTINCT name, price from table;
Here's the table I'm working with:
+----+-------------------+
| id | name | price |
+----+-----------+-------+
| 1 | Henry | 20 |
| 2 | Henry | 30 |
| 3 | Robert | 20 |
| 4 | Joshua | 10 |
| 5 | Alexander | 30 |
+----+-----------+-------+
The output that I'm seeking is:
+----+-------------------+
| id | name | price |
+----+-----------+-------+
| 1 | Henry | 20 |
| 3 | Robert | 20 |
| 4 | Joshua | 10 |
| 5 | Alexander | 30 |
+----+-----------+-------+
The desired output as you can tell only removed the duplicate name and none of the prices. Is there something I can add to my query above to only select unique entries in the name column? Any help is really appreciated as I have tried to find a solution on here, Google, DuckDuckGo, etc. with no luck.
From your sample data, this should work.
SELECT MIN(Id) AS Id, name, MIN(price) AS price
FROM table
GROUP BY name;
This is what GROUP BY is for:
SELECT * FROM `table` GROUP BY `name`
Usually people run into trouble because they will now get an arbitrarily-chosen row when more than one matches for a given name — you have to use aggregate functions to pick a specific one, e.g. "the one with the maximum price".
But in your case, since you don't seem to care which row is returned, this is perfect as-is.
So you want to select distinct list of rows AND then select that given entire row from the table? Try this query where temporary query is just a list of uniqueid then that row is linked back to the table.
Select n.*
From nameprices n
Join (Select MIN(id) as id
From nameprices
Group by name
Order By id) aTemp On (aTemp.id=n.id);
This is a common problem in SQL queries where we want to use that given fully row data but filter was using a distinct/groupby formula.

SQL query to find duplicate rows and return both IDs

I have a table of customers:
id | name | email
--------------------------
1 | Rob | spam#email.com
2 | Jim | spam#email.com
3 | Dave | ham#email.com
4 | Fred | eggs#email.com
5 | Ben | ham#email.com
6 | Tom | ham#email.com
I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)
The desired result:
original_id | duplicate_id | email
-------------------------------------------
1 | 2 | spam#email.com
3 | 5 | ham#email.com
3 | 6 | ham#email.com
My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?
We could handle this using a join, but I might actually go for an option which generates a CSV list of id corresponding to duplicates:
SELECT
email,
GROUP_CONCAT(id ORDER BY id) AS duplicate_ids
FROM yourTable
GROUP BY email
HAVING COUNT(*) > 1
Functionally speaking, this gives you the same information you wanted in your question, but in what is a much simplified form in my opinion. Because we order the id values when concatenating, the original id will always appear first, on the left side of the CSV list. Also, if you have many duplicates your requested output could become verbose and harder to read.
Output:
Demo
select
orig.original_id,
t.id as duplicate_id,
orig.email
from t
inner join (select min(id) as original_id, email
from t
group by email
having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id
By the subquery we can find all ids for emails with duplicates.
Then we join the subquery by email and for each one use minimal id as original
UPDATE: http://rextester.com/BLIHK20984 cloned #Tim Biegeleisen's answer

SELECT * FROM questions WHERE test_id

I have two SQL tables:
Tests
id | name
1 | simple
2 | advanced
3 | professional
Questions
id | Question | answer a | answer b | test_id
1 | Working? | Yes | no | 1,3
2 | Do you smoke? | Yes | no | 2,3
3 | You have a driving license? | Yes | No | 2
Questions should be displayed only if in "test_id" is the id of the desired test, for example, test 2 should contain only questions 2 and 3
SELECT * FROM questions WHERE... (array "test_id" contains 2)
It isn't duplicate, because in suggested question is query like "WHERE ... value=(1,2,3,4)", but in my problem, I have query like "... WHERE (1,2,3,4,5)=value" it's opposite and method "in" is not correct.
You should normalize your database, creating a third table to link questions with the tests they appear on:
Tests:
id | name
1 | simple
2 | advanced
3 | professional
Questions:
id | question | answer a | answer b
1 | Working? | Yes | No
2 | Do you smoke? | Yes | No
3 | You have a driving license? | Yes | No
TestQuestions:
id | test_id | question_number | question_id
1 | 1 | 1 | 1
2 | 2 | 1 | 2
3 | 2 | 2 | 3
4 | 3 | 1 | 1
5 | 3 | 2 | 2
You can then fetch the questions for an individual test by joining the TestQuestions table with Questions:
SELECT Questions.*
FROM TestQuestions
INNER JOIN Questions ON Questions.id = TestQuestions.question_id
WHERE TestQuestions.test_id = 2
ORDER BY TestQuestions.question_number
If you are unable to modify your database tables, then you could also determine the questions in a test using MySQL's FIND_IN_SET function:
SELECT * From Questions WHERE FIND_IN_SET('2', test_id) > 0
It is preferable to use the normalized database structure though since this allows the ordering of questions on a test to be expressed and indexes can also be created to allow better performance.
To evaluate the FIND_IN_SET query, the database must consider every row in the Questions table to see if it matches. By adding the following index to the normalized database, the database would be able to seek directly to the relevant questions for a test:
CREATE UNIQUE INDEX test_id_question_number ON TestQuestions (test_id, question_number)
You really should include another table that maps the question to the test. The way you have it now, the questions cannot be easily linked to the test. For example, if you try to use use a LIKE comparison you may get questios back you didn't intend. Say you are looking for questions to test 2
SELECT * FROM questions WHERE test_id LIKE %2%;
You would also retrieve items for tests 12, anything in the 20s and 32, 42, etc.
You should create a new table called testquestions (for example). This would have two fields
test_id
question_id
And then this data (using what you showed above)
test_id | question_id
1 | 1
2 | 2
2 | 3
3 | 1
3 | 2
Now you can run a query like this
SELECT
tests.name,
questions.question,
questions.`answer a`,
questions.`answer b`
FROM (
tests LEFT JOIN testquestions ON test.id=testquestion.test_id
) LEFT JOIN
questions ON questions.id = testquestion.question_id
WHERE
tests.id = 2;
If you need to search one test id at a time you can use
SELECT * FROM questions WHERE (',' + test_id + ',') like '%,1,%'
i think the best solution offer you Phil Ross (only think i would change will be that i would store this Yes/No answer as true/false OR small int and 0/1)... But if you want to keep your table like they are you can do something like this
SELECT q.* FROM Questions q
INNER JOIN tests t
ON t.id = 2
AND q.test_id regexp(CONCAT('(^|,)',t.id, '(,|$)'));
Here is SQL Fiddle to see how it's work...
P.S. The query connect and the first table so you can select value from there if it's needed...
GL!
EDIT: Fixed on Phil suggestion... Tnx Phil :)

Counting votes in a MySQL table only once or twice

I've got the following table:
+-----------------+
| id| user | vote |
+-----------------+
| 1 | 1 | text |
| 2 | 1 | text2|
| 3 | 2 | text |
| 4 | 3 | text3|
| 5 | 2 | text |
+-----------------+
What I want to do is to count the "votes"
SELECT COUNT(vote), vote FROM table GROUP BY vote
That works fine. Output:
+-------------------+
| count(vote)| vote |
+-------------------+
| 3 | text |
| 1 | text2|
| 1 | text3|
+-------------------+
But now I only want to count the first or the first and the second vote from a user.
So result what I want is (if I count only the first vote):
+-------------------+
| count(vote)| vote |
+-------------------+
| 2 | text |
| 1 | text3|
+-------------------+
I tried to work with count(distinct...) but can get it work.
Any hint in the right direction?
You can do this in a single SQL statement with something like this:
SELECT vote, COUNT(vote)
FROM
(
SELECT MAX(user), vote
FROM table1
GROUP BY user
) d
GROUP BY vote
Note that this only gives you 1 vote not 1 or 2.
The easiest way would be to use one of the "row numbering" solutions listed in this SO question. Then your original query's almost there:
SELECT
COUNT(vote),
vote
FROM tableWithRowNumberAdded
WHERE MadeUpRowNumber IN (1,2)
GROUP BY vote
My alternative is much longer winded and calls for working tables. These can be "real" tables in your schema, or whatever flavour of intermediate resultsets you are comfortable with.
Start by getting the first vote for each user:
SELECT user, min(id) FROM table GROUP BY user
Put this in a working table; let's call it FirstVote. Next we can get each user's second vote, if any:
SELECT user, min(id) FROM table WHERE id not in (select id from FirstVote) GROUP BY user
Let's call the result of this SecondVote. UNION FirstVote to SecondVote, join this to the original table and group by vote. There's your answer!
SELECT
vote,
COUNT(*)
FROM table
INNER JOIN
(
SELECT id FROM FirstVote
UNION ALL
SELECT id FROM SecondVote
) as BothVotes
ON BothVotes.id = table.id
GROUP BY vote
Of course it could be structured as a single statement with multiple sub-queries but that would be horrendous to maintain, or read in this forum.
This is a very triky question for MySQL. On other systems there windowed functions: it performs a calculation across a set of table rows that are somehow related to the current row.
MySQL lacks this functionality. So one should look for a workaround. Here is the problem description and couple solutions suggested: MySQL and window functions.
I also assume that first 2 votes by the User can be determined by Id: earlier vote has smaller Id.
Based on this I would suggest this solution to your problem:
SELECT
Vote,
Count (*)
FROM
Table,
(
SELECT
user_id, SUBSTRING_INDEX(GROUP_CONCAT(Id ORDER BY user_id ASC), ',', 2) AS top_IDs_per_user
FROM
Table
GROUP BY
user_id
) s_top_IDs_per_User
WHERE
Table.user_id = s_top_IDs_per_User.User_id and
FIND_IN_SET(Id, s_top_IDs_per_User.top_IDs_per_user)
GROUP BY Vote
;