How do I fix this query? - mysql

I'm writing an application where people ask questions, and get answers in the form of a survey. Each question has 2 options, plus a default option(s). When a person answers the question, they can choose from either the 2 options set by the asker, or the default option(s) chosen by me. For instance, if the question is Vanilla vs. Chocolate, options will be Vanilla, Chocolate, and Neither. I want to be able to tabulate the percentage of options chosen for a question, i.e., 25% say chocolate, 30% say vanilla, 45% say neither.
I'll start by showing the table structure and the query I'm running.
These are the tables involved (Note: these are not the full table structures):
--questions--
id
user_id
topic
description
--options--
id
text
default (bool)
--questions_options--
question_id
option_id
--answers--
id
question_id
user_id
option_id
Here is the query:
SELECT
options.id AS option_id, options.text, options.default,
ROUND(
IFNULL(
(COUNT(answers.option_id) * 100)
/
(SELECT COUNT(answers.option_id) FROM answers WHERE question_id = QUESTION_ID)
, 0)
, 2) AS percentage
FROM options
LEFT JOIN questions_options ON questions_options.option_id = options.id
LEFT JOIN answers ON answers.option_id = options.id
WHERE questions_options.question_id = QUESTION_ID
OR options.default = '1'
GROUP BY options.id
ORDER BY percentage DESC, option_id ASC
Where QUESTION_ID is an integer constant.
The problem is the query is not limiting answers to only those given for a particular question, and because the options are many to many with questions, I'm getting results like 600% for vanilla (if multiple questions use vanilla as an option). In cases where the options are unique to ONE question, then the percentages make sense, except for the default options, which are present for all questions. I tried putting WHERE answers.question_id = QUESTION_ID in there, but it did not work.
Any solutions?
Thanks

You're doing the joins in the wrong direction - you're looking at options first, even though you have specifically stated you want things tabulated by question. This means that you're getting results for all options, regardless of whether or not they even relate to your question...
Oh, and I'm assuming that answer_id is mapped to question_id, or you're not going to be able to get any meaningful results (that is - answers are not otherwise mapped to questions...)
Try this query instead:
SELECT b.id, b.text, b.default, (SELECT IFNULL(
ROUND((COUNT(c.id) * 100) /
(SELECT COUNT(d.id)
FROM answers as d
WHERE d.answer_id = a.question_id)
, 2)
, 0)
FROM answers as c
WHERE c.answer_id = a.question_id
AND c.option_id = a.option_id) as percentage
FROM questions_options as a
JOIN options as b
ON b.id = a.option_id
WHERE a.question_id = QUESTION_ID
ORDER BY percentage DESC, a.option_id ASC
Please note that I don't have a copy of MySQL to run this against, and I would normally implement with CTEs (which I have been informed are not supported for MySQL).
EDIT:
In light of the fact that 'default' options may not be mapped through the questions_options table, try this:
SELECT a.id, a.text, a.default, IFNULL(
ROUND((b.answerCount * 100) /
(SELECT COUNT(c.id)
FROM answers as c
WHERE c.answer_id = QUESTION_ID)
, 2)
, 0)
FROM options as a
LEFT JOIN (SELECT c.option_id, count(c.id) as answerCount
FROM answers as c
WHERE c.question_id = QUESTION_ID
GROUP BY c.option_id) as b
ON b.option_id = a.id
Please note that you will still get "meaningless" '0' results for every 'default' answer that was not presented to survey respondents - and no way to distinguish these from any actual '0' results results for 'default' answers that were presented to respondents. You are likely to be far better off placing the so called 'default' options in the questions_options table - as it is, you have no way to determine all the options that were presented to respondents (just which ones you have answers to, which is quite different); this may be a huge business-accountability issue for your company. In addition, some 'default' options may not make sense in context - "Do you prefer your tea hot or cold", "Yes".

Issues that I can see:
You GROUP BY options.id which means you are getting random values for options.text and options.default. This may or may not change your results depending on the structure of your data. If there are multiple rows per id then it will be inaccurate or misleading data.
You have a WHERE clause for your divisor but not your dividend in the percentage calculation - this means you will never have a lower count for the dividend. Try putting a WHERE question_id = QUESTION_ID to the first COUNT statement.

Related

Show only results from table where a certain combination of variables is not present in other table?

I want to build a system in which users can submit quiz questions and users can also rate those questions.
There are two tables relevant to my problem:
"Question", which contains a questionID, a userID(from the user who
submitted the question), a correct answer, a wrong answer, another
wrong answer and a boolean that is set to true when the question
survived the validation process.
"Validation", which contains a questionID for which question is validated, a UserID for which user validated it and a validation (0,
1 or 2 depending on the rating the user gave the question).
To give users questions that need rating, I need a MySQL query with which I receive a question that is:
not through validation yet (validated bool = false)
not made by the user requesting it
not already validated by the user requesting it
the first question in the list of results with these factors
EDIT: I found a solution, look at the bottom edit.
I have tried the following query:
set #UserId=5;
SELECT q.id, q.question, q.correct_answer, q.wrong_answer1, q.wrong_answer2
FROM question q
LEFT JOIN validation v ON v.question_id=q.id
WHERE q.validated = 0
AND q.user_id!=#UserId
AND v.user_id!=#UserID
ORDER BY q.id
LIMIT 1
I exclude questions already through validation with WHERE q.validated = 0.
I make sure it's the first question in the list of results with ORDER BY q.id LIMIT 1
I exclude questions made by the user requesting it with q.user_id!=#UserId
This query returns nothing, though.
The question table contains some unvalidated questions. The validation table is empty.
I know the mistake lies somewhere within the LEFT JOIN validation v ON v.question_id=q.id and v.user_id!=#UserID parts, but I don't know how to translate my will to MySQL..
EDIT: I found a solution that worked for my problem:
set #UserId=5;
SELECT q.id, q.question, q.correct_answer, q.wrong_answer1, q.wrong_answer2 FROM question q
WHERE NOT EXISTS
(SELECT * FROM validation v WHERE v.user_id=#UserID AND q.id = v.question_id)
AND q.validated = 0 AND q.user_id!=#UserId
But, I read this method is very bad for performance.
Is there a more performant method?
You have to check for null in v.user_id because there is no entry.
This query works:
set #UserId=5;
SELECT q.id, q.question, q.correct_answer, q.wrong_answer1, q.wrong_answer2
FROM question q
LEFT JOIN validation v ON v.question_id=q.id
WHERE q.validated = 0
AND q.user_id!=#UserId
AND v.user_id is null or v.user_id!=#UserId
ORDER BY q.id
LIMIT 1

Getting count from joined table

Here's my problem: I need to get the amount of test cases and issues associated to a project that meet certain conditions (test cases that are successful, and issues that are flaws of the application), but for some reason the amount doesn't add up. I have 10 test cases in a project, of which 6 are successful; and 8 issues, of which only 4 are flaws. However, the respective results for COUNT each show 24, which makes no sense. I did notice, though, that 24 happens to be 6 times 4, but I don't see how the query would multiply them.
Anyway... Can someone help me find which part of my query is wrong? How can I get the correct result? Thanks in advance.
Here's the query:
SELECT
p.codigo_proyecto,
p.nombre,
IFNULL(COUNT(iep.id_incidencia_etapa_proyecto), 0) AS cantidad_defectos,
IFNULL(COUNT(tc.id_test_case), 0) AS test_cases_exitosos,
CASE IFNULL(COUNT(tc.id_test_case), 0) WHEN 0 THEN 'No aplica'
ELSE CONCAT((IFNULL(COUNT(tc.id_test_case), 0) / IFNULL(COUNT(tc.id_test_case), 0)) * 100, '%') END AS tasa_defectos
FROM proyecto p
INNER JOIN etapa_proyecto ep ON p.codigo_proyecto = ep.codigo_proyecto
INNER JOIN incidencia_etapa_proyecto iep ON ep.id_etapa_proyecto = iep.id_etapa_proyecto
INNER JOIN incidencia i ON iep.id_incidencia = i.id_incidencia
INNER JOIN test_case tc ON ep.id_etapa_proyecto = tc.id_etapa_proyecto
INNER JOIN etapa_proyecto ep_ultima ON ep_ultima.id_etapa_proyecto =
(SELECT ep_ultima2.id_etapa_proyecto FROM etapa_proyecto ep_ultima2
WHERE p.codigo_proyecto = ep_ultima2.codigo_proyecto ORDER BY ep_ultima2.fecha_termino_real DESC LIMIT 1)
WHERE p.esta_cerrado = 1
AND i.es_defecto = 1
AND tc.resultado = 'Exitoso'
AND ep_ultima.fecha_termino_real BETWEEN '2015-01-01' AND '2016-12-31';
I would have thought it obvious that you're not going to get the expected output from an aggregate query without a GROUP BY (which suggests you're not really in a position to evaluate any advice given here effectively).
You've not said how the states of your data are represented in the database - so I'm having to make a lot of guesses based on SQL which is clearly very wrong. And I don't speak spanish/portugese or whatever your native language is.
It looks like you are inferring that a defect exists if the primary key of the defects table is null. Primary keys cannot be null. The only way this would make any sort of sense (BTW it still won't give you the answer you're looking for) is to do a LEFT JOIN rather than an INNER JOIN.
But even then a simple COUNT() will consider null cases (no record in source table) as 1 record in the output set.
Then you've got the problem that you will have the product of defects and test cases in your output - consider the case where you have no defects, but 2 tests cases (1,2) - the result of an outer joiun will be:
defect test
------ ----
null 1
null 2
If you just count the rows, you'll get 2 defects in your output.
Taking a simpler schema, this demonstrates the 2 methods for getting the values - note that they have very different performance characteristics.
SELECT project.id
, dilv.defects
, (SELECT COUNT(*)
FROM test_cases) AS tests
FROM project
LEFT JOIN ( SELECT project_id, COUNT(*) AS defects
FROM defect_table
GROUP BY project_id) AS dilv
ON project.id=dilv.project_id

Selecting multiple values from multiple tables

MySQL.
I have two tables, one is "Questions" and the other is "Answers"
The Questions table:
- question_id
- user_id
- question
The Answers table:
- answer_id
- question_id
- user_id
- answer
- correct
The goal is to get all questions (and associative answers) based on a user's id. I've been able to get all of the answers, however I'm only getting one question. I can see why it's only getting a single question, but I don't have any idea how to go about getting the question text for each answer.
Here's the code that I'm using right now. Where id_in is an input value on a saved procedure. The issue is that it gives me all of the answers for each question, but all of them return the same question text. I feel like possibly a type of join would be better here, but we haven't started learning about them yet and I hardly know anything about them as is.
BEGIN
DECLARE question_text VARCHAR(40);
SELECT question INTO question_text FROM questions WHERE user_id = id_in;
SELECT question_text, Q.* FROM answers AS Q WHERE user_id = id_in;
END
Yes, this is homework. I'm just completely lost as to what I need to be doing.
Left joins allow for All things in the left table, and only the matching things in the right table. In my example I may have A and Q mixed up but I think this is the general gist of it. You can also take the user_id = in_id and move that to a wear, but filter on the join should be faster.
SELECT
Q.QUESTION
, A.ANSWER
, A.CORRECT
FROM ANSWERS A
LEFT JOIN QUESTION Q
ON A.QUESTION_ID = Q.QUESTION_ID
AND A.USER_ID = Q.USER_ID
AND A.USER_ID = ID_IN
AND Q.USER_ID = ID_IN

MySQL joining table to itself and comparing results

MySQLFiddle here: http://sqlfiddle.com/#!2/15d447/1
I have a single table I am trying to work with:
Table 1: user_answers table (stores users answers to various questions)
The notable values that are stored are the users id (column uid), the question id for the question they are answering (column quid), the answer to the question (column answer) and the importance of their answer (column importance).
The end result I want:
I'd like to be able to grab all the questions that any two users have answered, excluding any answers from questions that have either not been answered by the other party, or answers to the same question which have a value of 1 for either user in importance. Again, this will only ever be used to compare two users at a time.
I've been pretty unsuccesful in my attempts, but here is what I've tried, just piecing things together:
#attempt one: trying to exclude answers that were not answered by both users
SELECT * FROM user_answers AS uid1
JOIN user_answers AS uid2 ON uid1.uid = uid2.uid
WHERE uid1.uid = 1
AND uid2.uid = 20008
AND uid1.quid IS NOT NULL
AND uid2.quid IS NOT NULL;
This returns no results but I'm not exactly sure why.
#attempt two: trying to exclude where answers are the same for both users
SELECT * FROM user_answers AS uid1
LEFT JOIN user_answers AS uid2 ON (uid1.uid = uid2.uid AND uid1.answer <> uid2.answer)
This gives me results, but seems to be doubling up on everything because of the join. I also tried in this attempt to eliminate any answers what were the same, which seems to be working in that sense.
Any guidance is appreciated.
Thanks.
You can answer your question using an aggregation query. The idea is to using the having clause to filter the rows for the conditions you are looking at.
Because you are not interested at all in questions with importance = 1 those are filtered using a where clause:
select ua.quid
from user_answers ua
where importance <> 1 and uid in (1, 20008)
group by ua.quid
having sum(uid = 1) > 0 and
sum(uid = 20008) > 0;
If you want to include the answers, you can do:
select ua.quid,
group_concat(concat(uid, ':', answer) order by uid) as answers
Just a simple version of what you need.
select *
from user_answers a,
user_answers b
where a.quid = b.quid
and a.uid <> b.uid
and 1 not in (a.importance, b.importance)
If you like to filter just the questions just change the * for distinct a.quid
See it here on fiddle: http://sqlfiddle.com/#!2/15d447/15

Database Design/SQL Optimisation: WHERE <id> NOT IN (thousands of IDs)

I'v been asked to add functionality to an application that lets users vote between two options: A and B. The table for those questions is very basic:
QUESTIONS
question_id (PK)
option_id_1(FK)
option_id_2(FK)
urgent (boolean)
Each time a user votes, that the user has voted is stored in an equally simple table:
USER VOTES
vote_id (PK)
user_id (FK)
question_id (FK)
The algorithm for selecting which question appears when a user requests a new one is complex, but for our purposes we can assume it's random. So, the issue?
Each user will be voting on many questions. Likely hundreds, and possibly thousands. I need to ensure no user is presented with a question they've already voted on, and the only way I can think to do that will, I'm guessing, pound the server into oblivion. Specifically, something like:
SELECT * from questions WHERE question_id NOT in (SELECT question_id from user_votes WHERE user_id = <user_id>) ORDER BY RAND() LIMIT 1.
[Note: RAND() is not actually in the query - it's just there as a substitute for a slightly complex (order_by).]
So, keeping in mind that many users could well have voted on hundreds if not thousands of questions, and that it's not possible to present the questions in a set order...any ideas on how to exclude voted-on questions without beating my server into the ground?
All advice appreciated - many thanks.
JOIN operator perform much better than nested queries in MySQL(that might have changed with latest MySQL releases but if you are experiencing performance problems than i guess my statement still holds)
what you could do is simply left join votes onto questions and only pick those records where none votes were joined(nobody voted):
SELECT *
FROM questions q
LEFT JOIN user_votes uv ON
uv.question_id = q.question_id AND
uv.user_id = '<user_id>'
WHERE vote_id IS NULL
RAND() is nasty however this may mitigate the problem while giving you the results you need. Seeing as you have mentioned that the RAND() is an example, I can't really provide more specific suggestions than that below however replacing the ORDER BY should work just fine.
The more you are able to limit the number of rows in the inner query, the faster the entire query will perform.
SELECT
q.*
FROM (
-- First get the questions which have not been answered
SELECT
questions.*
FROM questions
LEFT JOIN user_votes
ON user_votes.question_id = questions.question_id
AND user_votes.user_id = <user_id>
WHERE user_votes.user_id IS NULL
) q
-- Now get a random 1. I hate RAND().
ORDER BY RAND()
LIMIT 1