How to get data from multiple tables using just one single query? - mysql

I have a conception problem I think, not a code problem, I have 4 tables in my database, QUESTION, INTERESTS, QUESTIONTAG, and a USER.
My tables structure :
INTEREST -- id
-- user
-- tagparent
QUESTION_TAG -- id
-- tagparent
-- tagchild ( unnecessary )
-- question
QUESTION -- id
-- content and when published ( unnecessary )
I want to get 10 questions ordered by index_order ( field ) and are interesting for the USER, and I know he is interested by those questions using QUESTION_TAG association table that links between the question and the tag.
so the logic is this :
request 0 : SELECT * FROM QUESTION WHERE id = ( results of request 1
- returns many ) ORDER BY index_order LIMIT 0,10
This request must return many results :
request 1 : SELECT DISTICNT question FROM question_tag WHERE tagparent
= ( results of request 2 - returns many )
This request returns many results too
request 2 : SELECT tagparent FROM interests WHERE user=9
So using SubQueries is not really that helpful.
I am stuck, and I wish I can find a solution just using a single request, without filtering the data with a back-end language.
Any help would be much appreciated.

May Be This is what you are looking for.
select question.*, distinct question_tag.question, interests.tagparent
from question
inner join question_tag on question_tag.question = question.id
inner join interests on interest.tagparent = question_tag.tagparent
where interests.user = 9

Related

How could I perform a JOIN over 5 tables and get the data I need?

I have a DB with 5 tables representing surveys and their questions, users' responses and a mailing list to keep info about email that I've sent.
I generate guids for users to make it impossible to pass other user's survey (there's no authentication in my app) and store those guids both in mailing list table and response table.
Here is the DB schema:
When frontend requests survey by guid so that user could answer it, I put a simple SELECT query:
SELECT
se.survey_id, se.id, se.`type`, se.title
FROM
survey_element AS se
JOIN mailing_list AS ml
ON se.survey_id = ml.survey_id
WHERE ml.guid = '7658bc0f768bd5e0'
ORDER BY se.`order`
this gets me
survey_id id type title
1 1 head_question How do you feel?
1 5 idea Do you have any ideas?
(also see https://www.db-fiddle.com/f/rktX4mjfxKLwGA1izbQFoV/2)
Everyting works fine, but now I want to select answers to the questions if there were any. So, same guid, same questions (survey elements), and I want to get the output for the user that answered '1' to the first question, and didn't answer the second quesion:
survey_id id type title answer
1 1 head_question How do you feel? 1
1 5 idea Do you have any ideas? (NULL)
I've tried different ways, but I get either multiple question with the same title (since multiple users answered it) or the same answer for the question user haven't really answered.
Like that (gives multiple entries, see https://www.db-fiddle.com/f/2yQJ1q7jDzQMNWGBxNB6br/3):
SELECT
se.survey_id, se.id , se.`type`, se.title, re.answer
FROM
survey_element AS se
JOIN mailing_list AS ml
ON se.survey_id=ml.survey_id
LEFT OUTER JOIN response_element AS re
ON re.survey_element_id = se.id
WHERE ml.guid = '7658bc0f768bd5e0'
ORDER BY se.`order`
All the data is there, but I just can't select it the right way :-\
Any ideas on how to do that?
DB server is MySQL 5.7.
I think you are possibly missing a JOIN between response and response_element? It doesn't show in your data model, but I assume it would work the same as survey_element to survey does? If it does work this way then this query appears to work:
SELECT
se.survey_id,
se.id,
se.`type`,
se.title,
re.answer
FROM
survey_element AS se
INNER JOIN mailing_list AS ml ON se.survey_id = ml.survey_id
INNER JOIN response r ON r.guid = ml.guid
LEFT JOIN response_element AS re ON re.survey_element_id = se.id
AND re.response_id = r.id
WHERE
ml.guid = '7658bc0f768bd5e0'
ORDER BY
se.`order`
This gives you a list of questions and responses for Joe, where there is a NULL because he didn't answer the second question.
survey_id id type title answer
1 1 head_question How do you feel? 1
1 5 idea Do you have any ideas? null
Actually, re-reading your question I can see that response_id in response_element has a (FK) postfix, which makes me even more sure that this is just an omission from your schema/ data model?

Complex MySQL Query to calculate percentage improvement by specific categories

I'm stumped after countless hours of trial. Albeit i'm not a SQL GURU, so I appeal to those who are. I'd like to know if/how it's possible to write a single query to look like:
Specialty Performance on PreTest Questions
Infectious Disease 25% (37/148)
Internal Medicine 17% (2/12)
Pathology 20% (3/15)
This is an Exam database. What I want is a listing by specialty that shows a percentage. The first number represents the total number of people who got a question correct (37). The second is the total number who answered it at all, right or wrong (148)
Pre-Test Questions
A pretest consists of a set of modules and questions
module 2 questions (1,2),
module 3 questions (1,2,3),
module 4 question (1),
module 5 question (1),
module 6 question (1)
Where Clause
This is part of the where clause. It's how we calculate a "correct" question:
(q.type = 'PASS_FAIL' and e.correct = 'T' )
Here is the part that derives our total of those who answered it at all:
(q.type = 'PASS_FAIL' )
My Best Attempt
I'm convinced that we can't do this for the entire set of pre-test questions as one query, so
doing it per-question is ok. I think a parametrized query where we drop in the module and question numbers would be fine.
The best I could come up with is totals by specialty using two separate queries. I couldn't figure out how to make this a single query, nor could I link in the percentage calculation (per specialty). Is it possible????
I am a sponge for knowledge!
-thanks
CREATE
ALGORITHM = UNDEFINED
VIEW `PretestTotals_M2_Q1_by_specialty_degree`
AS
(select a.specialty, count( e.question ) as totals_M2_Q1
FROM Exam as e
JOIN
Questions as q using(module,question) join Accounts a using (user_id)
WHERE
(q.type = 'PASS_FAIL' )
and
(e.module = 2 and e.question = 1)
group by a.specialty
);
CREATE
ALGORITHM = UNDEFINED
VIEW `PretestCorrect_M2_Q1_by_specialty_degree`
(select a.specialty,a.degree, count( e.question ) as Correct_M2_Q1
FROM Exam as e
JOIN
Questions as q using(module,question) join Accounts a using (user_id)
WHERE
(q.type = 'PASS_FAIL' and e.correct = 'T' )
and
(e.module = 2 and e.question = 1)
group by a.specialty
);
Accounts Table
[typical stuff, but I've noted the fields that are critical to this query]
user_id,
degree, #college degree as selected from a dropdown on a form
specialty #medical specialization
Exam Table
[*records the result of an online Exam.Their user_id, the module and question number, the attempt counter, their actual answer and the correctness T/F of that answer*]
user_id,
module,
question,
attempt,
answer,
correct
Questions table
[*Records the module number, question number, text of the actual question and the 'type' of question it is. Three possible types (ALWAYS_PASS,PASS_FAIL,POLLING) as enumerations*]
module,
question
text,
type

MySQL query with limited subqueries

I'm trying to get some statistical data from a few tables. We have a users table, quiz table, a quiz question set table, and a quiz questions table. Each quiz has many sets, and each set has one or many questions. There's also a questions table which is where the question comes from (the quiz questions table links a question to question set, which then links to a quiz, which then links to the user). What I need is to see the number of questions answered correctly, and the number of questions answered, but only up to the past 50 questions. So if one user has answered 120 questions only the most recent 50 should be used in this query; if a user has answered 37 questions, then all of their questions should be used. I'd like to get this laid out so theres the user_id, questions_answered, questions_answered_correctly. I currently have this working, but I'm looking through each user and grabbing their 50 most recent questions, and with some additional tables limiting organization being joined on I have to do hundreds, if not thousands of these to get one statistical report.
I'm guessing I need to do a subquery somewhere to only pull the most recent questions from the user, but I'm not sure how a subquery like that would work. Here's what I have so far, but I'm sure I'm totally off on this. It executes, but incorrectly. Some of the results are over 50 when they shouldn't be:
SELECT users.id, (SELECT COUNT(grammar_quiz_questions.id) FROM `grammar_quiz_questions`
INNER JOIN `grammar_quiz_question_sets` ON `grammar_quiz_question_sets`.`id` = `grammar_quiz_questions`.`grammar_quiz_question_set_id`
INNER JOIN `grammar_quizzes` ON `grammar_quizzes`.`id` = `grammar_quiz_question_sets`.`grammar_quiz_id`
INNER JOIN `grammar_questions` ON `grammar_questions`.`id` = `grammar_quiz_questions`.`grammar_question_id`
WHERE (grammar_quiz_questions.finished is not null AND grammar_quizzes.user_id = users.id)
ORDER BY grammar_quiz_questions.finished DESC LIMIT 50) AS `questions_answered`, (SELECT COUNT(grammar_quiz_questions.id) FROM `grammar_quiz_questions`
INNER JOIN `grammar_quiz_question_sets` ON `grammar_quiz_question_sets`.`id` = `grammar_quiz_questions`.`grammar_quiz_question_set_id`
INNER JOIN `grammar_quizzes` ON `grammar_quizzes`.`id` = `grammar_quiz_question_sets`.`grammar_quiz_id`
INNER JOIN `grammar_questions` ON `grammar_questions`.`id` = `grammar_quiz_questions`.`grammar_question_id`
WHERE (grammar_quiz_questions.finished is not null AND grammar_quizzes.user_id = users.id AND grammar_quiz_question_sets.correct_on_first_attempt = 1)
ORDER BY grammar_quiz_questions.finished DESC LIMIT 50) AS `questions_answered_correctly`
FROM users
Thanks,
James
UPDATE:
Following update is not a complete answer to the question, but some nudges. I am not sure why you are querying on all these tables. are grammar_quiz_question_sets mutually exclusive subsets of grammar_quiz_questions? how about grammar_quizzes and grammar_questions, what is the set relation? Given that I don't know these answers, but you do, look at the code snippet following. I hope it guides you:
set #correct:=0;
select users.id, count(p.id), sum(if(r.correct_on_first_attempt = 1,1,0)) as correct
from grammar_quiz_questions p, grammar_quiz_question_sets r, users;
ORIGINAL:
I imagine you have a control and data access layer (java, php, python,etc) through which records are added and manipulated. Further, I imagine you need to grab statistics more than once in the lifetime of a user. Therefore, while you may need a query like yours to recalibrate once in a while -- if that will ever be necessary--, you need something less heady. Hence the following proposal.
1] create a statistics table:
create table statistics(
user_id int(11) not null, -- foreign key
questions_answered int(11) not null default 0,
questions_answered_correctly int(11) no null default 0
-- for primary key, you may use user_id or some auto record_id
)
2] the first time around, run your "heavy/administrative" query
3] subsequently, update the stats for a user after each quiz or each answered question. The idea here is that you will have that information in memory (i.e. in your programming layer) since you have to update the quiz table; during that time do some math to update the stats table. e.g. imagine java:
public void updateStats(int userId, int questions, int correct){
String query =
"insert into statistics(user_id,questions_answered,questions_answered_correctly) "+
"values("+userId+", "+questions+", "+correct+") "+
"on duplicate key update "+
"questions_answered=questions_answered+values(questions_answered), "+
"questions_answered_correctly = questions_answered_correctly + values(questions_answered_correctly)";
... //execute the statement
}
now for the "heavy" query, I am rewriting it below with a bit more clarity to encourage others to take a stab at it:
SELECT users.id,
(
SELECT COUNT(p.id)
FROM grammar_quiz_questions p, grammar_quiz_question_sets r, grammar_quizzes t, grammar_questions u
WHERE r.id = p.grammar_quiz_question_set_id
AND t.id = r.grammar_quiz_id
AND u.id = p.grammar_question_id
AND p.finished is not null
AND t.user_id = users.id
ORDER BY p.finished DESC LIMIT 50
) AS questions_answered,
(
SELECT COUNT(p.id)
FROM grammar_quiz_questions p, grammar_quiz_question_sets r, grammar_quizzes t, grammar_questions u
WHERE r.id = p.grammar_quiz_question_set_id
AND t.id = r.grammar_quiz_id
AND u.id = p.grammar_question_id
AND p.finished is not null
AND t.user_id = users.id
AND r.correct_on_first_attempt = 1
ORDER BY p.finished DESC LIMIT 50
) AS questions_answered_correctly
FROM users

efficiently showing the user only items he did not already see before (mysql)

This question applies to any kind of system that contains items (e.g: news articles) and users that watch these items.
So let's say i have a users table ([id],[username]), an articles table ([id],[title],[text]) and a table that contains all the articles viewed by all the users ([user_id],[article_id]).
What i want to do is efficiently show the user only the articles he did not already read before.
I know i can just do something like
select id,title,text from articles where id not in (select article_id
from article_views where user_id = 123)
But what if the current user already read 1M articles ? the query will become something like
select id,... from articles where id not in (1,2,3,......1000000)
This, i can assume, is too slow to be practical.
Also, it sucks because the more articles a user reads - the slower response time he will have retrieving new (unread) articles..
Any other suggestions, db-wise ?
Sometimes, by doing a LEFT JOIN and only returning NULL (ie: not found) entries might be faster than sub-select. It's does a direct join A:B and only includes those where NO match is found
select
a.id,
a.title,
a.text
from
articles a
LEFT JOIN article_views av
on av.User_ID = 123
AND a.id = av.article_id
where
av.article_id IS NULL
I would ensure an index on ( UserID, Article_ID ) (which I believe would be your primary key to that table anyway).
Instead of adding them directly in to the statement, you could run something like:
select articles.id, ... from articles, article_views where article_views.user_id = [useridhere] and articles.id != article_views.id
It alleviates the issue with having huge queries, but you're still comparing a million articles if you have a million articles.

MySQL: Using Count to limit results?

Hi: I need a way to limit the results of a query with SUM or COUNT as if using limit. Is this possible? I have a table with questions and answers. Each entry is identified with "0" if question or "1" if response. Each answer have the questionID, so I can order by this field. I want to show the first 20 questions with all his answers without using limit, because using LIMIT 0,20 would be showing the first 20 entries no matter if they are questions or answers.
I would like to see some logic like this:
SELECT *, SUM(IF(level = '0', 1,0)) AS MyCount FROM table
WHERE MyCount<20 ORDER BY questionID,timestamp
how could I accomplish this? Any suggestion is appreciated.
Since you're doing a GROUP function (sum), try using HAVING instead of WHERE:
SELECT *, SUM(IF(level = '0 ', 1,0)) AS MyCount
FROM table
HAVING MyCount < 20
ORDER BY
questionID,timestamp
-- Edit --
Because you are storing two different types of data (response and question) in the same table, you can try a self join (untested):
SELECT
question.id AS question_id,
question.name AS question_name,
response.id AS response_id,
response.name AS response_name
FROM
table AS question
JOIN
table as response
ON
question.id AND response.level = 1
WHERE
question.level = 0;
this sounds like a badly designed schema. have your questions in one table and your answers in another, make the question ID a foreign key on the answer table. chucking both of them together in the same table is why you now have this problem!