Database Design/SQL Optimisation: WHERE <id> NOT IN (thousands of IDs)

Database Design/SQL Optimisation: WHERE <id> NOT IN (thousands of IDs) - mysql

I'v been asked to add functionality to an application that lets users vote between two options: A and B. The table for those questions is very basic:
QUESTIONS
question_id (PK)
option_id_1(FK)
option_id_2(FK)
urgent (boolean)
Each time a user votes, that the user has voted is stored in an equally simple table:
USER VOTES
vote_id (PK)
user_id (FK)
question_id (FK)
The algorithm for selecting which question appears when a user requests a new one is complex, but for our purposes we can assume it's random. So, the issue?
Each user will be voting on many questions. Likely hundreds, and possibly thousands. I need to ensure no user is presented with a question they've already voted on, and the only way I can think to do that will, I'm guessing, pound the server into oblivion. Specifically, something like:
SELECT * from questions WHERE question_id NOT in (SELECT question_id from user_votes WHERE user_id = <user_id>) ORDER BY RAND() LIMIT 1.
[Note: RAND() is not actually in the query - it's just there as a substitute for a slightly complex (order_by).]
So, keeping in mind that many users could well have voted on hundreds if not thousands of questions, and that it's not possible to present the questions in a set order...any ideas on how to exclude voted-on questions without beating my server into the ground?
All advice appreciated - many thanks.

JOIN operator perform much better than nested queries in MySQL(that might have changed with latest MySQL releases but if you are experiencing performance problems than i guess my statement still holds)
what you could do is simply left join votes onto questions and only pick those records where none votes were joined(nobody voted):
SELECT *
FROM questions q
LEFT JOIN user_votes uv ON
uv.question_id = q.question_id AND
uv.user_id = '<user_id>'
WHERE vote_id IS NULL

RAND() is nasty however this may mitigate the problem while giving you the results you need. Seeing as you have mentioned that the RAND() is an example, I can't really provide more specific suggestions than that below however replacing the ORDER BY should work just fine.
The more you are able to limit the number of rows in the inner query, the faster the entire query will perform.
SELECT
q.*
FROM (
-- First get the questions which have not been answered
SELECT
questions.*
FROM questions
LEFT JOIN user_votes
ON user_votes.question_id = questions.question_id
AND user_votes.user_id = <user_id>
WHERE user_votes.user_id IS NULL
) q
-- Now get a random 1. I hate RAND().
ORDER BY RAND()
LIMIT 1

Related

MySQL Return least common rows based on another Table

I was wondering if I could get some help please? Apologies for the title, it was hard to summarise what I needed.
I have 2 tables, one contains a bank of questions and the other one stores people's responses to said questions. Both are linked by a unique identifier string.
What I need is to be able to order the questions in the question bank based on which ones have the least amount of entries in the answers table.
Is this easy to do? I can elaborate if this isn't clear enough or provide examples of tables etc... if needed.
Many thanks.

I think I figured it out using the below query, thanks for having a look at my question though guys.
SELECT DISTINCT Question_Bank.*,
(
SELECT COUNT(ID)
FROM Question_Answers
WHERE Question_ID = Question_Bank.Question_ID
) AS cnt
FROM Question_Bank
JOIN Question_Answers
ORDER BY cnt ASC

Mysql Left join multiple 1 to many tables

I am trying to get the following data from 3/4 tables in 1 Mysql query, wondering if it is possible ? The tables are
TOPIC
topicid (FK)(PK)
groupid
topic
user
LIKED
likeid
topicid (FK)
user
COMMENT
commentid (PK)
topicid (FK)
comment
user
I write my topics and store in TOPIC Table with unique topicid. I group each topic using groupid.
Other tables may have 0 or more data per topicid.
I am trying to get each topic for a particular group and also get other datas from the concerned Tables. I checked How to left join multiple one to many tables in mysql? and got few idea but that is for the count while I wanted to get details from that table (users who like), and (user and their comment).
I have tried
SELECT t.topicid,
topic,
group_concat(DISTINCT likeid,l.user SEPARATOR '|'),
group_concat(DISTINCT commentid,comment,c.user SEPARATOR '|') AS comments
FROM TOPIC t
LEFT JOIN LIKE l ON l.topicid = t.topicid
LEFT JOIN COMMENT c ON c.topicid = t.topicid
WHERE t.groupid='some_value'
GROUP BY t.topicid
While this works partly e.g. I do get the details but only if there is one topic in a group. If there are 2 or more topics in a group then the concat details are stored in the first record only and the later topics show no likes and comments.
Can someone please help me to correct this or any particular Mysql function I am missing

I am very very sorry for wasting your time, after thorough re-check I found my table data were wrong (checked after making sqlfiddle thanks #Barmar).
I was by mistake inserting wrong data in like and comment table. So Likes and comments for 2nd topic topicid='2' of groupid='1' were inserted by mistake as topicid='1' that is why the details only showed in 1st topic and nothing came out for second topic.
The SQL above is absolutely correct and thankyou for helping me find the fault.
Extremely sorry for posting again.

MYSQL joins not working as expected

I am trying to create a join statement to solve my problem but cannot get my head around it.
I am a new to join statements so please bear with me if my sql statement is nonsense.
I have two tables, one is a table of questions where users have asked questions about items for sale.
Second is a table of items that the user has asked a question about.
Table one called questions consists of question_ref, questioner_user_ref, item_ref, seller_ref, question_text, timestamp
questions
===========
+--------------+--------------------+---------+-----------+--------------+----------+
| question_ref |questioner_user_ref |item_ref |seller_ref |question_text |timestamp |
+--------------+--------------------+---------+-----------+--------------+----------+
Table two called my_item_comments consists of questioner_ref, item_ref, last_question_ref
my_item_comments
===========
+---------------+---------+------------------+
|questioner_ref |item_ref |last_question_ref |
+---------------+---------+------------------+
I have set up table two to keep track of items that the user has asked questions about so they can be informed when someone else asks the seller a question or the seller answers.
So I want to create a recordset of questions that
a). Someone has answered a question about an item the user is selling
b). A seller has replied to a question the user asked,
c). A third user has asked a question about an item that the user has also asked a question about.
A bit like facebook's commenting system, where you are informed about comments people have made on statuses that you have commented on.
So my current sql statement is as follows
$user_ref= current logged in user
$sql=mysql_query("
SELECT * FROM questions
LEFT JOIN my_item_comments
ON questions.item_ref=my_item_comments.item_ref
WHERE questions.questioner_user_ref!='$user_ref'
AND (questions.seller_ref='$user_ref' OR questions.item_ref=my_item_comments.item_ref
ORDER BY timestamp DESC");
The results don't work and I think its because of the OR questions.item_ref=my_item_comments.item_ref but for the life of me I cannot work it out.
Any help would be greatly appreciated, even if it means restructuring my database with more tables or new fields in the tables.
thanks in advance, Barry

SELECT * FROM questions
LEFT JOIN my_item_comments ON questions.item_ref=my_item_comments.item_ref
WHERE
questions.questioner_user_ref != '$user_ref' AND
(questions.seller_ref='$user_ref' OR questions.item_ref=my_item_comments.item_ref)
ORDER BY timestamp DESC
I think you were missing a ) to enclose the OR clause (in the example above I added it)

You should try to improve readability for yourself and us so that it is easier to debug and easier for us to help you :-)
$q = "
SELECT *
FROM questions
RIGHT JOIN my_item_comments ON questions.item_ref = my_item_comments.item_ref
WHERE questions.questioner_user_ref != '".$user_ref."'
AND ( questions.seller_ref = '".$user_ref."'
OR questions.item_ref=my_item_comments.item_ref
)
ORDER BY timestamp DESC";
$rs = mysql_query($q);
And preferably just the query with dummy data
SELECT *
FROM questions
RIGHT JOIN my_item_comments ON questions.item_ref = my_item_comments.item_ref
WHERE questions.questioner_user_ref != 'dummy_ref'
AND ( questions.seller_ref = 'dummy_ref'
OR questions.item_ref=my_item_comments.item_ref
)
ORDER BY timestamp DESC
Try a right join for your query. Left join will indeed get all the rows.

How do I fix this query?

I'm writing an application where people ask questions, and get answers in the form of a survey. Each question has 2 options, plus a default option(s). When a person answers the question, they can choose from either the 2 options set by the asker, or the default option(s) chosen by me. For instance, if the question is Vanilla vs. Chocolate, options will be Vanilla, Chocolate, and Neither. I want to be able to tabulate the percentage of options chosen for a question, i.e., 25% say chocolate, 30% say vanilla, 45% say neither.
I'll start by showing the table structure and the query I'm running.
These are the tables involved (Note: these are not the full table structures):
--questions--
id
user_id
topic
description
--options--
id
text
default (bool)
--questions_options--
question_id
option_id
--answers--
id
question_id
user_id
option_id
Here is the query:
SELECT
options.id AS option_id, options.text, options.default,
ROUND(
IFNULL(
(COUNT(answers.option_id) * 100)
/
(SELECT COUNT(answers.option_id) FROM answers WHERE question_id = QUESTION_ID)
, 0)
, 2) AS percentage
FROM options
LEFT JOIN questions_options ON questions_options.option_id = options.id
LEFT JOIN answers ON answers.option_id = options.id
WHERE questions_options.question_id = QUESTION_ID
OR options.default = '1'
GROUP BY options.id
ORDER BY percentage DESC, option_id ASC
Where QUESTION_ID is an integer constant.
The problem is the query is not limiting answers to only those given for a particular question, and because the options are many to many with questions, I'm getting results like 600% for vanilla (if multiple questions use vanilla as an option). In cases where the options are unique to ONE question, then the percentages make sense, except for the default options, which are present for all questions. I tried putting WHERE answers.question_id = QUESTION_ID in there, but it did not work.
Any solutions?
Thanks

You're doing the joins in the wrong direction - you're looking at options first, even though you have specifically stated you want things tabulated by question. This means that you're getting results for all options, regardless of whether or not they even relate to your question...
Oh, and I'm assuming that answer_id is mapped to question_id, or you're not going to be able to get any meaningful results (that is - answers are not otherwise mapped to questions...)
Try this query instead:
SELECT b.id, b.text, b.default, (SELECT IFNULL(
ROUND((COUNT(c.id) * 100) /
(SELECT COUNT(d.id)
FROM answers as d
WHERE d.answer_id = a.question_id)
, 2)
, 0)
FROM answers as c
WHERE c.answer_id = a.question_id
AND c.option_id = a.option_id) as percentage
FROM questions_options as a
JOIN options as b
ON b.id = a.option_id
WHERE a.question_id = QUESTION_ID
ORDER BY percentage DESC, a.option_id ASC
Please note that I don't have a copy of MySQL to run this against, and I would normally implement with CTEs (which I have been informed are not supported for MySQL).
EDIT:
In light of the fact that 'default' options may not be mapped through the questions_options table, try this:
SELECT a.id, a.text, a.default, IFNULL(
ROUND((b.answerCount * 100) /
(SELECT COUNT(c.id)
FROM answers as c
WHERE c.answer_id = QUESTION_ID)
, 2)
, 0)
FROM options as a
LEFT JOIN (SELECT c.option_id, count(c.id) as answerCount
FROM answers as c
WHERE c.question_id = QUESTION_ID
GROUP BY c.option_id) as b
ON b.option_id = a.id
Please note that you will still get "meaningless" '0' results for every 'default' answer that was not presented to survey respondents - and no way to distinguish these from any actual '0' results results for 'default' answers that were presented to respondents. You are likely to be far better off placing the so called 'default' options in the questions_options table - as it is, you have no way to determine all the options that were presented to respondents (just which ones you have answers to, which is quite different); this may be a huge business-accountability issue for your company. In addition, some 'default' options may not make sense in context - "Do you prefer your tea hot or cold", "Yes".

Issues that I can see:
You GROUP BY options.id which means you are getting random values for options.text and options.default. This may or may not change your results depending on the structure of your data. If there are multiple rows per id then it will be inaccurate or misleading data.
You have a WHERE clause for your divisor but not your dividend in the percentage calculation - this means you will never have a lower count for the dividend. Try putting a WHERE question_id = QUESTION_ID to the first COUNT statement.

What's the best way to find common rows in a table

We have a web project, there are some multiple-choice questions and the registered users answer them...
I want to find the common answers of two users to give a comparison. I mean, the logged in user will go to some other user's profile page and see the comparison of their answers for their common questions only...
The tables are like (simplified):
questions
id
question
active -> enum('Y', 'N')
answers
id
question_id
answer
users
id
nick
user_answers
user_id
question_id
answer_id
private -> enum('Y', 'N')
I can join user_answers table with itself by giving an alias but I have to join other tables too. Private answers should not be taken into account, only active questions should be taken...
The site is expected to get some load so I'm afraid there are too many joins and where conditions. From my experiences I know that these complex queries can lock the tables and cause some serious performance issues, especially under heavy load...
So what would be the best practice when scalability and performance is taken into account...
Would smt. like sphinx or solar help, or any software based solution to do the comparison?
The results will be paginated by the way...
Right now I'm thinking of seperating question details and answer details and cache them, so the query will be smt. like:
select ua1.answer_id as her_answer_id,
ua2.answer_id as my_answer_id,
ua1.question_id
from user_answers ua1
inner join users_answers ua2 on ua1.question_id=ua2.question_id
where ua1.user_id=$herId
and ua2.user_id=$myId
and ua1.private='N'
order by ua1.question_id desc
questions.question and answers.answer will be taken from cache... In this case passive questions will be a problem but I think I'll try to move the passive questions yo some backup database which will complicate things...

I would use a conditional in the query:
select
user_answers.question_id `QuestionId`,
max(if((user_answers.user_id = my_id), user_answers.answer_id, 0)) `MyAnswer`,
max(if((user_answers.user_id = other_id), user_answers.answer_id, 0)) `OtherAnswer`
from users_answers
where user_answers.private = 'N'
and user_answers.user_id IN(myid, orderid)
group by users_answers.question_id
having count(*) = 2
order by user_answers.question_id
Haven't tested it, but you should get the idea!
Hope this works out for you...

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008