What's the best way to find common rows in a table

What's the best way to find common rows in a table - mysql

We have a web project, there are some multiple-choice questions and the registered users answer them...
I want to find the common answers of two users to give a comparison. I mean, the logged in user will go to some other user's profile page and see the comparison of their answers for their common questions only...
The tables are like (simplified):
questions
id
question
active -> enum('Y', 'N')
answers
id
question_id
answer
users
id
nick
user_answers
user_id
question_id
answer_id
private -> enum('Y', 'N')
I can join user_answers table with itself by giving an alias but I have to join other tables too. Private answers should not be taken into account, only active questions should be taken...
The site is expected to get some load so I'm afraid there are too many joins and where conditions. From my experiences I know that these complex queries can lock the tables and cause some serious performance issues, especially under heavy load...
So what would be the best practice when scalability and performance is taken into account...
Would smt. like sphinx or solar help, or any software based solution to do the comparison?
The results will be paginated by the way...
Right now I'm thinking of seperating question details and answer details and cache them, so the query will be smt. like:
select ua1.answer_id as her_answer_id,
ua2.answer_id as my_answer_id,
ua1.question_id
from user_answers ua1
inner join users_answers ua2 on ua1.question_id=ua2.question_id
where ua1.user_id=$herId
and ua2.user_id=$myId
and ua1.private='N'
order by ua1.question_id desc
questions.question and answers.answer will be taken from cache... In this case passive questions will be a problem but I think I'll try to move the passive questions yo some backup database which will complicate things...

I would use a conditional in the query:
select
user_answers.question_id `QuestionId`,
max(if((user_answers.user_id = my_id), user_answers.answer_id, 0)) `MyAnswer`,
max(if((user_answers.user_id = other_id), user_answers.answer_id, 0)) `OtherAnswer`
from users_answers
where user_answers.private = 'N'
and user_answers.user_id IN(myid, orderid)
group by users_answers.question_id
having count(*) = 2
order by user_answers.question_id
Haven't tested it, but you should get the idea!
Hope this works out for you...

Related

MySQL Return least common rows based on another Table

I was wondering if I could get some help please? Apologies for the title, it was hard to summarise what I needed.
I have 2 tables, one contains a bank of questions and the other one stores people's responses to said questions. Both are linked by a unique identifier string.
What I need is to be able to order the questions in the question bank based on which ones have the least amount of entries in the answers table.
Is this easy to do? I can elaborate if this isn't clear enough or provide examples of tables etc... if needed.
Many thanks.

I think I figured it out using the below query, thanks for having a look at my question though guys.
SELECT DISTINCT Question_Bank.*,
(
SELECT COUNT(ID)
FROM Question_Answers
WHERE Question_ID = Question_Bank.Question_ID
) AS cnt
FROM Question_Bank
JOIN Question_Answers
ORDER BY cnt ASC

Filtering Results with JOIN

I have page called questions where the user gets asked questions and he/she has the option to answer them. The questions are pulled from a table called questions. When a question gets answered, a table in my database called answered_questions registers the id of the question answered and the id of the user who answered the question. The purpose of this is to hide the answered questions when the user accesses the page again.
On page load I'm trying to join the two tables and see if the question_id exists in both tables where the userID is that of the logged in user. If the id does exist in both tables then it shouldnt display the result per the use of <>. Problem is that its looping several times for each iteration when I try the following query:
SELECT questions.question_id, questions.user_id
FROM `questions`
JOIN `answered_questions`
ON questions.question_id <> answered_questions.question_id
WHERE answered_questions.user_id = ".$userID."
But it works fine when I use this
SELECT questions.question_id, questions.user_id
FROM `questions`
JOIN `answered_questions`
ON questions.question_id **=** answered_questions.question_id
WHERE answered_questions.user_id = ".$userID."
I sense that I'm doing something wrong with the logic of it all. Any help or clues would be highly appreciated.

To get unanswered questions You can use LEFT JOIN:
SELECT questions.question_id, questions.user_id
FROM questions
LEFT JOIN answered_questions
ON answered_questions.question_id = questions.question_id
AND answered_questions.user_id = ".$userID."
WHERE answered_questions.question_id IS NULL

Mysql Table and Index Design for Dating Portal [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am creating a dating portal where we will be asking user around 40-50 questions like religion,caste,date of birth,food preference,smoking/non smoking.
I am asking similar questions on the user preference like age range,religion preference,smoking preference.
I have around 30-40 such preference.
Now I want to show user the matches based on the preference set.
I want to know how I should design MySQL tables and indexes.
Should I create 1 big table of user_preferences and have all preferences indexes.
Should it be multiple column indexes or merge indexes.
Should I keep set of questions in different tables and join them when fetching the data?
m

I think this could be a case for EAV:
You should be able to get the matching user pairs in the descending order (from most matching to least) similar to this:
SELECT *
FROM (
SELECT U1.USER_ID, U2.USER_ID, COUNT(*) MATCH_COUNT
FROM USER U1
JOIN USER_PREFERENCE P1
ON (U1.USER_ID = P1.USER_ID)
JOIN USER_PREFERENCE P2
ON (P1.NAME = P2.NAME AND P1.VALUE = P2.VALUE)
JOIN USER U2
ON (P2.USER_ID = U2.USER_ID)
WHERE U1.USER_ID < U2.USER_ID -- To avoid matching the user with herself and duplicated pairs with flipped user IDs.
GROUP BY U1.USER_ID, U2.USER_ID
) Q
ORDER BY MATCH_COUNT DESC
This just matches the preferences by their exact values. You may want to create additional "preference" tables for range or enum-like values, and replace P1.VALUE = P2.VALUE accordingly. And you may still need special processing if the match is with the data in USER table (such whether user's age falls into other user's preferred age range).
Note the index on {NAME, VALUE} which is meant to help P1.NAME = P2.NAME AND P1.VALUE = P2.VALUE. InnoDB tables are clustered, and one consequence is that secondary indexes contain the copy of PK fields - which in this case causes the index I1 to completely cover the table. Whether MySQL will actually use it is another matter - as always look at the query plan and measure on representative data...

I see something like this:
questions is the list of questions to be answered. question_type is an enumeration that indicates what type of answer is expected (e.g. lookup from question_choices, a date, a number, text, etc.) - whatever types of data you expect to be entered. This, along with the other columns in this table, can drive your input form.
question_answers contains a list of predefined answers to questions (such as a predefined list of religions, or hair color, or eye color, etc.). This can be used to build a drop-down list of values on your input form.
users is pretty self explanatory.
user_characteristics contains a list of my answers to the questionnaire. The weight column indicates how important it is to me that someone looking for me have this same answer. The question_choices_id would be populated if the answer came from a select list built from the question_choices table. Otherwise question_choices_id would be NULL. The converse is true for the value column. value would be NULL if the answer came from a select list built from the question_choices table. Otherwise value would contain the user's hand crafted answer to the question.
user_preferences contains answers to the questionnaire for who I am looking for. The weight column indicates how important it is to me that the person I am looking for have this same answer. The question_choices_id and value columns behave the same as in the user_characteristics table.
SQL to find my match might look something like:
SELECT uc.id
,SUM(up.weight) AS my_weighted_score_of_them
,SUM(uc.weight) AS their_weighted_score_of_me
,SUM(up.weight) + SUM(uc.weight) AS combined_weighted_score
FROM user_preferences up
JOIN user_characteristics uc
ON uc.questions_id = up.questions_id
AND uc.question_choices_id = up.question_choices_id
AND uc.value = up.value
AND uc.users_id != up.users_id
WHERE up.users_id = me.id
GROUP BY uc.id
ORDER BY SUM(up.weight) + SUM(uc.weight) DESC
,SUM(up.weight) DESC
,SUM(uc.weight) DESC
For performance reasons, an index on user_characteristics(id, question_id, question_choices_id, value, and user_id) and an index on user_preferences(id, question_id, question_choices_id, value, and user_id) would be advisable.
Note that the above SQL will return one row for EVERY user except the one making the request. This certainly is NOT desirable. Consequently, one might consider adding HAVING SUM(up.weight) + SUM(uc.weight) > :some_minimum_value - or some other way to further filter the results.
Further tweaks might include only returning people who value an answer as much or more than I do (i.e. their characteristic weight is >= my weight preference weight.

Database Design/SQL Optimisation: WHERE <id> NOT IN (thousands of IDs)

I'v been asked to add functionality to an application that lets users vote between two options: A and B. The table for those questions is very basic:
QUESTIONS
question_id (PK)
option_id_1(FK)
option_id_2(FK)
urgent (boolean)
Each time a user votes, that the user has voted is stored in an equally simple table:
USER VOTES
vote_id (PK)
user_id (FK)
question_id (FK)
The algorithm for selecting which question appears when a user requests a new one is complex, but for our purposes we can assume it's random. So, the issue?
Each user will be voting on many questions. Likely hundreds, and possibly thousands. I need to ensure no user is presented with a question they've already voted on, and the only way I can think to do that will, I'm guessing, pound the server into oblivion. Specifically, something like:
SELECT * from questions WHERE question_id NOT in (SELECT question_id from user_votes WHERE user_id = <user_id>) ORDER BY RAND() LIMIT 1.
[Note: RAND() is not actually in the query - it's just there as a substitute for a slightly complex (order_by).]
So, keeping in mind that many users could well have voted on hundreds if not thousands of questions, and that it's not possible to present the questions in a set order...any ideas on how to exclude voted-on questions without beating my server into the ground?
All advice appreciated - many thanks.

JOIN operator perform much better than nested queries in MySQL(that might have changed with latest MySQL releases but if you are experiencing performance problems than i guess my statement still holds)
what you could do is simply left join votes onto questions and only pick those records where none votes were joined(nobody voted):
SELECT *
FROM questions q
LEFT JOIN user_votes uv ON
uv.question_id = q.question_id AND
uv.user_id = '<user_id>'
WHERE vote_id IS NULL

RAND() is nasty however this may mitigate the problem while giving you the results you need. Seeing as you have mentioned that the RAND() is an example, I can't really provide more specific suggestions than that below however replacing the ORDER BY should work just fine.
The more you are able to limit the number of rows in the inner query, the faster the entire query will perform.
SELECT
q.*
FROM (
-- First get the questions which have not been answered
SELECT
questions.*
FROM questions
LEFT JOIN user_votes
ON user_votes.question_id = questions.question_id
AND user_votes.user_id = <user_id>
WHERE user_votes.user_id IS NULL
) q
-- Now get a random 1. I hate RAND().
ORDER BY RAND()
LIMIT 1

MYSQL joins not working as expected

I am trying to create a join statement to solve my problem but cannot get my head around it.
I am a new to join statements so please bear with me if my sql statement is nonsense.
I have two tables, one is a table of questions where users have asked questions about items for sale.
Second is a table of items that the user has asked a question about.
Table one called questions consists of question_ref, questioner_user_ref, item_ref, seller_ref, question_text, timestamp
questions
===========
+--------------+--------------------+---------+-----------+--------------+----------+
| question_ref |questioner_user_ref |item_ref |seller_ref |question_text |timestamp |
+--------------+--------------------+---------+-----------+--------------+----------+
Table two called my_item_comments consists of questioner_ref, item_ref, last_question_ref
my_item_comments
===========
+---------------+---------+------------------+
|questioner_ref |item_ref |last_question_ref |
+---------------+---------+------------------+
I have set up table two to keep track of items that the user has asked questions about so they can be informed when someone else asks the seller a question or the seller answers.
So I want to create a recordset of questions that
a). Someone has answered a question about an item the user is selling
b). A seller has replied to a question the user asked,
c). A third user has asked a question about an item that the user has also asked a question about.
A bit like facebook's commenting system, where you are informed about comments people have made on statuses that you have commented on.
So my current sql statement is as follows
$user_ref= current logged in user
$sql=mysql_query("
SELECT * FROM questions
LEFT JOIN my_item_comments
ON questions.item_ref=my_item_comments.item_ref
WHERE questions.questioner_user_ref!='$user_ref'
AND (questions.seller_ref='$user_ref' OR questions.item_ref=my_item_comments.item_ref
ORDER BY timestamp DESC");
The results don't work and I think its because of the OR questions.item_ref=my_item_comments.item_ref but for the life of me I cannot work it out.
Any help would be greatly appreciated, even if it means restructuring my database with more tables or new fields in the tables.
thanks in advance, Barry

SELECT * FROM questions
LEFT JOIN my_item_comments ON questions.item_ref=my_item_comments.item_ref
WHERE
questions.questioner_user_ref != '$user_ref' AND
(questions.seller_ref='$user_ref' OR questions.item_ref=my_item_comments.item_ref)
ORDER BY timestamp DESC
I think you were missing a ) to enclose the OR clause (in the example above I added it)

You should try to improve readability for yourself and us so that it is easier to debug and easier for us to help you :-)
$q = "
SELECT *
FROM questions
RIGHT JOIN my_item_comments ON questions.item_ref = my_item_comments.item_ref
WHERE questions.questioner_user_ref != '".$user_ref."'
AND ( questions.seller_ref = '".$user_ref."'
OR questions.item_ref=my_item_comments.item_ref
)
ORDER BY timestamp DESC";
$rs = mysql_query($q);
And preferably just the query with dummy data
SELECT *
FROM questions
RIGHT JOIN my_item_comments ON questions.item_ref = my_item_comments.item_ref
WHERE questions.questioner_user_ref != 'dummy_ref'
AND ( questions.seller_ref = 'dummy_ref'
OR questions.item_ref=my_item_comments.item_ref
)
ORDER BY timestamp DESC
Try a right join for your query. Left join will indeed get all the rows.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008