Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm building a functionality similar to Tinder. People can 'like' or 'skip' photo's of someone else, if both people 'like' each other then there is a match.
What is the best approach of a database structure for this functionality? I want to be able to get a list of all matches and all matches per Person.
Approach 1:
Person | JudgedPerson | Like
------ | ------------ | ----
1 | 2 | yes
2 | 1 | yes
1 | 3 | yes
3 | 1 | no
2 | 3 | yes
This looks like a logical approach, but it is difficult to create a MySql query to discover matches. Or is there a simple way to discover it?
Approach 2
Person1 | Person2 | P1LikesP2 | P2LikesP1
------- | ------- | --------- | ---------
1 | 2 | yes | yes
1 | 3 | yes | no
2 | 3 | yes | null
It's easy to create queries to get matches, but the datamodel might be not the best.
What is the best approach?
If approach 1 is the best approach, what mysql queries can I use to discover the matches?
I don't have a formal reason for why I prefer the first option, but it is clear that the second option is not completely normalized.
To query the first table and find pairs of people who like each other, you can try the following self join:
SELECT DISTINCT LEAST(t1.Person, t1.JudgedPerson) AS Person1,
GREATEST(t1.Person, t1.JudgedPerson) AS Person2
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.JudgedPerson = t2.Person AND
t1.Person = t2.JudgedPerson
WHERE t1.Like = 'yes' AND
t2.Like = 'yes'
Note: I added DISTINCT along with LEAST/GREATEST to the SELECT clause because each match will actually come in the form of a duplicate. The reason for this is that, e.g. 1 -> 2, 2 -> 1 would be one matching record, but also 2 -> 1, 1 -> 2 would also be a second record.
Personally, I would consider adding another option to the presented ones: having 2 tables - likes and matches:
Matches
Person1 | Person2
------ | --------
1 | 2
1 | 3
2 | 1
3 | 1
Likes
Who | Whom | Likes
--- | -----|---------
2 | 3 | 'no'
Getting matches would be a simple query:
SELECT p.*
FROM Persons p
INNER JOIN Matches m ON p.Id = m.Person2
WHERE m.Person1 = #judgedPersonId
The idea is to precompute matches instead of resolving them on each query (either in background process or during Like operation - to remove two-way likes and add records to matches tables).
This way one gets faster and easier queries when selecting matches, but the approach involves additional complexity computing "matches" and doing related queries (e.g. finding people who are not yet matched and not disliked).
Related
I'm breaking my head over how to do this one in SQL. I have a table:
| User_id | Question_ID | Answer_ID |
| 1 | 1 | 1 |
| 1 | 2 | 10 |
| 2 | 1 | 2 |
| 2 | 2 | 11 |
| 3 | 1 | 1 |
| 3 | 2 | 10 |
| 4 | 1 | 1 |
| 4 | 2 | 10 |
It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})
I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.
For example, For the above table:
For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
For Answer_ID = 2 - (Question 2 - Answer ID 11)
Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?
Though #rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.
You need a query to find out the most common answers to questions:
SELECT
question_id,
answer_id,
COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC
This would return a table where for each question_id we output counts in descending order.
| 1 | 1 | 3 |
| 1 | 2 | 1 |
| 2 | 10 | 3 |
| 2 | 11 | 1 |
And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.
In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
SELECT
question_id, answer_id, cnt,
IF(question_id=#q_id, NULL, #q_id:=question_id) as v
FROM (
SELECT
question_id, answer_id, COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC) cnts
JOIN (
SELECT #q_id:=-1
) as init;
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
| 1 | 1 | 3 | 1 |
| 1 | 2 | 1 |(null)|
| 2 | 10 | 3 | 2 |
| 2 | 11 | 1 |(null)|
Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:
SELECT
question_id, answer_id
FROM (
SELECT
question_id, answer_id
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, COUNT(*) DESC) cnts
JOIN (
SELECT #q_id:=-1
) as init
WHERE IF(question_id=#q_id, NULL, #q_id:=question_id) IS NOT NULL;
The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
CREATE TABLE user_answers (
user_id INTEGER,
question_id INTEGER,
answer_id INTEGER,
UNIQUE INDEX (question_id, answer_id, user_id)
) engine=InnoDB;
Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.
Do you want a fish? Or do you want to learn how to fish?
Your question seems to have multiple steps.
Fetch info about "questions by users with the given answer". Devise this SELECT and imagine that the results form a new table.
Apply the "OTHER" restriction. This is probably a minor AND ... != ... added to SELECT #1.
Now find the "most common answer". This probably involves ORDER BY COUNT(*) DESC LIMIT 1. It is likely to
use a derived table:
SELECT ...
FROM ( select#2 )
Your question is multi conditional, you have to get first Questions with their asking user from Question table:
select question_id,user_id from question
Then insert the answer to the asked question and make some checks in your Java code like (is user has answered to this same question as the user who is asking this question, is user answered this question for multiple times).
select question_id,user_id from question where user_id=asking-user_id // gets all questions and show on UI
select answer_id,user_id from answer where user_id=answering-user_id // checks the answers that particular user
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
In my application i need to assign multiple groups to my users. There are 1000+ users and 10-15 groups.
Which database design is better?
One-to-many:
USER_ID | GROUP_1 | GROUP_2 | ... | GROUP_15
--------------------------------------------
1 | true | false | ... | true
2 | false | true | ... | true
3 | true | true | ... | true
. | . | . | ... | .
. | . | . | ... | .
. | . | . | ... | .
or many-to-many:
USER_ID | GROUP_ID
------------------
1 | 1
1 | 15
2 | 2
2 | 15
3 | 1
3 | 2
3 | 15
. | .
. | .
. | .
?
The many-to-many is the better design without a doubt.
The first design makes writing queries difficult. Consider the following routine queries.
Is a specified user in a specified group? To do this you have to use a different query for each group. This is undesirable. Also if you are using column names for groups, then the list of groups is part of the database schema rather than being part of the data, where the users are data.
What groups is a specified user in? You could simply return the single row, though many applications would probably prefer (and are versed in) iterating through a result set. Iterating through a subset of columns is doable but unnatural.
What users does a specified group contain? Now you are back to the different queries for each group..
I'll leave the demonstration of these things as an exercise to the reader.
The relational model, which SQL databases approximate, was intended to deal with relations and keys (tables and primary/foreign keys). Information should exist in one (and ONLY ONE) place AS DATA (not metadata). The multi-column approach lacks normalization and will be a maintenance headache into the future.
Note: I edited this response to correct a misreading on my part of the original code. The thrust of the comments is the same however. The second (many-to-many) is the way to go.
If you want to follow the rules of an entity relationship model:
Many-to-many: users can belong to different groups & groups can have multiple users.
One-to-many: a user belongs to one group & groups can have multiple users.
Your second example is a many-to-many, your first isn't a one-to-many. A one-to-many would be:
USER_ID | GROUP_ID
------------------
1 | 1
2 | 15
3 | 2
4 | 15
5 | 1
6 | 2
7 | 15
Where user_id must be unique.
No 2 is standard, you can increase number of groups at any time, also you can handle easy sql join queries easily.
I have two SQL tables:
Tests
id | name
1 | simple
2 | advanced
3 | professional
Questions
id | Question | answer a | answer b | test_id
1 | Working? | Yes | no | 1,3
2 | Do you smoke? | Yes | no | 2,3
3 | You have a driving license? | Yes | No | 2
Questions should be displayed only if in "test_id" is the id of the desired test, for example, test 2 should contain only questions 2 and 3
SELECT * FROM questions WHERE... (array "test_id" contains 2)
It isn't duplicate, because in suggested question is query like "WHERE ... value=(1,2,3,4)", but in my problem, I have query like "... WHERE (1,2,3,4,5)=value" it's opposite and method "in" is not correct.
You should normalize your database, creating a third table to link questions with the tests they appear on:
Tests:
id | name
1 | simple
2 | advanced
3 | professional
Questions:
id | question | answer a | answer b
1 | Working? | Yes | No
2 | Do you smoke? | Yes | No
3 | You have a driving license? | Yes | No
TestQuestions:
id | test_id | question_number | question_id
1 | 1 | 1 | 1
2 | 2 | 1 | 2
3 | 2 | 2 | 3
4 | 3 | 1 | 1
5 | 3 | 2 | 2
You can then fetch the questions for an individual test by joining the TestQuestions table with Questions:
SELECT Questions.*
FROM TestQuestions
INNER JOIN Questions ON Questions.id = TestQuestions.question_id
WHERE TestQuestions.test_id = 2
ORDER BY TestQuestions.question_number
If you are unable to modify your database tables, then you could also determine the questions in a test using MySQL's FIND_IN_SET function:
SELECT * From Questions WHERE FIND_IN_SET('2', test_id) > 0
It is preferable to use the normalized database structure though since this allows the ordering of questions on a test to be expressed and indexes can also be created to allow better performance.
To evaluate the FIND_IN_SET query, the database must consider every row in the Questions table to see if it matches. By adding the following index to the normalized database, the database would be able to seek directly to the relevant questions for a test:
CREATE UNIQUE INDEX test_id_question_number ON TestQuestions (test_id, question_number)
You really should include another table that maps the question to the test. The way you have it now, the questions cannot be easily linked to the test. For example, if you try to use use a LIKE comparison you may get questios back you didn't intend. Say you are looking for questions to test 2
SELECT * FROM questions WHERE test_id LIKE %2%;
You would also retrieve items for tests 12, anything in the 20s and 32, 42, etc.
You should create a new table called testquestions (for example). This would have two fields
test_id
question_id
And then this data (using what you showed above)
test_id | question_id
1 | 1
2 | 2
2 | 3
3 | 1
3 | 2
Now you can run a query like this
SELECT
tests.name,
questions.question,
questions.`answer a`,
questions.`answer b`
FROM (
tests LEFT JOIN testquestions ON test.id=testquestion.test_id
) LEFT JOIN
questions ON questions.id = testquestion.question_id
WHERE
tests.id = 2;
If you need to search one test id at a time you can use
SELECT * FROM questions WHERE (',' + test_id + ',') like '%,1,%'
i think the best solution offer you Phil Ross (only think i would change will be that i would store this Yes/No answer as true/false OR small int and 0/1)... But if you want to keep your table like they are you can do something like this
SELECT q.* FROM Questions q
INNER JOIN tests t
ON t.id = 2
AND q.test_id regexp(CONCAT('(^|,)',t.id, '(,|$)'));
Here is SQL Fiddle to see how it's work...
P.S. The query connect and the first table so you can select value from there if it's needed...
GL!
EDIT: Fixed on Phil suggestion... Tnx Phil :)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a "relationship" matrix like:
+---------+------------+------------+------------+------------+------------+
| name | Albert | Bob | Charles | Dale | Ethan |
+---------+------------+------------+------------+------------+------------+
| Albert | | 0 | 1 | 1 | -1 |
| Bob | | | 1 | -1 | 1 |
| Charles | | | | 0 | 1 |
| Dale | | | | | 0 |
| Ethan | | | | | |
+---------+------------+------------+------------+------------+------------+
0 means they don't know each other
1 means they like each other
-1 means they don't like each other
Now, I want to input two names and get the number of mutual known people and 'speculate' on their relationship by adding up the 'likes' (preferable in one single SELECT).
For example take the pair of Charles and Dale:
Charles knows Albert and Bob, who also know Dale. The relationship
between Charles and Dale would probably be friendly since Charles
likes Albert (+1) who likes Dale (+1) and Charles likes Bob (+1)
though Bob does not like Dale (-1).
So, the output would be 2 mutual known people and a 'speculation' of +3.
I can't get my head around a functional subselect query, plus the fact that the matrix is only half-filled seems to make it more complicated (sometimes a name is the first index, sometimes it is the second).
Could someone help me formulate a useful query, please?
As per a comment above you should modfify your table structure to something more sensible.
So we assume tables like:
Person - Columns: (PersonId, Name)
PersonRelationships - Columns: (Person1Id, Person2Id, Relationship)
Then a query might look like:
DECLARE #Person1Id INT;
DECLARE #Person2Id INT;
SET #Person1Id = 1;
SET #Person2Id = 2;
SELECT SUM(r1.Relationship + r2.Relationship)
(
SELECT
Person2Id AS CommonRelatedPersonId, Relationship
FROM PersonRelationships
WHERE Person1Id = #Person1Id
UNION
SELECT
Person1Id AS CommonRelatedPersonId, Relationship
FROM PersonRelationships
WHERE Person2Id = #Person1Id
) r1
JOIN
(
SELECT
Person2Id AS CommonRelatedPersonId, Relationship
FROM PersonRelationships
WHERE Person1Id = #Person2Id
UNION
SELECT
Person1Id AS CommonRelatedPersonId, Relationship
FROM PersonRelationships
WHERE Person2Id = #Person2Id
) r2 ON r1.CommonRelatedPersonId = r2.CommonRelatedPersonId;
Please excuse any syntax errors - I'm more used to MS SQL Server syntax.
Still you should be able to see the concept - you need a relationship table, linking people, and you need to assume the link could be in either direction (hence the unions above)
Join 2 unioned (A-> B + B -> A) copies together on the common related person and sum the total and you're there.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
My database is like this:
users table
_id_____favorites_____
1 | -53-87-96 |
2 | -12-54-87 |
images table
_id_____url____________
1 | smile.jpg |
2 | lol.jpg |
I stored favorites with seperator "-"; these favorites numbers are images id. When a user logs in they want to see their favorites. How can i write this query?
Favorites should be retrieved via a one-to-many relationship. The favorites table should look like
id | favorite
---+---------
1 | 53
1 | 87
1 | 96
2 | 12
2 | 54
2 | 87
Then you
select all from favorites where id = 1
Or whatever the id is.
Never, never, never store multiple values in one column!
That will give you serious problems, like in your case. You should change your database structure. You could add a favorite table
favorites table
+--------+------------+
|user_id |favorite_id |
| 1 | 53 |
| 1 | 87 |
| 1 | 96 |
+--------+------------+
Your current schema design is bad, you should properly normalize it into 3-table design.
User Table
UserID (PK)
UserName
other columns...
Images Table
ImageID (PK)
ImageLink
other columns...
Favorites Table
UserID
ImageID
other columns...
Anyway, to answer your question, MySQL's FIND_IN_SET will help but you need to replace - with , using REPLACE().
SELECT a.ID,
b.url
FROM users a
INNER JOIN images b
ON FIND_IN_SET(b.id, REPLACE(a.favorites, '-', ',')) > 0
WHERE a.ID = 1
SQLFiddle Demo