I'm looking for some help with a problem. I've a table with the following columns/fields -
COLUMNS = userName | userEmail | userInterests | .... and other fields
VALUE = userA | userA#email.com | 1,2,10
VALUE = userB | userB#email.com | 15,27,9,7,2
userName(varchar) = the primary key
userEmail(varchar) = user's email
userInterests(varchar) = comma separated numbers for interests. I've mapped these number to their actual value (interests) in PHP script. e.g. 1 - Painting, 2 - Dancing and so on...
Now I'm trying to find the users having the similar interests. The number of interests a user can have can vary. My goal is to find the best match for any user.
Suppose if User A has 3 interests and User B has 5 interests, from which 1 interest is common in both user, so I wanna get the matched user's name. Suppose if there's another user with 2 common interests, then in that case, return that profile. In General, I wanna find the best match (where all interests meet with other user), if not, then second best match (at least have 2 or 1 common interest).
I have already been to this solution but I'm unable to get the desired output. There's only one table, so I can't use the joins as well.
SQL - Finding Users with Similar Interests
Please help me with this, I've been meddling with this from past 2 days and couldn't find the right solution. I tried my best to explain the problem, If I failed to do so, Please let me know and I can elaborate. Thanks
I think the best approach would be to first normalize your database so you don't have an array like structure. This would make this type of query more efficient and much easier to do.
If that's not an option, you can have a look at this answer of how to do this normalization via query to a temporary table and work from there. Once you have a separate table for interests, you can proceed with the solution you already pointed in your question, making a count of the occurences.
Related
So I am working on a booking system where I am posting small avaiable jobs for the kids in the community. I am not looking for a direct booking system in the sense that the user can just press the "booking button" and directly have the job. The approach i want to take is that you can SUBMIT INTEREST and then the poster of the job can accept one of the applicants.
So i have a few tables going on but the essential for the questions are these two.
|users|
|id | name | age |......
|jobs|
|id | date | salary |
What i am looking for explained in it's most simplest form is that i want multiple user id´s to be stored in a column so that i can later display/controll the users connected to the job in matter.
Would very much appreciate a sultion or just as much a tip on how i would go about solving the problem.
(I am using mySQL database if that adds any value to the question)
Best regards.
That is an n:m relation. A user can be interested in multiple jobs and a job can be interesting to multiple users. You should have a third table user_jobs for this where you store one record per user interested in a job.
Something like
user_jobs
userid
jobid
date
status
Is it a good idea to store like count in the following format?
like table:
u_id | post_id | user_id
And count(u_id) of a post?
What if there were thousands of likes for each post? The like table is going to be filled with billions of rows after a few months.
What are other efficient ways to do so?
In two words answer is : yes , it is OK. (to store data about each like any user did for any post).
But I want just to separate or transform it to several questions:
Q. Is there other way to count(u_id)? or even better:
SELECT COUNT(u_id) FROM likes WHERE post_id = ?
A. Why not? you can save count in your post table and increase/decrease it every time when user like/dislike the post. You can set trigger (stored procedure) to automate this action. And then to get counter you need just:
SELECT counter FROM posts WHERE post_id = ?
If you like previous Q/A and think that it is good idea I have next question:
Q. Why do we need likes table then?
A. That depends of your application design and requirements. According to the columns set you posted : u_id, post_id, user_id (I would even add another column timestamp). Your requirements is to store info about user as well as about post when it liked. That means you can recognize if user already liked this post and refuse multilikes. If you don't care about multilikes or historical timeline and stats you can delete your likes table.
Last question I see here:
Q. The like table is going to be filled with billions of rows after a few months. isn't it?
A. I wish you that success but IMHO you are 99% wrong. to get just 1M records you need 1000 active users (which is very very good number for personal startup (you are building whole app with no architect or designer involved?)) and EVERY of those users should like EVERY of 1000 posts if you have any.
My point here is: fortunately you have enough time till your database become really big and that would hurt your application. Till your table get 10-20M of records you can do not worry about size and performance.
Let´s say I create something familiar to Facebook where the user can add interests to his or hers profile.
One way to do this is to have a field called Interests in the profile table and list the interests ID. So that the field Interests would look like this: 1,4,43,66 where each number refers to an interest in the interest table. I would then have to explode the interest fields using PHP to get each interest´s name.
Another way to do this is to have a third table that looks like this:
profileID, interestID
1 1
1 4
1 43
1 66
Which would achieve the same thing.
I haven´t worked much with databases. I use MySQL. Which is the best way to go? Let me know if you don´t understand.
Thanks for your help!
Your second approach would be the one to go with. It all really depends on your requirements, but the second one is more flexible. With both you can answer the question: "What interests does user X have?", but only with the second one can you answer "What users are interested in Y?". Also, it allows you to do a join, so instead of doing 2 lookups/queries, you can do one.
I can't understand where facebook uses really mysql:
All the Database can be seen as a graph:
Account - Like -> Comment
Account <- friend -> Account2
Account - Like -> Link
And what is stored in MySQL?
the text of the posts and notes?
Have facebook all these entities ( account, post, comment ) in its graph DB?
Well, I assume that everything You mentioned is stored in MySQL. Every piece of data that is subject to change, including:
Users
Posts
Comments
Information about uploaded pictures (but not pictures themselves)
Likes
Data about users logging in
Ads
Data about users liking / not liking ads
User settings
etc.
Any data that is subject to change needs to be saved in database for indexing and fast access. Filesystem is fine if You want to write-only data, for example logging. Or if You only need to access the whole data at once, not parts of it.
But if You need data to be structured and ready to be accessed quickly, then You need to use a database. You may want to read about binary trees: http://en.wikipedia.org/wiki/Binary_tree
About Facebook: If I had to guess, I would say that there are probably hundreds of more databases. I don't have access to their servers, so I can't really comment on that :) But as another example, if You install WordPress, then it creates 11 different tables. http://codex.wordpress.org/Database_Description
PS. There is no reason facebook should use MySQL, though. There are lot of different databases out there.
EDIT Thanks for pointing out that I misunderstood Your question.
Lets take this case: Account <- friend -> Account2
As said before, they have table like "Users".
Users table will have columns:
ID (It has PRIMARY KEY. This is meant to give unique ID to each row.)
Username (Text field with some length, for example 64 characters)
...And many more...
Now there will be table "Friends". It will have fields:
ID (again, PRIMARY KEY)
Person1
Person2
Both fields Person1 and Person2 will be integers pointing to ID in table "users".
So if table users has three rows:
ID Username
1 rodi
2 rauni
3 superman
Then table "Friends" would be for example:
ID Person1 Person2
1 1 2
2 2 1
3 1 3
4 3 1
Here row 1 means "rodi is friend with rauni" and row 2 means that "rauni is friend with rodi". This is redundant, but I wanted to keep example simple.
Here is good tutorial: http://www.tizag.com/mysqlTutorial/mysqltables.php
There are many pages there, just keep clicking Next to skip what You already know (I don't know how much You already know)
This is about joining info from two tables: http://www.tizag.com/mysqlTutorial/mysqljoins.php
You could use this to select all rodi's friends from our two tables in one query.
I am working on an auto-suggest feature and as part of it I want to weight the results based on a particular user's preference. For example: If most of my users frequently type in fish, then the algorithm will return fish as the most popular result once the user types f. However, if a particular user mostly types in food, then I want to apply a weight such that it takes that particular user's preference into account.
I initially thought of doing this by having a large auto-suggest index, with a field userids and whenever a user types in a letter, the algorithm would check if that particular user's userid was present in the userids field and if present would apply a corresponding weight to that particular result.
A few records would look like:
word |count |userids
------------------------------------------------------------------------------
food |2 |aa,b,ccd
fish |12 |a,b,c,d,e,f,gg,he,jkl,sd
However, I do not think this is an approach that would scale all that well with even a few hundred active users. What would be a better way to design this DB?
Thanks in advance,
asleepysamurai
P.S. I'm a newbie when it comes to DB design, so please explain your solution in layman terms.
This is not a good idea. The table is not normalized and you will end up with complicated queries when you need to join on this field.
A better design is to have a wordid field on this table as a primary key (identifying the word) and a many to many table to connect words with users (words_to_users with a wordid and userid fields).