I have a question about how to structure a DB. I have a reddit'esque voting system. Items can get votes. But each item belongs to a topic and each topic a category. While only items can get votes I'd like to be able to access the # of votes within a topic and within a category as well. Any suggestions on how to accomplish this?
I see 4 main ways of doing this:
De-normalize the votes and store the votes inside an attribute in the item table, the topic table, and the category table. I would then need to update all 3 whenever a vote / downvote occurs.
Create a separate 'vote' model. Votes belong to items, items to topics, and topics to categories. Then I can just query number of votes through the chain whenever I need to access anything.
Just have items and votes. Items would have a category and topic attribute.. then I'd query for items within a topic and count the votes on them..
Learn to use a NoSQL db system.
Extra info: I'm using Rails and I only really know MYSQL at the moment. Is this a time I should learn something like Mongo? Can this only really be accomplished with Hadoop? Can I accomplish this in MySQL. Thanks!
Create a separate 'vote' model. Votes belong to items, items to
topics, and topics to categories. Then I can just query number of
votes through the chain whenever I need to access anything.
That's the most flexible way to do what you're talking about.
Learn to use a NoSQL db system.
Not for your current project.
Is this a time I should learn something like Mongo?
No.
Can this only really be accomplished with Hadoop?
No. Any SQL database can do this. Whether any SQL database can manage whatever you're planning is a different question. Different platforms scale differently.
Can I accomplish this in MySQL.
Yes, easily.
I think you should got for option 2.
You need to create a vote model anyway, since you'll probably want to limit users to one vote on each item.
If you have performance issues later on, you can always cache the number of votes in an item, topic or category.
How you update those numbers should be carefully considered. A trigger on votes that auto-updates all the numbers above might cause too many write operations. Another way may be to run a statistics stored procedure periodically.
Anyway, the real point is - don't optimize until you know there's a problem.
Related
I need to store data a lot of similar data about my system of questions and the answer such as voting, following, bookmarks, etc.
In example of voting, what is the best table layout for storing votes for questions, answers, and posts?
Store the votes separately, that is, 3 tables are obtained: UserQuestionVotes, UserAnswerVotes and UserPostVotes
Store votes in one table:
UserVotes (id, user_id, item_id, item_type, vote),
while: item_id and item_type is the id and type of the question, answer or post, vote = -1/1
If I go the first way, I will have at least 9 tables.
And if I go the second way, that is, all the data in one heap, so in the future, when filling out the table, it will work more slowly.
Which way in my case eficient?
If you're looking for my opinion, I would pick door #1. Questions, Answers, and Posts are all separate, albeit related, "things." And, each of these "things" happen to also have "votes" associated with them ... but, really, a "vote" is not a "thing."
A "vote for a question" is tightly associated with "the question." "A vote for ..." anything else is the same. So now I start thinking about the queries I'm most likely to actually write. I'm most likely to want to write queries that, say, count how many votes a particular question has ... and I don't really want to muddy-up that query and make it either "hard to write" or obliged to look through a bunch of records that are not "votes for questions." The other types of votes wouldn't be relevant and I'd rather not have to filter them out. (If I need to write a query to count "how many votes for anything has this user cast?", I could very easily write that regardless.)
That's my opinion. (The database manager can take care of "efficiency" on its own. Design your database so that the queries you need to write are easy and clear to write.)
I'm writing an application to allow users to create a Poll. They ask a question and set n number of predefined answers to the question. Other users can vote on the answers provided for that question.
Current structure
I have designed the database like this:
Storing the vote count
Current thinking is, I create a new column called vote_count on the link table and every time that answer gets voted, it updates the record.
This works. But is it right? I'm new to database systems, so I can't imagine I'm doing much right. What are some more efficient ways to achieve this?
As far as it goes yes that's OK. However these tables will be incomplete. When your second quiz is created, you'll have to extend the QUESTIONS table. If this second quiz's Q1 also has a yes/no answer, you're going to have to extend the LINK/VOTES table.
You also have to think about how it's going to be queried and design indexes to support those queries.
Cheers -
I ask a similar question (1 table 150,000,000,000 rows) now I will add some details.
500,000 Items
Unlimited # Categories
15 Sections
The site allows users to create their own categories and place as many items into that category. Before they can add anything they must choose what section the category is best represented.
Each of the above will have: id, title, description, imageURL
I have two issues:
Each CATEGORY/ITEM will beable to re-arrange items/categories greatest to worst. COLUMN: rank
Users will be acknowledged for contributing most to category. COLUMN: king
This feature of the site is pretty simple but the ranking process is throwing me for a loop. I have tried multiple test runs cramming as much data into one table as possible but the results are crashing my spirits. The division of data to tables is not easy because of the individual ranking for each category.
The original design was to Have the above 3 and individual tables of each category/item to allow individual ranking(boost speed/performance) then:
User contributor: sectionID, categoryID, itemID, userID
Individual Rank: categoryID/itemID, rank
The outcome would be 150,000,000,000 tabled labyrinth. Has anyone dealt with this concept before? What is the best plan of action? Am I on the right track?
I just got High Performance MySQL, 3rd Edition
Optimization, Backups, and Replication and Beginning PHP and MySQL: From Novice to Professional 4th (fourth) EditionI am not promoting or endorsing these books...
These are the first of many steps I will be taking to tackle this design problem I face. Any comments and assistance will be appreciated. Thoughts; Concerns??
Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.
For websites like Digg. How can you use MYSQL to track when someone likes an article?
It seems simple enough to just keep track of the total number of likes. The part I don't understand, is how to
1. keep users from only voting on something once and
2. allow users to click on their profile to see the stories they have liked.
Would you have a column in the table containing the story info that you just add comma separated user names? You could keep track of who has liked a story, but the data would get huge, especially for websites like digg that has 100,000 users or more. And how would you allow the user to see all the stories they have liked?
Thank you.
You would need a row for each like. Don't use comma-separated lists.
how to 1. keep users from only voting on something once
Create a unique index on articleid, userid.
And how would you allow the user to see all the stories they have liked?
SELECT articleid FROM likes WHERE userid = 42
but the data would get huge
Yes, it could get huge. Most websites will easily be able to cope with just a single database. Very large websites will need to use a cluster to store data on several machines. The data needs to be partitioned so that the application knows on which server to find the data.
In Social Network these days are like the Graph dataStructure.
Where every entity like people,photo,video,status-updates, comments etc are nodes of the graph and likes,unlikes are connections between two nodes.
ideally you would have a Table for Likes where you would just add a like.
where you would store who liked, what is liked in columns and other info.
Complex social networks do more than just this.
You can store the likes in a seperate table called story_likes with two columns : story_id and user_id.
1) Put a constraint in the database that the combination of these should be unique. That way your user can like a story only once.
2) You can pull the stories that the user likes from this table and pull other story details using the story id you have. 100,000 rows is not that big for a MYSQL database.
You can also allow your users to dislike a story by having a column for state=ENUM('LIKED', 'DISLIKED').