I was wondering what is the best solution regarding this problem :
I have articles, and I have categories, an article belongs to a category. There are two types of categories, user defined and system defined (like "inbox","trash" etc ...)
So the question is should I have only one item for each system defined categories, and all the articles of all the users will be attached to these categories, or should I create all the systems defined categories when creating the user, and attached these categories to the user ?
The first solution will result in a lot of articles in each system defined categories, and the second solution will result in a lot of "redundant" item in category. Which solution is the best ?
It's your choice. Both approaches has its pros and cons.
If you go with the first solution, your select statement will have a WHERE clause similar to this:
WHERE (category.type = 'SYSTEM' OR category.user_id = :USER_ID)
The second solution will simplify your select statements:
WHERE category.user_id = :USER_ID
The second solution will make it more complex to add a user. You need to create the system defined categories for the user:
INSERT INTO CATEGORY (user_id, category_name, ...) SELECT :USER_ID, system_category, ... FROM SYSTEM CATEGORY;
And if you add a new system category you need to insert it for all users.
Related
Context
I have implemented a basic posting system in Rails, hence, I have a "posts" table with the following columns: id, user_id, body, created_at, updated_at.
I would like to enable users to pin their posts (only one post at a time, so a has_one relationship)
Question
I see two ways of implementing this:
adding a "pinned" boolean column inside the "posts" table
adding a "pinned_posts" table with the following columns: id, user_id, post_id, created_at, updated_at
What are the Pros and Cons of above two ways ?
Because you are limiting to one post per user, you could have the post_id filled in or NULL in the Users table.
This enforces the 1-per-user rule and has (perhaps) minimal overhead.
You have not said what happens to pinned posts; there could be other issues. Do you need to find all pinned posts (would require a costly scan of Users)? Do you want to flag a particular post when displaying user info? (That will be quite efficient.)
May I suggest you learn some of the basics of Databases; third party software ("R", in this case) can't really shield programmers from database concepts.
I have a bunch of products, and a bunch of category pages. One product can be in multiple categories. So in my database I have a products table with a "categories" column. In this column I store the ID's of all the categories that the current product is stored in, its a string seperated with semicolons.
Example: 1;5;23;35;49;.
When I browse to Category Page ID 5, I want to see all products that have 5; in its categories-column. I currently do this by
SELECT * FROM products WHERE categories LIKE "%".category.";%"
The problem is that this matches more than just 5. It matches 15; or 25; aswell.
So questions:
How do I make sure that I only select the number I want? If category is "5" I do not want it to match 15, 25, 35 and so on.
Maybe this is a very bad way of storing the category-ids. Do you have any suggestions of a different way of storing what products that belong to what category?
Others have mentioned that a junction table is the right way to design the database. SQL has a very nice data structure for storing lists. It is not called a "string", it is called a "table".
But, sometimes one is stuck with data in this format and needs to work with it. In that case, the key is to put the delimiters on both side to prevent the problem you are having:
SELECT *
FROM products
WHERE concat(';', categories) LIKE "%;".category.";%"
Your list already ends in a semicolon, so that is not necessary.
Another more typical MySQL solution is find_in_set():
SELECT *
FROM products
WHERE find_in_set(category, replace(categories, ';', ',') > 0;
It is designed for comma-delimited lists. Odd that MySQL supports such a function when storing lists this way is generally a bad idea, but it does. Still, a junction table is better for performance reasons (and for other reasons).
Answers/comments to your two questions:
The only way I can think of that you could do this without modifying your schema (see #2) is to use a MySQL regular expression but this is really not a good idea. See http://dev.mysql.com/doc/refman/5.1/en/regexp.html for documentation though
You are right - this is not a good way to store categories. What you want is a join also known as a junction table (see http://en.wikipedia.org/wiki/Junction_table). One way would be to have three tables: product, category, and a product_categories table. Product and category would have a unique ID as you already have and the product_categories table would have two columns: product_id and category_id. If product 1 belongs to categories 10 and 11, you would have two rows in the product_categories table: 1,10 and 1,11.
I can elaborate if you need more help but this should get you started in re-architecting your database (more) correctly.
You can try changing your like criteria to "%;".category.";%"
I am searching for a guideline on how to set up my database for a auction side.
My problem is, that there is a lot of different product types - let's say paintings, clothes, computers etc. They have different specifications, and it should be possible to set just Product A in size L on auction - or the whole stock of Product B e.g.
How should I build my database for optimal performance - and coding - in this case?
I would suggest the following database/object structure:
[Auction] n..1 [Category] 1..n [Variation Attribute] 1..n [Attribute Value]
An auction then has a category and several attribute values referring the variation attribute as well:
[Auction] = [Category], [Name], [Description]
[Auction_AttrVal] = [AuctionID], [VarAttrID], [AttrValID]
First of all you can have some kind of category table, which holds items like "Paintings", "Clothes", "Computers". An auction / product is assigned to one category.
Each category then defines variation attributes for this specific category. An example would be "Size" for the category "Clothes" or "CPU" for the category "Computers". You can also add predefined values for the variation attributes to limit the number of variations and avoid differentiations like "3GhZ" vs "3 GhZ".
This mechanism also allows for easy filtering of search results. You select a category and simply load all variation attributes as filters (or add a flag to an attribute to declare it as such) and offer the values for filtering to the end-user.
Furthermore you can make variation attributes for a category mandatory to force users who create the auctions (I'm assuming it's Consumer-to-Consumer) to provide sufficient information for their auction.
The code will probably be quite generic and simple. The database structure is highly flexible and extensible. Performance is much better than having all in one table. You probably should create an index (for the field AuctionID) for the Auction_AttrVal table. Please let me know if the database structure is not explained properly.
Summary: What is the most efficient way to store information similar to the like system on FB. Aka, a tally of likes is kept, the person who like it is kept etc.
It needs to be associated with a user id so as to know who liked it. The issue is, do you have a column that has a comma delimited list of the id of things that were liked, or do you have a separate column for each like (way too many columns). The info that's stored would be a boolean value (1/0) but needs to be associated with the user as well as the "page" that was liked.
My thought was this:
Column name = likes eg.:
1,2,3,4,5
Aka, the user has "like" the pages that have an id of 1, 2, 3, 4 and 5. To calculate total "likes" a tally would need to be taken and then stored in a database associated with the pages themselves (table already exists).
That seems the best way to me but is there a better option that anyone can think of?
P.S. I'm not doing FB likes but it's the easiest explanation.
EDIT: Similar idea to the plus/neg here on stackoverflow.
In this case the best way would be to create a new table to keep track of the likes. So supposing you have table posts, which has a column post_id which contains all the posts (on which the users can vote). And you have another table users with a column user_id, which contains all the users.
You should create a table likes which has at least two columns, something like like_postid and like_userid. Now, everytime a user likes a post create a new row in this table with the id of the post (the value of post_id from posts) that is liked and the id of the user (the value of user_id from users) that likes the post. Of course you can enter some more columns in the likes table (for instance to keep track of when a like is created).
What you have here is called a many-to-many relationship. Google it to get some more information about it and to find some more advice on how to implement them correctly (you will find that a comma seperated lists of id's will not be one of the best practices).
Update based on comments:
If I'm correct; you want to get a list of all users (ordered by name) who have voted on an artist. You should do that something like:
SELECT Artists.Name, User.Name
FROM Artists
JOIN Votes
ON Votes.page_ID = Artists.ID
JOIN Users
ON Votes.Votes_Userid = Users.User_ID
WHERE Artists.Name = "dfgdfg"
ORDER BY Users.Users_Name
There a strange thing here; the column in your Votes table which contains the artist id seems to be called page_ID. Also you're a bit inconsistent in column names (not really bad, but something to keep in mind if you want to be able to understand your code after leaving it alone for 6 months). In your comment you say that you only make one join, but you actually do two joins. If you specify two table names (like you do: JOIN Users, Votes SQL actually joins these two tables.
Based on the query you posted in the comments I can tell you haven't got much experience using joins. I suggest you read up on how to use them, it will really improve your ability to write good code.
What is the best way to store user relationships, e.g. friendships, that must be bidirectional (you're my friend, thus I'm your friend) in a rel. database, e.g. MYSql?
I can think of two ways:
Everytime a user friends another user, I'd add two rows to a database, row A consisting of the user id of the innitiating user followed by the UID of the accepting user in the next column. Row B would be the reverse.
You'd only add one row, UID(initiating user) followed by UID(accepting user); and then just search through both columns when trying to figure out whether user 1 is a friend of user 2.
Surely there is something better?
I would have a link table for friends, or whatever, with 2 columns both being PK's, and both being FK's to the User table.
Both columns would be the UID, and you would have two rows per friend relationship (A,B and B,A). As long as both columns are PK's, it should still be in normal format (although others are free to correct me on this)
Its a little more complex of a query, but nothing that can't be abstracted away by a stored procedure or some business logic, and its in Normal Format, which is usually nice to have.
You could check which of the two user_id's is the lowest and store them in a specific order. This way you don't need double rows for one friendship and still keep your queries simple.
user_id_low | user_id_high
a simple query to check if you're already friends with someone would be:
<?php
$my_id = 2999;
$friend_id = 500;
$lowest = min($my_id, $friend_id);
$highest= max($my_id, $friend_id);
query("SELECT * FROM friends WHERE user_id_low=$lowest AND user_id_high=$highest");
?>
Or you could find the lowest/higest userid using mysql
<?php
query("SELECT * FROM friends WHERE user_id_low=LEAST($my_id, $friend_id) AND user_id_high=GREATEST($my_id, $friend_id)");
?>
And to get all your friends id's
<?php
query("SELECT IF(user_id_low=$my_id,user_id_high,user_id_low) AS friend_id FROM friends WHERE $my_id IN (user_id_low, user_id_high)");
?>
Using double rows, while it creates extra data, will greatly simplify your queries and allow you to index smartly. I also remember seeing info on Twitter's custom MySQL solution wherein they used an additional field (friend #, basically) to do automatic limiting and paging. It seems pretty smooth:
https://blog.twitter.com/2010/introducing-flockdb
Use a key value store, such as Cassandra for example.