Should databases be simple and repetitive? - mysql

Basically, I'm creating a rails app with users and posts. I want to be able to soft delete them. To do this, all I need to do is create a boolean column deleted on the users and then use a conditional to change what information is displayed to a non admin user:
(rails)
def administrated_content
if !self.deleted && !current_user.is_admin?
self.content
else
"This post has been removed"
end
end
Now my question is, is it best to keep databases simple and repetitive? Because a few days ago I would have said it would be better to create a third table, a state table and set up a has_one belongs_to relationship between the user and a state, and a post and a state. Why? Because state is an attribute shared by both users and posts.
However, then I realised that this would result in more queries being executed.
so is it best to keep it simple and repeat yourself with attributes?

Yes, in general we keep each attribute in the table which it applies to, instead of needlessly adding a state table. It's okay for another table to have a similar state attribute.
That's far better than polymorphic-associations, which break the fundamental definition of a relation. And as you found, require you to write more complex queries.

It depends on the use-case you want to optimise on. If it is speed you want to achieve than a little bit of denormalization should be ok (again, depends on the scenario).
What your presented here makes sense in my opinion to be both in the user and in the post table because they will not lead to duplicated data. And also each one makes sense for both users and posts.
Think of the state as userState and postState. This way each makes sense in its own context. Maybe in the future the user gets another state other than deleted (Ex: 'in process of deletion') and that would not be true for posts.

Related

Should denormalization take place when data is mainly accessed with joins? [duplicate]

I have been struggling for the past few hours thinking about which route I should go. I have a Notification model. Up until now I have used a notification_type column to manage the types but I think it will be better to create separate classes for the types of notifications as they behave differently.
Right now, there are 3 ways notifications can get sent out: SMS, Twitter, Email
Each notification would have:
id
subject
message
valediction
sent_people_count
deliver_by
geotarget
event_id
list_id
processed_at
deleted_at
created_at
updated_at
Seems like STI is a good candidate right? Of course Twitter/SMS won't have a subject and Twitter won't have a sent_people_count, valediction. I would say in this case they share most of their fields. However what if I add a "reply_to" field for twitter and a boolean for DM?
My point here is that right now STI makes sense but is this a case where I may be kicking myself in the future for not just starting with MTI?
To further complicate things, I want a Newsletter model which is sort of a notification but the difference is that it won't use event_id or deliver_by.
I could see all subclasses of notification using about 2/3 of the notification base class fields. Is STI a no-brainer, or should I use MTI?
Have you considered the mixed model approach?
Where you use single table inheritance for your core notification fields. Then offload all the unique items to specific tables/models in a belongs to/has one relationship with your notification subclasses.
It's a little more overhead to set up, but works out to be pretty DRY, once all the classes and tables are defined. Seems like a pretty efficient way to store things. With eager loading you shouldn't be causing too much additional strain on the database.
For the purposes of this example, lets assume that Emails have no unique details. Here's how it maps out.
class Notification < ActiveRecord::Base
# common methods/validations/associations
...
def self.relate_to_details
class_eval <<-EOF
has_one :details, :class_name => "#{self.name}Detail"
accepts_nested_attributes_for :details
default_scope -> { includes(:details) }
EOF
end
end
class SMS < Notification
relate_to_details
# sms specific methods
...
end
class Twitter < Notification
relate_to_details
# twitter specific methods
...
end
class Email < Notification
# email specific methods
...
end
class SMSDetail < ActiveRecord::Base
belongs_to :SMS, :class_name => "SMS"
# sms specific validations
...
end
class TwiterDetail < ActiveRecord::Base
belongs_to :twitter
# twitter specific validations
...
end
Each of the detail tables will contain a notification ID and only columns that form of communication needs that isn't included in the notifications table. Although it would mean an extra method call to get media specific information.
This is great to know but do you think it's necessary?
Very few things are necessary in terms of design. As CPU and storage space drop in cost so do those necessary design concepts. I proposed this scheme because it provides the best of both STI and MTI, and removes a few of their weaknesses.
As far as advantages go:
This scheme provides the consistency of STI. With tables that do not need to be recreated.
The linked table gets around dozens of columns in that are empty in 75% of your rows. You also get the easy subclass creation. Where you only need to create a matching Details table if your new type isn't completely covered by the basic notification fields. It also keeps iterating over all Notifications simple.
From MTI, you get the storage savings and the ease of customization in meeting a class's needs without needing to redefine the same columns for each new notification type. Only the unique ones.
However this scheme also carries over the major flaw with STI. The table is going to replace 4. Which can start causing slowdown once it gets huge.
The short answer is, no this approach is not necessary. I see it as the most DRY way to handle the problem efficiently. In the very short run STI is the way to do it. In the very long run MTI is the way to go, but we're talking about the point where you hit millions of notifications. This approach is some nice middle ground that is easily extensible.
Detailed gem
I've built a gem over your solution: https://github.com/czaks/detailed. Using it you can simplify your Notification class to:
class Notification < ActiveRecord::Base
include Detailed
end
The rest goes the previous way.
As an added bonus, you can now access (read, write, relate) the subclass-specific attributes directly: notification.phone_number, without resorting to: notification.details.phone_number. You can also write all code in the main classes and subclasses, leaving the Details model empty. You will also be able to do less queries (in the above example 4 instead of N+1) on large datasets using Notification.all_with_details instead of the regular Notification.all.
Be aware, that at the current time this gem isn't tested very well, though it works in my usecase.
Given the limited info, I'd say stick with STI.
The key question is: Are there places in your app where you want to consider all types of Notifications together? If so, then that's a strong sign that you want to stick with STI.
I know this is old, but after having come up with a solution I see potential answers which could use it everywhere! I recently forked a promising project to implement multiple table inheritance and class inheritance in Rails. I have spent a few days subjecting it to rapid development, fixes, commenting and documentation and have re-released it as CITIER Class Inheritance and Table Inheritance Embeddings for Rails.
I think it should allow you to do what you needed by simply constructing the models where Twitter, Email and SMS inherit from Notification. Then have the migration for Notifications only include common attributes, and the ones for the three subtypes include their unique attributes.
Or even define a function in the root Notification class and overload it in subclasses to return something different.
Consider giving it a look: https://github.com/PeterHamilton/citier
I am finding it so useful! I would (by the way) welcome any help for the community in issues and testing, code cleanup etc! I know this is something many people would appreciate.
Please make sure you update regularly however because like I said, it has been improving/evolving by the day.
has_one :details, :class_name => "#{self.class.name}Detail"
doesn't work. self.class.name in the context of a class definition is 'Class' so :class_name is always 'ClassDetail'
So it must be:
has_one :details, :class_name => "#{self.name}Detail"
But very nice idea!

Separate get request and database hit for each post to get like status

So I am trying to make a social network on Django. Like any other social network users get the option to like a post, and each of these likes are stored in a model that is different from the model used for posts that show up in the news feed. Now I have tried two choices to get the like status on the go.
1.Least database hits:
Make one sql query and get the like entry for every post id if they exist.Now I use a custom django template tag to see if the like entry for the current post exist in the Queryset by searching an array that contains like statuses of all posts.
This way I use the database to get all values and search for a particular value from the list using python.
2.Separate Database Query for each query:
Here i use the same custom template tag but rather that searching through a Queryset I use the mysql database for most of the heavy lifting.
I use model.objects.get() for each entry.
Which is a more efficient algorithm. Also I was planning on getting another database server, can this change the choice if network latency is only around 0.1 ms.
Is there anyway that I can get these like statuses on the go as boolean values along with all the posts in a single db query.
An example query for the first method can be like
Let post_list be the post QuerySet
models.likes.objects.filter(user=current_user,post__in = post_list)
This is not a direct answer to your question, but I hope it is useful nonetheless.
and each of these likes are stored in a model that is different from the model used for news feed
I think you have a design issue here. It is better if you create a model that describes a post, and then add a field users_that_liked_it as a many-to-many relationship to your user model. Then, you can do something like post.users_that_liked_it and get a query set of all users that liked your page.
In my eyes you should also avoid putting logic in templates as much as possible. They are simply not made for it. Logic belongs into the model class, or, if it is dependent on the page visited, in the view. (As a rule of thumb).
Lastly, if performance is your main worry, you probably shouldn't be using Django anyway. It is just not that fast. What Django gives you is the ability to write clean, concise code. This is much more important for a new project than performance. Ask yourself: How many (personal) projects fail because their performance is bad? And how many fail because the creator gets caught in messy code?
Here is my advice: Favor clarity over performance. Especially in a young project.

User submitted content to mysql with moderation: separate table?

In an mysql table I would like to get data from user, however the data would need to be moderated by admin first. My question is that is it normal to just insert into the original table and use a field as flag of the moderation status? Or have a separate table of pre-moderated posts and do the insertions only at moderation?
I think both method would work but I am not sure if I miss out other considerations here. Hope someone experienced can tell me the established/preferred way to do that.
If you're working with a not-huge data set I'd recommend just adding a flag column that allows you to show or hide user data. This will require fewer and easier queries to work with and should make your life a lot easier than juggling the data between multiple identical tables. Additionally, if you want to add something like a button for "report this content as BAD" you could remove the content from other results while only "soft deleting" it from public visibility.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

Proper way to store requests in Mysql (or any) database

What is the "proper" (most normalized?) way to store requests in the database? For example, a user submits an article. This article must be reviewed and approved before it is posted to the site.
Which is the more proper way:
A) store it in in the Articles table with an "Approved" field which is either a 0, 1, 2 (denied, approved, pending)
OR
B) Have an ArticleRequests table which has the same fields as Articles, and upon approval, move the row data from ArticleRequests to Articles.
Thanks!
Since every article is going to have an approval status, and each time an article is requested you're very likely going to need to know that status - keep it inline with the table.
Do consider calling the field ApprovalStatus, though. You may want to add a related table to contain each of the statuses unless they aren't going to change very often (or ever).
EDIT: Reasons to keep fields in related tables are:
If the related field is not always applicable, or may frequently be null.
If the related field is only needed in rare scenarios and is better described by using a foreign key into a related table of associated attributes.
In your case those above reasons don't apply.
Definitely do 'A'.
If you do B, you'll be creating a new table with the same fields as the other one and that means you're doing something wrong. You're repeating yourself.
I think it's better to store data in main table with specific status. Because it's not necessary to move data between tables if this one is approved and the article will appear on site at the same time. If you don't want to store disapproved articles you should create cron script with will remove unnecessary data or move them to archive table. In this case you will have less loading of your db because you can adjust proper time for removing old articles for example at night.
Regarding problem using approval status in each query: If you are planning to have very popular site with high-load for searching or making list of article you will use standalone server like sphinx or solr(mysql is not good solution for this purposes) and you will put data to these ones with status='Approved'. Using delta indexing helps you to keep your data up-to-date.