The best way to structure this database? - mysql

At the moment I'm doing this:
gems(id, name, colour, level, effects, source)
id is the primary key and is not auto-increment.
A typical row of data would look like this:
id => 40153
name => Veiled Ametrine
colour => Orange
level => 80
effects => +12 sp, +10 hit
source => Ametrine
(Some of you gamers might see what I'm doing here :) )
But I realise this could be sorted a lot better. I have studied database relationships and secondary keys in my A-Level computing class but never got as far as to set one up properly. I just need help with how this database should be organised, like what tables should have what data with what secondary and foreign keys?
I was thinking maybe 3 tables: gem, effects, source. Which then have relationships to each other?
Can anyone shed some light on this? Is a complex way like I'm proposing really the way to go or should I just carry on with what I'm doing?
Cheers.

I happen to be passingly familiar with the environment you're describing (:))
Despite what you have convinced yourself, what you are doing is not particularly complex.
Anyway, currently, you have a table with no relationships. It's simple. It's easy. Each gem exists in the database.
If you were to move to the three tables that you proposed, you would also need to include link tables to assemble the tables into useable data, especially since (and mind, I'm not quite sure how your distinctions boil out) the effects and source table are involved in a many-to-x relationship: each gem has up to two effects, and each effect has up to Y gems where it is present // each source has up to Z gems.
I'd stick with the single table. The individual records may be longer, but its much simpler, and you'll encounter fewer errors than if you were trying to establish linking tables or the like.

Questions to ask yourself:
Is there a 1 to 1 relationship between gem, effects, and source?
Would you more often be pulling effects without pulling data from gem?
If the proposed tables have a 1 to 1 relationship then I'd suggest leaving them combined in one table. The only time I would consider splitting them out in this condition is if I only needed data from effects without needing other data AND these tables were going to be large enough to justify having them stored on different drives. Otherwise, you're just making work for yourself, adding more storage requriements and reaping exactly zero benefits.

You should also consider whether you will need the effects information for actual usage, or display only. If it is display only, no big deal to have it in one column in a table. If you have to use it, for example to apply the +12 and +10 appropriately, then I think you should put each occurrence of it in a separate column. Accordingly, you should have a separate table for effects, and then a separate table storing which gems have which effects, maybe gemeffects. The Effects table might have better descriptions of what "sp" stands for, maybe the min and max ranges, etc. The GemEffects table would just have the gem id, the value, and the effect itself. For example
Effects
effect => hit
desc => How many hit points
minimum => 0
maximum => 100
GemEffects
id => 40153
effect => sp
value => 12
and
id => 40153
effect => hit
value => 10

You would answer your own question if you do a simple exercise: describe in a natural, descriptive language your system. Which entities, their attributes, how they interact with other entities, etc. Underline substantives and verbs. Ask what entities do you mean to manage (eg: will there be an interface to manage the "effects" table?) You'll be surprised how it all gets assembled naturally.
Now for your example, I'd suggest two approaches (without syntactic details)
1) to gain experience in relational design, with some complexity overhead, and granular over each entity
gem (id, name,color_id,source_id,effect_assoc_id)
color (id, name)
source (id, name)
effect (id,value,nature_id)
nature (id, name)
effect_assoc (id, gem_id, effect_id)
2) straight to the point, possibly valid depending on the cardinality of your relations
just carry on ;)
From your description, I'd go with #1.

I would recommend the following:
Move all effects into their own table (e.g., ID, Name, Description, Enabled, ...)
Move source into its own table (e.g., ID, Name, Description, Enabled, ...)
Drop gems "effects" column (migrates to step 5 below)
Convert the gems "source" column into a foreign key value that corresponds to the PK from the "source" table
Add a new table to link a single gem entity to zero or more effect entities
Example: tbl_GemsEffectsLink, with two columns named "GemID" and "EffectID," that by
themselves are foreign keys back to the entity tables and when taken together, make up the
composite primary key.
A sample view of this link table would be as follows:
GemID EffectID
1 1
1 2
2 1
2 2
2 3
So, in summary, you would have the following tables:
gems
effects
source
gemseffectslink
With each table having the following columns:
gems
id (PK)
name
colour
level
sourceid (FK)
effects
id (PK)
name
description
enabled
...
source
id (PK)
name
description
enabled
...
gemseffectslink
gemid (FK)
effectid (FK)
Lastly, this assume each gem can have zero or more effects, a single source (you can enforce NULL or NOT NULL for this gem.sourceid FK field), and that the level integer value is just that (i.e., not representing something more robust and exhaustive in that there exists some type of "Level" entity and the value of "80" in your sample data row uniquely identifies one of these "Level" entities).
Hope this helps!
Michael

Related

Better way to organize lots of columns and data?

I'm creating a real-estate website and i was wondering if there was a better way of organizing my columns or tables, not sure what would be the best way to go about it, i currently have a lot of columns and im worried about performance issues.
The columns are as follows
5 for things like property id, add date, duration, owner/user id.
35 columns for things like title, description, price, energy rating, location, etc.
40 columns for features like swimming-pool, central heating, river front, garage, well, etc.
15 for image locations which are stored on server
15 for the image descriptions
Is 110+ columns bad practice in MySQL? Everything is lightning fast but i'm in localhost at the mo, wont the monstrous size of the tables slow queries? Especially if I have a couple hundred properties?
Am i ok with my current setup? What would best practice be? How do e-commerce websites that have many feature options go about this?
It is not a good practice since the data can be stored in separate tables. What would help you most would be to create an ERD to visualize how you can organize your tables. Even if you do not understand the ins and outs of ERDs, you can still use it to at least organize your thoughts.
It seems that you already have your tables separated based on the bullet points that you made within your question. One thing that I would add to your bullets is maybe breaking down your features into categories and creating a table for each.
For example, swimming pools and riverfront can be placed in a table called
LandscapeFeatures or OutdoorFeatures.
Most likely, the property features would be better stored in a separate table, with one row per property feature, rather than as columns in your main table. I understand this as a many-to-many relationship between a propery and its features, so this suggest two more tables:
properties (property_id (pk), date_added, title, description)
features (feature_id (pk), description)
property_features (property_id (fk), feature_id (fk))
Such structure is much more flexible and easier to query than having one column per feature. As examples:
easy to add features by creating new rows in the features table (while in the old structure you had to create a new column)
easy to aggregate the features, and answer a question like: count how many features each property has
As for images, they should have their ow table too. If an image maby belong to several user, then it's a many-to-many relationship, and you can follow the above pattern. If each image belongs to a single user, one more table is enough:
properties (property_id (pk), date_added, title, description)
features (feature_id (pk), description)
property_features (property_id (fk), feature_id (fk))
images (image_id, location, description, property_id (fk))
One table:
Columns for the dozen or so values that you are most likely to search on.
Devise several composite indexes that involve those columns, starting with the more commonly searched columns.
Devise a TEXT column and put "words" in it for a FULLTEXT index. If this is home sales, consider words like "swimming pool septic tank gazebo Eichler". This will help with certain "boolean" type queries. (If you like this idea, let's discuss how to make use of filtering with indexes and/or fulltext; it gets tricky.)
Put the rest into a JSON (or TEXT column). Do not plan on searching it; instead bring the row(s) into your app code for further filtering after searching by the actual INDEXes

How to display item as 'in transit' instead of to specific location id (foreign key)?

I have following requirements for item management.
Item can be moved from location 'A' to 'B'. And later on it can also be moved from 'B' to 'C' location.
History should be maintained for each item to display it location wise items for specific period, can be display item wise history.
Also I need to display items 'in transit' on particular date.
Given below is the database design:
item_master
-----------
- ItemId
- Item name
- etc...
item_location_history
------------------
- ItemId
- LocationId (foreign key of location_master)
- Date
While item is being transported I want to insert data in following way:
1. At the time of transport I want to enter item to be moved from location 'A' to 'In Transit' on particular date. As there is possibilities that item remains in 'in transit' state for several days.
2. At the time of receive at location 'B' I want to insert item to be moved from 'In Transit' to location 'B' on particular date and so on.
This way I will have track of both 'In Transit' state and item location.
What is the best way to achieve this? What changes I need to apply to the above schema? Thanks.
Initial Response
What is the best way to achieve this?
This is a simple and common Data Modelling Problem, and the answer (at least in the Relational Database context) is simple. I would say, every database has at least a few of these. Unfortunately, because the authors who write books about the Relational Model, are in fact completely ignorant of it, they do not write about this sort of simple straight-forward issue, or the simple solution.
What you are looking for is an OR gate. In this instance, because the Item is in a Location XOR it is InTransit, you need an XOR gate.
In Relational terms, this is a Basetype::Subtype structure. If it is implemented properly, it provides full integrity, and eliminates Nulls.
As far as I know, it is the only Relational method. Beware, the methods provided by famous writers are non-relational, monstrous, massively inefficient, and they don't work.
###Record ID
But first ... I would not be serving you if I didn't mention that your files have no integrity right now, you have a Record Filing System. This is probably not your fault, in that the famous writers know only pre-1970's Record Filing Systems, so that is all that they can teach, but the problem is, they badge it "relational", and that is untrue. They also have various myths about the RM, such as it doesn't support hierarchies, etc.
By starting with an ID stamped on every table, the data modelling process is crippled
You have no Row Uniqueness, as is required for RDBS.
an ID is not a Key.
If you do not understand that, please read this answer.
I have partially corrected those errors:
In Item, I have given a more useful PK. I have never heard any user discuss an Item RecordId, they always uses Codes.
Often those codes are made up of components, if so, you need to record those components in separate columns (otherwise you break 1NF).
Item needs an Alternate Key on Name, otherwise you will allow duplicate Names.
In Location, I have proposed a Key, which identifies an unique physical location. Please modify to suit.
If Location has a Name, that needs to be an AK.
I have not given you the Predicates. These are very important, for many reasons. The main reason here, is that it will prove the insanity of Record IDs. If you want them, please ask.
If you would like more information on Predicates, visit this Answer, scroll down (way down!) to Predicate, and read that section. Also check the ERD for them.
###Solution
What changes [do] I need to apply to the above schema?
Try this:
Item History Data Model
(Obsolete, refer below for the updated mode, in the context of the progression)
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation for a full explanation, or Model Anatomy.
If you have not encountered Subtypes implemented properly before, please read this Subtype Overview
That is a self-contained document, with links to code examples
There is also an SO discussion re How to implement referential integrity in subtypes.
When contemplating a Subtype cluster, consider each Basetype::Subtype pair as a single unit, do not perceive them as two fragments, or two halves. Each pair in one fact.
ItemHistory is an event (a fact) in the history of an Item.
Each ItemHistory fact is either a Location fact XOR an InTransit fact.
Each of those facts has different attributes.
Notice that the model represents the simple, honest, truth about the real world that you are engaging. In addition to the integrity, etc, as discussed above, the result is simple straight-forward code: every other "solution" makes the code complex, in order to handle exception cases. And some "solutions" are more horrendous than others.
Dr E F Codd gave this to us in 1970. It was implemented it as a modelling method in 1984, named IDEF1X. That became the standard for Relational Databases in 1993. I have used it exclusively since 1987.
But the authors who write books, allegedly on the Relational Model, have no knowledge whatsoever, about any of these items. They know only pre-1970's ISAM Record Filing Systems. They do not even know that they do not have the Integrity, Power, or Speed of Relational Databases, let alone why they don't have it.
Date, Darwen, Fagin, Zaniolo, Ambler, Fowler, Kimball, are all promoting an incorrect view of the RM.
Response to Comments
1) ItemHistory, contains Discriminator column 'InTransit'.
Correct. And all the connotations that got with that: it is a control element; its values better be constrained; etc.
Shall it be enum with the value Y / N?
First, understand that the value-stored has meaning. That meaning can be expressed any way you like. In English it means {Location|InTransit}.
For the storage, I know it is the values for the proposition InTransit are {True|False}, ...
In SQL (if you want the real article, which is portable), I intended it as a BIT or BOOLEAN. Think about what you want to show up in the reports. In this case it is a control element, so it won't be present in the user reports. There I would stick to InTransit={0|1}.
But if you prefer {Y|N}, that is fine. Just keep that consistent across the database (do not use {0|1} in one place and {Y|N} in another).
For values that do show up in reports, or columns such as EventType, I would use {InTransit|Location}.
In SQL, for implementation, if it BOOLEAN, the domain (range-of-values) is already constrained. nothing further is required.
If the column were other BOOLEAN,` you have two choices:
CHECKConstraint
CHECK #InTransit IN ( "Y", "N" )
Reference or Lookup Table
Implement a table that contains only the valid domain. The requirement is a single column, the Code itself. And you can add a column for short Descriptor that shows up in reports. CHAR(12)works nicely for me.
ENUM
There is no ENUM in SQL. Some of the non-SQL databases have it. Basically it implements option [2] above, with a Lookup table, under the covers. It doesn't realise that the rows are unique, and so it Enumerates the rows, hence the name, but it adds a column for the number, which is of course an ID replete with AUTOINCREMENT, so MySQL falls into the category of Stupid Thing to Do as described in this answer (scroll down to the Lookup Table section).
So no, do not use ENUM unless you wish to be glued at the hip to a home-grown, stupid, non-SQL platform, and suffer a rewrite when the database is ported to a real SQL platform. The platform might be stupid, but that is not a good reason to go down the same path. Even if MySQL is all you have, use one of the two SQL facilities given above, do not use ENUM.
2) Why is'ItemHistoryTransit' needed as 'Date' column
(DATETIME,not DATE, but I don't think that matters.)
[It] is there in ItemHistory?
The standard method of constraining (everything in the database is constrained) the nature of teh Basetype::Subtype relationship is, to implement the exact same PK of the Basetype in the Subtype. The Basetype PK is(ItemCode, DateTime).
[Why] will only Discriminator not work?
It is wrong, because it doesn't follow the standard requirement, and thus allows weird and wonderful values. I can't think of an instance where that could be justified, even if a replacement constraint was provided.
Second, there can well be more than two occs of ItemEventsthat are InTransitper ItemCode,`which that does not allow.
Third, it does not match the Basetype PK value.
Solution
Actually, a better name for the table would be ItemEvent. Labels are keys to understanding.
I have given the Predicates, please review carefully.
Data model updated.
Item Event Data Model
You could add a boolean field for in_transit to item_location_history so when it is getting moved from Location A to Location B, you set the LocationId to Location B (so you know where it is going) but then when it actually arrives you log another row with LocationId as LocationB but with in_transit as false. That way you know when it arrived also.
If you don't need to know where it is headed when it is "in transit" then you could just add "In Transit" as a location and keep your schema the same. In the past with an inventory applicaiton, I went as far as making each truck a location so that we knew what specific truck the item was in.
One of the techniques I've adopted over the years is to normalize transitional attributes (qty, status, location, etc.) from the entity table. If you also want to track the history, just version (versionize?) the subsequent status table.
create table ItemLocation(
ItemID int,
Effective date,
LocationID int,
Remarks varchar( 256 ),
constraint PK_ItemLocation primary key( ItemID, Effective ),
constraint FK_ItemLocation_Item foreign key( ItemID )
references Items( ID ),
constraint FK_ItemLocation_Location foreign key( LocationID )
references Locations( ID )
);
There are several good design options, I've shown the simplest, where "In transit" is implied. Consider the following data:
ItemID Effective LocationID Remarks
====== ========= ========== ===============================
1001 2015-04-01 15 In location 15
1001 2015-04-02 NULL In Transit [to location xx]
1001 2015-04-05 17 In location 17
Item 1001 appears in the database when it arrives at location 15, where it spends one whole day. The next day it is removed and shipped. Three days later it arrives at location 17 where it is remains to this day.
Implied meanings are generally frowned upon and are indeed easy to overdo. If desired, you can add an actual status field to contain "In location" and "In Transit" values. You may well consider such a course if you think there could be other status values added later (QA Testing, Receiving, On Hold, etc.). But for just two possible values, In Location or In Transit, implied meaning works.
At any rate, you know the current whereabouts of any item by fetching the LocationID with the latest Effective date. You also have a history of where the item is at any date -- and both can be had with the same query.
declare AsOf date = sysdate;
select i.*, il.Effective, IfNull( l.LocationName, 'In Transit' ) as Location
from Items i
join ItemLocation il
on il.ItemID = i.ID
and il.Effective =(
select Max( Effective )
from ItemLocation
where ItemID = il.ItemID
and Effective <= AsOf )
left join Locations l
on l.ID = il.LocationID;
Set the AsOf value to "today" to get the most recent location or set it to any date to see the location as of that date. Since the current location will be far and away the most common query, define a view that generates just the current location and use that in the join.
join CurrentItemLocation cil
on cil.ItemID = i.ID
left join Locations l
on l.ID = cil.LocationID;

Many highly similar objects in the same database table

Hello, stackoverflow community!
I am working on a rather large database-driven web application. The underlying database is growing in complexity as more components are being added, but so far I've had absolutely no trouble normalizing the data quite nicely.
However, this final component implies a table that can hold products.
Each product has a category, and depending on the category, has different fields.
Making a table for each product category doesn't seem right, as there are currently five types, and they still have quite a lot of fields in common. (but in weird ways - a few general fields such as description and price are common to all 5 categories, but some attributes are shared between 1 and 2, others 3,4,5 and so on).
I'm trying to steer away from the EAV model for obvious performance reasons.
The thing is that according to what product type the user wants to enter into the database there is a somewhat (but not completely) different field structure - all of them have a name and general description, but other attributes such as "area covered" can be applied only to certain categories such as seeds and pesticides, but not fuel, which would have a diesel/gasoline boolean and a bunch of other fuel-related attributes.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
My current idea would be to have the product table contain all the fields from all the possible categories, and then just have another table to describe which category from the product table has which fields.
product: id | type | name | description | price | composition | area covered | etc.
fields: id | name (contains a list of the fields in the above table)
product-fields: id | product_type | field_id (links a bunch of fields to the product table based on the product type)
I reckon this wouldn't be too slow, easy to search (no need to actually join the other tables, just perform the search on the main product table based on some inputs) and it would facilitate things like form generation and data validation with just one lightweight additional query /join. (fetch a product from the db and join a concatenated list of the fields actually used in a string - split that and display the proper form fields based on what it contains, i.e. the fields actually associated with that product.
Thanks for your trouble!
Andrei Bârsan
EAV can actually be quite good at storing data and fetching that databack again when you know the key. It also excels in it's ability to add fields without changing the schema. But where it's quite poor is when you need the equivilent of WHERE field1 = x and field2 = y.
So while I agree the data behaviour is important (how many products share the same fields, etc), the use of that data is also important.
Which fields need searching, which fields are always just data storage, etc
In most cases I'd suggest keeping all fields that need searching, in combination with each other, in the same table.
In practice this often leads to a single table solution.
New fields require schema changes, new indexes, etc
Potential for sparsely populated data, using more space than is 'required'
Allows simple queries, simple indexing and often the fastest queries
Often, though not always, the space overhead is marginal
Where the sparse-data overheads reach a critical point, I would then head towards additional tables grouped by what fields they contain. More specifically, I would not create tables by product. This is on the dual assumption that most/all fields will be shared across at least some products, and that those fields will need searching.
This gives a schema more like...
Main_table ( PK, Product_Type, Field1, Field2, Field3 )
Geo_table ( PK, county, longitute, latitude )
Value ( PK, cost, sale_price, tax )
etc
You may also have a meta-data table describing which product types have which fields, etc.
What this schema allows is a more densly populated set of tables, which can be easily indexed and so quickly searched, while minimising table clutter and joins by grouping related fields.
In the end, there isn't a true answer, it's all a balancing act. My general rule of thumb is to stay with a single table until I actually have a real and pressing reason not to, not just a theoretical one.
In my experience unless you are writing a a complete framework that can render fully described fields (we are talking about a lot of metadata describing each field) it is not worth separating field definitions from the main object. Modern frameworks (like Grails) allow for virtual zero pain adding a new column to a domain/Model class and table.
If your common field overlap is about 80% between all object types I would put them all in 1 table and use Table per Hierarchy inheritance model, where a descriminator field helps you tell your object types apart. On the other hand if you have 20% overlap of common fields then go with Table per Class inheritance model with base class and table containing common fields. And other joint tables hang off the base.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
This is called a SuperType - SubType relationship. It works very well if most of your queries are one of two types:
If you will be querying mostly the SupetType table and only drilling down into the SubType table infrequently.
If you will be querying the database after being filtered to a specific SubType.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please don´t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

What Db structure should I use with 2 the same comment tables for 2 different parent tables

This is a tough design question for a application I'm working on. I have 2 different items in my app that both will use comments. What but I can't decide how to design my database.
There are 2 possibilities here. The first is a different comment table for every table that needs comments (normalized way):
movies -> movie_comments
articles -> article_comments
The second way I was thinking of was the use of a generic comments table and then have a many 2 many relationship for the comment and movie|article relations. Eg
comments
comments_movies (movie_id, comment_id)
comments_articles (article_id, comment_id)
What is your opinion on that the best method would be and can you give a good reason so I can decide.
i personally opt for 2nd solution
comments
comments_movies (movie_id, comment_id)
comments_articles (article_id, comment_id)
it is much more simple to maintain only on table model for logical Comment model e.g. when You wan't to add some feature to comments You just do it once or when You wan't count comments for specific user is much more easier because there are in one table
of course someone else could write his advantages of keeping that in multiple tables but You asked for opinions so here is mine :)
Keeping them separate has the benefit of supporting change without impacting the comments for the other entity (movie vs articles). Assuming there are differences in attributes for a comment against an article vs. a movie. Otherwise...
I suppose there could be a need for displaying a comment with an article and a movie. But the consolidation would also support if you want to provide comment functionality for other entities in the future.
The answer depends on what you need currently, and a best guess of what you want to do in the future. More details help us to know what to suggest.
There is no "best" method, because it is a straight-forward Normalisation question: the proposal is either correctly Normalised or it is not.
Actually, the first option is not Normalised, the Normalisation is not complete. You have identical repeating groups of columns in two tables which have not been identified and grouped into a single table.
The second option is Normalised. You have identified that, and placed them in a single table.
at the logical level then, you have a many-to-many relation (not a table) between Movie and Comment, and between Article and Comment. End of story at the logical level.
at the physical level, where n::n relations are implemented as Associative tables, you have CommentMovie and CommentArticle.
as the Db expands and grows, life is simple, because:
any new column that is 1::1 with Movie.PK is placed in Movie
any new column that is 1::1 with Article.PK is placed in Article
any new column that is 1::1 with Comment.PK is placed in Comment
any new column that is 1::1 with CommentArticle.PK (the relation; PK is as shown (ArticleId, CommentId) ) is placed in CommentArticle. This (adding attributes to an n::n relation) will now cause the table to show up on the Logical model.
any new column that is 1::1 with CommentMovie.PK (the relation; PK is as shown (MovieId, CommentId) ) is placed in CommentMovie. This (adding attributes to an n::n relation) will now cause the table to show up on the Logical model.
I would suggest your second choice:
movies -> movie_comments -> comments
articles -> article_comments -> comments
One comments table, two pivot tables(many to many).
This will keep all the same data in one table and just loosely linking them. If you can get away with joins I usually recommend that for things that don't need to scale because joining can be a performance hit and a nightmare in cases. But this would be best for your case.
comment_table
-------------
comment_id (int)
object_id (int)
comment (varchar(max))
type (int)
--------------
object_id refers to object such as movie ,i articles and so on.
type equals 1: comment was done to movie ,
type equals 2: comment was done to article
You can design your tables like this.