database design for tagging multiple sources (MySQL)

database design for tagging multiple sources (MySQL) - mysql

I'm working on a project where I have the following (edited) table structures: (MySQL)
Blog
id
title
description
Episode
id
title
description
Tag
id
text
The idea is that that tags can be applied to any Blog or Episode (and to other types of sources), new tags can be created by the user if it doesn't exist already in the tag table.
The purpose of the tags is that a user will be able to search the site, and the results will search across all types of material on the site. Also, at the bottom of each blog article/episode description it would have a list of tags for that item.
I'd thought too much about the search mechanism, but I guess it'd be flexible between an OR and AND searches, if that has any impact on choices, and probably allow the user to filter the results for particular types of sources.
Originally I was planning to create multiple tag mapping tables:
BlogTag
id
tag_id
blog_id
EpisodeTag
id
episode_id
tag_id
But now I wonder if I would be better off with:
TaggedStuff
id
source_type
source_id
tag_id
Where source_type would be an integer related to whether it was an Episode, Blog, or some other type that I've not included in the structures above, and source_id would be the reference in that particular table.
I'm just wondering what the optimum structure would be for this, the first choice or the second?

In a clean (academic) design you would often see to have a supertype Resource (or something similar) for Blog and Episode with it's own table. Another table for the tags. And since it's a N:M relationship between Tag and Resource you have an extra mapping table between them.
So in such a design you would associate the Tag-Entities with your resources by having a relationship to their generalization.
After that you can put general attributes to the generalization. (i.e. title, description)
You can add attributes to the relationship between Tag and Resource like a counter how often a specific resource was tagged with a specific tag. Or how often a tag was used and and and (i.e. something like you see on stackoverflow in the upper right here)

The biggest loss in going with structure 2 is loss of referential integrity. If you can say "whatever" to that, it might be easier to go with this structure.
When I say structure 2 I mean:
TaggedStuff
id
source_type
source_id
tag_id

If I understand you correctly, the point is to optimize search mechanism...
So it has sense to make some kind of index_table and demoralize the data there...
I mean smth like this:
Url, Type, Title, Search_Field etc..
where Url is the path to the article or episode, Type (article|episode), Name (what users will see), Search_Field ( list of tags, other important data for search )
thats why both variants are quite good)))

Related

Database design to assign specific tags to an item

I am trying to build a little system that would assign specific tags to an item, or a person to be precise. So, I have a list of persons and a list of tags. I need to assign 3 specific tags to each person (that correspond to 3 different skills this person might have).
In a nutshell, the output would look like this :
Person 1 | webdesign, ux, jquery
Person 2 | blogging, photography, wordpress
Person 3 | graphic-design, 3d, inventor
...
For now, those lists are stored in two different tables :
persons
-------
person_id
person_name
tags
-------
tag_id
tag_name
My main goal is to avoid repetition and to simply assign 3 existing tags to an existing person.
Could you give me a few hints on how to implement this? I know that a three-table design is common for a tagging system, but is it relevant in my situation?
Thanks for your help.

If you want to ensure that you don't have any duplicates and to be able to add N tags to a person, then to properly implement a normalized design you would need a third table to link the tags to each person
persons_2_tags
--------------
person_id
tag_id
To guarantee uniqueness, you can either use a composite primary key, or add a unique index to the table including both columns.
See an example of the above in this SQL Fiddle.
If you need to enforce the 3 tag limit at the database level, you can add a third column to the persons_2_tags table (tag_number for example) that is an enum with values of 1, 2, 3 and add that to your unique index. Insert logic would need to be handled at the application level, but would be enforced by the index.

Do your requirements specify "exactly" 3 tags?
The third table is recommended to stay normalized. It's a typical Many-to-many relationship. This offeres the greatest amount of flexibility since you can have an unlimited, yet unique list of user/tag pairs.
You could have 3 columns in the user table for each tag. Performance would be improved at the expense of flexibility. Queries like, "List all the users with tag = 'X'" are a little harder. There may be several null values if you allow fewer than 3 tags. Of course in this setup, you'll have to create a new column and a lot of code to expand beyond three columns.

I think that I would probably do the three table design which Jeff O mentioned, however, just to present an alternative view...
If you're just talking about tags, that is, a short string with no other meta data, I don't know that you'd need a tags table. Essentially, the tag itself could be the its id.
persons (person_id, person_name);
tags (person_id, tag);
Yeah, you'd get a bit of repetition there, but they're short strings anyway and it should really make a difference.

How to build database for variant management in a webshop

I am searching for a guideline on how to set up my database for a auction side.
My problem is, that there is a lot of different product types - let's say paintings, clothes, computers etc. They have different specifications, and it should be possible to set just Product A in size L on auction - or the whole stock of Product B e.g.
How should I build my database for optimal performance - and coding - in this case?

I would suggest the following database/object structure:
[Auction] n..1 [Category] 1..n [Variation Attribute] 1..n [Attribute Value]
An auction then has a category and several attribute values referring the variation attribute as well:
[Auction] = [Category], [Name], [Description]
[Auction_AttrVal] = [AuctionID], [VarAttrID], [AttrValID]
First of all you can have some kind of category table, which holds items like "Paintings", "Clothes", "Computers". An auction / product is assigned to one category.
Each category then defines variation attributes for this specific category. An example would be "Size" for the category "Clothes" or "CPU" for the category "Computers". You can also add predefined values for the variation attributes to limit the number of variations and avoid differentiations like "3GhZ" vs "3 GhZ".
This mechanism also allows for easy filtering of search results. You select a category and simply load all variation attributes as filters (or add a flag to an attribute to declare it as such) and offer the values for filtering to the end-user.
Furthermore you can make variation attributes for a category mandatory to force users who create the auctions (I'm assuming it's Consumer-to-Consumer) to provide sufficient information for their auction.
The code will probably be quite generic and simple. The database structure is highly flexible and extensible. Performance is much better than having all in one table. You probably should create an index (for the field AuctionID) for the Auction_AttrVal table. Please let me know if the database structure is not explained properly.

How to structure table Activities in a database?

I have a site written in cakephp with a mysql database.
Into my site I want to track the activities of every users, for example (like this site) if a user insert a product I want to put this activity into my database.
I have 2 ways:
1) One table called Activities with:
- id
- user_id
- title
- text
- type (the type of activity: comment, post edit)
2) more table differenced by activities
- table activities_comment
- table activities_post
- table activities_badges
The problem is when I go to the page activities of a user I can have different type of activities and I don't know which of this solution is better because a comment has a title and a comment, a post has only a text, a badge has an external id to its table (for example) ecc...
Help me please

I'm not familiar with CakePHP, but from purely database perspective your data model should probably look similar to this:
The symbol denotes category (aka. inheritance, subclass, subtype, generalization hierarchy etc.). Take a look at "Subtype Relationships" in ERwin Methods Guide for more info.
There are generally 3 strategies for implementing the category:
All types in single table. This requires a lot of NULLs and requires CHECKs to make sure separate subtypes are not inappropriately "intermingled".
All concrete types in separate tables (excluding the base, which is ACTIVITY in your case), which means common fields and relationships must be repeated in all child tables.
All types in separate tables (including the base). This implementation requires a little more JOINing, but is flexible and clean. It should be your default, unless there are strong reasons against it.

How to store these field descriptions in mysql?

Apologize for the long topic, I didn't intend for it to be this long, but it's a pretty simple issue I've been having. :)
Let's say you have a simple table called tags that has columns tag_id and tag. The tag_id is simply an auto increment column and the tag is the title of the tag. If I need to add a description field, that would be around 1-2 paragraphs on average (max around 3-4 paragraphs probably), should I simply add a description field to the table or should I create a new table called tag_descriptions and store the descriptions with the tag_id?
I remember reading that it is better to do this because if you do a query that doesn't select the description, that description field will still slow down mysql. Is this true? I don't even remember where I read that from, but I've been kind of following it for a couple years now... Finally I question if I need to do this, I have a feeling I don't. You'd also need to inner join whenever you need the description field.
Another question I have is, is it generally bad to create new tables that will only hold very few rows at the max? What if this data doesn't fit anywhere else?
I have a simple case below which relates to these two questions.
I have three tables content, tags, and content_tags that make up a many to many relationship:
content
content_id
region (enum column with
about 6-7 different values and most
likely won't grow later on)
tags
tag_id
tag
content_tags
content_id
tag_id
I want to store a description around 1-2 paragraphs for each tag, but also for each region. I'm wondering what would be the best way to do this?
Option A:
Just add a description column to the
tags table
Create a new table for
region_descriptions
Option B:
Create a new table called
descriptions with fields: id,
description, and type
The id would be id of the content or
id of the enum field
The type would be whether it is a tag
description, or region description
(Would use the enum column for this)
Maybe have a primary key on the id and type?
Option C:
Create a new table for tag_descriptions
Create a new table for region_descriptions
Option A seems to be a good choice if adding the description column doesn't slow down mysql select queries that don't need the description.
Assuming the description column would slow down mysql, option B might be a good choice. It also removes the need for a small table with just 6-7 rows that would hold the region descriptions. Although now that I think of it, would it be slow to connect to this table if originally to get a region description you'd only need to go through very little rows.
Option C would be ideal if the description columns would slow down mysql and if a small table like region descriptions would not matter.
Maybe none of these options are the best, feel free to offer another option. Thanks.
P.S. What would be an ideal column type to use to hold data that usually 1-2 paragraphs, but might be a little more sometimes?

I don't think it really matters if you don't handle thousands of queries per minute. If you are going to have a zillion queries per minute, then I would implement the various options and perform benchmarks for all these options. Based on the results, you can make a decision.

In my (admittedly somewhat uninformed) opinion, it really depends on how much you'll be using both of them.
If properly indexed, that JOIN should not be very expensive. Also, a larger table will be slower. It inhibits caching, and takes longer to access stuff, although indexing seriously mitigates this problem.
If you'll be joining tag names to tag IDs a LOT, and only rarely will be using the descriptions, I'd say go with separate tables. If you'll be using the descriptions more often, go with one table.

For the first part of your question: if you have a tag with an id, a name and a description, you should save it in 1 table.
Now, this query
SELECT name FROM tags WHERE id = 1;
will NOT slow down if you have 1, 2 or 20 extra fields in there.

What Db structure should I use with 2 the same comment tables for 2 different parent tables

This is a tough design question for a application I'm working on. I have 2 different items in my app that both will use comments. What but I can't decide how to design my database.
There are 2 possibilities here. The first is a different comment table for every table that needs comments (normalized way):
movies -> movie_comments
articles -> article_comments
The second way I was thinking of was the use of a generic comments table and then have a many 2 many relationship for the comment and movie|article relations. Eg
comments
comments_movies (movie_id, comment_id)
comments_articles (article_id, comment_id)
What is your opinion on that the best method would be and can you give a good reason so I can decide.

i personally opt for 2nd solution
comments
comments_movies (movie_id, comment_id)
comments_articles (article_id, comment_id)
it is much more simple to maintain only on table model for logical Comment model e.g. when You wan't to add some feature to comments You just do it once or when You wan't count comments for specific user is much more easier because there are in one table
of course someone else could write his advantages of keeping that in multiple tables but You asked for opinions so here is mine :)

Keeping them separate has the benefit of supporting change without impacting the comments for the other entity (movie vs articles). Assuming there are differences in attributes for a comment against an article vs. a movie. Otherwise...
I suppose there could be a need for displaying a comment with an article and a movie. But the consolidation would also support if you want to provide comment functionality for other entities in the future.
The answer depends on what you need currently, and a best guess of what you want to do in the future. More details help us to know what to suggest.

There is no "best" method, because it is a straight-forward Normalisation question: the proposal is either correctly Normalised or it is not.
Actually, the first option is not Normalised, the Normalisation is not complete. You have identical repeating groups of columns in two tables which have not been identified and grouped into a single table.
The second option is Normalised. You have identified that, and placed them in a single table.
at the logical level then, you have a many-to-many relation (not a table) between Movie and Comment, and between Article and Comment. End of story at the logical level.
at the physical level, where n::n relations are implemented as Associative tables, you have CommentMovie and CommentArticle.
as the Db expands and grows, life is simple, because:
any new column that is 1::1 with Movie.PK is placed in Movie
any new column that is 1::1 with Article.PK is placed in Article
any new column that is 1::1 with Comment.PK is placed in Comment
any new column that is 1::1 with CommentArticle.PK (the relation; PK is as shown (ArticleId, CommentId) ) is placed in CommentArticle. This (adding attributes to an n::n relation) will now cause the table to show up on the Logical model.
any new column that is 1::1 with CommentMovie.PK (the relation; PK is as shown (MovieId, CommentId) ) is placed in CommentMovie. This (adding attributes to an n::n relation) will now cause the table to show up on the Logical model.

I would suggest your second choice:
movies -> movie_comments -> comments
articles -> article_comments -> comments
One comments table, two pivot tables(many to many).
This will keep all the same data in one table and just loosely linking them. If you can get away with joins I usually recommend that for things that don't need to scale because joining can be a performance hit and a nightmare in cases. But this would be best for your case.

comment_table
-------------
comment_id (int)
object_id (int)
comment (varchar(max))
type (int)
--------------
object_id refers to object such as movie ,i articles and so on.
type equals 1: comment was done to movie ,
type equals 2: comment was done to article
You can design your tables like this.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008