While trying to move table fields that are to be translated into several languages into *_i18n tables I came to situation when my tables(author, abstract_book, publisher) were left with only 1 field. My teacher says that it is a bad practice to have tables that do not have any fields but id. Is there any better way to do this?
I was about to say an author's name (or a person's name at that) is always the same, but then I remembered that Jules Verne is really called Julio Verne in some Spanish speaking countries and we call Mr. Фёдор Михайлович Достоевский Dostojewski in German and Dostoevsky in English.
While ID-only tables look a bit strange, they can still serve for referential integrity. You cannot have a book with an author ID that doesn't exist for instance. But you could still have instances without any language entry. E.g. Book 1 was written by authors 2 and 3, but we don't know the book's title or the authors' names. That's the big drawback of this design.
For this reason I'd add columns to the tables for identification. There are three approaches to this:
Add the real name (author's name, book's title, publisher's name) to the table. That would be the author 'Достоевский' for instance.
Add the name in a default language (e.g. English) to the table. That would be the author 'Dostoevsky' for instance. With this approach you would ensure that data for the default language would be complete. (While data in other languages may have gaps.)
Add a reference to the i18n table's row (again original language or default language) to the table. This, however, has the problem that you want to have the reference not nullable, so the tables would be cross referenced. This calls for deferrable constraints and these are not available in MySQL.
Whichever of these three approaches you prefer, they all do the same: The table has an ID and a name/title. Thus you have a default, when an I18N entry is missing. And you can detect errors. If author 1 is called 'Jules Verne' in the author table and his French name is 'Jules Verne' and his English name is 'Mary Shelley', then you know which entry is incorrect :-)
Related
I have two sets of data that are near identical, one set for books, the other for movies.
So we have things such as:
Title
Price
Image
Release Date
Published
etc.
The only difference between the two sets of data is that Books have an ISBN field and Movies has a Budget field.
My question is, even though the data is similar should both be combined into one table or should they be two separate tables?
I've looked on SO at similar questions but am asking because most of the time my application will need to get a single list of both books and movies. It would be rare to get either books or movies. So I would need to lookup two tables for most queries if the data is split into two tables.
Doing this -- cataloging books and movies -- perfectly is the work of several lifetimes. Don't strive for perfection, because you'll likely never get there. Take a look at Worldcat.org for excellent cataloging examples. Just two:
https://www.worldcat.org/title/coco/oclc/1149151811
https://www.worldcat.org/title/designing-data-intensive-applications-the-big-ideas-behind-reliable-scalable-and-maintainable-systems/oclc/1042165662
My suggestion: Add a table called metadata. your titles table should have a one-to-many relationship with your metadata table.
Then, for example, titles might contain
title_id title price release
103 Designing Data-Intensive Applications 34.96 2017
104 Coco 34.12 2107
Then metadata might contain
metadata_id title_id key value
1 103 ISBN-13 978-1449373320
2 103 ISBN-10 1449373320
3 104 budget USD175000000
4 104 EIDR 10.5240/EB14-C407-C74B-C870-B5B6-C
5 104 Sound Designer Barney Jones
Then, if you want to get items with their ISBN-13 values (I'm not familiar with IBAN, but I guess that's the same sort of thing) you do this
SELECT titles.*, isbn13.value isbn13
FROM titles
LEFT JOIN metadata isbn13 ON titles.title_id = metadata.title_id
AND metadata.key='ISBN-13'
This is a good way to go because it's future-proof. If somebody turns up tomorrow and wants, let's say, the name of the most important character in the book or movie, you can add it easily.
The only difference between the two sets of data is that Books have an
IBAN field and Movies has a Budget field.
Are you sure that this difference that you have now will not be
extended to other differences that you may have to take into account
in the future?
Are you sure that you will not have to deal with any other type of
entities (other than books and movies) in the future which will
complicate things?
If the answer in both questions is "Yes" then you could use 1 table.
But if I had to design this, I would keep a separate table for each entity.
If needed, it's easy to combine their data in a View.
What is not easy, is to add or modify columns in a table, even naming them, just to match requirements of 2 or more entities.
You must be very sure about future requests/features for your application.
I can't image what type of books linked with movies you store thus a lot of movies have different titles than books which are based on. Example: 25 films that changed the name.
If you are sure that your data will be persistent and always the same for books and movies then you can create new table for example Productions and there store attributes Title, Price, Image, Release Date, Published. Then you can store foreign keys of Production entity in your tables Books and Movies.
But if any accident happen in the future you will need to rebuild structure or change your assumptions. But anyway it will be easier with entity Production. Then you just create new row with modified values and assign to selected Book or Movie.
Solution with one table for both books and movies is the worst, because if one of the parameters drive away you will add new row and you will have data for first set (real book and non-existing movie) and second set (non-existing book and real movie).
Of course everything is under condition they may be changes in the future. If you are 100% sure, then 1 table is enough solution, but not correct from the database normalization perspective.
I would personally create separate tables for books and movies.
I have got an issue how to change a model of database:
For now we have predefined table Categories
and let's say tables Places and People which can be assigned to categories so it looks like this:
People <=> PeopleCategories <=> Categories <=> PlaceCategories <=> Places
(People can have many categories, categories can have many people, places can have many categories, categories can have many places)
But now there is a new requirement:
On person profile show all corresponding places based on categories (so far no problem) and add a tick box modeling some attribute (for example show on front-end as favorite place). The same from the other side on Place profile mark people assigned to at least one same category with a tick box.
I wonder whether there is some nice way to model this - the only thing which came to my mind is to add a new PeoplePlaces table but then I have to manually control whether people or places did not change their categories and they are still assigned and so on - There will be quite a problem with consistency of data which I will have to manage on application layer.
The second thing I could probably do is to delete categories totally and make it only on PeoplePlaces level but I will lose some simplicity for user: there are like 10 predefined categories which user can select so the linking between People and Places is quite automatic on front-end and only admin should see which places are assigned to which people and manage that tick box I was talking about
What would you suggest for this architecture? Thanks in advance! (It is a MySQL db if it is important for some kind of solution but this is more a general architecture thing)
If I understood your question correctly, you need to ensure that a person can only favor a place that is connected to the same category as the person herself?
If so, take a look at the following model:
We don't link the "endpoints" directly, and instead "link the links". This allows us to migrate PERSON_CATEGORY.CATEGORY_ID and PLACE_CATEGORY.CATEGORY_ID into the FAVORED_PLACE table, and "merge" them there, producing a single FAVORED_PLACE.CATEGORY_ID field (note FK1,FK2in the diagram above).
As a consequence, if a person is connected to a place, that must be done through a common category.
Furthermore, since CATEGORY_ID is outside PERSON_CATEGORY's PK, a particular combination of person and place can be used only once, even if they match through multiple categories. Effectively, you pick one common category as "special". If a place (or person) is removed from the special category, you'll need to pick another common category to serve as special. If there are no common categories left, the corresponding row in FAVORED_PLACE will not be allowed to exist anymore.
I don't think deleting Categories is a good idea.
What you are doing is introducing a new entity - PersonsFavouritePlaces - which relates People and Place directly rather than via a Category. It is sensible that a PersonsFavouritePlace be limited to a Person and a Place linked by Category, so it should probably reference PeopleCategories and PlaceCategories rather than the People and Category tables.
The table would look like:
create table PeopleFavourtiePlace
(
ID int not null, -- Primary key
PeopleCategoriesId int not null, -- FK to PK of PerpleCategories
PlaceCategoriesId int not null -- FK to PK of PlaceCategories
)
I don't know whether MySql supports cascading deletes, but if so the two FK's should have that turned on so when someone deselects a category (deleting the PeopleCategories row) if it linked to a favourite place in that category it too gets deleted.
However, if a person links to a place via multiple categories then it gets complicated....
I have a site written in cakephp with a mysql database.
Into my site I want to track the activities of every users, for example (like this site) if a user insert a product I want to put this activity into my database.
I have 2 ways:
1) One table called Activities with:
- id
- user_id
- title
- text
- type (the type of activity: comment, post edit)
2) more table differenced by activities
- table activities_comment
- table activities_post
- table activities_badges
The problem is when I go to the page activities of a user I can have different type of activities and I don't know which of this solution is better because a comment has a title and a comment, a post has only a text, a badge has an external id to its table (for example) ecc...
Help me please
I'm not familiar with CakePHP, but from purely database perspective your data model should probably look similar to this:
The symbol denotes category (aka. inheritance, subclass, subtype, generalization hierarchy etc.). Take a look at "Subtype Relationships" in ERwin Methods Guide for more info.
There are generally 3 strategies for implementing the category:
All types in single table. This requires a lot of NULLs and requires CHECKs to make sure separate subtypes are not inappropriately "intermingled".
All concrete types in separate tables (excluding the base, which is ACTIVITY in your case), which means common fields and relationships must be repeated in all child tables.
All types in separate tables (including the base). This implementation requires a little more JOINing, but is flexible and clean. It should be your default, unless there are strong reasons against it.
I'm creating a list of members on my site, and I want to enable them to look for eachother by first name and last name or either one. The catch is that a user can have several names, like names and then nicknames, also a person can have more than one lastnames, their maiden name and then the lastname after marriage.
Once users fillout their names and last names, each user could have several names and last names, for example There could be a person with 3 names and 2 lastnames - names: Eleonora, Ela, El and lastnames: Smith, Brown.
Then if someone looks for Ela Brown, Eleonora Brown, Eleonora Smith or any other combination, they should find this person.
My question, is how should I set this all up in sql (mysql) so tha schema and search is efficient and fast? Didn't want to reinvent a wheel so I turned to pros and asking a question here.
Thanks guys
P.S. I guess the standard solution would be to have a user table, fname table, lname table, userfname table with userid and fnameid and userlname table with userid and lnameid, but I'm not sure if this is the best way to do this and wether or not search would be fast...
Do you need to differentiate between first names and last names?
I would suggest a Users Table having UserID
and also some UsersNames Table having UserID and Name, a one-to-many relationship.
If you need, you could also add a IsLastName bit to the UsersNames table (or just a LastName column, but the bit is better imho)....
But this way you search one table and can easily locate user ID's, plus you don't limit the number of names each user can have.
EDIT:
You could easily take your input string and split it out too. So if somebody put in "John Smith" you could search for both or either name simply by splitting the string and using it in the WHERE clause using either OR or AND depending on your intended functionality.
The last time I did somethig like this I processed each name into a single column in a NAMES table. All names, first/last/middle. A second table hold a link to the person record in the PERSONS table.
So each NAME field get linked to one or more PERSONS record. If I search for "Scott" I would find the name Scott in the NAMES table, find the links in the NAMES_TO_PERSONS(/PEOPLE?) table and then return all the records for that name. ie: Scott Bruns, John Scott, David Scott Smith.
It worked very well with only a small amount of pre processing.
Text searching is what you need - use Lucene. I've used Lucene on several projects and it's truly amazing - not hard to use and ridiculously fast.
If in your data model the users may have multiple but bounded number of name types then the simplest solution would be to create indecies for each column that stores the name type. You would add a field for first name, last name, nickname, maiden name, etc. This model would be more performant than having a one-many names association.
You may also evaluate if there are general search requirements for the rest of the application or if you would like the search to be more flexible. In this case you can look into using a backend indexing process, such as with Lucene or using full text search. Initially, I would try to avoid this if possible, because it certainly complicates the project.
This is my first question to stackoverflow so if i do something wrong please let me know i will fix it as soon as possible.
So i am trying to make a database for Tv Shows and i would like to know the best way and to make my current database more simple (normalization).
I would to be able to have the following structure or similar.
Fringe
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
Burn Notice
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
... (More Tv Shows)
Sorry if this seems unclear. (Please ask for clarification)
But the structure i have right now is 3 tables (tvshow_list, tvshow_episodes, tvshow_link)
//tvshow_list//
TvShow Name | Director | Company_Created | Language | TVDescription | tv_ID
//tvshow_episodes//
tv_ID | EpisodeNum | SeasonNum | EpTitle | EpDescription | Showdate | epid
//tvshow_link//
epid | ep_link
The Director and the company are linked by an id to another table with a list of companies and directors.
I am pretty sure that there is an more simplified way of doing this.
Thanks for the help in advance,
Krishanthan Lingeswaran
The basic concept of Normalization is the idea that you should only store one copy of any item of data that you have. It looks like you've got a good start already.
There are two basic ways to model what you're trying to do here, with episodes and shows. In the database world, we you might have heard the term "one to many" or "many to many". Both are useful, it just depends on your specific situation to know which is the correct one to use. In your case, the big question to ask yourself is whether a single episode can belong to only one show, or can an episode belong to multiple shows at once? I'll explain the two forms, and why you need to know the answer to that question.
The first form is simply a foreign key relationship. If you have two tables, 'episodes' and 'shows', in the episodes table, you would have a column named 'show_id' that contains the ID of one (and only one!) show. Can you see how you could never have an episode belong to more than one show this way? This is called a "one to many" relationship, i.e. a show can have many episodes.
The second form is to use an association table, and this is the form you used in your example. This form would allow you to associate an episode with multiple shows and is therefore called a "many to many" relationship.
There is some benefit to using the first form, but it's not really that big of a deal in most cases. Your queries will be a little bit shorter because you only have to join 2 tables to get episodes->shows but the other table is just one more join. It really comes down to figuring out if you need a "one to many" or "many to many" type relationship.
An example of a situation where you would need a many-to-many relationship would be if you were modeling a library and had to keep track of who checked out which book. You'd have a table of books, a table of users, and then a table of "books to users" that would have an id, a book_id, and a user_id and would be a many-to-many relationship.
Hope that helps!
I am pretty sure that there is an more simplified way of doing this.
Not as far as I know. Your schema is close to the simplest you can make for what I presume is the functionality you're asking for. "Improvements" on it really only make it more complicated, and should be added as you judge the need emerges on your side. The following examples come to mind (none of which really simplify your schema).
I would standardize your foreign key and primary key names. An example would be to have the columns shows.id, episodes.id, episodes.show_id, link.id, link.episode_id.
Putting SeasonNum as what I presume will be an int in the Episodes table, in my opinion, violates the normalization constraint. This is not a major violation, but if you really want to stick to it, I would create a separate Seasons table and associate it many-to-one to the Shows table, and then have the Episodes associate only with the Seasons. This gives you the opportunity to, for instance, attach information to each season. Also, it prevent repetition of information (while the type of the season ID foreign key column in the Episodes table would ostensibly still be an INT, a foreign key philosophically stores an association, what you want, versus dumb data, what you have).
You may consider putting language, director, and company in their own tables rather than your TV show list. This is the same concern as above and in your case a minor violation of normalization.
Language, director, and company all have interesting issues attached to them regarding the level of the association. Most TV shows have different directors for different episodes. Many are produced in multiple languages and by several different companies and sometimes networks. So at what level do you plan on storing this information? I'm not a software architect, so someone else can better answer this question than me, but I'd set up a polymorphic many-to-many association for languages, directors, and companies and an inheritance model that allows for these values to be specified on an episode-by-episode, season-by-season, or show-by-show basis, inheriting the value from its parent if none are provided.
Bottom line concerning all these suggestions: Pick what's appropriate for your project. If you don't need the functionality afforded by this level of associations, and you don't mind manually entering in repetitive data (you might end up implementing an auto-complete system to help you), you can gloss over some of the normalization constraints.
Normalization is merely a suggestion. Pick what's right for you and learn from your mistakes.