Multiple foreign keys or multiple tables? - mysql

I'm actually stuck on how's better to manage a table creation for a following situation,
let's say i have a table called descriptions and tables like items, types, choices
each item in items or type in types or choice in choices could have one or more description
let's say table items types and choices has the following table structure (each could have specific column for it's table)
+----+------+-------+
| ID | NAME | PRICE |
+----+------+-------+
| 1 | CAR | 5 |
+----+------+-------+
| 2 | BUS | 10 |
+----+------+-------+
and table descriptions has the following one:
+----+-------+---------+---------+
| ID | DESC | DESC_EN | DESC_RU |
+----+-------+---------+---------+
| 1 | CIAO | HELLO | ПРИВЕТ | // This description have to belong to items
+----+-------+---------+---------+
| 2 | PIZZA | PIZZA | ПИЦЦА | // This description have to belong to types
+----+-------+---------+---------+
so at this point my doubt is, should i create 3 columns in descriptions should i create 3 foreign key for items types and choices or should i create 3 separate tables for descriptions for each table?

An alternative solution to the translation problem is to have a table with one row per id, language, and description.
This is handy, particularly for adding a new language. For a new language, nothing needs to change in the database, other than adding rows to some tables.
Another advantage is that a single table can actually hold the translations for multiple different tables -- so all the translations are in one place. That can simplify keeping them up-to-date and ensuring consistency across an application.
One disadvantage is that the translation column has a single collation. That makes it tricky to customize sort-orders (and sometimes comparisons) across languages. Whether this is an an issue also depends on what languages you envision for your application. Some languages, such as Arabic and Hebrew are written right-to-left which can introduce other complications.

Related

Over 2500 tables in mysql

My application stores login information of over 2500 employees in a table named "emp_login".
Now I have to store the activities of every employee on daily basis. For this purpose i have created a separate table for every employee. E.g. emp00001, emp0002... Each table will have about 50 columns.
After digging in alot on stackoverflow I'm kind of confused. Many of the experts say that database having more than 200-300 tables on mysql is considered to be poorly designed.
My question is whether it is good idea to have such a bulk of tables? Is my database poorly designed? Should i choose other database like mssql? Or some alternative idea is there to handle the database of such applications??
Do -not- do it that way. Every employee should be in 1 table and have a primary key index ID ie:
1: Tom
2: Pete
You then assign the actions with a column that references the employees ID number
Action, EmployeeID
You should always group identical entities in a table with index ids and then link properties / actions to those entities by Id. Imagine what you would have to do to search a database that consisted of a different table for every employee. Would defeat the whole point of using SQL.
Event table could look like:
Punchin, 1, 2018/01/01 00:00
That would tell you Tom punched In at 2018/01/01 00:00. This is a very simple example, and you prob wouldn’t wanna structure an event table that way but it should get you on the right track.
This is nothing to do with MySQL but to do with your design which is flawed. You should have one table for all your employees. This contains information unique to the employees such as firstname, lastname and email address.
|ID | "John" | "Smith" | "john.smith#gmail.com" |
|1 | "James" | "Smith" | "james.smith#gmail.com" |
|2 | "jane" | "Jones" | "jane.jones.smith#yahoo.com" |
|3 | "Joanne" | "DiMaggio" | "jdimaggio#outlook.com" |
Note the ID column. Typicially this would be an integer with AUTO_INCREMENT set and you would make it the Primary Key. Then you get a new unique number every time you add a new user.
Now you have separate tables for every piece of RELATED data. E.g. the city they live in or their login time (which I'm guessing you want from the table name).
If it's a one to many relationship (i.e. each user has many login times), you create a single extra table which REFERENCES your first table. This is a DEPENDENT table. Like so:
| UserId | LoginTime |
| 1 | "10:00:04 13-09-2018" |
| 2 | "11:00:00 13-09-2018" |
| 3 | "11:29:07 14-09-2018" |
| 1 | "09:00:00 15-09-2018" |
| 2 | "10:00:00 15-09-2018" |
Now when you query your database you do a JOIN on the UserId field to connect the two tables. If it were only their LAST login time, then you could put it in the user table because it would be a single piece of data. But because they will have many login times, then login times needs to be its own table.
(N.b. I haven't put an ID column on this table but it's a good idea.)
If it's data that ISN'T unique to the each user, i.e. it's a MANY to MANY relationship, such as the city they live in, then you need two tables. One contains the cities and the other is an INTERMEDIARY table that joins the two. So as follows:
(city table)
| ID | City |
| 1 | "London" |
| 2 | "Paris" |
| 3 | "New York" |
(city-user table)
| UserID | CityID |
| 1 | 1 |
| 2 | 1 |
| 3 | 3 |
Then you would do two JOINS to connect all three tables and get which city each employee lived in. Again, I haven't added an ID field and PRIMARY KEY to the intermediary table because it isn't strictly necessary (you could create a unique composite key which is a different discussion) but it would be a good idea.
That's the basic thing you need to know. Always divide your data up by function. Do NOT divide it up by the data itself (i.e. table per user). The thing you want to look up right now is called "Database Normalization". Stick that into a search engine and read a good overview. It wont take long and will help you enormously.

More efficient to have more columns or more rows?

I'm currently redesigning a database which could contain a lot of data - I have the option to either include a number of different columns in the database or use a lot of rows instead. It's probably easier if I did some kind of outline below:
item_id | user_id | title | description | content | category | template | comments | status
-------------------------------------------------------------------------------------------
1 | 1 | ABC | DEF | GHI | 1 | default | 1 | 1
2 | 1 | ZYX | | QWE | 2 | default | 0 | 1
3 | 1 | A | | RTY | 2 | default | 0 | 0
4 | 2 | ABC | DEF | GHI | 3 | custom | 1 | 1
5 | 2 | CBA | | GHI | 3 | custom | 1 | 1
Versus something in the following structure:
item_id | user_id | attribute | value
---------------------------------------
1 | 1 | title | ABC
1 | 1 | description | DEF
1 | 1 | content | GHI
... | ... | ... | ...
I may want to create additional attributes in the future (50 for arguments sake) - so there could be a lot of empty cells if using multiple columns. The attribute names would be reused, where possible, across different types of content - say a blog entry, event, and gallery - title would easily be reused.
So my question is, is it more efficient to use multiple columns or multiple rows - in terms of query speed and disk space. Or would you instead recommend relationship tables, so there's a table for blogs, a table for events, etc. I'm just trying to come up with an easily expandable solution, where I ideally do not want to create a table for every kind of content as I'm thinking of developers creating new kinds of content via an app/API system (with attributes being tightly controlled).
Supplementary Question if Multiple Rows
How could I, in MySQL, convert multiple rows into a usable column format (I guess temporary tables) - so I could do some filtering by content type, as an example.
Basically, mysql has a variable row length as long as one does not change the on a per table level. Thus, empty cols will not use any space (well, almost).
But with blobs or text columns, it might be better to normalize those, as these may have large data to store and this needs to be read / skipped every time a table is scanned. Even if the column is not in the result set and you're doing queries outside of an index, it will take it's time on a large amount of rows.
As a good practice I think it will be fast to put all administrative and often used cols in one table and normalize all the rest. A kind of "vertical" design as in your second example will be complex to read and as soon as you work with temporary tables you will run in to performance issues sooner or later.
For a traditional row-based store, the cost of spooling through rows will depend on their width, so scanning a table with wide rows will take longer than one with narrow rows.
That said, it you're using an index to locate the rows that are of interest, this won't be so much of an issue.
If you normalise your data by replacing columns with keys to rows in other tables, you can reduce the amount of storage if the linked tables end up being significantly smaller than the original table, however any query will need to include the cost of required joins into the related table.
As with all these things, it's a balancing act that depends on your requirements, but understanding what's going on under the hood can certainly help you to make more informed decisions.
This question is very difficult to answer as it all comes down to what you are looking for and how your database will grow in size and complexity over time. I find the best way to answer these types of questions is to read case studies from other successful sites. For example Reddit would be a case study where they use a lot of rows but very little tables and/or columns. The article is here and a question on it is here.
There is also the option of exploring a NoSQL solution which may be more applicable to what you are trying to achieve.
Google case studies of sites that would have a similar structure to your own and see how they accomplished it as they have most likely encountered all the issues you will and already overcome them.

MySql Prevent/track duplicate field across multiple fields

I'm looking for an easy way to check across multiple part tables to determine if a given part number is already present before adding it to a given table.
The current best idea I have come up with is a secondary table that simply lists every PN from all tables in a single column with a unique key; however I was wondering if there is a way to do it without creating a new table and index?
For the visual learner types, I have forty-some tables that more or less follow this pattern:
Table 1
| id | PN | Other Columns |
----------------------------------------------
| 1 | SomePn | ... ... ... ... |
...
Table 2
| id | PN | Still Other Columns |
--------------------------------------------------
| 1 | OtherPn | ... ... ... ... ... |
...
and about forty more as above, with up to 50 columns and up to 8 million records per table.
The goal is, whether through software (Java) or MySql rejecting the records, to prevent duplicate part numbers from creeping in across multiple tables. Is a master PN table the only possible or reasonable solution?
I know that the data structure is not the best design, and a rework is in progress, but I would like to know some suggested best practices as well as suggested solutions for this problem.
Adding a table is your best option.
Table 1:
| t1_id | PN | Other Columns |
----------------------------------------------
| 1 | pn_id | ... ... ... ... |
...
Table 2:
| t2_id | PN | Other Columns |
----------------------------------------------
| 1 | pn_id | ... ... ... ... |
...
Table 3:
| pn_id | PN |
------------------
| 1 | ## |
...
Although this may not be the easiest to implement solution in your case, it is the best overall solution, as you will have no scaling problems now or in the future. If you instead opted for a solution which checked all the tables for duplicate part numbers on update/creation, this would take longer and longer as your tables got bigger.
If you could guarantee that they would never get bigger or you would never add part numbers, you could probably get away with just writing a script to check for duplicates once and not worry about another table at all. But, in the long run, you'll want to add another table just to keep track of the part numbers.

Find records where CSV column values match

I am making a website. In the database I have a table of articles that kind of looks like this:
id | name | cats | etc.
------------------------------------------------------
1 | "alice" | "this, that, those, them" |
2 | "bob" | "this, that, those" |
3 | "carol" | "this, banana, cupcake" |
4 | "dave" | "other, unrelated, words" |
5 | "errol" | "those, them, fishstick" |
When viewing an article I want to also show some of the most related articles, based on the amount of categories in common.
For example, if I was viewing the Alice article I would want to pick out (in order of preference) Bob (3 cats in common), Errol(2), Carol(1).
I am aware that this would be easier if the data was normalised (I could for example do this) but unfortunately that's not really an option.
I ended up creating a couple of extra tables and populating them with properly normalized data every time something was saved. These run alongside the existing tables so it's not the cleanest of solutions but it works and the query speeds are excellent.

Database modeling for international and multilingual purposes

I need to create a large scale DB Model for a web application that will be multilingual.
One doubt that I've every time I think on how to do it is how I can resolve having multiple translations for a field. A case example.
The table for language levels, that administrators can edit from the backend, can have multiple items like: basic, advance, fluent, mattern... In the near future probably it will be one more type. The admin goes to the backend and add a new level, it will sort it in the right position.. but how I handle all the translations for the final users?
Another problem with internationalization of a database is that probably for user studies can differ from USA to UK to DE... in every country they will have their levels (that probably it will be equivalent to another but finally, different). And what about billing?
How you model this in a big scale?
Here is the way I would design the database:
Visualization by DB Designer Fork
The i18n table only contains a PK, so that any table just has to reference this PK to internationalize a field. The table translation is then in charge of linking this generic ID with the correct list of translations.
locale.id_locale is a VARCHAR(5) to manage both of en and en_US ISO syntaxes.
currency.id_currency is a CHAR(3) to manage the ISO 4217 syntax.
You can find two examples: page and newsletter. Both of these admin-managed entites need to internationalize their fields, respectively title/description and subject/content.
Here is an example query:
select
t_subject.tx_translation as subject,
t_content.tx_translation as content
from newsletter n
-- join for subject
inner join translation t_subject
on t_subject.id_i18n = n.i18n_subject
-- join for content
inner join translation t_content
on t_content.id_i18n = n.i18n_content
inner join locale l
-- condition for subject
on l.id_locale = t_subject.id_locale
-- condition for content
and l.id_locale = t_content.id_locale
-- locale condition
where l.id_locale = 'en_GB'
-- other conditions
and n.id_newsletter = 1
Note that this is a normalized data model. If you have a huge dataset, maybe you could think about denormalizing it to optimize your queries. You can also play with indexes to improve the queries performance (in some DB, foreign keys are automatically indexed, e.g. MySQL/InnoDB).
Some previous StackOverflow questions on this topic:
What are best practices for multi-language database design?
What's the best database structure to keep multilingual data?
Schema for a multilanguage database
How to use multilanguage database schema with ORM?
Some useful external resources:
Creating multilingual websites: Database Design
Multilanguage database design approach
Propel Gets I18n Behavior, And Why It Matters
The best approach often is, for every existing table, create a new table into which text items are moved; the PK of the new table is the PK of the old table together with the language.
In your case:
The table for language levels, that administrators can edit from the backend, can have multiple items like: basic, advance, fluent, mattern... In the near future probably it will be one more type. The admin goes to the backend and add a new level, it will sort it in the right position.. but how I handle all the translations for the final users?
Your existing table probably looks something like this:
+----+-------+---------+
| id | price | type |
+----+-------+---------+
| 1 | 299 | basic |
| 2 | 299 | advance |
| 3 | 399 | fluent |
| 4 | 0 | mattern |
+----+-------+---------+
It then becomes two tables:
+----+-------+ +----+------+-------------+
| id | price | | id | lang | type |
+----+-------+ +----+------+-------------+
| 1 | 299 | | 1 | en | basic |
| 2 | 299 | | 2 | en | advance |
| 3 | 399 | | 3 | en | fluent |
| 4 | 0 | | 4 | en | mattern |
+----+-------+ | 1 | fr | élémentaire |
| 2 | fr | avance |
| 3 | fr | couramment |
: : : :
+----+------+-------------+
Another problem with internationalitzation of a database is that probably for user studies can differ from USA to UK to DE... in every country they will have their levels (that probably it will be equivalent to another but finally, different). And what about billing?
All localisation can occur through a similar approach. Instead of just moving text fields to the new table, you could move any localisable fields - only those which are common to all locales will remain in the original table.