Best practice to store array-like data in MySQL or similar database? - mysql

I have two tables that I want to relate to each other. The issue is any product can have n-number of POs, so individual columns wouldn't work in a traditional DB.
I was thinking of using JSON fields to store an array, or using XML. I would need to insert additional POs, so I'm concerned with the lack of editing support for XML.
What is the standard way of handling n-number of attributes in a single field?
|id | Product | Work POs|
| - | ------- | ------- |
| 1 | bicycle | 002,003 |
| 2 | unicycle| 001,003 |
|PO | Job |
|-- | ---------------- |
|001|Install 1 wheel |
|002|Install 2 wheels |
|003|Install 2 seats |

The standard way to store multi-valued attributes in a relational database is to create another table, so you can store one value per row. This makes it easy to add or remove one new value, or to search for a specific value, or to count PO's per product, and many other types of queries.
id
Product
1
bicycle
2
unicycle
product_id
PO
1
002
1
003
2
001
2
003
PO
Job
001
Install 1 wheel
002
Install 2 wheels
003
Install seat
I also recommend reading my answer to Is storing a delimited list in a database column really that bad?

In some case you really need store array-like data in one field.
In MySQL 5.7.8+ you can use JSON type datafield:
ALTER TABLE `some_table` ADD `po` JSON NOT NULL`;
UPDATE `some_table` SET `po` = '[002,003]' WHERE `some_table`.`id` = 1;
See examples here: https://sebhastian.com/mysql-array/

Related

MySQL - Multiple Rows or JSON

I'm building an app in Laravel and have a design question regarding my MySQL db.
Currently I have a table which defines the skills for all the default characters in my game. Because the traits are pulled from a pool of skills, and have a variable number, one of my tables looks something like this:
+----+--------+---------+-----------+
| ID | CharID | SkillID | SkillScore|
+----+--------+---------+-----------+
| 1 | 1 | 15 | 200 |
| 2 | 1 | 16 | 205 |
| 3 | 1 | 12 | 193 |
| 4 | 2 | 15 | 180 |
+----+--------+---------+-----------+
Note the variable number of rows for any given CharID. With my Base Characters entered, I'm at just over 300 rows.
My issue is storing User's copies of their (customized)characters. I don't think storing 300+ rows per user makes sense. Should I store this data in a JSON Blob in another table? Should I be looking at a NoSQL solution like Mongo? Appreciate the guidance.
NB: The entire app centers around using the character's different skills. Mostly reporting from them, but users will also be able to update their SkillScore (likely a few times a week).
ps. Should I consider breaking each character out into their own table and tracking user's characters that way? Users won't be able to add/remove the skills from characters, only update them.
TIA.
Your pivot table looks good to me.
I'd consider dropping the ID column (unless you need it), and using a composite primary key:
PRIMARY_KEY(CharID, SkillID)
Primary keys are indexed so you will get efficient lookups.
As for your other suggestions, if you store this in a JSON column, you'll lose the ability to perform joins, and will therefore end up executing more queries.

Many to many relationship with different data types

I am trying to create a Database for different types of events. Each event has arbitrary, user created, properties of different types. For example "number of guests", "special song to play", "time the clown arrives". Not every event has a clown but one user could still have different events with a clown. My basic concept is
propID | name | type
------ | ---- | -----
1 |#guest| number
2 |clown | time
and another table with every event with a unique eventID. The Problem is that a simple approach like
eventID | propID | value
------ | ------ | -----
1 | 1 | 20
1 | 2 | 10:00
does not really work because of the different DataTypes.
Now I thought about some possible solutions but I don't really know which one is best, or if there is an even better solution?
1. I store all values as strings and use the datatype in the property table. I think this is called EAV and is not considered good practice.
2. There are only a limited amount of meaningful datatypes, which could lead to a table like this:
eventID | propID | stringVal | timeVal | numberVal
------ | ------ | --------- | ------- | --------
1 | 1 | null | null | 20
1 | 2 | null | 10:00 | null
3. Use the possible datatypes for multiple tables like:
propDateEvent propNumberEvent
-------------------------- --------------------------
eventID | propId | value eventID | propId | value
--------|--------|-------- --------|--------|--------
1 | 2 | 10:00 1 | 1 | 20
Somehow I think every solution has its ups and downs. #1 feels like the simplest but least robust. #3 seems like the cleanest solution, but pretty complicated if I wanted to add e.g. a priority for the properties per event.
All the options you propose are variations on entity/attribute/value or EAV. The basic concept is that you store entities (in your case events), their attributes (#guest, clown), and the values of those attributes as rows, not columns.
There are lots of EAV questions on Stack Overflow, discussing the benefits and drawbacks.
Your 3 options provide different ways of storing the data - but you don't address the ways in which you want to retrieve that data, or verify the data you're about to store. This is the biggest problem with EAV.
How will you enforce the rule that all events must have "#guests" as a mandatory field (for instance)? How will you find all events that have at least 20 guest, and no clown booke? How will you show a list of events between 2 dates, ordered by date, and number of guests?
If those requirements don't matter to you, EAV is fine. If they do, consider using a document to store this user-defined data (JSON or XML). MySQL can query those documents natively, you can enforce business logic much more easily, and you won't have to write horribly convoluted queries for even the simplest business cases.

MySQL Table structure: Multiple attributes for each item

I wanted to ask you which could be the best approach creating my MySQL database structure having the following case.
I've got a table with items, which is not needed to describe as the only important field here is the ID.
Now, I'd like to be able to assign some attributes to each item - by its ID, of course. But I don't know exactly how to do it, as I'd like to keep it dynamic (so, I do not have to modify the table structure if I want to add a new attribute type).
What I think
I think - and, in fact, is the structure that I have right now - that I can make a table items_attributes with the following structure:
+----+---------+----------------+-----------------+
| id | item_id | attribute_name | attribute_value |
+----+---------+----------------+-----------------+
| 1 | 1 | place | Barcelona |
| 2 | 2 | author_name | Matt |
| 3 | 1 | author_name | Kate |
| 4 | 1 | pages | 200 |
| 5 | 1 | author_name | John |
+----+---------+----------------+-----------------+
I put data as an example for you to see that those attributes can be repeated (it's not a relation 1 to 1).
The problem with this approach
I have the need to make some querys, some of them for statistic purpouses, and if I have a lot of attributes for a lot of items, this can be a bit slow.
Furthermore - maybe because I'm not an expert on MySQL - everytime I want to make a search and find "those items that have 'place' = 'Barcelona' AND 'author_name' = 'John'", I end up having to make multiple JOINs for every condition.
Repeating the example before, my query would end up like:
SELECT *
FROM items its
JOIN items_attributes attr
ON its.id = attr.item_id
AND attr.attribute_name = 'place'
AND attr.attribute_value = 'Barcelona'
AND attr.attribute_name = 'author_name'
AND attr.attribute_value = 'John';
As you can see, this will return nothing, as an attribute_name cannot have two values at once in the same row, and an OR condition would not be what I'm searching for as the items MUST have both attributes values as stated.
So the only possibility is to make a JOIN on the same repeated table for every condition to search, which I think it's very slow to perform when there are a lot of terms to search for.
What I'd like
As I said, I'd like to be able to keep the attributes types dynamical, so by adding a new input on 'attribute_name' would be enough, without having to add a new column to a table. Also, as they are 1-N relationship, they cannot be put in the 'items' table as new columns.
If the structure, in your opinion, is the only one that can acheive my interests, if you could light up some ideas so the search queries are not a ton of JOINs it would be great, too.
I don't know if it's quite hard to get it as I've been struggling my head until now and I haven't come up with a solution. Hope you guys can help me with that!
In any case, thank you for your time and attention!
Kind regards.
You're thinking in the right direction, the direction of normalization. The normal for you would like to have in your database is the fifth normal form (or sixth, even). Stackoverflow on this matter.
Table Attribute:
+----+----------------+
| id | attribute_name |
+----+----------------+
| 1 | place |
| 2 | author name |
| 3 | pages |
+----+----------------+
Table ItemAttribute
+--------+----------------+
| item_id| attribute_id |
+--------+----------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
+--------+----------------+
So for each property of an object (item in this case) you create a new table and name it accordingly. It requires lots of joins, but your database will be highly flexible and organized. Good luck!
In my Opinion it should be something like this, i know there are a lot of table, but actually it normilizes your DB
Maybe that is why because i cant understant where you get your att_value column, and what should contains this columns

Database modeling for international and multilingual purposes

I need to create a large scale DB Model for a web application that will be multilingual.
One doubt that I've every time I think on how to do it is how I can resolve having multiple translations for a field. A case example.
The table for language levels, that administrators can edit from the backend, can have multiple items like: basic, advance, fluent, mattern... In the near future probably it will be one more type. The admin goes to the backend and add a new level, it will sort it in the right position.. but how I handle all the translations for the final users?
Another problem with internationalization of a database is that probably for user studies can differ from USA to UK to DE... in every country they will have their levels (that probably it will be equivalent to another but finally, different). And what about billing?
How you model this in a big scale?
Here is the way I would design the database:
Visualization by DB Designer Fork
The i18n table only contains a PK, so that any table just has to reference this PK to internationalize a field. The table translation is then in charge of linking this generic ID with the correct list of translations.
locale.id_locale is a VARCHAR(5) to manage both of en and en_US ISO syntaxes.
currency.id_currency is a CHAR(3) to manage the ISO 4217 syntax.
You can find two examples: page and newsletter. Both of these admin-managed entites need to internationalize their fields, respectively title/description and subject/content.
Here is an example query:
select
t_subject.tx_translation as subject,
t_content.tx_translation as content
from newsletter n
-- join for subject
inner join translation t_subject
on t_subject.id_i18n = n.i18n_subject
-- join for content
inner join translation t_content
on t_content.id_i18n = n.i18n_content
inner join locale l
-- condition for subject
on l.id_locale = t_subject.id_locale
-- condition for content
and l.id_locale = t_content.id_locale
-- locale condition
where l.id_locale = 'en_GB'
-- other conditions
and n.id_newsletter = 1
Note that this is a normalized data model. If you have a huge dataset, maybe you could think about denormalizing it to optimize your queries. You can also play with indexes to improve the queries performance (in some DB, foreign keys are automatically indexed, e.g. MySQL/InnoDB).
Some previous StackOverflow questions on this topic:
What are best practices for multi-language database design?
What's the best database structure to keep multilingual data?
Schema for a multilanguage database
How to use multilanguage database schema with ORM?
Some useful external resources:
Creating multilingual websites: Database Design
Multilanguage database design approach
Propel Gets I18n Behavior, And Why It Matters
The best approach often is, for every existing table, create a new table into which text items are moved; the PK of the new table is the PK of the old table together with the language.
In your case:
The table for language levels, that administrators can edit from the backend, can have multiple items like: basic, advance, fluent, mattern... In the near future probably it will be one more type. The admin goes to the backend and add a new level, it will sort it in the right position.. but how I handle all the translations for the final users?
Your existing table probably looks something like this:
+----+-------+---------+
| id | price | type |
+----+-------+---------+
| 1 | 299 | basic |
| 2 | 299 | advance |
| 3 | 399 | fluent |
| 4 | 0 | mattern |
+----+-------+---------+
It then becomes two tables:
+----+-------+ +----+------+-------------+
| id | price | | id | lang | type |
+----+-------+ +----+------+-------------+
| 1 | 299 | | 1 | en | basic |
| 2 | 299 | | 2 | en | advance |
| 3 | 399 | | 3 | en | fluent |
| 4 | 0 | | 4 | en | mattern |
+----+-------+ | 1 | fr | élémentaire |
| 2 | fr | avance |
| 3 | fr | couramment |
: : : :
+----+------+-------------+
Another problem with internationalitzation of a database is that probably for user studies can differ from USA to UK to DE... in every country they will have their levels (that probably it will be equivalent to another but finally, different). And what about billing?
All localisation can occur through a similar approach. Instead of just moving text fields to the new table, you could move any localisable fields - only those which are common to all locales will remain in the original table.

Reformatting MySQL table as grid

I have a table for holding translations. It is laid out as follows:
id | iso | token | content
-----------------------------------------------
1 | GB | test1 | Test translation 1 (English)
2 | GB | test2 | Test translation 2 (English)
3 | FR | test1 | Test translation 1 (French)
4 | FR | test2 | Test translation 2 (French)
// etc
For the translation management tool to go along with the table I need to output it in something more like a spreadsheet grid:
token | GB | FR | (other languages) -->
-------------------------------------------------------------------------------------------
test1 | Test translation 1 (English) | Test translation 1 (French) |
test2 | Test translation 1 (French) | Test translation 2 (French) |
(other tokens) | | |
| | | |
| | | |
V | | |
I thought this would be easy, but it turned out to be far more difficult than I expected!
After a lot of searching and digging around I did find group_concat, which for the specific case above I can get to work and generate the output I'm looking for:
select
token,
group_concat(if (iso = 'FR', content, NULL)) as 'FR',
group_concat(if (iso = 'GB', content, NULL)) as 'GB'
from
translations
group by token;
However, this is, of course, totally inflexible. It only works for the two languages I have specified so far. The instant I add a new language I have to manually update the query to take it into account.
I need a generalized version of the query above, that will be able to generate the correct table output without having to know anything about the data stored in the source table.
Some sources claim you can't easily do this in MySQL, but I'm sure it must be possible. After all, this is the sort of thing databases exist for in the first place.
Is there a way of doing this? If so, how?
Because of mysql limitations, I need to do something like this on query side and in 1 query, I would do it like this:
query:
select token, group_concat(concat(iso,'|',content)) as contents
from translations
group by token
"token";"contents"
"test1";"GB|Test translation 1 (English),FR|Test translation 1
(French),IT|Test translation 1 (Italian)" "test2";"GB|Test translation
2 (English),FR|Test translation 2 (French),IT|Test translation 2
(Italian)"
Than While I am binding rows I could split from comma to rows and split from pipe for header..
What you seek is often called a dynamic crosstab wherein you dynamically determine the columns in the output. Fundamentally, relational databases are not designed to dynamically determine the schema. The best way to achieve what you want is to use a middle-tier component to build the crosstab SQL statement similar to what you have shown and then execute that.