Many to many relationship with different data types - mysql

I am trying to create a Database for different types of events. Each event has arbitrary, user created, properties of different types. For example "number of guests", "special song to play", "time the clown arrives". Not every event has a clown but one user could still have different events with a clown. My basic concept is
propID | name | type
------ | ---- | -----
1 |#guest| number
2 |clown | time
and another table with every event with a unique eventID. The Problem is that a simple approach like
eventID | propID | value
------ | ------ | -----
1 | 1 | 20
1 | 2 | 10:00
does not really work because of the different DataTypes.
Now I thought about some possible solutions but I don't really know which one is best, or if there is an even better solution?
1. I store all values as strings and use the datatype in the property table. I think this is called EAV and is not considered good practice.
2. There are only a limited amount of meaningful datatypes, which could lead to a table like this:
eventID | propID | stringVal | timeVal | numberVal
------ | ------ | --------- | ------- | --------
1 | 1 | null | null | 20
1 | 2 | null | 10:00 | null
3. Use the possible datatypes for multiple tables like:
propDateEvent propNumberEvent
-------------------------- --------------------------
eventID | propId | value eventID | propId | value
--------|--------|-------- --------|--------|--------
1 | 2 | 10:00 1 | 1 | 20
Somehow I think every solution has its ups and downs. #1 feels like the simplest but least robust. #3 seems like the cleanest solution, but pretty complicated if I wanted to add e.g. a priority for the properties per event.

All the options you propose are variations on entity/attribute/value or EAV. The basic concept is that you store entities (in your case events), their attributes (#guest, clown), and the values of those attributes as rows, not columns.
There are lots of EAV questions on Stack Overflow, discussing the benefits and drawbacks.
Your 3 options provide different ways of storing the data - but you don't address the ways in which you want to retrieve that data, or verify the data you're about to store. This is the biggest problem with EAV.
How will you enforce the rule that all events must have "#guests" as a mandatory field (for instance)? How will you find all events that have at least 20 guest, and no clown booke? How will you show a list of events between 2 dates, ordered by date, and number of guests?
If those requirements don't matter to you, EAV is fine. If they do, consider using a document to store this user-defined data (JSON or XML). MySQL can query those documents natively, you can enforce business logic much more easily, and you won't have to write horribly convoluted queries for even the simplest business cases.

Related

MySQL - Multiple Rows or JSON

I'm building an app in Laravel and have a design question regarding my MySQL db.
Currently I have a table which defines the skills for all the default characters in my game. Because the traits are pulled from a pool of skills, and have a variable number, one of my tables looks something like this:
+----+--------+---------+-----------+
| ID | CharID | SkillID | SkillScore|
+----+--------+---------+-----------+
| 1 | 1 | 15 | 200 |
| 2 | 1 | 16 | 205 |
| 3 | 1 | 12 | 193 |
| 4 | 2 | 15 | 180 |
+----+--------+---------+-----------+
Note the variable number of rows for any given CharID. With my Base Characters entered, I'm at just over 300 rows.
My issue is storing User's copies of their (customized)characters. I don't think storing 300+ rows per user makes sense. Should I store this data in a JSON Blob in another table? Should I be looking at a NoSQL solution like Mongo? Appreciate the guidance.
NB: The entire app centers around using the character's different skills. Mostly reporting from them, but users will also be able to update their SkillScore (likely a few times a week).
ps. Should I consider breaking each character out into their own table and tracking user's characters that way? Users won't be able to add/remove the skills from characters, only update them.
TIA.
Your pivot table looks good to me.
I'd consider dropping the ID column (unless you need it), and using a composite primary key:
PRIMARY_KEY(CharID, SkillID)
Primary keys are indexed so you will get efficient lookups.
As for your other suggestions, if you store this in a JSON column, you'll lose the ability to perform joins, and will therefore end up executing more queries.

MySQL Table structure: Multiple attributes for each item

I wanted to ask you which could be the best approach creating my MySQL database structure having the following case.
I've got a table with items, which is not needed to describe as the only important field here is the ID.
Now, I'd like to be able to assign some attributes to each item - by its ID, of course. But I don't know exactly how to do it, as I'd like to keep it dynamic (so, I do not have to modify the table structure if I want to add a new attribute type).
What I think
I think - and, in fact, is the structure that I have right now - that I can make a table items_attributes with the following structure:
+----+---------+----------------+-----------------+
| id | item_id | attribute_name | attribute_value |
+----+---------+----------------+-----------------+
| 1 | 1 | place | Barcelona |
| 2 | 2 | author_name | Matt |
| 3 | 1 | author_name | Kate |
| 4 | 1 | pages | 200 |
| 5 | 1 | author_name | John |
+----+---------+----------------+-----------------+
I put data as an example for you to see that those attributes can be repeated (it's not a relation 1 to 1).
The problem with this approach
I have the need to make some querys, some of them for statistic purpouses, and if I have a lot of attributes for a lot of items, this can be a bit slow.
Furthermore - maybe because I'm not an expert on MySQL - everytime I want to make a search and find "those items that have 'place' = 'Barcelona' AND 'author_name' = 'John'", I end up having to make multiple JOINs for every condition.
Repeating the example before, my query would end up like:
SELECT *
FROM items its
JOIN items_attributes attr
ON its.id = attr.item_id
AND attr.attribute_name = 'place'
AND attr.attribute_value = 'Barcelona'
AND attr.attribute_name = 'author_name'
AND attr.attribute_value = 'John';
As you can see, this will return nothing, as an attribute_name cannot have two values at once in the same row, and an OR condition would not be what I'm searching for as the items MUST have both attributes values as stated.
So the only possibility is to make a JOIN on the same repeated table for every condition to search, which I think it's very slow to perform when there are a lot of terms to search for.
What I'd like
As I said, I'd like to be able to keep the attributes types dynamical, so by adding a new input on 'attribute_name' would be enough, without having to add a new column to a table. Also, as they are 1-N relationship, they cannot be put in the 'items' table as new columns.
If the structure, in your opinion, is the only one that can acheive my interests, if you could light up some ideas so the search queries are not a ton of JOINs it would be great, too.
I don't know if it's quite hard to get it as I've been struggling my head until now and I haven't come up with a solution. Hope you guys can help me with that!
In any case, thank you for your time and attention!
Kind regards.
You're thinking in the right direction, the direction of normalization. The normal for you would like to have in your database is the fifth normal form (or sixth, even). Stackoverflow on this matter.
Table Attribute:
+----+----------------+
| id | attribute_name |
+----+----------------+
| 1 | place |
| 2 | author name |
| 3 | pages |
+----+----------------+
Table ItemAttribute
+--------+----------------+
| item_id| attribute_id |
+--------+----------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
+--------+----------------+
So for each property of an object (item in this case) you create a new table and name it accordingly. It requires lots of joins, but your database will be highly flexible and organized. Good luck!
In my Opinion it should be something like this, i know there are a lot of table, but actually it normilizes your DB
Maybe that is why because i cant understant where you get your att_value column, and what should contains this columns

More efficient to have more columns or more rows?

I'm currently redesigning a database which could contain a lot of data - I have the option to either include a number of different columns in the database or use a lot of rows instead. It's probably easier if I did some kind of outline below:
item_id | user_id | title | description | content | category | template | comments | status
-------------------------------------------------------------------------------------------
1 | 1 | ABC | DEF | GHI | 1 | default | 1 | 1
2 | 1 | ZYX | | QWE | 2 | default | 0 | 1
3 | 1 | A | | RTY | 2 | default | 0 | 0
4 | 2 | ABC | DEF | GHI | 3 | custom | 1 | 1
5 | 2 | CBA | | GHI | 3 | custom | 1 | 1
Versus something in the following structure:
item_id | user_id | attribute | value
---------------------------------------
1 | 1 | title | ABC
1 | 1 | description | DEF
1 | 1 | content | GHI
... | ... | ... | ...
I may want to create additional attributes in the future (50 for arguments sake) - so there could be a lot of empty cells if using multiple columns. The attribute names would be reused, where possible, across different types of content - say a blog entry, event, and gallery - title would easily be reused.
So my question is, is it more efficient to use multiple columns or multiple rows - in terms of query speed and disk space. Or would you instead recommend relationship tables, so there's a table for blogs, a table for events, etc. I'm just trying to come up with an easily expandable solution, where I ideally do not want to create a table for every kind of content as I'm thinking of developers creating new kinds of content via an app/API system (with attributes being tightly controlled).
Supplementary Question if Multiple Rows
How could I, in MySQL, convert multiple rows into a usable column format (I guess temporary tables) - so I could do some filtering by content type, as an example.
Basically, mysql has a variable row length as long as one does not change the on a per table level. Thus, empty cols will not use any space (well, almost).
But with blobs or text columns, it might be better to normalize those, as these may have large data to store and this needs to be read / skipped every time a table is scanned. Even if the column is not in the result set and you're doing queries outside of an index, it will take it's time on a large amount of rows.
As a good practice I think it will be fast to put all administrative and often used cols in one table and normalize all the rest. A kind of "vertical" design as in your second example will be complex to read and as soon as you work with temporary tables you will run in to performance issues sooner or later.
For a traditional row-based store, the cost of spooling through rows will depend on their width, so scanning a table with wide rows will take longer than one with narrow rows.
That said, it you're using an index to locate the rows that are of interest, this won't be so much of an issue.
If you normalise your data by replacing columns with keys to rows in other tables, you can reduce the amount of storage if the linked tables end up being significantly smaller than the original table, however any query will need to include the cost of required joins into the related table.
As with all these things, it's a balancing act that depends on your requirements, but understanding what's going on under the hood can certainly help you to make more informed decisions.
This question is very difficult to answer as it all comes down to what you are looking for and how your database will grow in size and complexity over time. I find the best way to answer these types of questions is to read case studies from other successful sites. For example Reddit would be a case study where they use a lot of rows but very little tables and/or columns. The article is here and a question on it is here.
There is also the option of exploring a NoSQL solution which may be more applicable to what you are trying to achieve.
Google case studies of sites that would have a similar structure to your own and see how they accomplished it as they have most likely encountered all the issues you will and already overcome them.

MySql Prevent/track duplicate field across multiple fields

I'm looking for an easy way to check across multiple part tables to determine if a given part number is already present before adding it to a given table.
The current best idea I have come up with is a secondary table that simply lists every PN from all tables in a single column with a unique key; however I was wondering if there is a way to do it without creating a new table and index?
For the visual learner types, I have forty-some tables that more or less follow this pattern:
Table 1
| id | PN | Other Columns |
----------------------------------------------
| 1 | SomePn | ... ... ... ... |
...
Table 2
| id | PN | Still Other Columns |
--------------------------------------------------
| 1 | OtherPn | ... ... ... ... ... |
...
and about forty more as above, with up to 50 columns and up to 8 million records per table.
The goal is, whether through software (Java) or MySql rejecting the records, to prevent duplicate part numbers from creeping in across multiple tables. Is a master PN table the only possible or reasonable solution?
I know that the data structure is not the best design, and a rework is in progress, but I would like to know some suggested best practices as well as suggested solutions for this problem.
Adding a table is your best option.
Table 1:
| t1_id | PN | Other Columns |
----------------------------------------------
| 1 | pn_id | ... ... ... ... |
...
Table 2:
| t2_id | PN | Other Columns |
----------------------------------------------
| 1 | pn_id | ... ... ... ... |
...
Table 3:
| pn_id | PN |
------------------
| 1 | ## |
...
Although this may not be the easiest to implement solution in your case, it is the best overall solution, as you will have no scaling problems now or in the future. If you instead opted for a solution which checked all the tables for duplicate part numbers on update/creation, this would take longer and longer as your tables got bigger.
If you could guarantee that they would never get bigger or you would never add part numbers, you could probably get away with just writing a script to check for duplicates once and not worry about another table at all. But, in the long run, you'll want to add another table just to keep track of the part numbers.

Find records where CSV column values match

I am making a website. In the database I have a table of articles that kind of looks like this:
id | name | cats | etc.
------------------------------------------------------
1 | "alice" | "this, that, those, them" |
2 | "bob" | "this, that, those" |
3 | "carol" | "this, banana, cupcake" |
4 | "dave" | "other, unrelated, words" |
5 | "errol" | "those, them, fishstick" |
When viewing an article I want to also show some of the most related articles, based on the amount of categories in common.
For example, if I was viewing the Alice article I would want to pick out (in order of preference) Bob (3 cats in common), Errol(2), Carol(1).
I am aware that this would be easier if the data was normalised (I could for example do this) but unfortunately that's not really an option.
I ended up creating a couple of extra tables and populating them with properly normalized data every time something was saved. These run alongside the existing tables so it's not the cleanest of solutions but it works and the query speeds are excellent.