MySQL Table structure: Multiple attributes for each item - mysql

I wanted to ask you which could be the best approach creating my MySQL database structure having the following case.
I've got a table with items, which is not needed to describe as the only important field here is the ID.
Now, I'd like to be able to assign some attributes to each item - by its ID, of course. But I don't know exactly how to do it, as I'd like to keep it dynamic (so, I do not have to modify the table structure if I want to add a new attribute type).
What I think
I think - and, in fact, is the structure that I have right now - that I can make a table items_attributes with the following structure:
+----+---------+----------------+-----------------+
| id | item_id | attribute_name | attribute_value |
+----+---------+----------------+-----------------+
| 1 | 1 | place | Barcelona |
| 2 | 2 | author_name | Matt |
| 3 | 1 | author_name | Kate |
| 4 | 1 | pages | 200 |
| 5 | 1 | author_name | John |
+----+---------+----------------+-----------------+
I put data as an example for you to see that those attributes can be repeated (it's not a relation 1 to 1).
The problem with this approach
I have the need to make some querys, some of them for statistic purpouses, and if I have a lot of attributes for a lot of items, this can be a bit slow.
Furthermore - maybe because I'm not an expert on MySQL - everytime I want to make a search and find "those items that have 'place' = 'Barcelona' AND 'author_name' = 'John'", I end up having to make multiple JOINs for every condition.
Repeating the example before, my query would end up like:
SELECT *
FROM items its
JOIN items_attributes attr
ON its.id = attr.item_id
AND attr.attribute_name = 'place'
AND attr.attribute_value = 'Barcelona'
AND attr.attribute_name = 'author_name'
AND attr.attribute_value = 'John';
As you can see, this will return nothing, as an attribute_name cannot have two values at once in the same row, and an OR condition would not be what I'm searching for as the items MUST have both attributes values as stated.
So the only possibility is to make a JOIN on the same repeated table for every condition to search, which I think it's very slow to perform when there are a lot of terms to search for.
What I'd like
As I said, I'd like to be able to keep the attributes types dynamical, so by adding a new input on 'attribute_name' would be enough, without having to add a new column to a table. Also, as they are 1-N relationship, they cannot be put in the 'items' table as new columns.
If the structure, in your opinion, is the only one that can acheive my interests, if you could light up some ideas so the search queries are not a ton of JOINs it would be great, too.
I don't know if it's quite hard to get it as I've been struggling my head until now and I haven't come up with a solution. Hope you guys can help me with that!
In any case, thank you for your time and attention!
Kind regards.

You're thinking in the right direction, the direction of normalization. The normal for you would like to have in your database is the fifth normal form (or sixth, even). Stackoverflow on this matter.
Table Attribute:
+----+----------------+
| id | attribute_name |
+----+----------------+
| 1 | place |
| 2 | author name |
| 3 | pages |
+----+----------------+
Table ItemAttribute
+--------+----------------+
| item_id| attribute_id |
+--------+----------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
+--------+----------------+
So for each property of an object (item in this case) you create a new table and name it accordingly. It requires lots of joins, but your database will be highly flexible and organized. Good luck!

In my Opinion it should be something like this, i know there are a lot of table, but actually it normilizes your DB
Maybe that is why because i cant understant where you get your att_value column, and what should contains this columns

Related

How do I handle linking a record to another table?

I'm very new to Access and my teacher is... hard to follow. So I feel like there's something pretty basic I'm probably missing here. I think the biggest problem I'm having with this question is that I'm struggling to find the words to communicate what I actually need to do, which is really putting a damper on my google-fu.
In terms of what I think I want to do, I want to make a record reference another table in its entirety.
Main
+----+-------+--------+-------+----------------------------+
| PK | Name | Phone# | [...] | Cards |
+----+-------+--------+-------+----------------------------+
| 1 | Bob | [...] | [...] | < Reference to 2nd table > |
| 2 | Harry | [...] | [...] | [...] |
| 3 | Ted | [...] | [...] | [...] |
+----+-------+--------+-------+----------------------------+
Bob's Cards
+----+-------------+-----------+-------+-------+-------+
| PK | Card Name | Condition | Year | Price | [...] |
+----+-------------+-----------+-------+-------+-------+
| 1 | Big Slugger | Mint | 1987 | .20 | [...] |
| 2 | Quick Pete | [...] | [...] | [...] | [...] |
| 3 | Mac Donald | [...] | [...] | [...] | [...] |
+----+-------------+-----------+-------+-------+-------+
This would necessitate an entire new table for each record in the main table though, if it's even possible.
But the only alternative solution I can think of is to add 'Card1, Condition1, [...], Card2, Condition2, [...], Card3, [...]' fields to the main table and having to add another set of fields any time someone increases the maximum number of cards stored.
So I'm sort of left believing there is some other approach I should be taking that our teacher has failed to properly explain. We haven't even touched on forms and reports yet so I don't need to worry about working them in.
Any pointers?
(Also, the entirety of this data and structure is only a rough facsimile of my own, as I'd rather learn how to do it and apply it myself than be like 'here's my data, pls fix.')
Third option successfully found in comments by the helpful Minty.
This depends on a number of things, however to keep it simple you
would normally add one field to the cards table, with an number data
type called CardOwnerID. In your example it would be 1 indicating Bob.
This is known as a foreign key. (FK) - However if you have a table of
cards and multiple possible owners then you need a third table - a
Junction table. This would consist of the Main Person ID and the Card
ID. – Minty

Several separated tables vs one integrated table with an additional column?

I have 3 tables which all of them have the same structure:
// table1 // table2 // table3
+----+------+ +----+------+ +----+------+
| id | name | | id | name | | id | name |
+----+------+ +----+------+ +----+------+
| 1 | jack | | 1 | ali | | 1 | peter|
+----+------+ +----+------+ +----+------+
Well, I want to know, my current structure is better or an integrated table along with one additional column? something like this:
+----+------+-------+
| id | name | which |
+----+------+-------+
| 1 | jack | table1|
| 2 | ali | table2|
| 3 | peter| table3|
+----+------+-------+
Note: It should be noted that in the current structure (several tables) my query is something like this:
select id, name from table1
union all
select id, name from table2
union all
select id, name from table3
Now I want to know converting those several tables to one table and add a new column is better or not? (I think that new column is kinda overload, is it true?)
This has practical consequences and also philosophical consequences. From a practical point of view, it's very hard to know without knowing a lot more about how the data is going to be used. what's the read to write ratio for this data? How often is data from two or more tables going to be selected in a single query? If you have to do a UNION to get all the data gathered, it's both slower and more cumbersome.
I prefer the philosophical approach, starting with the subject matter. Is there only one kind of entity here, or are there three different entitites that all happen to have the same attribute? That nearly always tells me whether to put them in the same table or not, and also turns out to give the right answer to the practical issue as well, most of the time.
I will say that I would be looking around for some better name for the values of the extra attribute. "table1", "table2" and "table3" seem terribly opaque to me. The subject matter should provide a clue here as well.
Edit:
now that I get the subject matter, I'm going to opine in favor of a single table. It is an opinion rather than a hard and fast rule. So it would be something like.
+----+-----------+----------+--------------+
| id | word | language |translation |
+----+-----------+----------+--------------+
| 1 | butterfly | Spanish | mariposa |
| 2 | butterfly | French | papillon |
| 3 | butterfly | Italian | farfalla |
| 4 | chair | Spanish | silla |
+----+-----------+----------+--------------+
If you are sure that all three tables will remain have common attributes then the option of single table is fine and if that may not persist then don't think about it.
This thread may help you more.

MySql Prevent/track duplicate field across multiple fields

I'm looking for an easy way to check across multiple part tables to determine if a given part number is already present before adding it to a given table.
The current best idea I have come up with is a secondary table that simply lists every PN from all tables in a single column with a unique key; however I was wondering if there is a way to do it without creating a new table and index?
For the visual learner types, I have forty-some tables that more or less follow this pattern:
Table 1
| id | PN | Other Columns |
----------------------------------------------
| 1 | SomePn | ... ... ... ... |
...
Table 2
| id | PN | Still Other Columns |
--------------------------------------------------
| 1 | OtherPn | ... ... ... ... ... |
...
and about forty more as above, with up to 50 columns and up to 8 million records per table.
The goal is, whether through software (Java) or MySql rejecting the records, to prevent duplicate part numbers from creeping in across multiple tables. Is a master PN table the only possible or reasonable solution?
I know that the data structure is not the best design, and a rework is in progress, but I would like to know some suggested best practices as well as suggested solutions for this problem.
Adding a table is your best option.
Table 1:
| t1_id | PN | Other Columns |
----------------------------------------------
| 1 | pn_id | ... ... ... ... |
...
Table 2:
| t2_id | PN | Other Columns |
----------------------------------------------
| 1 | pn_id | ... ... ... ... |
...
Table 3:
| pn_id | PN |
------------------
| 1 | ## |
...
Although this may not be the easiest to implement solution in your case, it is the best overall solution, as you will have no scaling problems now or in the future. If you instead opted for a solution which checked all the tables for duplicate part numbers on update/creation, this would take longer and longer as your tables got bigger.
If you could guarantee that they would never get bigger or you would never add part numbers, you could probably get away with just writing a script to check for duplicates once and not worry about another table at all. But, in the long run, you'll want to add another table just to keep track of the part numbers.

Find records where CSV column values match

I am making a website. In the database I have a table of articles that kind of looks like this:
id | name | cats | etc.
------------------------------------------------------
1 | "alice" | "this, that, those, them" |
2 | "bob" | "this, that, those" |
3 | "carol" | "this, banana, cupcake" |
4 | "dave" | "other, unrelated, words" |
5 | "errol" | "those, them, fishstick" |
When viewing an article I want to also show some of the most related articles, based on the amount of categories in common.
For example, if I was viewing the Alice article I would want to pick out (in order of preference) Bob (3 cats in common), Errol(2), Carol(1).
I am aware that this would be easier if the data was normalised (I could for example do this) but unfortunately that's not really an option.
I ended up creating a couple of extra tables and populating them with properly normalized data every time something was saved. These run alongside the existing tables so it's not the cleanest of solutions but it works and the query speeds are excellent.

Is this good Database Normalization?

I am a beginner at using mysql and I am trying to learn the best practices. I have setup a similar structure as seen below.
(main table that contains all unique entries) TABLE = 'main_content'
+------------+---------------+------------------------------+-----------+
| content_id | (deleted) | title | member_id |
+------------+---------------+------------------------------+-----------+
| 6 | | This is a very spe?cal t|_st | 1 |
+------------+---------------+------------------------------+-----------+
(Provides the total of each difficulty and joins id --> actual name) TABLE = 'difficulty'
+---------------+-------------------+------------------+
| difficulty_id | difficulty_name | difficulty_total |
+---------------+-------------------+------------------+
| 1 | Absolute Beginner | 1 |
| 2 | Beginner | 1 |
| 3 | Intermediate | 0 |
| 4 | Advanced | 0 |
| 5 | Expert | 0 |
+---------------+-------------------+------------------+
(This table ensures that multiple values can be inserted for each entry. For example,
this specific entry indicates that there are 2 difficulties associated with the submission)
TABLE = 'lookup_difficulty'
+------------+---------------+
| content_id | difficulty_id |
+------------+---------------+
| 6 | 1 |
| 6 | 2 |
+------------+---------------+
I am joining all of this into a readable query:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN difficulty ON difficulty.difficulty_id
IN (SELECT difficulty_id FROM main_content, lookup_difficulty WHERE lookup_difficulty.content_id = main_content.content_id )
INNER JOIN member ON member.member_id = main_content.member_id
The above works fine, but I am wondering if this is good practice. I practically followed the structure laid out Wikipedia's Database Normalization example.
When I run the above query using EXPLAIN, it says: 'Using where; Using join buffer' and also that I am using 2 DEPENDENT SUBQUERY (s) . I don't see any way to NOT use sub-queries to achieve the same affect, but then again I'm a noob so perhaps there is a better way....
The DB design looks fine - regarding your query, you could rewrite it exclusively with joins like:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN lookup_difficulty ON main_content.id = lookup_difficulty.content_id
INNER JOIN difficulty ON difficulty.id = lookup_difficulty.difficulty_id
INNER JOIN member ON member.member_id = main_content.member_id
If the lookup_difficulty provides a link between content and difficulty I would suggest you take out the difficulty_id column from your main_content table. Since you can have multiple lookups for each content_id, you would need some extra business logic to determine which difficulty_id to put in your main_content table (or multiple entries in the main_content table for each difficulty_id, but that goes against normalization practices). For ex. the biggest value / smallest value / random value. In either case, it does not make much sense.
Other than that the table looks fine.
Update
Saw you updated the table :)
Just as a side-note. Using IN can slow down your query (IN can cause a table-scan). In any case, it used to be that way, but I'm sure that these days the SQL compiler optimizes it pretty well.