I've been thinking a lot trying to figure out how to make a flexible system to hold many values trying to avoid the option of adding more fields to table in the future.
The only thing I could think off, is to make a table that will look like this:
CREATE TABLE IF NOT EXISTS `form_data` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`value` varchar(500) default NULL,
`form_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
)
+--------+---------+----------+--------+
| id | name | value | form_id|
+--------+---------+----------+--------+
| 100 |fullname | Steve | 1 |
+--------+---------+----------+--------+
| 101 |email |ab#c.com | 1 |
+--------+---------+----------+--------+
| 102 |fullname | John | 1 |
+--------+---------+----------+--------+
| 103 |email |cd#c.com | 1 |
+--------+---------+----------+--------+
This way, I could save each value in a row, and it would be as dynamic as I'd want.
I'm aware of the bad performance in a very long tables.
Now I've also figured out how to make the View(front end) of the values in a "Regular" table. Looks just like a normal table.
+--------+---------+----------+
| ID | Email |Fullname |
+--------+---------+----------+
| 1 |ab#c.com | Steve |
+--------+---------+----------+
| 2 |cd#c.com | John |
+--------+---------+----------+
Now I want to create a temporary table instead of PHP loops.
Any ideas how to make this work?
How do I create a stored procedure that will receive the form_id as parameter and will return a table like this?
Congratulations. You have re-invented the Entity-Attribute-Value model.
This model has existed for quite a while, but has proven to perform quite bad in a relational database system. You should probably not use it.
This answer makes a nice list of the pro and cons of EAV. The biggest pro is what you have discovered, it's easier to design. The biggest con is what I'm telling here: it's worse on performance.
Since usually you design far less often than your queries run, it might be better to think a bit longer while designing, and have faster queries.
NoSQL is considered more dynamic
Try the schemaless approach.
this may be a goot starting point:http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/
Related
The Problem
I landed a small gig to develop an online quoting system for an electronic distributor. He has roughly a half million parts - one little screw is considered a part, one little led, etc. So there are a LOT of parts.
One Important Note: This is only a RFQ ( Request for Quote ). There are no prices client-side, or totals, or anything to do with money. Just collecting a list of part numbers to send to my client.
I had to collect the part data from multiple sources (vendor website, scanned paper catalog, Excel spreadsheets, CSV files, and even a few JSON files. It was exhausting, but I got it done.
Results
Confusing at first. I had dozens of product categories, and some products had so many attributes that were not common to any other products. I could see this project getting very complicated, and given the fact I bid this job at $900 even, I had to simplify this somehow.
This is what I came up with, and received client approval.
Current Columns
+--------------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+--------------+------+-----+---------+-------+
| Datasheets | varchar(128) | YES | | NULL | |
| Image | varchar(85) | YES | | NULL | |
| DigiKey_Part_Number | varchar(46) | YES | | NULL | |
| Manufacturer_Part_Number | varchar(47) | YES | | NULL | |
| Manufacturer | varchar(49) | YES | | NULL | |
| Description | varchar(34) | YES | | NULL | |
| Quantity_Available | int(11) | YES | | NULL | |
| Minimum_Quantity | int(11) | YES | | NULL | |
+--------------------------+--------------+------+-----+---------+-------+
so all products will fit this page template (menu on bottom is error in screenshot):
Autocomplete Off The Table?
Early on in the design, I implemented a nice autocomplete feature:
BUT .. given the number of products in the table, is this even
practical anymore ???
FINAL PRODUCT COUNT: 223,347
What changes do I need to make to PRODUCTS table so that querying the table will not take forever?
These are the only queries the app will be making ( not sure if this info will help in your solution advice )...
Get all products by category:
Select * from products where category = 'semiconductors'
Get single product:
Select * from products where Manufacturer_Part_Number = '12345'
Get product count by category
I think those three actually cover everything I need to do. Maybe a couple more, but not many.
In closing...
Is there a way to "index" this table with 223000 records where searching by one or more columns can be done efficiently?
I am very new to database design, and know I do need to index SOMETHING, but ... WHAT???
Thank you for taking the time to look at this post.
Regards,
John
Listing the queries is mandatory to answering your question. Thanks for including them.
INDEX(category)
INDEX(Manufacturer_Part_Number)
But I suggest your second query should include Manufacturer, too. Then this would be better it:
INDEX(Manufacturer, Manufacturer_Part_Number)
Everything NULL? Seems unlikely.
(I've done jobs like yours; I can't imagine bidding only $900 for all that scraping.)
What will you do when there are a thousand items in a single category or manufacturer? A UI with a thousand-item list sucks.
For how to handle "so many attributes", I recommend http://mysql.rjweb.org/doc.php/eav (I should charge you $899 for the research that went into that document. Just kidding.)
Don't they need other lookups, like "Flash drive", which need to match "FLASH DRV"?
223K rows -- no problem. The VARCHARs seem to be too short; were they based on the data?
And the table needs a PRIMARY KEY.
I've looked for solutions but couldn't find anything for my specific problem and can't manage to sort it on my own, the queries I'm trying are too heavy and timeout my server.
I have a table 'events' like this :
+-----------+--------------------------------+
| field | informations |
+-----------+--------------------------------+
| id | INT(11) PRIMARY AUTO_INCREMENT |
| repeat_id | INT(11) |
| timestamp | TIMESTAMP |
| title | VARCHAR(255) |
| details | TEXT |
+-----------+--------------------------------+
My events cane be unique ('repeat_id'=0) or repeated ('repeat_id'>0), and my problem is, I want to find all "outdated" events, that is to say the events that are repeated AND whose max timestamp is under current timestamp.
It might be easy using the good synthax and/or functions but I can't manage to do it... any help would be appreciated!
Thanks a lot,
Arthur
What if you try like below
select * from events
where repeat_id > 0
group by title
having max(`timestamp`) <= CURRENT_TIMESTAMP;
if I do have a table player_res:
+-----------+-------+
| player_id | score |
+-----------+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 22 |
| 4 | 22 |
+-----------+-------+
Would it be better to have score being a foreign key referencing lets say num.value, with num.value being UNIQUE? So a certain score can only appear once in num.value field.
Is there advantages in database size and/or speed as both types (player_res.score and num.value) are INT(11)? I believe it doesn't matter but someone else just tried to convince me that using a second table to store the scores uniquely would be better for performance in case the table would grow really large, that's why I decided asking here!
Regards
INT(11) are the same size as INT that is 4 bytes in MySQL. For integer types the value in parenthesis is just the number of digit displayed. Not the type size.
http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
Using an secondary table to deduplicate score values doesn't seems to be a very serious idea... or you and I are really missing some subtleties?
If size is really an issue, you might perhaps use a smaller integer (like a tinyint) to store your scores?
I'm at the moment developping a quite big application that will manipulate a lot of data.
I'm designing the data model and I wonder how to tune this model for big amount of data. (My DBMS is MySQL)
I have a table that will contain objects called "values". There are 6 columns called :
id
type_bool
type_float
type_date
type_text
type_int
Depending of the type of that value (that is written elsewhere), one of these columns has a data, the others are NULL values.
This table is aimmed to contain millions lines (growing very fastly). It's also going to be read a lot of times.
My design is going to make a lot of lines with few data. I wonder if it's better to make 5 different tables, each will contain only one type of data. With that solution there would be much more jointures.
Can you give me a piece of advice ?
Thank you very much !
EDIT : Description of my tables
TABLE ELEMENT In the application there are elements thats contains attributes.
There will be a LOT of rows.
There is a lot of read/write, few update/delete.
TABLE ATTRIBUTEDEFINITION Each attribute is described (design time) in the table attributeDefinition that tells which is the type of the attribute.
There will not be a lot of rows
There is few writes at the begining but a LOT of reads.
TABLE ATTRIBUTEVALUE After that, another table "attributeValue" contains the actual data of each attributeDefinition for each element.
There will be a LOT of rows ([nb of Element] x [nb of attribute])
There is a LOT of read/write/UPDATE
TABLE LISTVALUE *Some types are complex, like the list_type. The set of values available for this type are in another table called LISTVALUE. The attribute value table then contains an id that is a key of the ListValue Table*
Here are the create statements
CREATE TABLE `element` (
`id` int(11),
`group` int(11), ...
CREATE TABLE `attributeDefinition` (
`id` int(11) ,
`name` varchar(100) ,
`typeChamps` varchar(45)
CREATE TABLE `attributeValue` (
`id` int(11) ,
`elementId` int(11) , ===> table element
`attributeDefinitionId` int(11) , ===> table attributeDefinition
`type_bool` tinyint(1) ,
`type_float` decimal(9,8) ,
`type_int` int(11) ,
`type_text` varchar(1000) ,
`type_date` date,
`type_list` int(11) , ===> table listValue
CREATE TABLE `listValue` (
`id` int(11) ,
`name` varchar(100), ...
And there is a SELECT example that retrieve all elements of a group that id is 66 :
SELECT elementId,
attributeValue.id as idAttribute,
attributeDefinition.name as attributeName,
attributeDefinition.typeChamps as attributeType,
listValue.name as valeurDeListe,
attributeValue.type_bool,
attributeValue.type_int,
DATE_FORMAT(vdc.type_date, '%d/%m/%Y') as type_date,
attributeValue.type_float,
attributeValue.type_text
FROM element
JOIN attributeValue ON attributeValue.elementId = element.id
JOIN attributeDefinition ON attributeValue.attributeDefinitionId = attributeDefinition.id
LEFT JOIN listValue ON attributeValue.type_list = listValue.id
WHERE `e`.`group` = '66'
In my application, foreach row, I print the value that corresponds to the type of the attribute.
As you are only inserting into a single column each time, create a different table for each data type - if you are inserting large quantities of data you will be wasting a lot of space with this design.
Having fewer rows in each table will increase index lookup speed.
Your column names should describe the data in them, not the column type.
Read up on Database Normalisation.
Writing will not be an issue here. Reading will
You have to ask yourself :
how often are you gonna query this ?
are old data modified or is it just "append" ?
==> if the answers are frequently / append only, or minor modification of old data, a cache may solve your read issues, as you won't query the base so often.
There will be a lot of null fields at each row. If the table is not big ok, but as you said there will be millions of rows so you are wasting space and the queries will take longer to execute. Do someting like this:
table1
id | type
table2
type | other fields
Advice I have, although it might not be the kind you want :-)
This looks like an entity-attribute-value schema; using this kind of schema leads to all kind of maintenance / performance nightmares:
complicated queries to get all values for a master record (essentially, you'll have to left join your result table N times with itself to obtain N attributes for a master record)
no referential integrity (I'm assuming you'll have lookup values with separate master data tables; you cannot use foreign key constraints for this)
waste of disk space (since your table will be sparsely filled)
For a more complete list of reasons to avoid this kind of schema, I'd recommend getting a copy of SQL Antipatterns
Finally I tried to implement both solutions and then I benched them.
For both solution, there were a table element and a table attribute definition as follow :
[attributeDefinition]
| id | group | name | type |
| 12 | 51 | 'The Bool attribute' | type_bool |
| 12 | 51 | 'The Int attribute' | type_int |
| 12 | 51 | 'The first Float attribute' | type_float |
| 12 | 51 | 'The second Float attribute'| type_float |
[element]
| id | group | name
| 42 | 51 | 'An element in the group 51'
First Solution (Best one)
One big table with one column per type and many empty cells. Each value of each attribute of each element.
[attributeValue]
| id | element | attributeDefinition | type_int | type_bool | type_float | ...
| 1 | 42 | 12 | NULL | TRUE | NULL | NULL...
| 2 | 42 | 13 | 5421 | NULL | NULL | NULL...
| 3 | 42 | 14 | NULL | NULL | 23.5 | NULL...
| 4 | 42 | 15 | NULL | NULL | 56.8 | NULL...
One table for attributeDefinition that describe each attribute of every element in a group.
Second Solution (Worse one)
8 tables, one for each type :
[type_float]
| id | group | element | value |
| 3 | 51 | 42 | 23.5 |
| 4 | 51 | 42 | 56.8 |
[type_bool]
| id | group | element | value |
| 1 | 51 | 42 | TRUE |
[type_int]
| id | group | element | value |
| 2 | 51 | 42 | 5421 |
Conclusion
My bench was first looking at the database size. I had 1 500 000 rows in the big table which means approximatly 150 000 rows in each small table if there are 10 datatypes.
Looking in phpMyAdmin, sizes are nearly exactly the same.
First Conclusion : Empty cells doesn't take place.
After that, my second bench was for performance tests, getting all values of all attributes of all elements in one group. There are 15 groups in the database. Each group has :
400 elements
30 attributes per element
So that is 12 000 rows in [attributeValue] or 1200 rows in each table [type_*].
The First SELECT only does one join between [attributeValue] and [element] to put a WHERE on the group.
The second SELECT uses a UNION with 10 SELECT in each table [type_*].
That second SELECT is 10 times longer !
Second Conclusion : One table is better that many.
I'm attempting to design a small database for a customer. My customer has an organization that works with public and private schools; for every school that's involved, there's an implementation (a chapter) at each school.
To design this, I've put together two tables; one for schools and one for chapters. I'm not sure, however, if I should merge the two together. The tables are as follows:
mysql> describe chapters;
+--------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| school_id | int(10) unsigned | NO | MUL | | |
| is_active | tinyint(1) | NO | | 1 | |
| registration_date | date | YES | | NULL | |
| state_registration | varchar(10) | YES | | NULL | |
| renewal_date | date | YES | | NULL | |
| population | int(10) unsigned | YES | | NULL | |
+--------------------+------------------+------+-----+---------+----------------+
7 rows in set (0.01 sec)
mysql> describe schools;
+----------------------+------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+------------------------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| full_name | varchar(255) | NO | MUL | | |
| classification | enum('high','middle','elementary') | NO | | | |
| address | varchar(255) | NO | | | |
| city | varchar(40) | NO | | | |
| state | char(2) | NO | | | |
| zip | int(5) unsigned | NO | | | |
| principal_first_name | varchar(20) | YES | | NULL | |
| principal_last_name | varchar(20) | YES | | NULL | |
| principal_email | varchar(20) | YES | | NULL | |
| website | varchar(20) | YES | | NULL | |
| population | int(10) unsigned | YES | | NULL | |
+----------------------+------------------------------------+------+-----+---------+----------------+
12 rows in set (0.01 sec)
(Note that these tables are incomplete - I haven't implemented foreign keys yet. Also, please ignore the varchar sizes for some of the fields, they'll be changing.)
So far, the pros of keeping them separate are:
Separate queries of schools and
chapters are easier. I don't know if
it's necessary at the moment, but
it's nice to be able to do.
I can make a chapter inactive
without directly affecting the
school information.
General separation of data - the fields in
"chapters" are directly related to
the chapter itself, not the school
in which it exists. (I like the
organization - it makes more sense
to me. Also follows the "nothing but the key" mantra.)
If possible, we can collect school
data without having a chapter
associated with it, which may make
sense if we eventually want people
to select a school and autopopulate
the data.
And the cons:
Separate IDs for schools and
chapters. As far as I know, there
will only ever be a one-to-one
relationship between the two, so
doing this might introduce more
complexity that could lead to errors
down the line (like importing data
from a spreadsheet, which is unfornately
something I'll be doing a lot of).
If there's a one-to-one ratio, and
the IDs are auto_increment fields,
I'm guessing that the chapter_id and
school_id will end up being the same - so why not just put them in a single table?
From what I understand, the chapters
aren't really identifiable on their
own - they're bound to a school, and
as such should be a subset of a
school. Should they really be
separate objects in a table?
Right now, I'm leaning towards keeping them as two separate tables; it seems as though the pros outweigh the cons, but I want to make sure that I'm not creating a situation that could cause problems down the line. I've been in touch with my customer and I'm trying to get more details about the data they store and what they want to do with it, which I think will really help. However, I'd like some opinions from the well-informed folks on here; is there anything I haven't thought of? The bottom line here is just that I want to do things right the first time around.
I think they should be kept separate. But, you can make the chapter a subtype of a school (and the school the supertype) and use the same ID. Elsewhere in the database where you use SchoolID you mean the school and where you use ChapterID you mean the chapter.
CREATE TABLE School (
SchoolID int unsigned NOT NULL AUTO_INCREMENT,
CONSTRAINT PK_School PRIMARY KEY (SchoolID)
)
CREATE TABLE Chapter (
ChapterID int unsigned NOT NULL,
CONSTRAINT PK_Chapter PRIMARY KEY (ChapterID)
CONSTRAINT FK_Chapter_School FOREIGN KEY (ChapterID) REFERENCES School (SchoolID)
)
Now you can't have a chapter unless there's a school first. If such a time occurred that you had to allow multiple chapters per school, you would recreate the Chapter table with ChapterID as identity/auto-increment, add a SchoolID column populated with the same value and put the FK on this one to School, and continue as before, only inserting the ID to SchoolID instead of ChapterID. If MySQL supports inserting explicit values to an autoincrement column, then making it SchoolID autoincrement ahead of time could save you trouble later (unless switching a regular column to autoincrement is supported in which case no issues there).
Additional benefits of keeping them separate:
You can make foreign key relationships directly with SchoolID or ChapterID so that the data you're storing is always correct (for example, if no chapter exists yet you can't store related data for such a thing until it is created).
Querying each table separately will perform better as the rows don't contain extraneous information.
A school can be created with certain required columns, but the chapter left uncreated (temporarily). Then, when it is created, you can have some NOT NULL columns in it as well.
keep them separate.
they may be 1-1 currently... however these are clearly separate concepts.
will they eventually want to input schools which do not have chapters? perhaps as part of a sales lead system?
can there really only be one chapter per school or just one active chapter ? what about across time? is it possible they will request a report with all chapters in the past 10 years at x school ?
You said the links will always be 1 to 1, but does a school always have a chapter can it change chapters? If so, then keeping chapters separate is a good idea.
Another reason to keep them separate is if the amount of information about the two entities combined would make the length of the records longer than the database backend can handle. One-to_one tables are often built to keep the amount of data that needs to be stored in a record down to an appropriate size.
Further is the requirement a firm 1-1 or is does it have the potential to be 1-many? If the second, make it a separate table now. Id there the potential to have schools without chapters? Again I'd keep them separate.
And how are you intending to query this data, will you generally need the data about both the chapter and school in the same queries, then you might put them in one table if you are sure there is no possibility of it turning into a 1-many relationship. However a proper join with the join fields indexed should be fast anyway.
I tend to see these as separate entities and would keep them separte unless there was a critcal performance problem that would lead to putting them to gether. I think that having separate entities in separate table from the start tends to be less risky than putting them together. And performance would normally be perfectly acceptable as long as the indexing is correct and may even be better if you don't normally need to query data from both tables all the time.