I have recently inherited a already started project, and I have one challenge right now. One of the requirements is to allow a user to create a "database" inside the application, that can have a variable number of user-defined columns (it's an excel-like structure).
Here's the sqlfiddle for my current structure.
Here's a query I am using to fetch rows:
select `row`,
group_concat(dd.value order by field(`col`, 1, 2, 3) asc) as `values`
from db_record dr,
db_dictionary dd
where dr.database_id in (1, 2, 3)
and dr.database_dictionary_id = dd.id
group by `row`
order by group_concat(dd.value order by field(`col`, 1, 2, 3) asc);
Ability to sort by any column is achieved by using group_concat().
I am thinking about that design, because I have some doubts regarding performance and meeting requirements:
It has to be sortable (by any column), meaning that user sorts asc by column 2, and rows are ordered properly.
It has to be searchable/filterable. User can filter by values in any column, and only rows containing search phrase should be returned.
First requirement I think is handled by the query I pasted above. Second one - I also tried adding HAVING clause to the query with LIKE, but it compared the whole GROUP_CONCAT() result.
Can someone advise, whether the current DB structure is ok for the purpose and help me with the latter requirement? Or maybe there's a better approach to the problem?
Last question, is it possible to return values for each column in one query? In DB, records look like this:
-------------------------------------------
| database_id | dictionary_id | row | col |
-------------------------------------------
| 1 | 1 | 1 | 1 |
-------------------------------------------
| 2 | 2 | 1 | 2 |
-------------------------------------------
| 3 | 3 | 1 | 3 |
-------------------------------------------
And I would like to get a query result groupped by row, similar to that: (column 1 .. 3 values are dictionary_id values)
----------------------------------------
| row | column 1 | column 2 | column 3 |
----------------------------------------
| 1 | 1 | 2 | 3 |
----------------------------------------
Is that achievable in mysql? Or the only solution is to use GROUP_CONCAT() and then I can use php to split into columns?
I need a flexlible and efficient structure, and I hope someone can advise me on that, I would really appreciate any help or suggestions.
Excel-2-MySQL
A Flexible, Dynamic Adaption of Excel Format to a MySQL Relational Schema
The approach of this solution may work for other relational database systems as it does not rely on any specific features of MySQL except for SQL compliant DDL and DML commands. The maintenance of this database can be handled through a combination of internal db constraints and stored procedure apis, or externally by an alternate scripting language and user interface. The focus of this walk through is the purpose of the schema design, organization of the data and supporting values as well as potential points of expansion for additional enhancements.
Schema Overview and Design Concepts to Adapt a Spreadsheet
The schema leverages an assumption that each data point on the spreadsheet grid can be represented by a unique combination of keys. The simplest combination would be a row-column coordinate pair, such as "A1" (Column A, Row Number 1) or "G72" (Column G, Row Number 72)
This walk-through demonstration will show how to adapt the following data sample in spreadsheet form into a reusable, multi-user relational database format.
A pair of coordinates can also include a uniquely assigned spreadsheet/mini-db ID value. For a multi-user environment, the same schema can still be used by adding a supporting user ID value to associate with each spreadsheet ID.
Defining the Smallest Schema Unit: The Vector
After bundling together all the identifying meta info about each data point, the collection is now tagged with a single, globally unique ID, which to some may now appear like a catalog of "vectors".
A VECTOR by mathematical definition is a collection of multiple components and their values used to simplify solutions for problems which exist in spaces that are described through multiple (n) dimensions.
The solution is scalable: mini-databases can be as small as 2 rows x 2 columns or hundreds to thousands of rows and columns wide.
Search, Sort and Pivot Easily
To build search queries from the data values of vectors that have common attributes such as:
Database/Spreadsheet ID and Owner (Example, 10045, Owner = 'HELEN')
Same Column: (Example, Column "A")
Your data set would be all vector id's and their associated data values which have these common values. Pivot outputs could be accomplished generically with probably some simple matrix algebra transformations... a spreadsheet grid is only two dimensions, so it can't be that hard!
Handling Different Data Types: Some Design Considerations
The simple approach: Store all the data as VARCHAR types but keep track of the original data type so that when you query the vector's data value you can apply the right conversion function. Just be consistent and use your API or input process to vigilantly police the population of your data in the data store... the last thing you'll want to end up debugging is a Numeric conversion function that has encountered a STRING typed character.
The next section contains the DDL code to set up a one-table solution which uses multiple columns to manage the different possible data types that may be hosted within a given spreadsheet grid.
A Single Table Solution for Serving a Spreadsheet Grid Through MySQL
Below is the DDL worked out on MySQL 5.5.32.
-- First Design Idea... Using a Single Table Solution.
CREATE TABLE DB_VECTOR
(
vid int auto_increment primary key,
user_id varchar(40),
row_id int,
col_id int,
data_type varchar(10),
string_data varchar(500),
numeric_data int,
date_data datetime
);
-- Populate Column A with CITY values
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 2, 1, 'STRING', 'ATLANTA', NULL, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 3, 1, 'STRING', 'MACON', NULL, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 4, 1, 'STRING', 'SAVANNAH', NULL, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 5, 1, 'STRING', 'FORT BENNING', NULL, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 6, 1, 'STRING', 'ATHENS', NULL, NULL);
-- Populate Column B with POPULATION values
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 2, 2, 'NUMERIC', NULL, 1500000, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 3, 2, 'NUMERIC', NULL, 522000, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 4, 2, 'NUMERIC', NULL, 275200, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 5, 2, 'NUMERIC', NULL, 45000, NULL);
INSERT INTO DB_VECTOR (user_id, row_id, col_id, data_type,
string_data, numeric_data, date_data)
VALUES ('RICHARD', 6, 2, 'NUMERIC', NULL, 1325700, NULL);
There is a temptation to run off and start over-normalizing this table, but redundancy may not be that bad. Separate off information that is related to the spreadsheets (Such as OWNER/USER Name and other demographic info) but otherwise keep things together until you understand the purpose of the vector-based design and some of the performance trade-offs.
One such tradeoff with a over-normalized schema is that now the required data values are scattered across multiple tables. Filter criteria may now have to apply on different tables involved in these joins. Ironic as it may seem, I have observed that flattened, singular table structures fare well when it comes to querying and reporting despite some apparent redundancy.
An Additional Note: Creating tables for supporting data linked to the main data source via Foreign Key relations are a different story... an implied relation exists between tables, but many RDBMS systems actually self-optimize based on Foreign Key connections.
For Example: Searching the USER_OWNER column with several million records benefits from a potential boost if it is linked by a FK to a supporting table which identifies a finite user list of 20 people... This is also known as an issue of CARDINALITY, which helps the database build execution plans that can take short-cuts through an otherwise unknown data set.
Getting Your Data Back Out: Some Sample Queries
The first is a base query to pull the data back out in a organized, grid-like format... just like the original Excel page.
SELECT base_query.CITY, base_query.POPULATION
FROM (
SELECT CASE WHEN col_a.data_type = 'STRING'
THEN col_a.string_data
WHEN col_a.data_type = 'NUMERIC'
THEN col_a.numeric_data
WHEN col_a.data_type = 'DATETIME'
THEN col_a.date_data ELSE NULL END as CITY,
CASE WHEN col_b.data_type = 'STRING'
THEN col_b.string_data
WHEN col_b.data_type = 'NUMERIC'
THEN col_b.numeric_data
WHEN col_b.data_type = 'DATETIME'
THEN col_b.date_data ELSE NULL END as POPULATION
FROM db_vector col_a, db_vector col_b
WHERE ( col_a.col_id = 1 AND col_b.col_id = 2 )
AND ( col_a.row_id = col_b.row_id)
) base_query WHERE base_query.POPULATION >= 500000
ORDER BY base_query.POPULATION DESC
Even the base query here is still a little specific to manage a scalable, generic solution for a spreadsheet of one or many values in width or length. But you can see how the internal query in this example remains untouched and a complete data set can quickly be filtered or sorted in different ways.
Some Parting Thoughts: (a.k.a. Some Optional Homework)
It is possible to solve this with an flexible, multi-table solution. I was able to accomplish this in THREE.
DB_VECTOR (as you have already seen) underwent some modifications: data values were moved out and strictly positional information (row and column id's) plus a globally unique spreadsheet id was left behind.
DB_DATA was used as the final home for the raw data fields: STRING_DATA, NUMERIC_DATA, and DATE_DATA... each record uniquely identified by a VID (vector id).
In the multi-table solution, I used the unique VID instead as a pointer with multiple associated dimensions (owner, sheet id, row, column, etc.) to point to its corresponding data value.
An example of the utility of this design: the possibility of a "look-up" function or query that identifies a collection of vector ids and the data they point to based on the properties of the data itself, or the vector components (row, column, sheet id, etc.)... or a combination.
The possibility is that instead of circulating a whole lot of data (the spreadsheet itself) between different parts of the code handling this schema, queries deal with only the specific properties and just push around lists (arrays?) or sets of universally unique ids which point to the data as it is needed.
Initializing New Spreadsheets: If you pursue the multi-table design, your DB_VECTOR table becomes a hollow collection of bins with pointers to the actual data. Before you populate the raw data values, the VECTOR_ID (vid) will need to exist first so you can link the two values.
Which Way is UP???: Using numeric values for row and column id's seemed like the easy way first, but I noticed that: (a) I was easily mixing up columns and rows... and worse, not noticing it until it was too late; (b) Excel actually has a convention: Rows (numeric), Columns (Alphabetic: A through ZZ+?) Will users miss the convention or get lost when using our schema? Are there any problems with adopting a non-numeric identification scheme for our data vectors?
Yet Another Dimension: Excel Spreadsheets have MULTIPLE sheets. How would support for this convention change the design of your VECTORS? Engineers and scientists even push this limit to more than the three dimensions humans can see. How would that change things? If you tried it, did you find out if it imposed a limitation, or did it matter at all?
Stumbled Into This One...: My current DB_VECTOR table contains an extra VARCHAR value called "DETAILS". I found it a useful catch-bin for a miscellaneous, custom attribute that can be unique all the way down to the lowest (VECTOR ID/POINTER) level... or you can use it to create a custom label for an unusual collection of vectors that may not have an easily definable relation (like Excel's "Range Name" property)... What would you use it for?
If you're still with me... thanks. This one was a challenging thought exercise in database design. I have purposely left out fully expanded discussions on optimization and performance considerations for the sake of clarity... perhaps something to consider at a later time.
Best Wishes on Your Project.
Why not model tabular storage as a table? Just build the ALTER|CREATE|DROP TABLE statements ad hoc, and you can reap all the benefits of actually having a database server. Indexes and SQL come to mind.
Example schema:
CREATE TABLE Worksheets
(
WorksheetID int auto_increment primary key,
WorkbookID int not null,
Name varchar(256) not null,
TableName nvarchar(256) not null
);
CREATE TABLE Columns
(
ColumnID int auto_increment primary key,
WorksheetID int not null,
ColumnSequenceNo int not null,
Name varchar(256) not null,
PerceivedDatatype enum ('string', 'number') not null
)
-- Example of a dynamically generated data table:
-- Note: The number in the column name would correspond to
-- ColumnSequenceNo in the Columns table
CREATE TABLE data_e293c71b-b894-4652-a833-ba817339809e
(
RowID int auto_increment primary key,
RowSequenceNo int not null,
Column1String varchar(256) null,
Column1Numeric double null,
Column2String varchar(256) null,
Column2Numeric double null,
Column3String varchar(256) null,
Column3Numeric double null,
-- ...
ColumnNString varchar(256) null,
ColumnNNumeric double null
);
INSERT INTO Worksheets (WorkbookID, Name, TableName)
VALUES (1, `Countries`, `data_e293c71b-b894-4652-a833-ba817339809e`);
SET #worksheetID = LAST_INSERT_ID();
INSERT INTO Columns (WorksheetID, ColumnSequenceNo, Name, PerceivedDatatype)
VALUES (#worksheetID, 1, `Country Name`, `string`),
(#worksheetID, 2, `Population`, `numeric`),
(#worksheetID, 3, `GDP/person`, `numeric`);
-- example of an insert for a new row:
-- if the new data violates any perceived types, update them first
INSERT INTO data_e293c71b-b894-4652-a833-ba817339809e (
RowSequenceNo,
Column1String,
Column2String, Column2Numeric,
Column3String, Column3Numeric)
VALUES (
1,
`United States of America`,
`3000000`, 3000000,
`34500`, 34500);
-- example of a query on the first column:
select *
from data_e293c71b-b894-4652-a833-ba817339809e
where Column1String like `United%`;
-- example of a query on a column with a numeric perceived datatype:
select *
from data_e293c71b-b894-4652-a833-ba817339809e
where Column3Numeric between 4000 and 40000;
Moral of the story is that you shouldn't fight the database server — use it to your advantage.
select `row`,
group_concat(if(field(`row`, 1), dd.value, null)) as row1,
group_concat(if(field(`row`, 2), dd.value, null)) as row2,
group_concat(if(field(`row`, 3), dd.value, null)) as row3
from db_record dr
left join db_dictionary dd on (dr.dictionary_id = dd.id)
where dr.database_id = 0
group by `column`
having row1 like '%biu%'
order by `row` uni;
My first impression is that you may be overthinking this quite a bit. I'm guessing that you wish to get a permutation of 3 or more player combinations across all db dictionaries (players). And the sqlfiddle suggests recording all these in the db_record table to be retrieved later on.
Using group_concat is pretty expensive, so is the use of 'having'. When you view the original sqlfiddle's execution plan, it says in the "Extra" column
Using where; Using temporary; Using filesort
"Using temporary; Using filesort" are indications of inefficiency around using temporary tables and having to hit the disk multiple times during filesort. The first execution time was 25ms (before it was cached, bringing that fictitiously down to 2ms on the second execution onwards)
To the original question, creating a "database" inside the "application"? If you mean a flexible DB within a DB, you're probably overusing a relational DB. Try shifting some of the responsibilities out to the application layer code (php?), yes outside of the DB, and leave the relational DB to do what it's best at, relating relevant tables of data. Keep it simple.
After some thinking, I think I might have a solution, but I am not sure if it's the best one. Before running the query in the app, I already know how many columns that virtual "database" has, and since I know which column I need to search (column 3 in this example), I can build a query like that:
select `row`,
group_concat(if(field(`column`, 1), dd.value, null)) as column1,
group_concat(if(field(`column`, 2), dd.value, null)) as column2,
group_concat(if(field(`column`, 3), dd.value, null)) as column3
from db_record dr
left join db_dictionary dd on (dr.dictionary_id = dd.id)
where dr.database_id = 1
group by `row`
having column3 like '%biu%'
order by `columns` asc;
So, in PHP I can add group_concat(if(...)) for each column and add HAVING clause to search.
But I would like to get some feedback about that solution if possible.
Related
How can i insert multiple values into one row?
My query
insert into table_RekamMedis values ('RM001', '1999-05-01', 'D01', 'Dr Zurmaini', 'S11', 'Tropicana', 'B01', 'Sulfa', '3dd1');
i cant insert two values into one row. is there another way to do it?
I'm ignorant of the human language you use, so this is a guess.
You have two entities in your system. One is dokter, the other is script (prescription). Your requirement is to store zero or more scripts for each dokter. That is, the relationship between your entities is one-to-many.
In a relational database management system (SQL system) you do that with two tables, one per entity. Your dokter table will contain a unique identifier for each doctor, and the doctor's descriptive attributes.
CREATE TABLE dokter(
dokter_id BIGINT AUTO_INCREMENT PRIMARY KEY NOT NULL,
nama VARCHAR (100),
kode VARCHAR(10),
/* others ... */
);
And you'll have a second table for script
CREATE TABLE script (
script_id BIGINT AUTO_INCREMENT PRIMARY KEY NOT NULL,
dokter_id BIGINT NOT NULL,
kode VARCHAR(10),
nama VARCHAR(100),
dosis VARCHAR(100),
/* others ... */
);
Then, when a doctor writes two prescriptions, you insert one row in dokter and two rows in script. You make the relationship between script and dokter by putting the correct dokter_id into each script row.
Then you can retrieve this information with a query like this:
SELECT dokter.dokter_id, dokter.nama, dokter.kode,
script.script_id, script.kode, script.nama, script.dosis
FROM dokter
LEFT JOIN script ON dokter.dokter_id = script.dokter_id
Study up on entity-relationship data design. It's worth your time to learn and will enhance your career immeasurably.
You can't store multiple values in a single field but there are various options to achieve what you're looking for.
If you know that a given field can only have a set number of values then it might make sense to simply create multiple columns to hold these values. In your case, perhaps Nama obat only ever has 2 different values so you could break out that column into two columns: Nama obat primary and Nama obat secondary.
But if a given field could have any amount of values, then it would likely make sense to create a table to hold those values so that it looks something like:
NoRM
NamaObat
RM001
Sulfa
RM001
Anymiem
RM001
ABC
RM002
XYZ
And then you can combine that with your original table with a simple join:
SELECT * FROM table_RekamMedis JOIN table_NamaObat ON table_RekamMedis.NoRM = table_NamaObat.NoRM
The above takes care of storing the data. If you then want to query the data such that the results are presented in the way you laid out in your question, you could combine the multiple NamaObat fields into a single field using GROUP_CONCAT which could look something like:
SELECT GROUP_CONCAT(NamaObat SEPARATOR '\n')
...
GROUP BY NoRM
MySQL 5.7.24
Lets say I have 3 rows like this:
ID (PK) | Name (VARCHAR) | Data (JSON)
--------+----------------+-------------------------------------
1 | Admad | [{"label":"Color", "value":"Red"}, {"label":"Age", "value":40}]
2 | Saleem | [{"label":"Color", "value":"Green"}, {"label":"Age", "value":37}, {"label":"Hoby", "value":"Chess"}]
3 | Daniel | [{"label":"Food", "value":"Grape"}, {"label":"Age", "value":47}, {"label":"State", "value":"Sel"}]
Rule #1: The JSON column is dynamic. Means not everybody will have the same structure
Rule #2: Assuming I can't modify the data structure
My question, it it possible to query so that I can get the ID of records where the Age is >= 40? In this case 1 & 3.
Additional Info (after being pointed as duplicate): if you look at my data, the parent container is array. If I store my data like
{"Age":"40", "Color":"Red"}
then I can simply use
Data->>'$.Age' >= 40
My current thinking is to use a stored procedure to loop the array but I hope I don't have to take that route. The second option is to use regex (which I also hope not). If you think "JSON search" is the solution, kindly point to me which one (or some sample for this noob of me). The documentation's too general for my specific needs.
Here's a demo:
mysql> create table letsayi (id int primary key, name varchar(255), data json);
mysql> > insert into letsayi values
-> (1, 'Admad', '[{"label":"Color", "value":"Red"}, {"label":"Age", "value":"40"}]'),
-> (2, 'Saleem', '[{"label":"Color", "value":"Green"}, {"label":"Age", "value":"37"}, {"label":"Hoby", "value":"Chess"}]');
mysql> select id, name from letsayi
where json_contains(data, '{"label":"Age","value":"40"}');
+----+-------+
| id | name |
+----+-------+
| 1 | Admad |
+----+-------+
I have to say this is the least efficient way you could store your data. There's no way to use an index to search for your data, even if you use indexes on generated columns. You're not even storing the integer "40" as an integer — you're storing the numbers as strings, which makes them take more space.
Using JSON in MySQL when you don't need to is a bad idea.
Is it still possible to query age >= 40?
Not using JSON_CONTAINS(). That function is not like an inequality condition in a WHERE clause. It only matches exact equality of a subdocument.
To do an inequality, you'd have to upgrade to MySQL 8.0 and use JSON_TABLE(). I answered another question recently about that: MySQL nested JSON column search and extract sub JSON
In other words, you have to convert your JSON into a format as if you had stored it in traditional rows and columns. But you have to do this every time you query your data.
If you need to use conditions in the WHERE clause, you're better off not using JSON. It just makes your queries much too complex. Listen to this old advice about programming:
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
— Brian Kernighan
how people tackle dynamically added form fields
You could create a key/value table for the dynamic form fields:
CREATE TABLE keyvalue (
user_id INT NOT NULL,
label VARCHAR(64) NOT NULL,
value VARCHAR(255) NOT NULL,
PRIMARY KEY (user_id, label),
INDEX (label)
);
Then you can add key/value pairs for each user's dynamic form entries:
INSERT INTO keyvalue (user_id, label, value)
VALUES (123, 'Color', 'Red'),
(123, 'Age', '40');
This is still a bit inefficient in storage compared to real columns, because the label names are stored every time you enter a user's data, and you still store integers as strings. But if the users are really allowed to store any labels of their own choosing, you can't make those real columns.
With the key/value table, querying for age > 40 is simpler:
SELECT user_id FROM key_value
WHERE label = 'Age' AND value >= 40
MySQL 5.7.24
Lets say I have 3 rows like this:
ID (PK) | Name (VARCHAR) | Data (JSON)
--------+----------------+-------------------------------------
1 | Admad | [{"label":"Color", "value":"Red"}, {"label":"Age", "value":40}]
2 | Saleem | [{"label":"Color", "value":"Green"}, {"label":"Age", "value":37}, {"label":"Hoby", "value":"Chess"}]
3 | Daniel | [{"label":"Food", "value":"Grape"}, {"label":"Age", "value":47}, {"label":"State", "value":"Sel"}]
Rule #1: The JSON column is dynamic. Means not everybody will have the same structure
Rule #2: Assuming I can't modify the data structure
My question, it it possible to query so that I can get the ID of records where the Age is >= 40? In this case 1 & 3.
Additional Info (after being pointed as duplicate): if you look at my data, the parent container is array. If I store my data like
{"Age":"40", "Color":"Red"}
then I can simply use
Data->>'$.Age' >= 40
My current thinking is to use a stored procedure to loop the array but I hope I don't have to take that route. The second option is to use regex (which I also hope not). If you think "JSON search" is the solution, kindly point to me which one (or some sample for this noob of me). The documentation's too general for my specific needs.
Here's a demo:
mysql> create table letsayi (id int primary key, name varchar(255), data json);
mysql> > insert into letsayi values
-> (1, 'Admad', '[{"label":"Color", "value":"Red"}, {"label":"Age", "value":"40"}]'),
-> (2, 'Saleem', '[{"label":"Color", "value":"Green"}, {"label":"Age", "value":"37"}, {"label":"Hoby", "value":"Chess"}]');
mysql> select id, name from letsayi
where json_contains(data, '{"label":"Age","value":"40"}');
+----+-------+
| id | name |
+----+-------+
| 1 | Admad |
+----+-------+
I have to say this is the least efficient way you could store your data. There's no way to use an index to search for your data, even if you use indexes on generated columns. You're not even storing the integer "40" as an integer — you're storing the numbers as strings, which makes them take more space.
Using JSON in MySQL when you don't need to is a bad idea.
Is it still possible to query age >= 40?
Not using JSON_CONTAINS(). That function is not like an inequality condition in a WHERE clause. It only matches exact equality of a subdocument.
To do an inequality, you'd have to upgrade to MySQL 8.0 and use JSON_TABLE(). I answered another question recently about that: MySQL nested JSON column search and extract sub JSON
In other words, you have to convert your JSON into a format as if you had stored it in traditional rows and columns. But you have to do this every time you query your data.
If you need to use conditions in the WHERE clause, you're better off not using JSON. It just makes your queries much too complex. Listen to this old advice about programming:
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
— Brian Kernighan
how people tackle dynamically added form fields
You could create a key/value table for the dynamic form fields:
CREATE TABLE keyvalue (
user_id INT NOT NULL,
label VARCHAR(64) NOT NULL,
value VARCHAR(255) NOT NULL,
PRIMARY KEY (user_id, label),
INDEX (label)
);
Then you can add key/value pairs for each user's dynamic form entries:
INSERT INTO keyvalue (user_id, label, value)
VALUES (123, 'Color', 'Red'),
(123, 'Age', '40');
This is still a bit inefficient in storage compared to real columns, because the label names are stored every time you enter a user's data, and you still store integers as strings. But if the users are really allowed to store any labels of their own choosing, you can't make those real columns.
With the key/value table, querying for age > 40 is simpler:
SELECT user_id FROM key_value
WHERE label = 'Age' AND value >= 40
We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.
I've got two tables where I'm trying to insert data from one to another, I've been able to find a few examples of how this can be accomplished on the web, the problem is these examples mostly rely on identical table structure between the two ... you see I'm trying to insert some data from one table into another table with quite a different structure.
I'm trying to insert data from a table called 'catalog_product_entity_media_gallery' into a table called 'catalog_product_entity_varchar'. Below is a simple description of their structure
The 'catalog_product_entity_varchar' looks as follows:
value_id | entity_type_id | attribute_id | store_id | entity_id | value
PK INT INT INT INT VARCHAR
And the 'catalog_product_entity_media_gallery' table looks as follows:
value_id | attribute_id | entity_id | value
PK INT INT VARCHAR
I need to insert the entity, and value columns from catalog_product_entity_media_gallery into catalog_product_entity_varchar. However as you can see the structure is quite different.
The query I'm trying to use is as follows
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4,
74,
0,
catalog_product_entity_media_gallery.entity_id,
catalog_product_entity_media_gallery.value
FROM catalog_product_entity_media_gallery;
I only need the entity_id and value from media_gallery and the other values are always the same, I have tried to do this using the above but this is just hanging in MySQL (no errors)
I think it's due to the fact that I'm trying to select 4, 74 and 0 from catalog_product_entity_media_gallery but I'm not 100% sure (apologies, I'm a bit of a novice with MySQL)
Can anybody point me in the right direction? Is there any way way I can insert some data from the media table whilst inserting static values for some columns? (I hope this all makes sense)
The query syntax is ok.
However, there may be issues with the unique and foreign keys in catalog_product_entity_varchar table, which doesn't allow you to insert data. Also the query may be waiting for some other query to complete (if your query is just a part of bigger scenario), so it is an issue with locking. Most probable is the first case.
Currently, the question lacks important details:
The MySQL client / programming code you use to perform query. So we
are not able to see the case in full and to reproduce it correctly
The scenario you perform. I.e. whether you do it inside the Magento application in some
module during a web-request. Or whether there are other queries in your script,
some opened transactions, other people accessing the DB server, etc.
Based on most probable assumption that you just don't see the actual error with unique/foreign keys, you may try the following queries.
1) Unique index failure.
Try this:
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4 as etid,
74 as aid,
0 as sid,
catalog_product_entity_media_gallery.entity_id as eid,
catalog_product_entity_media_gallery.value as val
FROM
catalog_product_entity_media_gallery
GROUP BY
eid, aid, sid;
There is a huge possibility, that you insert non-unique entries, because catalog_product_entity_media_gallery can hold multiple entries for the same product, while catalog_product_entity_varchar can not. If the query above successfully completes, then the issue is really with unique key. In such a case you must re-verify what you want to achieve, because the initial aim (not the query itself) is wrong.
2) Wrong foreign key (non-existing attribute 74)
Try this (replacing ATTRIBUTE_CODE and ATTRIBUTE_ENTITY_TYPE_ID with the values you need, e.g. 'firstname' and 6):
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4 as etid,
eav_attribute.attribute_id as aid,
0 as sid,
gallery.entity_id as eid,
gallery.value as val
FROM
catalog_product_entity_media_gallery AS gallery
INNER JOIN
eav_attribute
ON
eav_attribute.attribute_code = '<ATTRIBUTE_CODE>'
AND eav_attribute.entity_type_id = <ATTRIBUTE_ENTITY_TYPE_ID>
GROUP BY
eid, aid, sid;
If it executes successfully AND
Some rows are added to the catalog_product_entity_varchar - then it seems, that 74 was chosen as a wrong id of the attribute you needed, thus foreign key in catalog_product_entity_varchar didn't allow you to insert the records.
No rows are added to the catalog_product_entity_varchar - then it seems, that you mistake in attribute id, attribute code and entity type. Recheck, what you put as ATTRIBUTE_CODE and ATTRIBUTE_ENTITY_TYPE_ID.
If both queries still hang - then you have issues with your MySQL client or server or execution scenario.
Note: your initial query may make sense in your specific case, but some issues are signalling that something may be wrong with your approach, because:
You're using direct numbers for ids. But ids are different for different installations and Magento versions. It is expected to use more stable values, like attribute code in second query, by which you should extract the actual attribute id.
You copy data from the storage catalog_product_entity_media_gallery, which can store multiple entries for the same product, to the storage catalog_product_entity_varchar, which is able to store only one entry for the product. It means, that you cannot copy all the data in such a way. Probably, your query doesn't reflect the goal you want to achieve.
The entity type id, inserted to the catalog_product_entity_varchar is not related to attribute id. While in Magento these are deeply connected things. Putting the wrong entity type id in a table will either make Magento behave incorrectly, or it won't notice your changes at all.
try this
INSERT INTO catalog_product_entity_varchar( entity_id, value)
VALUES (
SELECT entity_id, value
FROM catalog_product_entity_media_gallery
WHERE value_id = here the row id of value_id which have those values 4,74,0 )
Assuming the valued_id in the catalog_product_entity_varchar table is an autoincrement, could you not do the following?
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, store_id, entity_id, value)
SELECT
4,
74,
catalog_product_entity_media_gallery.entity_id,
catalog_product_entity_media_gallery.value
FROM catalog_product_entity_media_gallery;
Note that there is no attribute_id column in your catalog_product_entity_varchar table.