I am about to design a database for a logging system.
Many of the String columns will have a limited amount of values, but not known in advance:
Like the name of the modules sending the messages, or the source hostnames.
I would like to store them as MySQL Enums to save space.
So the idea would be that the Enums grow as they encounter new values.
I would start with a column like :
host ENUM('localhost')
Then, in Java, I would load on startup the enum values defined for the hostnames at a time (how do I do that with MySQL/JDBC ??), and I would alter the Enum whenever I encounter a new host.
Do you think it is feasible / a good idea ?
Have you ever done something like that ?
Thanks in advance for your advice.
Raphael
This is a not good idea. ENUM designed not for that.
You can just create separate table (host_id, host_name) and use refference in main table. Example:
CREATE TABLE `host` (
`host_id` INT(10) NOT NULL AUTO_INCREMENT,
`host_name` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`host_id`)
)
CREATE TABLE `log` (
`log_id` INT(10) NOT NULL AUTO_INCREMENT,
`host_id` INT(10) NULL DEFAULT NULL,
...
PRIMARY KEY (`log_id`),
INDEX `FK__host` (`host_id`),
CONSTRAINT `FK__host` FOREIGN KEY (`host_id`) REFERENCES `host` (`host_id`) ON UPDATE CASCADE ON DELETE CASCADE
)
UPD:
I think the best way to storing host is varchar/text field. It is easiest and fastest way. I think you need not worry about the space.
Nonetheless.
Using the second table for hosts will reduce the size, but will complicate writing logs. Using ENUM complicate writing and significantly reduce the performance.
Related
Im revisiting my database and noticed I had some primary keys that were of type INT.
This wasn't unique enough so I thought I would have a guid.
I come from a microsoft sql background and in the ssms you can
choose type to "uniqeidentifier" and auto increment it.
In mysql however Ive found that you have to make triggers that execute on insert for the tables you want
to generate a guide id for. Example:
Table:
CREATE TABLE `tbl_test` (
`GUID` char(40) NOT NULL,
`Name` varchar(50) NOT NULL,
PRIMARY KEY (`GUID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Trigger:
CREATE TRIGGER `t_GUID` BEFORE INSERT ON `tbl_test`
FOR EACH ROW begin
SET new.GUID = uuid();
Alternatively you have to insert the guid yourself in the backend.
Im no DB expert but still remember that triggers cause performance problems.
The above is something I found here and is 9 years old so I was hoping something has changed?
As far as stated in the documentation, you can use uid() as a column default starting version 8.0.13, so something like this should work:
create table tbl_test (
guid binary(16) default (uuid_to_bin(uuid())) not null primary key,
name varchar(50) not null
);
This is pretty much copied from the documentation. I don't have a recent enough version of MySQL at hand to test this.
You can make a
INSERT INTO `tbl_test` VALUES (uuid(),'testname');
This would generate a new uuid, when you call it.
Or you can also use the modern uuid v4 by using one of these functions instead of the standard uuid(), which is more random than the uuid in mysql
How to generate a UUIDv4 in MySQL?
You can use since 8.0.13
CREATE TABLE t1 (
uuid_field VARCHAR(40) DEFAULT (uuid())
);
But you wanted more than unique, but here are only allowed internal functions and not user defined as for uuid v4, for that uyou need the trogger
As per the documentation, BINARY(x) adds some hidden padding bytes to the end of each entry, & VARCHAR(40) also wastes space by not being encoded directly in binary. Using VARBINARY(16) would be more efficient.
Also, more entropy (unguessability / security) per byte is available from RANDOM_BYTES(16) than standardized UUIDs, because they use some sections to encode constant metadata.
Perhaps the below will work for your needs.
-- example
CREATE TABLE `tbl_test` (
`GUID` VARBINARY(16) DEFAULT (RANDOM_BYTES(16)) NOT NULL PRIMARY KEY,
`Name` VARCHAR(50) NOT NULL
);
Im revisiting my database and noticed I had some primary keys that were of type INT.
This wasn't unique enough so I thought I would have a guid.
I come from a microsoft sql background and in the ssms you can
choose type to "uniqeidentifier" and auto increment it.
In mysql however Ive found that you have to make triggers that execute on insert for the tables you want
to generate a guide id for. Example:
Table:
CREATE TABLE `tbl_test` (
`GUID` char(40) NOT NULL,
`Name` varchar(50) NOT NULL,
PRIMARY KEY (`GUID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Trigger:
CREATE TRIGGER `t_GUID` BEFORE INSERT ON `tbl_test`
FOR EACH ROW begin
SET new.GUID = uuid();
Alternatively you have to insert the guid yourself in the backend.
Im no DB expert but still remember that triggers cause performance problems.
The above is something I found here and is 9 years old so I was hoping something has changed?
As far as stated in the documentation, you can use uid() as a column default starting version 8.0.13, so something like this should work:
create table tbl_test (
guid binary(16) default (uuid_to_bin(uuid())) not null primary key,
name varchar(50) not null
);
This is pretty much copied from the documentation. I don't have a recent enough version of MySQL at hand to test this.
You can make a
INSERT INTO `tbl_test` VALUES (uuid(),'testname');
This would generate a new uuid, when you call it.
Or you can also use the modern uuid v4 by using one of these functions instead of the standard uuid(), which is more random than the uuid in mysql
How to generate a UUIDv4 in MySQL?
You can use since 8.0.13
CREATE TABLE t1 (
uuid_field VARCHAR(40) DEFAULT (uuid())
);
But you wanted more than unique, but here are only allowed internal functions and not user defined as for uuid v4, for that uyou need the trogger
As per the documentation, BINARY(x) adds some hidden padding bytes to the end of each entry, & VARCHAR(40) also wastes space by not being encoded directly in binary. Using VARBINARY(16) would be more efficient.
Also, more entropy (unguessability / security) per byte is available from RANDOM_BYTES(16) than standardized UUIDs, because they use some sections to encode constant metadata.
Perhaps the below will work for your needs.
-- example
CREATE TABLE `tbl_test` (
`GUID` VARBINARY(16) DEFAULT (RANDOM_BYTES(16)) NOT NULL PRIMARY KEY,
`Name` VARCHAR(50) NOT NULL
);
I am currently facing an issue with designing a database table and updating/inserting values into it.
The table is used to collect and aggregate statistics that are identified by:
the source
the user
the statistic
an optional material (e.g. item type)
an optional entity (e.g. animal)
My main issue is, that my proposed primary key is too large because of VARCHARs that are used to identify a statistic.
My current table is created like this:
CREATE TABLE `Statistics` (
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL)
In particular, the server_id is configurable, the player_id is a UUID, statistic is the representation of an enumeration that may change, material and entity likewise. The value is then aggregated using SUM() to calculate the overall statistic.
So far it works but I have to use DELETE AND INSERT statements whenever I want to update a value, because I have no primary key and I can't figure out how to create such a primary key in the constraints of MySQL.
My main question is: How can I efficiently update values in this table and insert them when they are not currently present without resorting to deleting all the rows and inserting new ones?
The main issue seems to be the restriction MySQL puts on the primary key. I don't think adding an id column would solve this.
Simply add an auto-incremented id:
CREATE TABLE `Statistics` (
statistis_id int auto_increment primary key,
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL
);
Voila! A primary key. But you probably want an index. One that comes to mind:
create index idx_statistics_server_player_statistic on statistics(server_id, player_id, statistic)`
Depending on what your code looks like, you might want additional or different keys in the index, or more than one index.
Follow the below hope it will solve your problem :-
- First use a variable let suppose "detailed" as money with your table.
- in your project when you use insert statement then before using statement get the maximum of detailed (SELECT MAX(detailed)+1 as maxid FROM TABLE_NAME( and use this as use number which will help you to FETCH,DELETE the record.
-you can also update with this also BUT during update MAXIMUM of detailed is not required.
Hope you understand this and it will help you .
I have dug a bit more through the internet and optimized my code a lot.
I asked this question because of bad performance, which I assumed was because of the DELETE and INSERT statements following each other.
I was thinking that I could try to reduce the load by doing INSERT IGNORE statements followed by UPDATE statements or INSERT .. ON DUPLICATE KEY UPDATE statements. But they require keys to be useful which I haven't had access to, because of constraints in MySQL.
I have fixed the performance issues though:
By reducing the amount of statements generated asynchronously (I know JDBC is blocking but it worked, it just blocked thousand of threads) and disabling auto-commit, I was able to improve the performance by 600 times (from 60 seconds down to 0.1 seconds).
Next steps are to improve the connection string and gaining even more performance.
I have a database design where i store image filenames in a table called resource_file.
CREATE TABLE `resource_file` (
`resource_file_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`resource_id` int(11) NOT NULL,
`filename` varchar(200) NOT NULL,
`extension` varchar(5) NOT NULL DEFAULT '',
`display_order` tinyint(4) NOT NULL,
`title` varchar(255) NOT NULL,
`description` text NOT NULL,
`canonical_name` varchar(200) NOT NULL,
PRIMARY KEY (`resource_file_id`)
) ENGINE=InnoDB AUTO_INCREMENT=592 DEFAULT CHARSET=utf8;
These "files" are gathered under another table called resource (which is something like an album):
CREATE TABLE `resource` (
`resource_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`resource_id`)
) ENGINE=InnoDB AUTO_INCREMENT=285 DEFAULT CHARSET=utf8;
The logic behind this design comes handy if i want to assign a certain type of "resource" (album) to a certain type of "item" (product, user, project & etc) for example:
CREATE TABLE `resource_relation` (
`resource_relation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`module_code` varchar(32) NOT NULL DEFAULT '',
`resource_id` int(11) NOT NULL,
`data_id` int(11) NOT NULL,
PRIMARY KEY (`resource_relation_id`)
) ENGINE=InnoDB AUTO_INCREMENT=328 DEFAULT CHARSET=utf8;
This table holds the relationship of a resource to a certain type of item like:
Product
User
Gallery
& etc.
I do exactly this by giving the "module_code" a value like, "product" or "user" and assigning the data_id to the corresponding unique_id, in this case, product_id or user_id.
So at the end of the day, if i want to query the resources assigned to a product with the id of 123 i query the resource_relation table: (very simplified pseudo query)
SELECT * FROM resource_relation WHERE data_id = 123 AND module_code = 'product'
And this gives me the resource's for which i can find the corresponding images.
I find this approach very practical but i don't know if it is a correct approach to this particular problem.
What is the name of this approach?
Is it a valid design?
Thank you
This one uses super-type/sub-type. Note how primary key propagates from a supert-type table into sub-type tables.
To answer your second question first: the table resource_relation is an implementation of an Entity-attribute-value model.
So the answer to the next question is, it depends. According to relational database theory it is bad design, because we cannot enforce a foreign key relationship between data_id and say product_id, user_id, etc. It also obfuscates the data model, and it can be harder to undertake impact analysis.
On the other hand, lots of people find, as you do, that EAV is a practical solution to a particular problem, with one table instead of several. Although, if we're talking practicality, EAV doesn't scale well (at least in relational products, there are NoSQL products which do things differently).
From which it follows, the answer to your first question, is it the correct approach?, is "Strictly, no". But does it matter? Perhaps not.
" I can't see a problem why this would "not" scale. Would you mind
explaining it a little bit further? "
There are two general problems with EAV.
The first is that small result sets (say DATE_ID=USER_ID) and big result sets (say DATE_ID=PRODUCT_ID) use the same query, which can lead to sub-optimal execution plans.
The second is that adding more attributes to the entity means the query needs to return more rows, whereas a relational solution would return the same number of rows, with more columns. This is the major scaling cost. It also means we end up writing horrible queries like this one.
Now, in your specific case perhaps neither of these concerns are relevant. I'm just explaining the reasons why EAV can cause problems.
"How would i be supposed to assign "resources" to for example, my
product table, "the normal way"?"
The more common approach is to have a different intersection table (AKA junction table) for each relationship e.g.USER_RESOURCES, PRODUCT_RESOURCES, etc. Each table would consist of a composite primary key, e.g. (USER_ID, RESOURCE_ID), and probably not much else.
The other approach is to use a generic super-type table with specific sub-type tables. This is the implementation which Damir has modelled. The normal use caee for super-types is when we have a bunch of related entities which have some attributes, behaviours and usages in common plus seom distinct features of their own. For instance, PERSON and USER, CUSTOMER, SUPPLIER.
Regarding your scenario I don't think USER, PRODUCT and GALLERY fit this approach. Sure they are all consumers of RESOURCE, but that is pretty much all they have in common. So trying to map them to an ITEM super-type is a procrustean solution; gaining a generic ITEM_RESOURCE table is likely to be a small reward for the additiona hoops you're going to have to jump through elsewhere.
I have a database design where i store images in a table called
resource_file.
You're not storing images; you're storing filenames. The filename may or may not identify an image. You'll need to keep database and filesystem permissions in sync.
Your resource_file table structure says, "Image filenames are identifiable in the database, but are unidentifiable in the filesystem." It says that because resource_file_id is the primary key, but there are no unique constraints besides that id. I suspect your image files actually are identifiable in the filesystem, and you'd be better off with database constraints that match that reality. Maybe a unique constraint on (filename, extension).
Same idea for the resource table.
For resource_relation, you probably need a unique constraint on either (resource_id, data_id) or (resource_id, data_id, module_code). But . . .
I'll try to give this some more thought later. It's kind of hard to figure out what you're trying to do resource_relation, which is usually a red flag.
TARGET_RDBMS: MySQL-5.X-InnoDB ("X" equals current stable release)
BACKGROUND: Building my first database with true referential integrity constraints, in an effort to get feedback, after creating the "real" DDL, I've made an abstraction that I believe covers the "feel" of the database; this is only 3 tables of about 20, all with referential integrity constraints; only pattern I see that is missing is a composite key table, which does not have data to be dumped in right now anyway, so I'm just focus on the first iteration.
Sample Data / Unit Test: One thing I do not know is how to build out a sample data set that will offer 100% coverage of the referential integrity modeled -- AND build "Unit Test" around that sample data and this DDL:
Sample DLL:
(Note: Just to be clear, the LEGEND and naming standards are JUST for this example, which I've abstracted from the "real" database. The column names are robotic in nature, and meant to make the meaning and relationship of a given instance as clear as possible. If you have suggestions on the notation system used, please feel free to comment. I'm open to any suggestions. Thanks!)
CREATE DATABASE sampleDB;
use sampleDB;
# ###############
# LEGEND
# - sID = surrogate key
# - nID = natural key
# - cID = common/shared across tables, but NOT unique/natural-key
# - PK = Primary Key
# - FK = Foreign Key
# - data01 = Sample data (non-key,not-shared-across-tables)
# - data02 = Sample data NOT NULL (non-key,not-shared-across-tables)
#
# - uID = user defined unique/natural key (NOTE: not used)
# ###############
# Behavior
# - create_timestamp (NOT NULL, updated on record creation, NOT update)
# - update_timestamp (NOT NULL, updated on record creation AND updates)
CREATE TABLE `TABLE_01` (
`TABLE_01_sID_PK` MEDIUMINT NOT NULL AUTO_INCREMENT,
`TABLE_01_cID` int(8) NOT NULL,
`TABLE_01_data01` varchar(128) default NULL,
`TABLE_01_data02` varchar(128) default NULL,
`create_timestamp` DATETIME DEFAULT NULL,
`update_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`TABLE_01_sID_PK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `TABLE_02` (
`TABLE_02_sID_PK` MEDIUMINT NOT NULL AUTO_INCREMENT,
`TABLE_02_nID_FK__TABLE_01_sID_PK` int(8) NOT NULL,
`TABLE_02_cID` int(8) NOT NULL,
`TABLE_02_data01` varchar(128) default NULL,
`TABLE_02_data02` varchar(128) NOT NULL,
`create_timestamp` DATETIME DEFAULT NULL,
`update_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`TABLE_02_sID_PK`),
FOREIGN KEY (TABLE_02_nID_FK__TABLE_01_sID_PK) REFERENCES TABLE_01(TABLE_01_sID_PK),
INDEX `TABLE_02_nID_FK__TABLE_01_sID_PK` (`TABLE_02_nID_FK__TABLE_01_sID_PK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `TABLE_03` (
`TABLE_03_sID_PK` MEDIUMINT NOT NULL AUTO_INCREMENT,
`TABLE_03_nID_FK__TABLE_01_sID_PK` int(8) NOT NULL,
`TABLE_03_nID_FK__TABLE_02_sID_PK` int(8) NOT NULL,
`TABLE_03_cID` int(8) NOT NULL,
`TABLE_03_data01` varchar(128) default NULL,
`TABLE_03_data02` varchar(128) NOT NULL,
`create_timestamp` DATETIME DEFAULT NULL,
`update_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`TABLE_03_sID_PK`),
FOREIGN KEY (TABLE_03_nID_FK__TABLE_01_sID_PK) REFERENCES TABLE_01(TABLE_01_sID_PK),
FOREIGN KEY (TABLE_03_nID_FK__TABLE_02_sID_PK) REFERENCES TABLE_02(TABLE_02_sID_PK),
INDEX `TABLE_03_nID_FK__TABLE_01_sID_PK` (`TABLE_03_nID_FK__TABLE_01_sID_PK`),
INDEX `TABLE_03_nID_FK__TABLE_02_sID_PK` (`TABLE_03_nID_FK__TABLE_02_sID_PK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SHOW TABLES;
# DROP DATABASE `sampleDB`;
# #######################
# View table definition
# DESC inserttablename;
# #######################
# View table create statement
# SHOW CREATE TABLE example;
Questions:
Any and all feedback on missing, wrong, or "better" ways to do this database build are welcome. If you have questions, just comment -- and I'll respond ASAP. Again, thanks~!
UPDATE (1):
Just added "MEDIUMINT NOT NULL AUTO_INCREMENT" to the PKs -- not sure how I left that off.
First of all, I want to applaud you for defining a standard. There is no end to how much it will come to help you in the future.
Having said that, a couple of very subjective opinions from my part:
I don't like to embed type information in names, such as "TABLE_PERSON" or "PERSON_T" because it becomes confusing the second you replace a table with a view instead. At this point you could of course search and replace "PERSON_T" with "PERSON_VW" instead, but it kind of misses the point :)
The same goes for columns (although i can't see this in your example). Think of the "n_is_dead" column that gets changed from numeric to varchar.
Can a row exist in a table without being created (create_timestamp)? Declare columns as NOT NULL if they really can't be null. In fact, I start of having NOT NULL on most of my columns because it makes me think harder about the nature of the data.
I'm a fan of naming the primary key column something other than ID. For example
company(company_id, etc)
person(person_id, company_id, firstname etc)
I've heard some people have problems with O/R mappers that want you to have the primary key named "ID" at all times, but I don't know if this is still true of if this has changed recently.
It's not clear to me if you intented to embed (s,n,c) in the column names to indicate whether they are surrogate, natural or common key. But I also don't think this is a good idea. I feel that would "reveal" some implementation detail that doesn't fit naturally in the logical model.
It looks like you are exposing/embedding the foreign key relationship in the column names. I have never thought of this, but I think you will deeply regret this one. If not only because it makes the column names unbearably uggly :)
When choosing a name for an index. The only time I regret naming an index something is when I look at an execution plan and see "index_01" being used. I always wish I had put the column name in the index to make it visible in the xplan. I don't know the limit for an index name, but I always run into the limit on Oracle. So, try to come up with some rule for how to abbreviate the table name. The column name is the important thing here.
Regarding mixed case. I always (no exceptions) go with either ALL_UPPER_CASE or all_lower_case. The reason is that in the past I've been burned when migrating queries between databases when they treat case differently. Lately, I use all_lower_case because the typical font of our editors makes it easier to spot spelling errors in lower case than in upper case. And when I fail at things, it doesn't seem like the editor is SHOUTING AT ME ;)