MySQL autogenerated id vs custom autogenerated id - mysql

I have to say that this is my first question here. Currently I have a table ´journals´ in a MySQL server with an ID autogenerate with autoincrement:
´id´ int(10) NOT NULL AUTO_INCREMENT,
-- more columns
PRIMARY KEY (´id´)
I'm thinking about modify this ID for a custom one because the client would like to generate the id in the following way:
ABC0000001, where ´ABC´ is the acronym of a department and ´0000001´ would be autoincrement. We already have different acronyms for different deparments.
I can get this ID with a trigger (a similar example: How to make MySQL table primary key auto increment with some prefix) but I have doubts about perfomance and efficiency because I could get the same "result" with the initial solution and another column to store the department´s acronym. In the backend I would create a method to unified both columns before returning the result to the frontend and another method to split the search into the table.
Does anybody have faced a similar situation that could guide me on this issue in terms of perfomance or efficiency? Thanks in advance.

Creating custom-generated ids is fun, but masochistic. What is your subconscious goal?
AUTO_INCREMENT is well optimized; anything else is somewhere between slightly and seriously less efficient. For example, a Trigger, might double the cost of an INSERT, but if INSERTs are not a performance problem, then "so what".
Semi-related: UUIDs are terribly inefficient for huge tables -- due to their randomness.

Related

MySQL query and insertion optimisation with varchar(255) UUIDs

I think this question has been asked in some way shape or form but I couldn't find a question that had asked exactly what I wish to understand so I thought I'd put the question here
Problem statement
I have built a web application with a MySQL database of say customer records with an INT(11) id PK AI field and a VARCHAR(255) uuid field. The uuid field is not indexed nor set as unique. The uuid field is used as a public identifier so its part of URLs etc. - e.g. https://web.com/get_customer/[uuid]. This was done because the UUID is 'harder' to guess for a regular John Doe - but understand that it is certainly not 'unguessable' in theory. But the issue now is that as the database is growing larger I have observed that the query to retrieve a particular customer record is taking longer to complete.
My thoughts on how to solve the issue
The solution that is coming to mind is to make the uuid field unique and also index the same. But I've been doing some reading in relation to this and various blog posts, StackOverflow answers on this have described putting indices on UUIDs as being really bad for performance. I also read that it will also increase the time it takes to insert a new customer record into the database as the MySQL database will take time to find the correct location in which to place the record as a part of the index.
The above mentioned https://web.com/get_customer/[uuid] can be accessed without having to authenticate which is why I'm not using the id field for the same. It is possible for me to consider moving to integer based UUIDs (I don't need the UUIDs to be universally unique - they just need to be unique for that particular table) - will that improve the the indicing performance and in turn the insertion and querying performance?
Is there a good blog post or information page on how to best set up a database for such a requirement - Need the ability to store a customer record which is 'hard' to guess, easy to insert and easy to query in a large data set.
Any assistance is most appreciated. Thank you!
The received wisdom you mention about putting indexes on UUIDs only comes up when you use them in place of autoincrementing primary keys. Why? The entire table (InnoDB) is built behind the primary key as a clustered index, and bulk loading works best when the index values are sequential.
You certainly can put an ordinary index on your UUID column. If you want your INSERT operations to fail in the astronomically unlikely event you get a random duplicate UUID value you can use an index like this.
ALTER TABLE customer ADD UNIQUE INDEX uuid_constraint (uuid);
But duplicate UUIDv4s are very rare indeed. They have 122 random bits, and most software generating them these days uses cryptographic-quality random number generators. Omitting the UNIQUE index is, I believe, an acceptable risk. (Don't use UUIDv1, 2, 3, or 5: they're not hard enough to guess to keep your data secure.)
If your UUID index isn't unique, you save time on INSERTs and UPDATEs: they don't need to look at the index to detect uniqueness constraint violations.
Edit. When UUID data is in a UNIQUE index, INSERTs are more costly than they are in a similar non-unique index. Should you use a UNIQUE index? Not if you have a high volume of INSERTs. If you have a low volume of INSERTs it's fine to use UNIQUE.
This is the index to use if you omit UNIQUE:
ALTER TABLE customer ADD UNIQUE INDEX uuid (uuid);
To make lookups very fast you can use covering indexes. If your most common lookup query is, for example,
SELECT uuid, givenname, surname, email
FROM customer
WHERE uuid = :uuid
you can create this so-called covering index.
ALTER TABLE customer
ADD INDEX uuid_covering (uuid, givenname, surname, email);
Then your query will be satisfied directly from the index and therefore be faster.
There's always an extra cost to INSERT and UPDATE operations when you have more indexes. But the cost of a full table scan for a query is, in a large table, far far greater than the extra INSERT or UPDATE cost. That's doubly true if you do a lot of queries.
In computer science there's often a space / time tradeoff. SQL indexes use space to save time. It's generally considered a good tradeoff.
(There's all sorts of trickery available to you by using composite primary keys to speed things up. But that's a topic for when you have gigarows.)
(You can also save index and table space by storing UUIDs in BINARY(16) columns and use UUID_TO_BIN() and BIN_TO_UUID() functions to convert them. )

create index on creation of table in Oracle

In MySql, one can create an index on a (non-unique) column along with a table, e.g.
create table orders(
orderid varchar(20) not null unique,
customerid varchar(20),
index(customerid)
);
Having not found a corresponding option in Oracle, i.e. creating the index on table creation rather than as a separate command afterwards, I suspect it is not possible. Is this correct? If so, what is the reason behind this - efficiency, as for example discussed here
Insertion of data after creating index on empty table or creating unique index after inserting data on oracle? ?
Thanks in advance!
Other than indexes defined as part of a primary or unique constraint there does not appear to be a way to define an index as part of a CREATE TABLE statement in Oracle. Although the USING INDEX clause is part of the constraint-state element of the CREATE TABLE statement, a missing right parenthesis error is issued if you try to include a USING INDEX clause in any constraint definition except a PRIMARY or UNIQUE constraint - see this db<>fiddle for examples.
As to "why" - that's a question only someone on the architecture team at Oracle could answer. From my personal user-oriented point of view, I see no particular value to being able to create an index as part of the CREATE TABLE statement, but then I'm accustomed to how Oracle works and have my thought patterns oriented in that particular direction. YMMV.
The root reason for the difference is that MySQL and Oracle are two distinctly different products, developed at different times by different software engineering teams. The fact thay MySQL is owned by Oracle means nothing in this case. MySQL was a separate and separetly developed product which was subsequently purchased by Oracle. As for why the two separate and distinct design teams made the decisions they did ... you'd have to ask them. But I'm pretty certain it has nothing to do with operational efficiency as you suggest. Once a table and index are created, there is no difference between having created an index as part of the CREATE TABLE vs. creating the index separately. And so there would be no difference in efficiency of any DML on said table.

Performance suggestions for a MySQL table definition

I am concerned with the performance of a database table i have to store data related
with a customer survey application.
I have a database table storing customer responses from a survey. Since the survey questions change according to customer i though instead of defining
the table schema using each questionid as column to define it as as follows
customerdata(customerid varchar,
partkey varchar,
questionkey varchar,
value, varchar,
version, int,
lastupdate, timestamp)
Where:
partkey: is the shortcode of the part (part1,part2...)
questionkey: is the shortcode of the question
e.g age, gender etc
since some customers fill the survey twice, thrice etc i have added the version column.
With this design customerid,partkey,questionkey and version are primary keys.
i am concerned about the performance with such design. Should i define the other primary keys as indexes ? Would that help ? So far for 30 customers i have 7000 records. I expect to have maximum 300-500. What do you think ?
Sounds like a pretty small database. I doubt you'll have performance issues but if you detect any when querying on partkey, questionkey, or version later on you can always add one or more indexes to solve the problem at that time. There's no need to solve a performance problem you don't have and probably never will have.
Performance issues will arise only if you have to perform time-sensitive queries that don't use the customerid field as the primary filter. I suspect you'll have some queries like that (when you want to aggregate data across customers) but I doubt they'll be time-sensitive enough to be impacted by the one second or less response time I would expect to see from such a small collection of data. If they are, add the index(es) then.
Also, note that a table only has a single PRIMARY KEY. That key can use more than one column, so you can say that columns customerid, partkey, questionkey, and version are part of the PRIMARY KEY, but you can't say their all "primary keys".
rownumber-wise, i have experienced mysql database with over 100.000 rows and it runs just fine so you should be okay.
although it's a different case if you run complicated queries, which depends more on database design rather than row numbers.

mysql table structure proposal?

is this table any good for mysql? I wanted to make it flexible in the future for this type of data storage. With this table structure, you can't use a PRIMARY KEY but an index ...
Should I change the format of the table to have headers - Primary Key, Width, Length, Space, Coupling ...
ID_NUM Param Value
1 Width 5e-081
1 Length 12
1 Space 5e-084
1 Coupling 1.511
1 Metal Layer M3-0
2 Width 5e-082
2 Length 1.38e-061
2 Space 5e-081
2 Coupling 1.5
2 Metal Layer M310
No, this is a bad design for a relational database. This is an example of the Entity-Attribute-Value design. It's flexible, but it breaks most rules of what it means to be a relational database.
Before you descend into the EAV design as a solution for a flexible database, read this story: Bad CaRMa.
More specifically, some of the problems with EAV include:
You don't know what attributes exist for any given ID_NUM without querying for them.
You can't make any attribute mandatory, the equivalent of NOT NULL.
You can't use database constraints.
You can't use SQL data types; the value column must be a long VARCHAR.
Particularly in MySQL, each VARCHAR is stored on its own data page, so this is very wasteful.
Queries are also incredibly complex when you use the EAV design. Magento, an open-source ecommerce platform, uses EAV extensively, and many users say it's very slow and hard to query if you need custom reports.
To be relational, you should store each different attribute in its own column, with its own name and an appropriate datatype.
I have written more about EAV in my presentation Practical Object-Oriented Models in SQL and in my blog post EAV FAIL, and in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
What you suggest is called EAV model (Entity–Attribute–Value)
It has several drawbacks like severe difficulties in enforcing referential integrity constraints. In addition, the queries you'll have to come up with, will be a bit more complicated than with a normalized table as your second suggestion (table with columns: Primary Key, Width, Length, Space, Coupling, etc).
So, for a simple project, do not use EAV model.
If your plans are for a more complex project and you want maximum flexibility, do not use EAV either. You should look into 6NF (6th Normal Form) which is even harder to implement and certainly not an easy task in MySQL. But if you succeed, you'll have both goods: flexibility and normalization to the highest level (some people call "EAV" as "6NF done wrongly").
In my experience this idea of storing fields row-wise needs to be considered extremely carefully - although it seems give many advantages it makes many common tasks much more difficult.
On the positive side: It is easily extensible without changes to the structure of the database and in some ways abstracts the details of the data storage.
On the negative side: You need to look at all the everyday things storing fields column-wise gives you automatically in the DBMS: Simple inner/outer joins, one statement inserts/updates, uniqueness, foreign keys and other db level constraint checking, simple filtering ad ordering of search results.
Consider in your architecture a query to return all items with MetalLayer=X and Width between y and z - results sorted by Coupling by length. This query is much harder for you to construct and much, much harder for the DBMS to execute than it would be using columns to store specific fields.
In the balance the only time I have used a structure like the one you suggest was in a context where random unstructured additional data needed to be added on an ad-hoc basis. In my opinion this would be a last resort strategy if there was no way I could make a more traditional table structure work.
A few things to consider here:
There is no single primary key. This can be overcome by making the primary key consist of two columns (like in the second example of Carl T)
The Param column is repeated and to normalize this you should look at the example of MGA.
Thirdly the "Metal layer" column is a string and not a float value like the others.
So best to go for a table def like this:
create table yourTable(
ID int primary key,
ParamId int not null,
Width float,
Length float,
Space float,
Coupling float,
Metal_layer varchar(20),
Foreign key(ParamID) references Param(ID),
Value varchar(20)
)
create table Param(
ID int primary key,
Name varchar(20)
)
The main question you have to ask when creating a table specially for future use is how will this data be retrieved and what purpose it is having. Personally I always have a unique identifier usually an ID to the table.
Looking at you list do not seem to have anything that uniquely defines the entries so you will not be able to track duplicate entries nor uniquely retrieve a record.
If you want to keep this design you could create a composite primary key composed of the name and the param-value.
CREATE TABLE testtest (
ID INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100) NOT NULL,
value number NOT NULL
/*add other fields here*/
);
CREATE TABLE testtest (
name VARCHAR(100) NOT NULL,
value int NOT NULL,
/*add other fields here*/
primary key(name,value)
);
Those create table example express the 2 above mentioned options.

Scalable one to many table (MySQL)

I have a MySQL database, and a particular table in that database will need to be self-referencing, in a one-to-many fashion. For scalability, I need to find the most efficient solution possible. The two ways most evident to me are:
1) Add a text field to the table, and store a serialized list of primary keys there
2) Keep a linker table, with each row being a one-to-one.
In case #1, I see the table growing very very wide (using a spatial analogy), but in case #2, I see the linker table growing to a very large number of rows, which would slow down lookups (by far the most common operation).
What's the most efficient manner in which to implement such a one-to-many relationship in MySQL? Or, perhaps, there is a much saner solution keeping the data all directly on the filesystem somehow, or else some other storage engine?
Just keep a table for the "many", with a key column for the primary table.
I quarantee you'll have lots of other more important problems to solve before you run into efficiency or capacity constraints in a standard industrial-strength relational dbms.
IMHO the most likely second option (with numerous alternative products) is to use an isam.
If you need to do deep/recursive traversals into the data, a graph database like Neo4j (where I'm on the team) is a good choice. You'll find some information in the article Should you go Beyond Relational Databases? and in this post at High Scalability. For a use case that may be similar to yours, read this thread on MetaFilter. For information on language bindings and other things you may also find the Neo4j wiki and mailing list useful.
Not so much an answer but a few questions and a possible approach....
If you want to make the table self referencing and only use one field ... there are some options. A calculated maskable 'join' field describes a way to associate many rows with each other.
The best solution will probably consider the nature of the data and relationships?
What is the nature of the data and lookups? What sort of relationship are you trying to contain? Association? Related? Parent/Children?
My first comment would be that you'll get better responses if you can describe how the data will be used (frequency of adds/updates vs lookups, adds vs updates, etc) in addition to what you've already described. That being said, my first thought would be to just go with a generic representation of
CREATE TABLE IF NOT EXISTS one_table (
`one_id` INT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT 'The The ID of the items in the one table' ,
... other data
)
CREATE TABLE IF NOT EXISTS many_table (
`many_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT 'the id of the items in the many table',
`one_id` INT UNSIGNED NOT NULL
COMMENT 'The ID of the item in the one table that this many item belongs to' ,
... other data
)
Making sure, of course, to create an index on the one_id in both tables.