Understanding MySQLs behaviour when adding a autoincremented primary key afterwards - mysql

Let's say we have a (InnoDB) table associations in a MySQL-Database which has the following structure:
CREATE TABLE `associations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fk_id_1` int(11) NOT NULL,
`fk_id_2` int(11) NOT NULL,
`fk_id_3` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `some_unique_constraint` (`fk_id_1`,`fk_id_2`),
KEY `fk_id_2_INDEX` (`fk_id_2`),
KEY `fk_id_3_INDEX` (`fk_id_3`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin$$
There are jumps in the column id (I know this is an issue of how the autoincremented value is generated while multiple threads try to get one). Since no other table is using the column id as a reference I plan to drop the column id and to create it again, hopefully the counting holes will be gone. I backed up my database and tested that. The result was a little confusing. The order of the rows seemed to have changed. If I am not mistaken the order is first by fk_id_1 then fk_id_2 then fk_id_3.
Is this the natural order in which MySQL sets the table, when assignung an new generated autoincrement key to the rows?
Is there more I should know, that happened during this process?
The reason, why I need to know about this is that I need to make the column id useful for another task I intend to accomplish where gaps are a no go.

There is no natural order to a table in any mainstream RDBS.
Only the outermost ORDER BY in a SELECT statement will guarantee the order of results.
If you want "order":
create a new table
INSERT..SELECT..ORDER BY fk_id_1, fk_id_2, fk_id_3
Drop old table
Rename new table
Or live with gaps... OCD isn't good for developers
Edit:
Question says "no dependency" on this value but turns out there is.
If gaps are not allowed then don't use autonumber and use fk_id_1, fk_id_2, fk_id_3 as your key, with a ROW_NUMBER emulation. Or code your downstream to deal with gaps.
Autonumbers will have gaps: immutable fact of life.

Related

Is it good practice to add primary keys to a tables during the alter statement, or it makes no deference to add them when I cam creating the table

I created/defined an admin table, now I have seen other programmers alter the table and add keys to the tables
CREATE TABLE `admin` (
`admin_id` int(11) NOT NULL AUTO_INCREMENT,
`admin_name` varchar(255) NOT NULL,
`admin_surname` varchar(255) NOT NULL,
`phone` CHAR(10) NOT NULL,
`admin_email` varchar(255) NOT NULL,
`password` varchar(255) NOT NULL,
PRIMARY KEY (`admin_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `admin`
ADD PRIMARY KEY (`admin_id`),
ADD UNIQUE KEY `admin_email` (`admin_email`);
If I have already defined the table why should I alter the definition again here?
In InnoDB there exists clustered index always.
When primary key exists in a table then it is used as clustered index.
When no primary key but unique index(es) which's expression does not innclude NULLable columns exists then the most upper unique index from them in the table definition is clustered.
When no such unique index then inner hidden row number is used as an expression for clustered index.
Hence, if you create a table (and some expression is used for clustered index) and then use ALTER TABLE for to add primary key then the table must be rebuilt. It doesn't matter when the table is empty, but when there is data in it the process may be long enough (because COPY method is used).
If you create primary key as a part of CREATE TABLE then this is always fast.
I like to put all the index definitions inside the CREATE TABLE, and put them at the end instead of sitting on the column definitions.
Potential problem 1:
But I notice that some dump utilities like to add the indexes later. This may be a kludge to handle FOREIGN KEY definitions. Those have trouble if the tables are not created in just the right order.
It would seem better to simply ADD FOREIGN KEY... after all the tables are created and indexed.
Potential problem 2:
If you will be inserting a huge number of rows, it is usually more efficient to make the secondary keys after loading the data. This is more efficient than augmenting the indexes as it goes. For small tables (under, say, a million rows), this is not a big deal.
I do not understand why they ADD PRIMARY KEY after loading the data. That requires (as Akina points out) tossing the fabricated PK, sorting the data, and adding the real PK. That seems like extra work, even for a huge table.
If the rows are sorted in PK order, the loading is more efficient. The table is ordered by the PK (for InnoDB); inserting in that order is faster than jumping around. (mysqldump will necessarily provide them in PK order, so it is usually a non-issue.)

MySQL serial vs auto-increment for id column

Disclaimer: I have only novice knowledge of and experience with databases.
I'm following a Laravel course on Laracasts, and in the database video, the instructor sets the ID column to a type of SERIAL. This is different to how I've seen this done in all other database tutorials, where they will usually check the A_I (auto-increment) checkbox, and this automatically makes the column primary, and leaves the type to be something like INT.
Hovering over the SERIAL type in PHPMyAdmin tells me that it's an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE, but is there ever a particular reason to prefer it over the settings that checking the A_I checkbox sets up? Does either way offer any advantages or disadvantages?
I did find this for PostgreSQL, indicating SERIAL is old and outdated, but I couldn't find an equivalent for MySQL and I'm unsure if the same applies to it.
I'm sure MySQL's SERIAL type was implemented to make it easy for folks who were accustomed to PostgreSQL to have one set of CREATE TABLE statements that would work on both brands of database, and do more or less the same thing.
In an old version of the MySQL manual, it was stated that SERIAL is a compatibility feature (without naming the brand it was intended to be compatible with). The language about compatibility was removed (see https://bugs.mysql.com/bug.php?id=7978).
Now that even PostgreSQL has changed its recommended practice and they use IDENTITY columns instead of SERIAL, the MySQL feature is really unnecessary.
There is no advantage to using SERIAL in MySQL. On the contrary, if you do use it in a CREATE TABLE statement, you will see that the syntax isn't saved. It is just an alias for the BIGINT UNSIGNED AUTO_INCREMENT UNIQUE, as documented.
I find that it's actually wasteful to do this, because I typically declare the auto-increment column as a PRIMARY KEY anyway, and this makes the UNIQUE redundant. So you end up with two unique indexes for no reason.
mysql> create table mytable (id serial primary key);
mysql> show create table mytable\G
*************************** 1. row ***************************
Table: mytable
Create Table: CREATE TABLE `mytable` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`) -- this one is superfluous
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
P.S. This question is almost but not quite a duplicate of What is the difference between SERIAL and AUTO_INCREMENT in mysql

This table does not contain a unique column. Grid edit, checkbox, Edit, Copy and Delete features are not available, 2020

Why does phpMyAdmin give me this warning , and lack of function, when selecting from a table named only in lowercase (and underscores) that does have a (single column) primary key? I checked these elements after seeing this
Specifically my query is
SELECT su.* FROM `r8u2d_comps_testsubitem` su
JOIN `r8u2d_comps_testitem` ti ON ti.id=su.testitemid
JOIN `r8u2d_comps_test` t ON ti.testid=t.id
WHERE t.id=241
ORDER BY ti.ordering
The table aliased as "su" has a column "id" (int(11), autoincrement) and a primary key using only this field. It looks to me like this query avoids all the restrictions listed in this answer, so what's the problem? Is it phpMyAdmin (my hosting company has 4.7.9, but I get the same problem locally with 5.0.4) or MySQL (host has 5.7.29-0ubuntu0.16.04.1 - (Ubuntu), I have 10.4.17-MariaDB - MariaDB Server, not strictly comparable I suppose).
Table structure
`id` INT NOT NULL AUTO_INCREMENT,
`testitemid` INT NOT NULL
`marker` CHAR(20) NULL
`text` TEXT NOT NULL,
`ordering` TINYINT NOT NULL,
PRIMARY KEY (`id`),
KEY `testitemid` (`testitemid`),
KEY `ordering` (`ordering`),
CONSTRAINT `subelementToElement`
FOREIGN KEY (`testitemid`) REFERENCES `#__comps_testitem`(`id`)
ON DELETE CASCADE
ON UPDATE NO ACTION
phpMyAdmin makes an effort to work with a primary/unique key for the purposes of enabling grid editing, but that detection logic doesn't hold up very well when using with multiple JOIN statements. It gets difficult for the phpMyAdmin parser to work backwards through some queries and determine which columns come from which tables and whether there's a primary key that could be used for editing the data. I suppose the warning message could better be written as something like "This table or query does not contain a unique column, or your query is a join that obfuscates the original table structure enough that we don't want to risk damaging your data."
Unfortunately, aside from someone rewriting this part of phpMyAdmin, the best solution I can recommend right now is to find the data you want to modify through your JOIN query then open that individual table and scroll through the Browse view to (or use Search to find) the row you wish to modify from the table directly.

Duplicate row in database with Unique key constraint

I have the following table:
CREATE TABLE `some_table` (
`ReferenceId` int(11) DEFAULT NULL,
`ten` int(10) DEFAULT NULL,
`so` bigint(18) DEFAULT NULL,
`mc` bigint(18) DEFAULT NULL,
`ev` bigint(18) DEFAULT NULL,
`sclso` bigint(18) DEFAULT NULL,
`sowbse` bigint(18) DEFAULT NULL,
`AsOfDate` date DEFAULT NULL,
`dud` date NOT NULL,
UNIQUE KEY `ReferenceId` (`ReferenceId`,`AsOfDate`),
KEY `fk_main_table` (`ReferenceId`),
CONSTRAINT `fk_main_table` FOREIGN KEY (`ReferenceId`) REFERENCES `some_other_table` (`Id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
In this table I have added a multiple column UNIQUE index on columns ReferenceId and AsOfDate. But I just noticed that there is a duplicate entry in the table even when we have this constraint.
Check the 2 highlighted records. The constraint is applied on first and second-last columns, which are identical but should not have existed.
What could be the possible issues? The data in this table is not inserted/updated from any web/desktop application but only from 1 script that runs in background.
Edit: I have only identified this 1 index being duplicate and the script have been running for past 3 months.
Either one of two things is true:
You're mistaken
Your database is corrupt
To verify your assertion, write a query to show only the invalid condition:
select count(*) as N, ReferenceId, AsOfDate
from some_table
group by ReferenceId, AsOfDate
having count(*) > 1
(You can dispense with the unnecessary, nonstandard backtick-quotes, by the way. You'll find it makes SQL more pleasant to deal with.)
If that query produces any rows, your database is corrupt, by definition: the table cannot be declared unique on two columns and admit two rows of the same values. Find out what's wrong, and fix it.
If it doesn't produce any rows, it might still be corrupt, but that's evidence you're mistaken. You'll want to re-check your facts, and see if there's another explanation for what you're seeing. Get your hands on the verbatim SQL that produced that output (or is supposed to have done). Execute it, redirecting the output to a temporary table or file, and verify the duplication. If you don't find it, it's not there. If you do, see "corruption" in your friendly manual.
One last thing, just as an aside. This line:
KEY `fk_main_table` (`ReferenceId`),
is likely not doing you much good. You already have
UNIQUE KEY `ReferenceId` (`ReferenceId`,`AsOfDate`),
and your DBMS probably creates an index to enforce that, and probably uses that index to locate rows by ReferenceId.

Table schema design while using innodb

I have encountered a problem when designing the table schema for our system.
Here is the situation:
our system has a lot of items ( more than 20 millions ), each item has an unique id, but for each item there can be lots of records. For example for the item with id 1 there are about 5000 records and each record has more than 20 attributes. The needs to be identified by its id and status of one or more of its attributes for use in select, update or delete.
I want to use innodb
But the problem is when using innodb, there must be an cluster index.
Due to the situation described above it seems had to find a cluster index so I can only use an auto_increment int as the key
The current design is as follows:
create table record (
item_key int(10) unsigned NOT NULL AUTO_INCREMENT,
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_key`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1
the sql statement:
select * from records
where item_id=1 and attribute_1='a1' and attribute_2 between 10 and 1000;
the update and delete statement are similar.
I don't think this a good design, but I can't think of anything else; all suggestions welcome.
Sorry if I didn't make the question clear.
What I want to access ( select, update, delete, insert) is the records, not the items.
The items have there own attributes, but in the descriptions above, the attributes that I mentioned are belongs to the records.
Every item can have many records, like item 1 have about 5000 records.
Every record have 42 attributes, some of them can be NULL, every record has an unique id, this id is unique among different items, but this id is an string not an number
I want to access the records in this way:
A. I will only get(or update or delete) the records that belongs to one specific item at on time or in one query
B. I will get or update the values of all attributes or some specific attributes in the query
C. The attributes that in the condition of the query may not the same as the attributes that I want.
So there could be some SQL statements like:
Select attribute_1, attribute_N from record_table_1 where item_id=1 and attribute_K='some value' and attribute_M between 10 and 100
And the reasons that why I think the original design is not good are:
I can't choose an attribute or the record id as the primary key, because it is no use, in every query, I have to assign the item id and some attributes as the query condition ( like "where item_id=1 and attribute_1='value1' and attribte_2 between 2 and 3), so I can only use an auto_increment int number as the primary key. The result of this is that each query have to scan two b-trees, and it look like that scan of the secondary index is not effective.
Also compound keys seems useless, because the condition of the query could vary among many attributes.
With the original design, it seems that I have add a lot of indexes to satisfy different queries, otherwise I have to deal with the full table scan problem, but it is obviously that too many indexes is not good for update, delete, insert operations.
If you want a cluster index and don't want to use the myisam engine, it sounds like you should use two tables: one for the unique properties of the items and the other for each instance of the item (with the specified attributes).
You're right the schema is wrong. Having the attribute 1..20 as fields within the table is not the way to do this, you need a separate table to store this information. This table would have the item_key from this record together with its own key and a value and therefore this second table would have indexes that allow much better searching.
Something like the following:
Looking at the diagram it is obvious that something is wrong because the record table is too empty, it doesn't look right to me so maybe I'm missing something in the original question....
Compound Keys
I think maybe you are looking to have compound key rather than a clustered index which is a different thing. You can achieve this by:
create table record (
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_id`,`attribute_1`,`attribute_2`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1