MySQL: UNIQUE constraint without index - mysql

Is it possible to add a constraint like
ALTER TABLE `t1` ADD UNIQUE(`col1`, `col2`);
without creating an index? The index wouldn't be used for any queries so it would be a waste of space.
It wouldn't be a problem if inserts and updates would be way slower, because the table doesn't get updated very often.

No, this is not possible. A UNIQUE constraint contains an index definition and I barely imagine how it might be implemented without creating an index (in DBMS terms).
You should realize that indexes are not just 'wizardy' - they are a real data structure, which takes space to be placed, special procedures to be handled e.t.c. A unique constraint, itself, means unique index values, not unique column values.

Related

MySQL: Create a Foreign key without an Index

Is it possible to have a foreign key without an index in MySQL 5.6.34? I want that because I created a nullable column in 20M rows with a foreign key to another table. As this is a new feature, only the new rows MAY have this column filled with an actual value, and as you may expect, the cardinality of that index becomes horrible. So, for most of the time, using that index is actually a bad idea. The problem: I have tons of queries that shares this same restriction:
[...] from large_table where tenant_id = ? and nullable_foreign_key_with_index is null and [...]
The issue? MySQL thinks that it's a good idea to use an index_merge/intersect strategy for query resolution. In this case MySQL would do 2 queries in parallel: one with tenant_id (which uses a valid and good index) and another one with nullable_foreign_key_with_index which is bad, almost a "full table scan in parallel" given that the cardinality of this index is <1000 in a table with >20M rows. More details about this "problem" in here
So, what are the pssible solutions? Given that MySQL "forces" a foreign key to have an index attached:
Drop the foreign key and the index. This is bad, because in the case of a bug in the app we may compromise the referential integrity.
FOREIGN_KEY_CHECKS=0; Drop index; FOREIGN_KEY_CHECKS=1. This is bad, because even that the foreign key still exists, MySQL doesn't validade the column anymore to check if the value actually exists. Is that a bug?
Use query hints in all existing queries to make sure that we are only using the old and efficient "tenant_id_index". This is bad because I have to hunt down all existing queries and also remember to use it again when news queries are built.
So, how can I say: "MySQL, don't bother creating an index for this foreign key, but keep validating it's content in the related table, which is indexed by primary key anyway". Am I missing something? The best idea so far is to remove the foreign key and just believe that the app is working as expected, which probably it is, but this would start a classic discussion about having constraints in APP vs DATABASE. Any ideas?
For this query:
from large_table
where tenant_id = ? and
nullable_foreign_key_with_index is null and [...]
Just add the index large_table(tenant_id, nullable_foreign_key_with_index).
MySQL should use this index for the table.
I'm pretty sure you can do this backwards (I would be 100% sure if the comparison were to anything other than NULL, but I'm pretty sure MySQL does the right thing with NULL as well.)
large_table(nullable_foreign_key_with_index, tenant_id)
And MySQL will recognize that this index works for the foreign key and not create any other index.
Q: How can I say: "MySQL, don't bother creating an index for this foreign key, but keep validating it's content in the related table, which is indexed by primary key anyway"
A: No can do. InnoDB requires a suitable index to support the enforcement of foreign key constraint.
Consider the flip side of it... if we are going to DELETE a row in the parent table, then InnoDB needs to check the foreign key constraint.
That means InnoDB needs to check the contents of the child table, to find rows that have a specific value in foreign key column. Essentially equivalent to
SELECT ... FROM child_table c WHERE c.foreign_key_col = ?
And to do that, InnoDB requires that there be an index on child_table that has foreign_key_col as the leading column.
The options suggested in the question (disabling or dropping the foreign key) will work because then InnoDB isn't going enforce the foreign key. But as noted in the question, what this means is that the foreign key isn't enforced. Which defeats the purpose of the foreign key. The application code could be responsible for enforcing referential integrity, or we could write some ug-gghhh-ly triggers (no, we don't want to go there).
As Gordon already noted in his (as usual excellent) answer... the problem isn't really dropping the index on the foreign key column. The actual problem is the inefficient execution plan. And the most likely fix for that is to make sure a more suitable index is available.
Composite indexes are the way to go. An index like this:
... ON child_table (foreign_key_col,tenant_id,...)
would satisfy the requirement of the foreign key, an index with the foreign key column as a leading column. And drop the (now redundant) index on just the singleton foreign_key_col.
This index could also be used to satisfy the query that's using a horrible index merge access plan. (Verify with EXPLAIN.)
Also, consider adding columns (such as foreign_key_col) to the index that has tenant_id as the leading column
... ON child_table (tenant_id,...,foreign_key_col,...)
and drop the redundant index on the singleton tenant_id col.
Summary: Almost always it is better to have a composite index instead of depending on "index merge intersect".
If both columns are tested with = (or IS NULL), it does not matter which order the columns are in the index definition. That is, cardinality is irrelevant.

Order of columns in a primary key, performance

I have a small question for performance reasons.
I'm working with symfony and doctrine. I always used annotations in my entities and decided recently to switch to yml files.
So I exported externally all my entities and generated the yml files.
I compared the yml files with the database. There was a diff file generated which drops the primary key on certain tables and then adds them, simply in a different order. These primary keys have multiple columns.
It seems that this happens only when one of the columns is a foreign key.
The question is whether I can execute the change to my database and switch the order of the key columns, or whether it will affect my performance?
Primary keys in MySQL are implemented with unique indexes. Indeed, that's true for most, if not all, SQL dbms nowadays.
The order of columns in an index is significant. Changing the order can certainly change performance.
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
There might be a good reason for changing the order. See Using Foreign Key Constraints.
MySQL requires indexes on foreign keys and referenced keys so that
foreign key checks can be fast and not require a table scan. In the
referencing table, there must be an index where the foreign key
columns are listed as the first columns in the same order. Such an
index is created on the referencing table automatically if it does not
exist. This index might be silently dropped later, if you create
another index that can be used to enforce the foreign key constraint.
If your programs are putting the foreign key columns first in the new primary key, this might be the problem they're trying to solve. They're trying to avoid creating both an index on the primary key columns and an additional index on the foreign key columns alone.
That doesn't mean it won't hurt performance of particular queries, though.
There are at least two ways to test this. First, you can bring up a new database, connect your application to it, and run it. Does it seem fast enough?
Second, you can bring up a new database, and run some or all of your queries manually, using EXPLAIN.

Should primary keys always be added to an innodb table?

I have some innoDbs with only 2 int columns which are foreign keys to the primary keys of other tables.
E.g one table is user_items, it has 2 columns, userId, itemId, both foreign keys to user and item tables, set to cascade if updated or deleted.
Should I add a 3rd column to such tables and make it a primary key, or is it better the way it is right now, in terms of performance or any other benefits?
Adding a third ID column just for the sake of adding an ID column makes no sense. In fact it simply adds processing overhead (index maintenance) when you insert or delete rows.
A primary key is not necessarily "an ID column".
If you only allow a single associated between user and item (a user cannot be assigned the same item twice) then it does make sense to define (userid, itemid) as the primary key of your table.
If you do allow the same pair to appear more than once then of course you don't need that constraint.
You already have a natural key {userId, itemId}. Unless there is a specific reason to add another (surrogate) key, just use your existing key as primary.
Some reasons for the surrogate may include:
Keeping child FKs "slimmer".
Elimination of child cascading updates.
ORM-friendliness.
I don't think that any of this applies to your case.
Also, please be aware that InnoDB tables are clustered, and secondary indexes in clustered tables are more expensive than secondary indexes in heap-based tables. So ideally, you should avoid secondary indexes whenever you can.
In general, if it adds no real complexity to the code you're writing and the table is expected to contain 100,000-500,000 rows or less, I'd recommend adding the primary key. I also sometimes recommended adding created_at and updated_at columns.
Yes, they require more storage -- but it's minimal. There's also the issue that the primary key index will have to be maintained and so inserts and updates may be slower if the table becomes large. But unless the table is large (100's of thousands or millions of rows) it will probably make no difference in processing speed.
So unless the table is going to be quite large, the space and processing speed impact are insignificant -- so you make the decision on how much effort it takes to maintain it and the potential utility it provides. If it takes very little extra code to do, then virtually any utility it provides might make it worthwhile.
One of the best reasons to have a primary key is to give the rows a natural order based on the order they were inserted. If you ever want to retrieve the last 100 (or first 100) rows added, it's very simple and fast if you have an auto-increment primary key on the table.
Adding inserted_at and updated_at columns can provide similar utility in terms of fetching data based on date ranges. Again, unless the number of rows is going to be very large, it may be worth evaluating these as well.

What does "The indexes PRIMARY and id seem to be equal and one of them could possibly be removed." mean?

What does this mean and how can I fix it?
You have two separate indexes on the same field of your table (id). One of them is implied by having set id as a PRIMARY KEY, the other you probably created explicitly. Only one of them is needed - having both of them may result in performance drop due to the additional index updates.
Just drop one of them to resolve this issue.
Having a PRIMARY KEY or UNIQUE constraint on a column (or field, if you wish) of a table essentially means that for each row inserted, the value of that column should be unique and therefore it should not already exist in the table. The naive approach would be to read all existing rows before inserting, but that would make the DB very slow once a number of rows has been inserted.
In order to deal with this, most (all?) decent database engines will implicitly create indexes for such fields, so that they can quickly detect if a value already exists in the table, without having to scan all its rows.
As a result, manually creating indexes on fields declared PRIMARY KEY or UNIQUE not only is redudant, but it may also cause performance loss due to the duplication of the work needed to maintain the indexes.
It looks to me that it's saying both of these indices have the exact same properties and just have different Keynames, having two indices will create extra storage space as well as run-time to inserts (there is no reason to do this that I can think of) more detail on the topic can be found here:
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
the PRIMARY one is created by phpMyAdmin I believe and it appears the index one was created manually, but is a duplication of work.

Can a database table be without a primary key?

Can anyone tell me if a table in a relational database (such as MySQL / SQL SERVER) can be without a primary key?
For example, I could have table day_temperature, where I register temperature and time. I don't see the reason to have a primary key for such a table.
Technically, you can declare such a table.
But in your case, the time should be made the PRIMARY KEY, since it's probably wrong to have different temperatures for the same time and probably useless to have same more than once.
Logically, each table should have a PRIMARY KEY so that you could distinguish two records.
If you don't have a candidate key in you data, just create a surrogate one (AUTO_INCREMENT, SERIAL or whatever your database offers).
The only excuse for not having a PRIMARY KEY is a log or similar table which is a subject to heavy DML and having an index on it will impact performance beyond the level of tolerance.
Like always it depends.
Table does not have to have primary key. Much more important is to have correct indexes. On database engine depends how primary key affects indexes (i.e. creates unique index for primary key column/columns).
However, in your case (and 99% other cases too), I would add a new auto increment unique column like temp_id and make it surrogate primary key.
It makes much easier maintaining this table -- for example finding and removing records (i.e. duplicated records) -- and believe me -- for every table comes time to fix things :(.
If the possibility of having duplicate entries (for example for the same time) is not a problem, and you don't expect to have to query for specific records or range of records, you can do without any kind of key.
You don't need a PK, but it's recommended that you have one. It's the best way to identify unique rows. Sometimes you don't want an auto incremental int PK, but rather create the PK on something else. For example in your case, if there's only one unique row per time, you should create the PK on the time. It makes looks up based on time faster, plus it ensures that they're unique (you can be sure that the data integrity isn't violated):
Even if you do not add a primary key to an InnoDB table in MySQL, MySQL adds a hidden clustered index to that table. If you do not define a primary key, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
If the table has no primary key or suitable UNIQUE index, InnoDB internally generates a clustered index GEN_CLUST_INDEX on a synthetic column containing row ID values.
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
The time would then become your primary key. It will help index that column so that you can query data based on say a date range. The PK is what ultimately makes your row unique, so in your example, the datetime is the PK.
I would include a surrogate/auto-increment key, especially if there is any possibility of duplicate time/temperature readings. You would have no other way to uniquely identify a duplicate row.
I run into the same question on one of the tables i did.
The problem was that the PK was supposed to be composed out of all the rows of the table all is well but this means that the table size will grow very fast with each row inserted.
I choose to not have a PK, but only have an index on the row i do the lookup on.
When you replicate a database on mysql, A table without a primary key may cause delay in the replication.
http://lists.mysql.com/mysql/227217
The most common mistake when using ROW or MIXED is the failure to
verify that every table you want to replicate has a PRIMARY KEY on
it. This is a mistake because when a ROW event (such as the one
documented above) is sent to the slave and neither the master's copy
nor the slave's copy of the table has a PRIMARY KEY on the table,
there is no way to easily identify which unique row you want
replication to change.
According to your answer I would consider three options:
put a PK on both cols, this way for each time there could be only one temp and vise versa. This solution allows for multiple rows with the same temp or the same time just that there wouldn't be any two rows with same temp AND time.
don't put a PK at all but do put a unique index on both cols. one unique index containing both cols. this would allow for nulls in temp and time but incurs more space to maintain index.
these two options would be best for retrieval speed if you have heavy reads but would result in lower inserts rate as indices would have to be updated as well.
don't put any index at all, nor PK. this would be best for inserts but very bad for searching. useful for logging where retrieval is done by another
mechanism or when inserting device is not required to check for dups.
Also, it is very important to consider cardinality here and think about future consequences of using an auto incremented number. if you're planning to do A LOT OF inserts then even an auto incremented unsigned bigint would be a risk because it would eventually run out. In your example I guess you'll be saving data daily - for how long? this would be problematic if you saved temp every minute... so I'll take this as an extreme example.
I guess it is best to think about what you need from the table. are you doing "save-and-forget" for the entire year for the temp at every minute? are you going to use this table frequently in real-time decision making in your business logic? I think it is best to segregate data necessary for real-time (oltp) from long-term saving data that would be required seldom and its retrieval latency is allowed to be high (olap). it's even worth duplicating the data into two different tables, one heavily indexed and get erased once in a while to control cardinality and the second is actually saved on a magentic disk with almost no indices at all (it is possible to transfer a schema from your main fs into another fs).
I've got a better example of a table that doesn't need a primary key - a joiner table. Say I have a table with something called "capabilities", and another table with something called "groups", and I want a joiner table that tells me all the capabilities that all the groups might have, so it's basicallly
create table capability_group
( capability_id varchar(32),
group_id varchar(32));
There is no reason to have a primary key on that, because you never address a single row - you either want all the capabilities for a given group, or all the groups for a given capabilty. It would be better to have a unique constraint on (capabilty_id,group_id), and separate indexes on both fields.