Order of columns in a primary key, performance - mysql

I have a small question for performance reasons.
I'm working with symfony and doctrine. I always used annotations in my entities and decided recently to switch to yml files.
So I exported externally all my entities and generated the yml files.
I compared the yml files with the database. There was a diff file generated which drops the primary key on certain tables and then adds them, simply in a different order. These primary keys have multiple columns.
It seems that this happens only when one of the columns is a foreign key.
The question is whether I can execute the change to my database and switch the order of the key columns, or whether it will affect my performance?

Primary keys in MySQL are implemented with unique indexes. Indeed, that's true for most, if not all, SQL dbms nowadays.
The order of columns in an index is significant. Changing the order can certainly change performance.
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
There might be a good reason for changing the order. See Using Foreign Key Constraints.
MySQL requires indexes on foreign keys and referenced keys so that
foreign key checks can be fast and not require a table scan. In the
referencing table, there must be an index where the foreign key
columns are listed as the first columns in the same order. Such an
index is created on the referencing table automatically if it does not
exist. This index might be silently dropped later, if you create
another index that can be used to enforce the foreign key constraint.
If your programs are putting the foreign key columns first in the new primary key, this might be the problem they're trying to solve. They're trying to avoid creating both an index on the primary key columns and an additional index on the foreign key columns alone.
That doesn't mean it won't hurt performance of particular queries, though.
There are at least two ways to test this. First, you can bring up a new database, connect your application to it, and run it. Does it seem fast enough?
Second, you can bring up a new database, and run some or all of your queries manually, using EXPLAIN.

Related

MySQL: Create a Foreign key without an Index

Is it possible to have a foreign key without an index in MySQL 5.6.34? I want that because I created a nullable column in 20M rows with a foreign key to another table. As this is a new feature, only the new rows MAY have this column filled with an actual value, and as you may expect, the cardinality of that index becomes horrible. So, for most of the time, using that index is actually a bad idea. The problem: I have tons of queries that shares this same restriction:
[...] from large_table where tenant_id = ? and nullable_foreign_key_with_index is null and [...]
The issue? MySQL thinks that it's a good idea to use an index_merge/intersect strategy for query resolution. In this case MySQL would do 2 queries in parallel: one with tenant_id (which uses a valid and good index) and another one with nullable_foreign_key_with_index which is bad, almost a "full table scan in parallel" given that the cardinality of this index is <1000 in a table with >20M rows. More details about this "problem" in here
So, what are the pssible solutions? Given that MySQL "forces" a foreign key to have an index attached:
Drop the foreign key and the index. This is bad, because in the case of a bug in the app we may compromise the referential integrity.
FOREIGN_KEY_CHECKS=0; Drop index; FOREIGN_KEY_CHECKS=1. This is bad, because even that the foreign key still exists, MySQL doesn't validade the column anymore to check if the value actually exists. Is that a bug?
Use query hints in all existing queries to make sure that we are only using the old and efficient "tenant_id_index". This is bad because I have to hunt down all existing queries and also remember to use it again when news queries are built.
So, how can I say: "MySQL, don't bother creating an index for this foreign key, but keep validating it's content in the related table, which is indexed by primary key anyway". Am I missing something? The best idea so far is to remove the foreign key and just believe that the app is working as expected, which probably it is, but this would start a classic discussion about having constraints in APP vs DATABASE. Any ideas?
For this query:
from large_table
where tenant_id = ? and
nullable_foreign_key_with_index is null and [...]
Just add the index large_table(tenant_id, nullable_foreign_key_with_index).
MySQL should use this index for the table.
I'm pretty sure you can do this backwards (I would be 100% sure if the comparison were to anything other than NULL, but I'm pretty sure MySQL does the right thing with NULL as well.)
large_table(nullable_foreign_key_with_index, tenant_id)
And MySQL will recognize that this index works for the foreign key and not create any other index.
Q: How can I say: "MySQL, don't bother creating an index for this foreign key, but keep validating it's content in the related table, which is indexed by primary key anyway"
A: No can do. InnoDB requires a suitable index to support the enforcement of foreign key constraint.
Consider the flip side of it... if we are going to DELETE a row in the parent table, then InnoDB needs to check the foreign key constraint.
That means InnoDB needs to check the contents of the child table, to find rows that have a specific value in foreign key column. Essentially equivalent to
SELECT ... FROM child_table c WHERE c.foreign_key_col = ?
And to do that, InnoDB requires that there be an index on child_table that has foreign_key_col as the leading column.
The options suggested in the question (disabling or dropping the foreign key) will work because then InnoDB isn't going enforce the foreign key. But as noted in the question, what this means is that the foreign key isn't enforced. Which defeats the purpose of the foreign key. The application code could be responsible for enforcing referential integrity, or we could write some ug-gghhh-ly triggers (no, we don't want to go there).
As Gordon already noted in his (as usual excellent) answer... the problem isn't really dropping the index on the foreign key column. The actual problem is the inefficient execution plan. And the most likely fix for that is to make sure a more suitable index is available.
Composite indexes are the way to go. An index like this:
... ON child_table (foreign_key_col,tenant_id,...)
would satisfy the requirement of the foreign key, an index with the foreign key column as a leading column. And drop the (now redundant) index on just the singleton foreign_key_col.
This index could also be used to satisfy the query that's using a horrible index merge access plan. (Verify with EXPLAIN.)
Also, consider adding columns (such as foreign_key_col) to the index that has tenant_id as the leading column
... ON child_table (tenant_id,...,foreign_key_col,...)
and drop the redundant index on the singleton tenant_id col.
Summary: Almost always it is better to have a composite index instead of depending on "index merge intersect".
If both columns are tested with = (or IS NULL), it does not matter which order the columns are in the index definition. That is, cardinality is irrelevant.

mySQL table, primary key

I have a table which is used for two purposes. One purpose is for calculation that is done every 5 seconds by running a PHP cron job. For this purpose I need a primary key that is a combination of about 5 fields. With this primary key the cron job runs really effectively and really fast. Then there is the second purpose of the table and that is to retrieve data to display on the web-page after user signs in. For that purpose, totally different primary key would be needed. The one that I use for cron makes it slow. I am tempted to create two tables with identical fields and data but with different primary key. I know it would add a lot of overhead but the website will be really quick and responsive. Is that something that would be recommended?
You can create an index on whatever combination of fields you need.
In general, I prefer an auto-incremented integer primary key on tables. Very useful.
You can have such an index and then build two more indexes on the other columns
create index table_col1_col2_col3_col4_col5 on table(col1, col2, col3, col4, col5);
for the first index. If you like, you can make this a unique index and the database will then enforce uniqueness among rows for these five columns. Then you can create another index for surfing the table in another way:
create index table_col6_col7 on table(col6, col7);
This can be used for retrieval.
There is some overhead to maintaining the indexes on insert/update/delete operations. You would want to test in your environment to see if this is a problem (typically it is not).

MySQL: UNIQUE constraint without index

Is it possible to add a constraint like
ALTER TABLE `t1` ADD UNIQUE(`col1`, `col2`);
without creating an index? The index wouldn't be used for any queries so it would be a waste of space.
It wouldn't be a problem if inserts and updates would be way slower, because the table doesn't get updated very often.
No, this is not possible. A UNIQUE constraint contains an index definition and I barely imagine how it might be implemented without creating an index (in DBMS terms).
You should realize that indexes are not just 'wizardy' - they are a real data structure, which takes space to be placed, special procedures to be handled e.t.c. A unique constraint, itself, means unique index values, not unique column values.

Using Primary Keys as Index

In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?
Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.
Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.
Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.
1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.
Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values ​​among more than a thousand it is a good sign to make it index.

Index a mysql table of 3 integer fields

I have a mysql table of 3 integer fields. None of the fields have a unique value - but the three of them combined are unique.
When I query this table, I only search by the first field.
Which approach is recommended for indexing such table?
Having a multiple-field primary key on the 3 fields, or setting an index on the first field, which is not unique?
Thanks,
Doori Bar
Both. You'll need the multi-field primary key to ensure uniqueness, and you'll want the index on the first field for speed during searches.
You can have a UNIQUE Constraint on the three fields combined to meet your data quality standards. If you are primarily searching by Field1 then you should have an index on it.
You should also consider how you JOIN this table.
Your indexes should really support the bigger workload first - you will have to look at the execution plan to determine what suits you best.
The primary key will prevent your application from accidenttly inserting dupe rows. You probably want that.
Order the columns in the PK correctly though or make an index on the first column clustered for better performance. Compare how the query runs (with the PK present) and with and without the index on the first column.
If you're using InnoDB, you must have a clustered index. If you don't specify one, MySQL will use one in the background anyway. So, you may as well use a clustered (unique) primary key by combining all three columns.
The primary key will also then prevent duplicates, which is a bonus.
If you're returning all three integer fields, then you'll have a covered index, which means that the database won't even have to touch the actual record. It will get everything it needs right from the index.
The only caveat would be inserts (and appends). Updating a clustered index, especially on multiple columns, does have some performance penalization. It will be up to you to test and determine the best approach.