How to use index in my table correctly? - mysql

I realized, that when I am creating foreign keys in table, indexes are adding automatically.
In my table:
CREATE TABLE `SupplierOrderGoods` (
`shopOrder_id` INT(11) NOT NULL,
`supplierGood_id` INT(11) NOT NULL,
`count` INT(11) NOT NULL,
PRIMARY KEY (`shopOrder_id`, `supplierGood_id`),
CONSTRAINT `FK_SupplierOrderGoods_ShopOrders` FOREIGN KEY (`shopOrder_id`) REFERENCES `shoporders` (`id`),
CONSTRAINT `FK_SupplierOrderGoods_SupplierGoods` FOREIGN KEY (`supplierGood_id`) REFERENCES `suppliergoods` (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
Index
INDEX `FK_SupplierOrderGoods_SupplierGoods` (`supplierGood_id`)
have been created automatically.
It is okay, that index have been created as I found in another post. I was looking what indexes are used for and found, that they are used for optimizing search in tables.
Now, I know, that I have to use indexes to optimize work with database.
Also, I found, that indexes can be complex (not on one field, but on some fields). In that case, I want to ask should I use complex index:
INDEX `FK_ShopOrders_SupplierGoods` (`shopOrder_id`, `supplierGood_id`),
or two simple indexes?:
INDEX `FK_SupplierOrderGoods_SupplierGoods` (`supplierGood_id`),
INDEX `FK_SupplierOrderGoods_ShopOrders` (`shopOrder_id`),

I'm still earning about indexes myself but I believe it's going to depend on what kind of data you will be querying the DB for.
For example, if you have a report for a certain record that will be ran a lot you'll want an index on it. If the report pulls just one column then make a one column index, if it's comprised of two, like a first name and a last name record, you'll probably want one for both.
You do not want to put an index on everything though as that can have performance issues as both the record and the index need to be updated. As such, tables that have a high amount of inserts or updating done on them you'll want to think about whether an index hurts or helps.
Lot of information to cover with indexes.

Related

Is this second multicolumn unique key in my MySQL statement redundant or does it improve performance?

I found this old code and I'm not sure if it's optimized or just doing something silly.
I have a SQL create statement like this:
CREATE TABLE `wp_pmpro_memberships_categories` (
`membership_id` int(11) unsigned NOT NULL,
`category_id` int(11) unsigned NOT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY `membership_category` (`membership_id`,`category_id`),
UNIQUE KEY `category_membership` (`category_id`,`membership_id`)
);
Is that second UNIQUE KEY there redundant with the PRIMARY KEY on the same 2 columns? Or would the second one help for queries that filter by the category_id first then by the membership_id? Is it being ignored?
I'm trying to remember why I coded it that way, way back when. Seems similar to what this comment is describing: https://dba.stackexchange.com/a/1793/245678
Thanks!
It depends on your query patterns. If you do SELECT, UPDATE, DELETE only on the category_id column, then the 2nd index makes sense but you should omit the membership_id column (redundant) and the UNIQUE constraint.
MySQL will automatically use the PRIMARY KEY index if you use either membership_id or both columns. It doesn't matter in which order these columns appear in your WHERE clauses.
The secondary index does improve performance when going from a "category" to a "membership".
You coded it with those two indexes because some queries start with a "membership" and need to locate a "category"; some queries go the 'other' direction.
That's a well-coded "many-to-many mapping table".
InnoDB provides better performance than MyISAM.
The "Uniqueness" constraint in the UNIQUE key is redundant.
Checking for Uniqueness slows dowing writes by a very small amount. (The constraint must be checked before finishing the update to the index's BTree. A non-unique index can put off the update until later; see "change buffering".)
I like to say this to indicate that I have some reason for the pair of columns being together in the index:
INDEX(`category_id`,`membership_id`)
I discuss the schema pattern here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

Optimising a query that uses index merge by intersection

I have a MySQL 8 database table accounts that has the following columns:
id (primary)
city_id (foreign key)
province_id (foreign key)
country_id (foreign key)
school_id (foreign key)
age (indexed)
EDIT: See bottom for complete table structure.
Now, imagine the following SQL query:
SELECT
COUNT(`id`) AS AGGREGATE
FROM
`accounts`
WHERE
`city_id` = 1
AND
`country_id` = 7
AND
`age` = 3
At 1 million records, this query becomes slow (~200ms).
When running EXPLAIN, I receive the following output:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
accounts
NULL
index_merge
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
9,2,9
NULL
15542
100.00
Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index
Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?
EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.
EDIT: Here's the complete table structure:
CREATE TABLE `accounts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`city_id` bigint unsigned DEFAULT NULL,
`school_id` bigint unsigned DEFAULT NULL,
`country_id` bigint unsigned DEFAULT NULL,
`province_id` bigint unsigned DEFAULT NULL,
`age` tinyint unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `accounts_city_id_foreign` (`city_id`),
KEY `accounts_school_id_foreign` (`school_id`),
KEY `accounts_country_id_foreign` (`country_id`),
KEY `accounts_province_id_foreign` (`province_id`),
KEY `accounts_age_index` (`age`),
CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)
Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?
So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.
if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.
FEEDBACK
The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.
You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.
You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.
You don't have to add ALL the columns to the composite index!
Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.
You might like my presentation How to Design Indexes, Really (or the video).
Re your comment:
I won't know every possible query combination ahead of time.
Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.
In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.

How to speed up a highly active big data table (MySQL)?

I'll begin to try and explain my problem and what I meant with the title.
Currently I have got a table with around ~8 million rows.
This table is highly active, what this means is there's constant updates, inserts and deletes.
These are caused by users (it's like a collecting game). Meaning I also need to make sure the data is accurately displayed.
I've looked so far into:
indexing
partitioning
sharding
mapreduce
optimize
I applied indexing, however I'm not sure if I applied this method correctly and it doesn't seem to help much more than I thought.
As I said, my table is highly active, meaning that if I'd add partitioning to this table, it would mean there are going to be additional inserts/deletes and make this process way more complex than I can understand. I do not have that much experience with databases.
Sharding this database is way too complex for me and I only have one service I can run this database on, so this option is a no-go.
As for mapreduce, I am not entirely sure what this does, but as far as I understood, it mainly has to do more so with the code, than with the database.
I applied optimize, but it didn't really seem to have too much effect neither as I experienced.
I have tried to not use the * in SELECT statements, I made sure to get rid of most DISTINCT, COUNT and other functionalities of SQL alike, so that these wouldn't affect the speed of the database.
However even after narrowing down the data in each table and specifically this table, it's currently slower than it was before this.
This table consists of:
CREATE TABLE `claim` (
`global_id` bigint NOT NULL AUTO_INCREMENT,
`fk_user_id` bigint NOT NULL,
`fk_series_id` smallint NOT NULL,
`fk_character_id` smallint NOT NULL,
`fk_image_id` int NOT NULL,
`fk_gif_id` smallint DEFAULT NULL,
`rarity` smallint NOT NULL,
`emoji` varchar(31) DEFAULT NULL,
PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
KEY `fk_claim_character_id` (`fk_character_id`),
KEY `fk_claim_image_id` (`fk_image_id`),
KEY `fk_claim_series_id` (`fk_series_id`),
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
KEY `fk_claim_gif_id` (`fk_gif_id`) /*!80000 INVISIBLE */,
KEY `fk_claim_rarity` (`rarity`) /*!80000 INVISIBLE */,
KEY `fk_claim_emoji` (`emoji`),
CONSTRAINT `fk_claim_character_id` FOREIGN KEY (`fk_character_id`) REFERENCES `character` (`character_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_claim_image_id` FOREIGN KEY (`fk_image_id`) REFERENCES `image` (`image_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_claim_series_id` FOREIGN KEY (`fk_series_id`) REFERENCES `series` (`series_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_claim_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=7622452 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Is there possibly another solution to speed up the database? If so, how? I'm currently at wits end and stuck on it. The database needs to respond preferably within 300ms.
EXAMPLE SLOW QUERIES:
SELECT PK FROM <table> WHERE fk_user_id = ?;
SELECT PK FROM <table> WHERE fk_user_id = ? GROUP BY fk_character_id HAVING MAX(fk_character_id) = 1;
SELECT PK, fk_user_id, fk_character_id, etc, etc, etc FROM <table> WHERE fk_user_id = ? ORDER BY PK ASC LIMIT 0, 20
Redundant
PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
A PRIMARY KEY, in MySQL, is a UNIQUE KEY. So the UNIQUE KEY is redundant, wastes disk space, and slows down INSERT.
Need VISIBLE index starting with user_id for Q1 and Q2
Replace this
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
with
INDEX(fk_user_id, fk_character_id)
in that order -- this will help with your first 2 queries.
Query 3
The 3rd query may still need (in the given order)
INDEX(fk_user_id, global_id)
If you need some of the DISTINCTs/COUNTs, let's see them. Changing indexes may help.
Strange query
As for
SELECT PK FROM <table> WHERE fk_user_id = ?;
Why would you just want the PK? Is global_id useful by itself? Or is it useful only for looking up something else? If the latter, let's see it; it is often more practical to optimize a single, complex, query than two queries that are artificially split.
Tuning
How much RAM is available to MySQL? What is the value of innodb_buffer_pool_size? 30s for 50K rows -- sounds like being I/O-bound. Maybe that setting is too low.
In some cases, DISTINCT speeds up a query -- if for no other reason that less data is shoveled back to the client.
Redesign PK
Based on the names "claim" and "user_id" and the test for "user_id" in all 3 queries, I deduce that you are frequently looking up stuff for a single "user"? What, if anything, is global_id needed for outside this table?
If you need need global_id elsewhere or nothing else could be used for uniqueness, do
PRIMARY KEY(user_id, global_id), -- for locality of reference
INDEX(global_id) -- to keep AUTO_INCREMENT happy
If (user_id, xx) is known to be unique (for some column(s) xx), toss global_id and change to
PRIMARY KEY(user_id, xx)
In either case, these go away:
PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
InnoDB stores the data in PK order. By having the PK start with user_id, all the rows for one user are "adjacent" on the disk, thereby more readily cached in RAM (in the buffer_pool).
Given a user with 100 claims, I am restructuring the table so that the data is found in a couple of consecutive blocks (16KB unit of storage by InnoDB) instead of upwards of 100 scattered blocks.

Is there a performance benefit to creating a multiple index on a primary key + foreign key?

If I have a table that has a primary key and a foreign key, and searches are frequently done with queries that include both (...WHERE primary=n AND foreign=x), is there any performance benefit to making a multiple index in MySQL using the two keys?
I understand that they are both indexes already, but I am uncertain if the foreign key is still seen as an index when included in another table. For example, would MySQL go to the primary key, and then compare all values of the foreign key until the right one is found, or does it already know where it is because the foreign key is also an index?
Update: I am using InnoDB tables.
For equality comparisons, you cannot get an improvement over the primary key index (because at that point, there is at most just one row that can match).
The access path would be:
look at the primary key index for primary = n
get the single matching row from the table
check any other conditions using the row in the table
A composite index might make some sense if you have a range scan on the primary key and want to narrow that down by the other column.

Partitioning mySQL tables that has foreign keys?

What would be an appropriate way to do this, since mySQL obviously doesnt enjoy this.
To leave either partitioning or the foreign keys out from the database design would not seem like a good idea to me. I'll guess that there is a workaround for this?
Update 03/24:
http://opendba.blogspot.com/2008/10/mysql-partitioned-tables-with-trigger.html
How to handle foreign key while partitioning
Thanks!
It depends on the extent to which the size of rows in the partitioned table is the reason for partitions being necessary.
If the row size is small and the reason for partitioning is the sheer number of rows, then I'm not sure what you should do.
If the row size is quite big, then have you considered the following:
Let P be the partitioned table and F be the table referenced in the would-be foreign key. Create a new table X:
CREATE TABLE `X` (
`P_id` INT UNSIGNED NOT NULL,
-- I'm assuming an INT is adequate, but perhaps
-- you will actually require a BIGINT
`F_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`P_id`, `F_id`),
CONSTRAINT `Constr_X_P_fk`
FOREIGN KEY `P_fk` (`P_id`) REFERENCES `P`.`id`
ON DELETE CASCADE ON UPDATE RESTRICT,
CONSTRAINT `Constr_X_F_fk`
FOREIGN KEY `F_fk` (`F_id`) REFERENCES `F`.`id`
ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=INNODB CHARACTER SET ascii COLLATE ascii_general_ci
and crucially, create a stored procedure for adding rows to table P. Your stored procedure should make certain (use transactions) that whenever a row is added to table P, a corresponding row is added to table X. You must not allow rows to be added to P in the "normal" way! You can only guarantee that referential integrity will be maintained if you keep to using your stored procedure for adding rows. You can freely delete from P in the normal way, though.
The idea here is that your table X has sufficiently small rows that you should hopefully not need to partition it, even though it has many many rows. The index on the table will nevertheless take up quite a large chunk of memory, I guess.
Should you need to query P on the foreign key, you will of course query X instead, as that is where the foreign key actually is.
I would strongly suggest sharding using Date as the key for archiving data to archive tables. If you need to report off multiple archive tables, you can use Views, or build the logic into your application.
However, with a properly structured DB, you should be able to handle tens of millions of rows in a table before partitioning, or sharding is really needed.