Partitioning mySQL tables that has foreign keys? - mysql

What would be an appropriate way to do this, since mySQL obviously doesnt enjoy this.
To leave either partitioning or the foreign keys out from the database design would not seem like a good idea to me. I'll guess that there is a workaround for this?
Update 03/24:
http://opendba.blogspot.com/2008/10/mysql-partitioned-tables-with-trigger.html
How to handle foreign key while partitioning
Thanks!

It depends on the extent to which the size of rows in the partitioned table is the reason for partitions being necessary.
If the row size is small and the reason for partitioning is the sheer number of rows, then I'm not sure what you should do.
If the row size is quite big, then have you considered the following:
Let P be the partitioned table and F be the table referenced in the would-be foreign key. Create a new table X:
CREATE TABLE `X` (
`P_id` INT UNSIGNED NOT NULL,
-- I'm assuming an INT is adequate, but perhaps
-- you will actually require a BIGINT
`F_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`P_id`, `F_id`),
CONSTRAINT `Constr_X_P_fk`
FOREIGN KEY `P_fk` (`P_id`) REFERENCES `P`.`id`
ON DELETE CASCADE ON UPDATE RESTRICT,
CONSTRAINT `Constr_X_F_fk`
FOREIGN KEY `F_fk` (`F_id`) REFERENCES `F`.`id`
ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=INNODB CHARACTER SET ascii COLLATE ascii_general_ci
and crucially, create a stored procedure for adding rows to table P. Your stored procedure should make certain (use transactions) that whenever a row is added to table P, a corresponding row is added to table X. You must not allow rows to be added to P in the "normal" way! You can only guarantee that referential integrity will be maintained if you keep to using your stored procedure for adding rows. You can freely delete from P in the normal way, though.
The idea here is that your table X has sufficiently small rows that you should hopefully not need to partition it, even though it has many many rows. The index on the table will nevertheless take up quite a large chunk of memory, I guess.
Should you need to query P on the foreign key, you will of course query X instead, as that is where the foreign key actually is.

I would strongly suggest sharding using Date as the key for archiving data to archive tables. If you need to report off multiple archive tables, you can use Views, or build the logic into your application.
However, with a properly structured DB, you should be able to handle tens of millions of rows in a table before partitioning, or sharding is really needed.

Related

How to speed up a highly active big data table (MySQL)?

I'll begin to try and explain my problem and what I meant with the title.
Currently I have got a table with around ~8 million rows.
This table is highly active, what this means is there's constant updates, inserts and deletes.
These are caused by users (it's like a collecting game). Meaning I also need to make sure the data is accurately displayed.
I've looked so far into:
indexing
partitioning
sharding
mapreduce
optimize
I applied indexing, however I'm not sure if I applied this method correctly and it doesn't seem to help much more than I thought.
As I said, my table is highly active, meaning that if I'd add partitioning to this table, it would mean there are going to be additional inserts/deletes and make this process way more complex than I can understand. I do not have that much experience with databases.
Sharding this database is way too complex for me and I only have one service I can run this database on, so this option is a no-go.
As for mapreduce, I am not entirely sure what this does, but as far as I understood, it mainly has to do more so with the code, than with the database.
I applied optimize, but it didn't really seem to have too much effect neither as I experienced.
I have tried to not use the * in SELECT statements, I made sure to get rid of most DISTINCT, COUNT and other functionalities of SQL alike, so that these wouldn't affect the speed of the database.
However even after narrowing down the data in each table and specifically this table, it's currently slower than it was before this.
This table consists of:
CREATE TABLE `claim` (
`global_id` bigint NOT NULL AUTO_INCREMENT,
`fk_user_id` bigint NOT NULL,
`fk_series_id` smallint NOT NULL,
`fk_character_id` smallint NOT NULL,
`fk_image_id` int NOT NULL,
`fk_gif_id` smallint DEFAULT NULL,
`rarity` smallint NOT NULL,
`emoji` varchar(31) DEFAULT NULL,
PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
KEY `fk_claim_character_id` (`fk_character_id`),
KEY `fk_claim_image_id` (`fk_image_id`),
KEY `fk_claim_series_id` (`fk_series_id`),
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
KEY `fk_claim_gif_id` (`fk_gif_id`) /*!80000 INVISIBLE */,
KEY `fk_claim_rarity` (`rarity`) /*!80000 INVISIBLE */,
KEY `fk_claim_emoji` (`emoji`),
CONSTRAINT `fk_claim_character_id` FOREIGN KEY (`fk_character_id`) REFERENCES `character` (`character_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_claim_image_id` FOREIGN KEY (`fk_image_id`) REFERENCES `image` (`image_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_claim_series_id` FOREIGN KEY (`fk_series_id`) REFERENCES `series` (`series_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_claim_user_id` FOREIGN KEY (`fk_user_id`) REFERENCES `user` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=7622452 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Is there possibly another solution to speed up the database? If so, how? I'm currently at wits end and stuck on it. The database needs to respond preferably within 300ms.
EXAMPLE SLOW QUERIES:
SELECT PK FROM <table> WHERE fk_user_id = ?;
SELECT PK FROM <table> WHERE fk_user_id = ? GROUP BY fk_character_id HAVING MAX(fk_character_id) = 1;
SELECT PK, fk_user_id, fk_character_id, etc, etc, etc FROM <table> WHERE fk_user_id = ? ORDER BY PK ASC LIMIT 0, 20
Redundant
PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
A PRIMARY KEY, in MySQL, is a UNIQUE KEY. So the UNIQUE KEY is redundant, wastes disk space, and slows down INSERT.
Need VISIBLE index starting with user_id for Q1 and Q2
Replace this
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
with
INDEX(fk_user_id, fk_character_id)
in that order -- this will help with your first 2 queries.
Query 3
The 3rd query may still need (in the given order)
INDEX(fk_user_id, global_id)
If you need some of the DISTINCTs/COUNTs, let's see them. Changing indexes may help.
Strange query
As for
SELECT PK FROM <table> WHERE fk_user_id = ?;
Why would you just want the PK? Is global_id useful by itself? Or is it useful only for looking up something else? If the latter, let's see it; it is often more practical to optimize a single, complex, query than two queries that are artificially split.
Tuning
How much RAM is available to MySQL? What is the value of innodb_buffer_pool_size? 30s for 50K rows -- sounds like being I/O-bound. Maybe that setting is too low.
In some cases, DISTINCT speeds up a query -- if for no other reason that less data is shoveled back to the client.
Redesign PK
Based on the names "claim" and "user_id" and the test for "user_id" in all 3 queries, I deduce that you are frequently looking up stuff for a single "user"? What, if anything, is global_id needed for outside this table?
If you need need global_id elsewhere or nothing else could be used for uniqueness, do
PRIMARY KEY(user_id, global_id), -- for locality of reference
INDEX(global_id) -- to keep AUTO_INCREMENT happy
If (user_id, xx) is known to be unique (for some column(s) xx), toss global_id and change to
PRIMARY KEY(user_id, xx)
In either case, these go away:
PRIMARY KEY (`global_id`),
UNIQUE KEY `global_id_UNIQUE` (`global_id`),
KEY `fk_claim_user_id` (`fk_user_id`) /*!80000 INVISIBLE */,
InnoDB stores the data in PK order. By having the PK start with user_id, all the rows for one user are "adjacent" on the disk, thereby more readily cached in RAM (in the buffer_pool).
Given a user with 100 claims, I am restructuring the table so that the data is found in a couple of consecutive blocks (16KB unit of storage by InnoDB) instead of upwards of 100 scattered blocks.

Does the order of KEYs in a CREATE TABLE statement matter?

Note: I searched around to see if this question has been asked before. All the existing questions I've been able to find are asking about composite index ordering, or the ordering of columns for queries on an existing table.
Say I have the following table:
CREATE TABLE `foobar` (
`foo_id` int(11),
`bar_id` int(11),
KEY `foo_id` (`foo_id`),
KEY `bar_id` (`bar_id`)
);
It has two unrelated indexes on it. If I swap the definition of the two indexes, it might look like this:
CREATE TABLE `foobar` (
`foo_id` int(11),
`bar_id` int(11),
KEY `bar_id` (`bar_id`),
KEY `foo_id` (`foo_id`)
);
If I run SHOW CREATE TABLE foobar on each of these I can see that there is a difference between the ordering of the KEYs for each table. My question is, does the ordering in this specific case matter? I know it would matter if foo_id and bar_id were used together in a composite index, but here they are not.
If it does indeed matter, is there a way to arbitrarily rearrange the keys once the table has been created? (Something akin to ALTER TABLE foobar ADD INDEX foo_id (foo_id) AFTER bar_id, which I'm pretty sure is invalid as written.)
There is no visual representation of the keys, and it would not add any overhead to arrange them in any way. The key-setting function simply adds qualities to existing columns, which can be rearranged.
The only exception I could see to this would be if you went through some IDE (DB Forge, HeidiSQL, SequelPro, etc), which arranged key values to the top of some list they generated. This, however, is on the side of the system which is interpreting it and has nothing to do with the database performance.
No.
All keys are 'equal'. All are considered when deciding which to use. The key that has the least 'cost' is used. ('Cost' is a complicated formula involving the effort to perform the query one way versus another.)
If you are using InnoDB, it is really a good idea to have a PRIMARY KEY.
If your table is a "many-to-many" mapping table, here is further advice:
http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
A related topic: A "composite" index is one that has multiple columns. (Eg, INDEX(a,b).) The order of the columns does make a difference.
You can use modify in altertable comanad
ALTER TABLE table_name MODIFY foo_id int AFTER bar_id

delete by primary key takes forever mysql

I am trying to delete by primary key from Table (300 rows) and it takes up to max query execution time and at the end returns ERROR 2013: 2013: Lost connection to MySQL server during query. This table has foreign key to the large table (200k rows). What can be an issue?
Query: DELETE FROM Table Where table_id=x
EDIT:
There are no triggers associated with this DELETE statement. DELETE/INSERT/UPDATE statements in some database tables work really slow while SELECT statements in whole database work perfectly fine.
EDIT#2:
Additional information from innodb trx table for the query:
trx_lock_structs 429704
trx_lock_memmory_bytes 34698792
trx_rows_locked 214938
trx_isolation_level REPEATABLE READ
trx_unique_checks 1
trx_foreign_key_checks 1
This query deletes 1 row and doesn't have child rows, why locked rows value is so high?
EDIT#3
Investigating situation further I have determined that tables that have slow insert/update/delete operations are the tables that have foreign key with the big table (200k). Is it necessary to remove this foreign keys or data integrity is more important? Although 200k rows is not that much what can be reasons for this slow operations?
EDIT#4
SHOW CREATE TABLE:
CREATE TABLE `Table` (
`table_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`tableb_id` bigint(20) unsigned NOT NULL,
`tablec_id` bigint(20) unsigned DEFAULT NULL,
`bigtable_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`table_id`),
KEY `fk_tableb_id_idx` (`tableb_id`),
KEY `fk_bigtable_id_idx` (`bigtable_id`),
KEY `fk_tablec_id_idx` (`tablec_id`),
CONSTRAINT `fk_bigtable_id` FOREIGN KEY (`bigtable_id`) REFERENCES `Bigtable`
(`bigtable_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `fk_tableb_id` FOREIGN KEY (`tableb_id`) REFERENCES `tablebs`
(`tableb_id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `fk_tablec_id` FOREIGN KEY (`tablec_id`) REFERENCES `tablecs`
(`tablec_id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=271 DEFAULT CHARSET=utf8
BigTable is a typical Users table id and additional information.
EDIT#5
EXPLAIN DELETE:
select_type : SIMPLE,
table : Table,
type : range,
possible_keys : PRIMARY,
key : PRIAMRY,
key_len : 8,
ref : const,
rows : 1,
Extra : Using where
The reason is cascade on big table. Do you understand the complexity of this computation? it is not just a delete operation on 300 rows. It is basically 300 rows * 200k. Each delete would go and pass through the big table and would perform delete operation on the corresponding rows (based on the id).
Just follow the below mentioned steps:
1) Remove the references of this primary id of this table from other tables (if any)
2) Alter this table and add NO ACTION to the UPDATE & CASCADE of big table foreign key
OR
remove all the foreign key contraints
3) delete from tableName
I may suggest an empirical approach...
Delete ops with cascade impacts on indexes, slow execution time may depend on time needed to elaborate fKey indexes, btw your tables aren't huge. You say you have no triggers, functions or procedure, so you have to benchmark your query to find what's part is taking so much time...
I assume that you have already verified indexes efficency.
So to benchmark your delete query you should handy write "inner deletion" queries and benchmark each one, using BENCHMARK().
This way you will find which part of that deletion took so long.
What if each single part is reasonably fast?
You may have some misconfiguration in your my.cnf, so you could try checking with https://tools.percona.com/wizard recommendations...
Maybe you have some memory allocation limit or some thread/memory limit.
You can find many tutorials on mysql optimization like http://www.codingpedia.org/ama/optimizing-mysql-server-settings/
You can also find some scripts that may help you in finding mysql optimizations.
Without knowing your architecture, mysql configuration and your database structure, it's quite hard to give a 100% solution, but I hope you can find the way, I'll be glad to read about your findings. If something more will come to my mind I'll keep you posted.

How to use index in my table correctly?

I realized, that when I am creating foreign keys in table, indexes are adding automatically.
In my table:
CREATE TABLE `SupplierOrderGoods` (
`shopOrder_id` INT(11) NOT NULL,
`supplierGood_id` INT(11) NOT NULL,
`count` INT(11) NOT NULL,
PRIMARY KEY (`shopOrder_id`, `supplierGood_id`),
CONSTRAINT `FK_SupplierOrderGoods_ShopOrders` FOREIGN KEY (`shopOrder_id`) REFERENCES `shoporders` (`id`),
CONSTRAINT `FK_SupplierOrderGoods_SupplierGoods` FOREIGN KEY (`supplierGood_id`) REFERENCES `suppliergoods` (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
Index
INDEX `FK_SupplierOrderGoods_SupplierGoods` (`supplierGood_id`)
have been created automatically.
It is okay, that index have been created as I found in another post. I was looking what indexes are used for and found, that they are used for optimizing search in tables.
Now, I know, that I have to use indexes to optimize work with database.
Also, I found, that indexes can be complex (not on one field, but on some fields). In that case, I want to ask should I use complex index:
INDEX `FK_ShopOrders_SupplierGoods` (`shopOrder_id`, `supplierGood_id`),
or two simple indexes?:
INDEX `FK_SupplierOrderGoods_SupplierGoods` (`supplierGood_id`),
INDEX `FK_SupplierOrderGoods_ShopOrders` (`shopOrder_id`),
I'm still earning about indexes myself but I believe it's going to depend on what kind of data you will be querying the DB for.
For example, if you have a report for a certain record that will be ran a lot you'll want an index on it. If the report pulls just one column then make a one column index, if it's comprised of two, like a first name and a last name record, you'll probably want one for both.
You do not want to put an index on everything though as that can have performance issues as both the record and the index need to be updated. As such, tables that have a high amount of inserts or updating done on them you'll want to think about whether an index hurts or helps.
Lot of information to cover with indexes.

Mysql design for logtable

I would like to have advices about a mysql table design for a event logger.
Our needs :
- track a lot of action
- 10 000 actions / second
- 1 billion row at this time
Our hardware :
- 2*Xeon (seen as 32 CPU by the system)
- 128 GB RAM
- 6*600 SSD with Raid 10
Our table design :
CREATE TABLE IF NOT EXISTS `log_event` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_event` smallint(6) NOT NULL,
`id_user` bigint(20) NOT NULL,
`date` int(11) NOT NULL,
`data` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_event_2` (`id_event`,`data`),
KEY `id_inscri` (`id_inscri`),
KEY `date` (`date`),
KEY `id_event_4` (`id_event`,`date`,`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
ALTER TABLE `log_event`
ADD CONSTRAINT `log_event_ibfk_1` FOREIGN KEY (`id_inscri`) REFERENCES `inscription` (`id_inscri`) ON DELETE CASCADE ON UPDATE CASCADE;
Our problem :
- We have an auto-increment as primary, but it is not really used. Is it a problem to remove it ? We will no have primary key if we remove it => How to identify a line ?
We would like to do partionning, but with the foreign it seems to be impossible ?
We don't do bulk insert. Is it a good idea to insert in a Memory table without index and copy data every 5 minutes ?
Do you have any idea to optimize ? Do you have best practice for this kind of system ?
Thanks !
François
Primary keys of relational tables (relations) might have two types:
Natural - exists in subject area to completely determine each row of relational table.
Natural primary keys might be simple (if consists of only one column), or complex (if consists more than one column). It is not recomended to set a natural primary key on large string column.
Artificial - special column, injected by database designer / developer to boost table performance, if natural key is complex, and have to be used in related table (is foreign key for something), or if it is simple, but is large and will produce data overhead while copied in related table as a foreign key, or if it is complex to search (for example, CRUD operations on VARCHAR IDs might be slower, than on INT IDs). There might be other reasons. TL;DR: Artificial key - one special column, serving to completely determine each row of relational table and boost it's performance for CRUD operations.
We have an auto-increment as primary, but it is not really used. Is it
a problem to remove it ? We will no have primary key if we remove it
=> How to identify a line ?
If you do not need to reference your table to another tables (as source), then you may probably remove artificial key without any consequences. Still, I recomend you set any other PRIMARY KEY in this table to avoid data duplication, and for obviosity (if it matters).
Your table by itself (if properly normalized) will have natural key as one of "key candidates". It might be complex one (consist of few columns). It is normal. But don't set primary for strings, because PRIMARY always have index, which will produce data overhead. If it is combination of INT or "small" VARCHAR columns, then it is normal.
Consider as an option: id_event + id_user + date.
We don't do bulk insert. Is it a good idea to insert in a Memory table
without index and copy data every 5 minutes ?
It is not a bad idea. But it is not good idea, until it properly tested. Try to perform load-test, before real use.
If you not reference MEMORY table to others, then you still may join it with any other InnoDB table. But you will loose InnoDB functionality (referential integrity). If lose of parent table ON DELETE CASCADE ON UPDATE CASCADE is not a concern, then it might be done. As for me, InnoDB is not so slow to switch table engine, in your case.