Really Slow MySQL Insert Query - mysql

I've got a table with about half a million records in it. It's not huge. A couple varchar(255) fields, some ints, a float, and a couple timestamps. There are indices on the ints as well as foreign key constraints. Inserts are taking forever. I'm talking 1-4 seconds to insert one row. I've had to deal with slow select queries plenty of times, but I'm stuck trying to figure out what's going on with this insert.
EDIT: Okay, I was really just asking for ideas on how to debug this, but, here's all the tables involved. Inserting into "ingredients" is what takes forever. Hopefully throwing a good portion of my schema onto the web doesn't bite me later...
CREATE TABLE `ingredients` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`quantity` float DEFAULT NULL,
`food` varchar(255) NOT NULL,
`unit_id` int(11) DEFAULT NULL,
`ingredient_group_id` int(11) DEFAULT NULL,
`order_by` int(11) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`description` varchar(255) DEFAULT NULL,
`range` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `unit_id` (`unit_id`),
KEY `ingredient_group_id` (`ingredient_group_id`),
CONSTRAINT `ingredients_ibfk_1` FOREIGN KEY (`unit_id`) REFERENCES `units` (`id`),
CONSTRAINT `ingredients_ibfk_2` FOREIGN KEY (`ingredient_group_id`) REFERENCES `ingredient_groups` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=269974 DEFAULT CHARSET=utf8
CREATE TABLE `units` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`abbreviation` varchar(255) CHARACTER SET latin1 NOT NULL,
`type` int(11) NOT NULL,
`si` float NOT NULL,
`lower_bound` float DEFAULT NULL,
`lower_unit_id` int(11) DEFAULT NULL,
`upper_bound` float DEFAULT NULL,
`upper_unit_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `lower_unit_id` (`lower_unit_id`),
KEY `upper_unit_id` (`upper_unit_id`),
CONSTRAINT `units_ibfk_1` FOREIGN KEY (`lower_unit_id`) REFERENCES `units` (`id`),
CONSTRAINT `units_ibfk_2` FOREIGN KEY (`upper_unit_id`) REFERENCES `units` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8
CREATE TABLE `ingredient_groups` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`recipe_id` int(11) NOT NULL,
`order_by` int(11) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `recipe_id` (`recipe_id`),
CONSTRAINT `ingredient_groups_ibfk_1` FOREIGN KEY (`recipe_id`) REFERENCES `recipes` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=32739 DEFAULT CHARSET=utf8

lots of information missing, but the things i'd check first:
if on MyISAM tables: extremely fragmented files, especially index files. Use filefrag to check that. This can happen if the database grew slowly with time. If so, just shut down MySQL, copy the database directory, rename the original and new copies and restart MySQL
if you use InnoDB tables: a file-based datastore, again too fragmented. In this case, fragmentation can be both at filesystem level (check and handle as above) or at datastore level, for that use the InnoDB tools. In the worst case a block-device-based datastore (which can't get externally fragmented) can exhibit a bad case of internal fragmentation.
some index with extremely low cardinality. That is, a non-unique index with few distinct values present, that is, lots of repeats. This indexes approach asymptotically a linear list, with O(n) time profiles. This can be either an index on the table or the referred foreign index.
reader contention. unlikely, but a huge number of concurrent readers can stall a single writer.
Edit:
After reading your definitions, i think ingredients.unit_id and ingredients.ingredient_group_id are the first candidates to check, since they seem to have very low cardinality.
The first one is unlikely to be useful (do you plan to select all ingredients that are measured in spoons?), so you can probably just drop it.
The second one can be very useful; but if there are few ingredient groups, the cardinality can be very low, degrading performance. To raise cardinality, add some part to make it more discriminating. If no other field is likely to appear in a query together with group id, just add the main id or creation date, making it (ingredient_group_id, id) or (ingredient_group_id, created_at). Seems counterintuitive to add complexity to make it faster, but it can really help. As a bonus, you can add a sort by created_at to any query that selects by ingredient_group_id without performance penalty.

You might want to look at the ingredients.unit_id index, since it has a low selectivity.
are the inserts happening concurrently?

Turns out I had a trigger that was falling prey to this bug:
http://bugs.mysql.com/bug.php?id=9021
I turned it from an IN to an = and now inserts are running in 0.00 seconds.
I totally forgot I had a trigger hooked up to this table. That's my fault. Sorry to anyone who wasted their time trying to help me out, but thank you so much anyway.

Related

MySQL Repeatable Read and Phantoms, unique username example

Martin Kleppmann in his book "Designing Data-Intensive Applications" is showcasing the following problem:
Claiming a username
On a website where each user has a unique username, two users may try to create
accounts with the same username at the same time. You may use a transaction to
check whether a name is taken and, if not, create an account with that name.
However, like in the previous examples, that is not safe under snapshot isolation.
Fortunately, a unique constraint is a simple solution here (the second transaction
that tries to register the username will be aborted due to violating the constraint).
I have a very similar use case, where 2 transactions are trying to claim the name of the entity.
At the beginning of each transaction, I run a select to see if such name was already taken. If it wasn't - create or update, depending on the operation requested by the user. This logic crumbles under concurrent attempts to claim/modify the name.
I am trying to see if there is a mechanism that allows implementing correct behavior under the Repeatable Read isolation level. Unique constraint violation thrown by the DB is not acceptable in my case, neither is a downgrade to Serializable execution.
Can I employ Select For ... Update here? Obviously, I won't be locking the concrete rows, but rather an entire table (correct me if I am wrong in my assumption) as I will not have pk index columns in the WHERE subclause?
Table structure:
CREATE TABLE `application_domains` (
`id` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`description` varchar(10000) DEFAULT NULL,
`org_id` varchar(255) NOT NULL,
`created_time` bigint(20) NOT NULL,
`updated_time` bigint(20) NOT NULL,
`created_by` varchar(16) NOT NULL,
`changed_by` varchar(16) NOT NULL,
`revision_id` varchar(16) DEFAULT NULL,
`topic_domain` varchar(255) NOT NULL,
`enforce_unique_topic_names` tinyint(1) NOT NULL DEFAULT '1',
`sample_id` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UK_orgId_name` (`org_id`,`name`),
UNIQUE KEY `UK_orgId_sampleId` (`org_id`,`sample_id`),
KEY `FK_references_application_domains_organization` (`org_id`),
KEY `FK_app_domain_samples_id_references_application_domains_tbl` (`sample_id`),
CONSTRAINT `FK_app_domain_samples_id_references_application_domains_tbl` FOREIGN KEY (`sample_id`) REFERENCES `application_domain_samples` (`id`) ON DELETE SET NULL ON UPDATE SET NULL,
CONSTRAINT `FK_references_application_domains_organization` FOREIGN KEY (`org_id`) REFERENCES `organizations` (`org_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Update index values extremely slow on MySQL

I have three tables, one is in database db1 and two are in database db2, all on the same MySQL server:
CREATE TABLE `db1`.`user` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`user_name` varchar(20) NOT NULL,
`password_hash` varchar(71) DEFAULT NULL,
`email_address` varchar(100) NOT NULL,
`registration_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`registration_hash` char(16) DEFAULT NULL,
`active` bit(1) NOT NULL DEFAULT b'0',
`public` bit(1) NOT NULL DEFAULT b'0',
`show_name` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`id`),
UNIQUE KEY `user_name` (`user_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `db2`.`ref` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `db2`.`combination` (
`ref_id` bigint(20) UNSIGNED NOT NULL,
`user_id` bigint(20) UNSIGNED NOT NULL,
`arbitrary_number` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`figurine_id`,`user_id`),
KEY `combination_user` (`user_id`),
KEY `combination_number` (`user_id`,`arbitrary_number`),
CONSTRAINT `combination_ref` FOREIGN KEY (`ref_id`) REFERENCES `ref` (`id`) ON UPDATE CASCADE,
CONSTRAINT `combination_user` FOREIGN KEY (`user_id`) REFERENCES `db1`.`user` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The table db1.user has around 600 records, the table db2.ref has around 800 records and the table db2.combination has around 300K records.
Now using Perl and DBD::mysql I perform the following query:
UPDATE `db1`.`user` SET `id` = (`id` + 1000)
ORDER BY `id` DESC
However, this query always stops mentioning the connection to the MySQL server was lost. Also executing this same query via PhpMyAdmin results in a timeout. Somehow the query just takes a very long time to execute. I guess it comes because all the foreign key values need to be updated.
Setting FOREIGN_KEY_CHECKS variable to OFF will not update the user_id column in the db.combination table, which do need to be updated.
I have also tried to manipulate the different timeouts (as suggested all over the internet), like this:
SET SESSION net_read_timeout=3000;
SET SESSION net_write_timeout=3000;
SET SESSION wait_timeout=6000;
I have verified that the new values are actually set, by retrieving the values again. However, even with these long timeouts, the query still fails to execute and after about 30 seconds the connection to the MySQL server is again lost (while amidst executing the UPDATE query)
Any suggestions on how to speed up this query are more than welcome.
BTW: The PK columns have a very large integer type. I will also make this type smaller (change to INT). Could this type change also improve the speed significantly?
UPDATE
I also performed an EXPLAIN for the query and it mentions in the Extra column that the query is doing a filesort. I would have expected that due to the indexes on the table (added them, as they were not there in the first place), no filesort would take place.
The 300K CASCADEs is probably the really slow part of the task. So, let's avoid it. (However, there may be a check the verify the resulting links; this should be not-too-slow.)
Disable FOREIGN KEY processing
Create new tables without FOREIGN KEYs. new_user, new_combination. (I don't know if new_ref is needed.)
Do this to populate the tables:
INSERT INTO new_xx (user_id, ...)
SELECT user_id + 1000, ...;
ALTER TABLE new_xx ADD FOREIGN KEY ...; (for each xx)
`RENAME TABLE xx TO old_xx, new_xx TO xx;
`DROP TABLE old_xx;
Enable FOREIGN KEY processing

Error when creating foreign key of type CHAR with mysql workbench: Error 1005: Can't create table (errno: 150)

I have defined the following 2 tables:
record_status
SHOW CREATE TABLE record_status
CREATE TABLE `record_status` (
`record_status_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`record_status_id`,`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
user
SHOW CREATE TABLE user
CREATE TABLE `user` (
`user_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`handle` varchar(45) NOT NULL,
`email` varchar(255) NOT NULL,
`password` char(64) DEFAULT NULL,
`password_salt` binary(1) DEFAULT NULL,
`first_name` varchar(50) NOT NULL,
`last_name` varchar(50) NOT NULL,
`gender` char(1) DEFAULT NULL,
`birthday` date NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_status` char(6) DEFAULT NULL,
PRIMARY KEY (`user_id`),
KEY `usr_status_idx` (`user_status`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
and I tried adding the foreign key user_status of type CHAR using mysql Workbench as follows:
ALTER TABLE `mydatabase`.`user`
ADD CONSTRAINT `usr_status`
FOREIGN KEY (`user_status`)
REFERENCES `mydatabase`.`record_status` (`status`)
ON DELETE NO ACTION
ON UPDATE NO ACTION;
but I am getting the following error:
Error:
Executing SQL script in server
ERROR: Error 1005: Can't create table 'mydatabase.#sql-420_1b0' (errno: 150)
ALTER TABLE 'mydatabase'.'user'
ADD CONSTRAINT 'usr_status'
FOREIGN KEY ('user_status')
REFERENCES 'mydatabase'.'record_status'('status')
ON DELETE NO ACTION
ON UPDATE NO ACTION
SQL script execution finished: statements: 4 succeeded, 1 failed.
Question
My intention is to have the status column clearly show the current status for each user (ACTIVE, INACTV, DELETD) while still having the flexibility to join the record_status table with the user table using the record_status_id to find any rows with a given status for better performance.
I found a similar post here
Adding foreign key of type char in mysql
which suggests to change my primary key's collation but, how would that affect my user table?
Will I have to change the collation to the user_status field in my user table as well? The user table will be queried every time a user logs in and I am concerned about performance or any constraints this may cause.
I also intend to add a foreign key for the status to a few other tables as well. I would just like to know how this affects performance, or does it add any constraints?
Any input regarding my design will also be appreciated. Thank you for your help!
The issue you're facing isn't actually related to collation (though collation can be a cause of the error you're experiencing under different circumstances).
Your FOREIGN KEY constraint is failing because you don't have an index individually on record_status.status. You have that column as part of the composite PRIMARY KEY (record_status_id, status), but for successful foreign key constraint creation, both the referencing table and the referenced table must have indexes on exactly the columns used in the key relationship (in addition to the same data types).
Adding the FOREIGN KEY constraint implicitly creates the necessary index on the referencing table, but you must still ensure you have the corresponding index on the referenced table.
So given what you have now, if you added a single index on record_status.status, the constraint would correctly be created.
CREATE TABLE `record_status` (
`record_status_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`record_status_id`,`status`),
-- This would make your relationship work...
KEY (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
However, I don't think that's the best course of action. I don't see a need for the composite primary key on (record_status_id, status), chiefly because the record_status_id is itself AUTO_INCREMENT and guaranteed to be unique. That column alone could be the PRIMARY KEY, while still adding an additional UNIQUE KEY on status to satisfy the foreign key constraint's indexing requirement. After all, it is not the combination of record_status_id and status which uniquely identifies each row (making a primary key)
CREATE TABLE `record_status` (
`record_status_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Primary only on record_status_id
PRIMARY KEY (`record_status_id`),
-- Additional UNIQUE index on status
UNIQUE KEY (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
About the design -- eliminating record_status_id...
Without knowing how the rest of your application currently uses record_status_id, I can't say for sure if it required by your application code. But, if you wish to make the actual status value easily available to other tables, and it is merely CHAR(6), it is possible that you actually have no need for record_status_id as an integer value. After all, if the status string is intended to be unique, then it is perfectly capable of serving as the PRIMARY KEY on its own, without any auto-increment integer key.
In that case, your record_status table would look like below, and your FOREIGN KEY constraint would correctly be added to users.
CREATE TABLE `record_status` (
-- Remove the auto_increment column!!
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Status is unique, and therefore can be the PK on its own
PRIMARY KEY (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
Given this setup, here's a sample showing the successful creation of the tables and addition of the FK constraint.
You asked about performance implications of adding a status FK to other tables as well. It's tough to speculate on that without knowing the purpose, but if other tables share the same status values, then it makes sense to create their FK constraints to link to it in the same say you're doing with users. And if that's the case, I would recommend doing it the same way, wherein the status column is CHAR(6) (or consider changing all of them to VARCHAR(6)). The value of record_status.status still makes sense as the true primary key, and can be used as the FK in as many related tables as necessary.
In all but the most gigantic scale, there should be no appreciable performance difference between using an INT value and a CHAR(6)/VARCHAR(6) value as the foreign key. And the storage size difference between them is equally tiny. It isn't worth worrying about unless you must scale this to positively enormous proportions.

mysql - 500 tables with 100K-1M rows or 1 table with 50-500M rows

I've read many similar posts, yet I don't understand what to choose.
From software perspective it is game leaderboard. One table for all leaderboards or 500 small tables, one for each game level?
I've tested both variants, and have found:
1 big table works slower (with all needed indexes created).
1 big table should be partitioned at least into 10 files for adequate speed.
500 small tables are not that convenient, but twice faster (50M big table vs 100K small table)
500 small tables don't need partitioning (I heard about some problems with it in mysql, maybe in MariaDB 10.0 that I use everything is fixed, but just in case)
The only problem here is possibly many opened tables at once. I didn't thougt it's a problem until read setup suggestions in phpMyAdmin, so now I doubt should I use that many tables?
Just in case here's schemas.
"small" table:
CREATE TABLE IF NOT EXISTS `level0` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT '0',
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE IF NOT EXISTS `leaderboard` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT '0',
`level_no` int(11) NOT NULL,
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `level_no` (`level_no`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`),
KEY `lev_sc` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (id)
PARTITIONS 10 */
Queries for ranking:
SELECT COUNT(score) FROM level0 WHERE score > $current_score
ORDER BY score desc
SELECT COUNT(score) FROM leaderboard WHERE
level_no = 0 and score > $current_score ORDER BY score desc
update
I've learned about indexes and ended up with the following schema for big table (20M rows):
CREATE TABLE IF NOT EXISTS `leaderboard` (
`user_id` int(11) NOT NULL DEFAULT '0',
`level_no` smallint(5) unsigned NOT NULL,
`score` int(11) unsigned NOT NULL,
`timestamp` int(11) unsigned NOT NULL,
PRIMARY KEY (`level_no`,`user_id`),
KEY `user_id` (`user_id`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
and for small (100K rows, got from leaderboard where level_no=200):
CREATE TABLE IF NOT EXISTS `level20` (
`user_id` int(11) NOT NULL DEFAULT '0',
`score` int(11) NOT NULL,
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`user_id`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
shared table with long literal user ids:
CREATE TABLE IF NOT EXISTS `player_ids` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`store_user_id` char(64) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `store_user_id` (`store_user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For tests I've used these queries:
SELECT COUNT(*) AS rank FROM level20 lev WHERE score >
(SELECT score FROM level20 lt INNER JOIN player_ids pids ON
pids.id = lt.user_id WHERE pids.store_user_id='3FGTOHQN6UMwXI47IiRRMf9WI777SSJ6A' );
SELECT COUNT(*) AS rank FROM leaderboard lev WHERE level_no=20 and score >
(SELECT score FROM leaderboard lt INNER JOIN player_ids pids ON
pids.id = lt.user_id WHERE pids.store_user_id='3FGTOHQN6UMwXI47IiRRMf9WI777SSJ6A' and level_no=20 ) ;
I like the idea of using one big table, yet, while I'm getting similar timings (~0,050 for small and ~0,065 for big) on both queries, explain still confuses me a little:
for small table
type | key | key_len | ref | rows | extra
index; score; 4; (null); 50049; Using where, Using index
and for big table:
ref; PRIMARY 2; const; 164030; Using where
As you can see there are 3x less rows was scanned in the small table. Data in all tables are identical, level20 was filled with query:
INSERT INTO level20 (user_id, score, timestamp) SELECT user_id, score,
timestamp FROM leaderboard WHERE level_no=20;
another update
Have experimented today with tables and found that changing int to medium int almost doesn't change the size of the table. Here's statistics after optimize (recreate+analyse):
#medium ints
CREATE TABLE IF NOT EXISTS `leaderboard1` (
`user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`level_no` smallint(5) unsigned NOT NULL DEFAULT '0',
`score` mediumint(8) unsigned NOT NULL DEFAULT '0',
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`level_no`,`user_id`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Data 628 Mb
Index 521.6 Mb
Total 1.1 Gb
#ints
CREATE TABLE IF NOT EXISTS `leaderboard` (
`user_id` int(11) NOT NULL DEFAULT '0',
`level_no` smallint(5) unsigned NOT NULL,
`score` int(11) unsigned NOT NULL,
`timestamp` int(11) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`level_no`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Data 670 Mb
Index 597.8Mb
Total 1.2 Gb
And my queries work almost the same way on both tables. I have a feeling that table with medium ints is better and I leave it, yet still a little bit confused.
Your queries are a bit strange. Try this
SELECT COUNT(*)
FROM leaderboard
WHERE level_no = 0 and score > $current_score
Your ORDER BY here is pointless because this query can only return a single row: it's an aggregate query without any GROUP BY.
Five hundred tables is a terrible idea. Your administration tasks will be quite unpleasant.
Also, partitioning your tables rarely helps query performance. In the case you've proposed, partitioning on hash(id), will definitely wreck performance for the query you've shown; every query will have to read every partition.
Keep it simple. One table. When it gets reasonably big, use EXPLAIN to analyze your query performance, and consider adding appropriate compound indexes.
Don't create indexes you don't need. They slow down inserts and waste hard drive space. Read this http://use-the-index-luke.com/ .
Edit MySQL is built for this sort of four-longword table with half a billion rows. You will get this working if you're patient and learn about indexing. Don't waste your irreplaceable time with hundreds of smaller tables or with partitioning. More RAM may help, though.
The best thing for performance with InnoDB is making sure that all of your frequently used data fits in the buffer pool. With your posted table structures, it looks like you'll need roughly 500MB of buffer pool space to keep all of the data in the buffer pool.
A better structure for the leaderboard table would be:
CREATE TABLE IF NOT EXISTS `leaderboard` (
`user_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`level_no` SMALLINT(5) UNSIGNED NOT NULL,
`score` int(10) NOT NULL,
`timestamp` int(10) UNSIGNED NOT NULL,
PRIMARY KEY (`level_no`,`user_id`),
KEY `user_id` (`user_id`),
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Changes:
timestamp and user_id columns are UNSIGNED: expands range for user IDs, I assume you're not using negative time values and current unix timestamps are above the signed range.
The timestamp may be easier to use as a TIMESTAMP type: the TIMESTAMP uses 4 bytes like INT but is shown as a datetime.
Removed the level_no index: it is redundant with the level_no_score index since prefixes of indexes can be used instead of the whole thing.
List item
Using (level_no, user_id) as the primary key will help if you frequently use those columns in queries and removes an unneeded column (id). InnoDB does implicitly create a primary key only if one is not explicitly defined, so creating the id column only to use as a primary key is a waste.
The "correct" primary index also depends on the data and access pattern. What is unique in the table? Is it really level_no and user_id or is it just user? If it's just user_id that will probably be a better primary key.
To save space (hence make things more cacheable, hence faster), shrink from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes, 0-16M range) or smaller.
CHAR(64) -- are the strings always 64 characters? If not, use VARCHAR(64) to save space. ('3FGTOHQN6UMwXI47IiRRMf9WI777SSJ6A' is only 33?)
For leaderboard, I think you can get rid of one index:
PRIMARY KEY (`user_id`, `level_no`), -- reversed
# KEY `user_id` (`user_id`), -- not needed
KEY `score` (`score`),
KEY `level_no_score` (`level_no`,`score`) -- takes care of any lookup by just `level_no`
Re "3x": "Rows" in EXPLAIN is an estimate. Sometimes it is a crude estimate.
You know SQL; why go to the effort to code "SELECT" yourself for NoSQL?
PARTITIONing does not automatically provide any performance boost. And you have not shown any queries that would benefit.
I agree that 500 similar tables is more trouble than it is worth.
2GB of RAM? Better keep innodb_buffer_pool_size down at maybe 300M. Swapping is much worse than shrinking the buffer_pool.
leaderboard PK -- You are saying that one user_id can be in multiple levels?

Optimizing aggregation on MySQL Table with 850 million rows

I have a query that I'm using to summarize via aggregations.
The table is called 'connections' and has about 843 million rows.
CREATE TABLE `connections` (
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
KEY `app_id` (`bid`),
KEY `time_started_dt` (`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
When I try to run a query, such as the one below, it takes over 10 hours and I end up killing it. Does anyone see any mistakes that I'm making, of have any suggestions as to how I could optimize the query?
SELECT
app_id,
MAX(time_started_dt),
MIN(time_started_dt),
COUNT(*)
FROM
connections
GROUP BY
app_id
I suggest you create a composite index on (app_id, time_started_dt):
ALTER TABLE connections ADD INDEX(app_id, time_started_dt)
To get that query to perform, you really need a suitable covering index, with app_id as the leading column, e.g.
CREATE INDEX `connections_IX1` ON `connections` (`app_id`,` time_start_dt`);
NOTE: creating the index may take hours, and the operation will prevent insert/update/delete to the table while it is running.
An EXPLAIN will show the proposed execution plan for your query. With the covering index in place, you'll see "Using index" in the plan. (A "covering index" is an index that can be used by MySQL to satisfy a query without having to access the underlying table. That is, the query can be satisfied entirely from the index.)
With the large number of rows in this table, you may also want to consider partitioning.
I have tried your query on randomly generated data (around 1 million rows). Adding PRIMATY KEY will improve performance of your query by 10%.
As already suggested by other people composite index should be added to the table. Index time_started_dt is useless.
CREATE TABLE `connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `composite_idx` (`app_id`,`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;