Mysql Limit Performance - mysql

I have a large table in mysql, about 1 million records.
I'm using a dynamic query with different parameters in where clause and order, so i cant use some code like AND id > 34000 LIMIT 10
I have index on my fields in WHERE and LIMIT and ORDER but index doesn't help alone.
I need a better way than LIMIT 34000, 10, Is there any way to slove offset delay?
I put my table schema but i just copy more usable field without any index, because i'm using dynamic queries.
CREATE TABLE IF NOT EXISTS `p_apartmentbuy` (
`property_id` mediumint(8) unsigned NOT NULL,
`dateadd` int(10) unsigned NOT NULL,
`sqm` smallint(5) unsigned NOT NULL,
`sqmland` smallint(5) unsigned NOT NULL,
`age` tinyint(2) unsigned NOT NULL,
`price` bigint(12) unsigned NOT NULL,
`pricemeter` int(11) unsigned NOT NULL,
`floortotal` tinyint(3) unsigned NOT NULL,
`floorno` tinyint(3) unsigned NOT NULL,
`unittotal` smallint(4) unsigned NOT NULL,
`unitthisfloor` tinyint(3) unsigned NOT NULL,
`room` tinyint(1) unsigned NOT NULL,
`parking` tinyint(1) unsigned NOT NULL,
`renovate` tinyint(1) unsigned NOT NULL,
`address` varchar(255) COLLATE utf8_general_ci NOT NULL,
`describe` varchar(500) COLLATE utf8_general_ci NOT NULL,
`featured` tinyint(1) unsigned NOT NULL,
`l_location_id` smallint(5) unsigned NOT NULL,
`l_city_id` smallint(4) unsigned NOT NULL,
`pf_furnished_id` tinyint(2) unsigned NOT NULL,
PRIMARY KEY (`property_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

the problem with a table with 1 mill records wont be the AND id > 34000 LIMIT 10 or LIMIT 34000, 10 that will up to the Structure and the rest of the query. I.E, you need index, PK, FK to speed up the query, beside that an Order by probably will slow it down, make search like '%text%' it will make your query SLOW. Also it's up to the table's Engine
So don't expect that changing limit 10 will make a huge difference. There are a couple of tool that will help you to determinate a 'better' query, but not all queries works as the same so don't expect the "best solution" because it doesn't exists.
You can use Show create table or Describe select ...... or explain to see what's going on, or use the command benchmark to see the approximate time of a function that you are applying to improve it
EDIT:
Some tools for MySQL
I'll recommend you to take a look to this program that will help you with this part of performance.
Mysqlslap (it's like benchmark but you can customize more the result).
SysBench (test CPUperformance, I/O performance, mutex contention, memory speed, database performance).
Mysqltuner (with this you can analize general statistics, Storage engine Statistics, performance metrics).
mk-query-profiler (perform analysis of a SQL Statement).
mysqldumpslow (good to know witch queries are causing problems).

MySQL is able to optimize LIMIT clauses (i.e. only scan / evaluate the rows in the range specified by LIMIT) if it is able to use only indexes to find rows matching the query.
For queries like SELECT * FROM users WHERE active = 1 ORDER BY created_at, adding and index on (active, created_at) is enough.
See http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/

Related

MySQL Query Optimization that touches three tables via a union of two of them

I have a query that returns results from a single table based on the provided ID existing in a column in one of two, or both, tables. The DB schema for the relevant tables is provided below as well as the initial query and then what was later recommended to me by a peer. I go into some details below as to why this query works but I need to optimize it farther for larger datasets and pagination.
CREATE TABLE `killmails` (
`id` BIGINT(20) UNSIGNED NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`moon_id` BIGINT(20) NULL DEFAULT NULL,
`solar_system_id` BIGINT(20) UNSIGNED NOT NULL,
`war_id` BIGINT(20) NULL DEFAULT NULL,
`is_npc` TINYINT(1) NOT NULL DEFAULT '0',
`is_awox` TINYINT(1) NOT NULL DEFAULT '0',
`is_solo` TINYINT(1) NOT NULL DEFAULT '0',
`dropped_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`destroyed_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`fitted_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`total_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`killmail_time` DATETIME NOT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`, `hash`),
INDEX `total_value` (`total_value`),
INDEX `killmail_time` (`killmail_time`),
INDEX `solar_system_id` (`solar_system_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_attackers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_done` BIGINT(20) UNSIGNED NOT NULL,
`final_blow` TINYINT(1) NOT NULL DEFAULT '0',
`security_status` DECIMAL(17,15) NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`weapon_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `weapon_type_id` (`weapon_type_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_attackers_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_victim` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_taken` BIGINT(20) UNSIGNED NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NOT NULL,
`ship_value` DECIMAL(18,4) NOT NULL DEFAULT '0.0000',
`pos_x` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_y` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_z` DECIMAL(30,10) NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_victim_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This first query is where the problem started:
SELECT
*
FROM
killmails k
LEFT JOIN killmail_attackers ka ON k.id = ka.killmail_id
LEFT JOIN killmail_victim kv ON k.id = kv.killmail_id
WHERE
ka.character_id = ?
OR kv.character_id = ?
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
This worked okay, but long query times. We optimized to this
SELECT
killmails.*,
FROM (
SELECT killmail_victim.killmail_id FROM killmail_victim
WHERE killmail_victim.corporation_id = ?
UNION
SELECT killmail_attackers.killmail_id FROM killmail_attackers
WHERE killmail_attackers.corporation_id = ?
) SELECTED_KMS
LEFT JOIN killmails ON killmails.id = SELECTED_KMS.killmail_id
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
I saw a huge improvement in query times when looking up killmails for characters, however when I started querying for larger datasets like corporation and alliance killmails, the query slows down. This is because the queries that are union'd together can potentially return large sets of data and the time it takes to read all that into memory so that the SELECTED_KMS table can be created is what I believe is taking so much time. Most of the time, with alliances, my connection to the database times out from the application. One alliance returned 900K killmailIDs from one of the union'd tables, not sure what the other returned.
I can easily add limit statements to the internal queries, but this will introduce a lot of complications when I get to paginating the data or when I introduce a feature to search for KMs by date for example.
I am looking for suggestions on how this query can be optimized and still allow for easy pagination in the near future.
Thank You
Change INDEX(corporation_id) in both tables to INDEX(corporation_id, killmail_id) so that the inner queries will be "covering".
In general, INDEX(a) is useless when you also have INDEX(a,b). Any query that needs just a, can use either of those indexes. (This rule does not apply to b; only the "leftmost" column(s).)
Where does killmails.id come from? It's not AUTO_INCREMENT; it is not alone in the PRIMARY KEY, so there is no specified "uniqueness" constraint. Is it unique by some other design? Is it computed somewhere else in the code? (I ask because I need a feel for its uniqueness and other characteristics.)
Add INDEX(id, killmails_time).
What version are you using?
Perhaps UNION ALL give the same results? It would be faster because it would not need to de-dup.
How much RAM do you have? What is the value of innodb_buffer_pool_size?
Do you really need 8-byte BIGINTs? Even if your application is using longlong (or whatever it calls it), you can probably change the schema without changing the app.
Do you need this much precision and range? DECIMAL(30,10) -- it takes 14 bytes each. DOUBLE would give you about 16 significant digits in 8 bytes, with a wider range of values (up to about 10^308). What "units" are you using? (Overkill for light-years or parsecs; inadequate for miles or km. Perhaps AUs? Then the bottom digit would be a precision of a few meters?)
The last few questions are aimed at shrinking the table and seeing if we can avoid it being as I/O-bound as it apparently is now.
Important
innodb_buffer_pool_size = 128M is terribly small, especially for a 32GB machine, and especially if your dataset is much bigger than 128MB. If there are not any other apps running on the server, bump that setting up to 20G.

INSERT INTO SELECT takes long time on cluster

My mysql cluster: Ver 5.6.30-76.3-56 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel76.3, Revision aa929cb, WSREP version 25.16, wsrep_25.16)
I've a complicated sql query which inserts for about 36k rows into a table with this syntax:
INSERT INTO `sometable` (SELECT ...);
The select is a bit complicated but not slow (0.0023s) but the insert takes about 40-50s. The table is not in use when I'm inserting the rows.
My questions are:
Can I speed it up somehow?
The slow insert causes locking problems on the other tables (because of select)
This workflow is good or bad practice? Is there any better?
Thanks
UPDATE:
The table schema:
CREATE TABLE `sometable` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned DEFAULT NULL,
`a` varchar(255) DEFAULT NULL,
`b` smallint(6) unsigned DEFAULT NULL,
`c` smallint(6) unsigned DEFAULT NULL,
`d` smallint(6) unsigned DEFAULT NULL,
`e` smallint(6) unsigned DEFAULT NULL,
`f` varchar(255) DEFAULT '',
`country_id` int(10) unsigned DEFAULT NULL,
`city_id` int(10) unsigned DEFAULT NULL,
`g` smallint(6) unsigned DEFAULT NULL,
`h` smallint(6) unsigned DEFAULT NULL,
`i` smallint(6) unsigned DEFAULT NULL,
`j` smallint(6) unsigned DEFAULT NULL,
`k` smallint(6) unsigned DEFAULT NULL,
`l` varchar(3) DEFAULT NULL,
`m` varchar(3) DEFAULT NULL,
`n` text,
`o` varchar(255) DEFAULT NULL,
`p` varchar(32) DEFAULT NULL,
`q` varchar(32) DEFAULT NULL,
`r` varchar(32) DEFAULT NULL,
`s` time DEFAULT NULL,
`t` time DEFAULT NULL,
`u` text,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `country_id` (`country_id`),
KEY `city_id` (`city_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
UPDATE2:
When I try to run the query I get an error in some cases:
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
MY SOLUTION:
Here is my final solution if somebody interested in:
gist
The main problem was that while I fill mytable the other queries are stuck and the cluster had serious performance problems. In this solution I create a temporary table and fill it with data in "dirty read" mode, then I copy these data to mytable in chunks so it takes a bit more time but there is no performance problem and not stuck the queries.
A SELECT operation that returns a row of the length you describe every 64 nanoseconds is very fast. That's what 36 kilorows in 2.3 milliseconds works out to. It seems likely that your SELECT query timing doesn't account for the transport of the result set to the MySQL client. At any rate, using that performance as a comparison to an INSERT operation sets your expectations unreasonably high.
You might try issuing this command before starting your operation. It will allow your SELECT operation to proceed with fewer contentions with your application's traffic on the source tables for the SELECT. See here https://dev.mysql.com/doc/refman/5.7/en/set-transaction.html
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
You might try a two step process, involving a temporary table. This will have the advantage of not having to update all the indexes in some_table at the same time as the SELECT operation. That operation will look like this.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TEMPORARY TABLE insert_batch AS SELECT ... ;
INSERT INTO some_table SELECT * FROM insert_batch;
DROP TEMPORARY TABLE insert_batch;
You should understand that InnoDB posts your batch of insertions to your table as a single transaction. If you can do this in a way that handles about 500 rows at a time rather than 36K, you'll have more transactions, but they will be smaller. That's generally a way to get higher throughput.
If all else fails, this may be a viable solution. First, see http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
Load your corrections into a temp table (or non-replicated MyISAM table).
Loop through the temp table (using code similar to that link). Pick 100 rows at a time.
Do the INSERT ... SELECT ... of 100 rows in a separate transaction.
This technique may (or may not) take longer than 40-50s, but at least is much less likely to timeout or deadlock.
In general, avoid running any transaction that lasts longer than a few seconds. This link is somewhat generic on how to "chunk" lengthy (and repetitive) operations to avoid long transactions.

MySQL OUTER LEFT JOIN performance

I am updating an existing web-based inventory system that pulls data from a MySQL database. The main structures for the data stored are "items" and "tags" with a one-to-many relationship (items can have multiple corresponding tags)
The existing front-end system for the data is a Backbone.js app that pulls the entire datastore on login and manipulates that data in-memory, committing back to the database when necessary via a RESTful interface. (This is not how I would have designed the system, but it is now a common pattern in Backbone and Spine apps, and how most all of the tutorials and books teach these frameworks).
To serve the initial fetch performed by the front-end in which it captures the entire dataset (about 1000 items and 10,000 item tags at this point) the back-end performs a SELECT query for the items table, and then subsequent SELECT queries for tags table for each item fetched. Performance sucks, obviously. I thought this could be improved with an JOIN, figuring one select query is better than 1000. The following query fetches the data I need but takes over 15s to execute even on my local development server. What gives? Can we improve this system or query without setting up additional infrastructure like a caching key-value store?
SELECT items.*, itemtags.id as `tag_id`, itemtags.tag, itemtags.type
FROM items LEFT OUTER JOIN
itemtags
ON items.id = itemtags.item_id
ORDER BY items.id;
Here are the table structures:
CREATE TABLE `items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`num` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`length_inches` int(10) unsigned DEFAULT NULL,
`length_feet` int(10) unsigned DEFAULT NULL,
`width_inches` int(10) unsigned DEFAULT NULL,
`width_feet` int(10) unsigned DEFAULT NULL,
`height_inches` int(10) unsigned DEFAULT NULL,
`height_feet` int(10) unsigned DEFAULT NULL,
`depth_inches` int(10) unsigned DEFAULT NULL,
`depth_feet` int(10) unsigned DEFAULT NULL,
`retail_price` int(10) unsigned DEFAULT NULL,
`discount` int(10) unsigned DEFAULT NULL,
`decorator_price` int(10) unsigned DEFAULT NULL,
`new_price` int(10) unsigned DEFAULT NULL,
`sold` int(10) unsigned NOT NULL,
`push_date` int(10) unsigned DEFAULT NULL,
`updated` int(10) unsigned NOT NULL,
`created` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1747 DEFAULT CHARSET=latin1;
CREATE TABLE `itemtags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`item_id` int(10) unsigned NOT NULL,
`tag` varchar(100) NOT NULL,
`type` varchar(100) NOT NULL,
`created` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=61474 DEFAULT CHARSET=latin1;
I think you could use this:
SELECT *, a.id as `tag_id`, a.tag, a.type
FROM items LEFT OUTER JOIN
(SELECT id, item_id, tag, type from itemtags ORDER BY 1,2,3) a
ON items.id = a.item_id
ORDER BY items.id;
I didn't really change much, just the alias. a doesn't signify anything important.
I didn't fill the tables but your original query took 4ms, mine took 1ms.
http://sqlfiddle.com/#!2/b9551/6
Your application can pull the entire data store, irregardless of what you have in your data-set. As data store and data set are not synonymous.
You don't have any indexes either. You should put an index on ID, ITEM_ID in order to optimize the table to return results quicker. I created an index in my sub-query with the order by. Hope this helps.
In terms of performance, you are probably not comparing like-to-like.
The SQL query is completely doing the following things:
Joining the two tables together
Sorting the results by items.id
Returning all the results
Is the original version doing all three of these and waiting until they are completed?
My guess is that the original code is pulling the items back in the order you want them, and then only pulling the tags for a handful that are actually needed at any given time.
In addition, it is unclear how large the items.* data is. The way the query is formulated, you are pulling this about 10 times for each item -- potentially a much larger return set than the original data.
The real question is why you need all this information in the memory of the application. You have the database, just pull back what you need when you need it. Are you familiar with limit and offset -- these may be what you are really looking for.

Mysql partition indexing

I want to create a table from batch data for data mining purposes. I will have about 25 million rows of data a day going into this table. There are several indices defined on the table, so the insertion (I do batch insertions) speed is quite slow. With no indices I can stick 40K rows, while with indices it is more like 3-4 K, which makes this whole thing infeasible. So the idea is to partition the data by day, disable the keys, and then do the day's insertions, and reenable the indices. Reenabling indices on a day's worth of data takes, say, 20 minutes, which is fine. This takes me to my question. When you reenable the indices, will it have to recalculate the indices on all partions, or just for that day? It seems clear that for the index that the partitions are on (date in this case), it should be for that day only. But how about the other indices? If it needs to recalculate the indices for all partitions, there is no way it can be done in a reasonable amount of time. Does anyone know?
Show create is like this:
sts | CREATE TABLE `sts` (
`userid` int(10) unsigned DEFAULT NULL,
`urlid` int(10) unsigned DEFAULT NULL,
`geoid` mediumint(8) unsigned DEFAULT NULL,
`cid` mediumint(8) unsigned DEFAULT NULL,
`m` smallint(5) unsigned DEFAULT NULL,
`t` smallint(5) unsigned DEFAULT NULL,
`d` tinyint(3) unsigned DEFAULT NULL,
`requested` int(10) unsigned DEFAULT NULL,
`rate` tinyint(4) DEFAULT NULL,
`mode` varchar(12) DEFAULT NULL,
`session` smallint(5) unsigned DEFAULT NULL,
`sins` smallint(5) unsigned DEFAULT NULL,
`tos` mediumint(8) unsigned DEFAULT NULL,
PRIMARY KEY (userid, urlid, requested),
KEY `id_index` (`m`),
KEY `id_index2` (`t`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
It is not currently partitioned.
You disable/enable index on a table. It means that index will be disabled/enables on all parts of the table.
Consider this scenario for loading new data:
Create a staging table defining all partitions you will need
Load data into staging table without indexes.
Create indexes on this table.
Move partition to target table, which is partitioned same as staging table.
Drop indexes on staging table
To partition your existing data in controllable manner you can use same logic to move data to new partitioned table.

MySQL indexes creation strategy and inner logic

This question expects a generic answer to the wide problematic of indexes creation on MySQL database.
Let's take this table example :
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) NOT NULL DEFAULT '0',
`author_id` int(11) unsigned NOT NULL,
`modificator_id` int(11) unsigned DEFAULT NULL,
`category_id` int(11) unsigned DEFAULT NULL,
`title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`headline` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`url_alias` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`priority` mediumint(11) unsigned NOT NULL DEFAULT '50',
`publication_date` datetime NOT NULL,
`creation_date` datetime NOT NULL,
`modification_date` datetime NOT NULL,
PRIMARY KEY (`id`)
);
Over such a sample there is a wide range of queries that could be performed on different criterions :
category_id
published
publication_date
e.g.:
SELECT id FROM article WHERE NOT published AND category_id = '2' ORDER BY publication_date;
On many tables you can see a wide range of state fields (like published here), date fields or reference fields (like author_id or category_id). What strategy should be picked to make indexes ?
Which can be developed under the following points:
Make an index on every fields that can be used in query (either as where argument or order by) even if this can lead to have a lot of indexes per table ?
Also make an index on fields that have only a small set of values like boolean or enum, this just does reduce the scope size of the scan by a n factor (assuming n being the number of inputs and every value homogeneously used) ?
I've read that MySQL prior to 5.0 used only one index per request how do the system picks it ? (by choosing the more restrictive one ?)
How does a OR statement is processed ?
How much does this is going to slow insert ?
Does InnoDB/MyISAM change anything to this problem ?
I know the EXPLAIN statement could be used to know whether a request is optimized or not, but a bit of concrete theoretical stuff would really be more constructive than a purely empirical approach !