I'm dealing with a table that where new rows are added several times per minute. Just as often do I need to run the query below, which I believe is the cause for recent lag spikes.
Is there any way I can make this query more efficient?
SELECT IFNULL((SELECT SUM(amount)
FROM transactions
WHERE to_account=:account), 0) - IFNULL((SELECT SUM(amount)
FROM transactions WHERE from_account=:account), 0) as balance;
CREATE TABLE IF NOT EXISTS `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_account` int(11) NOT NULL,
`to_account` int(11) NOT NULL,
`amount` int(11) unsigned NOT NULL,
`fee` int(11) NOT NULL DEFAULT '0'
`ip` varchar(39) DEFAULT NULL,
`time` int(10) NOT NULL,
`type` enum('CLAIM','REFERRAL','DEPOSIT','WITHDRAWAL') NOT NULL,
`is_processed` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `from_account` (`from_account`),
KEY `to_account` (`to_account`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=24099 ;
It looks like your two table-reading queries are
SELECT SUM(amount)
FROM transactions
WHERE from_account = constant
and
SELECT SUM(amount)
FROM transactions
WHERE to_account = constant
The first of these can be optimized by a compound index on (from_account, amount) because it allows the query to be satisfied by an index range scan. The second query can be satisfied with a similar index. You can get rid of your indexes on from_account and to_account, replacing them with these compound indexes.
It is still possible that you have contention between your INSERT and SELECT queries. But indexing for the SELECTs should reduce that.
Ok, the lag as you experience might not be 100% preventable, but there is a small index trick which can reduce the chance on this lag:
Use a multi column index on all the values you use and or return in your query.
In your case that would be:
CREATE INDEX idx_nn_1 ON transactions(from_account,amount);
CREATE INDEX idx_nn_2 ON transactions(to_account,amount);
What this does is take care that the real records from the database do not have to be read to get you your result.
There is a drawback: Updating/inserting records in a multi column index is slower then in a single column index. The trade off here can only be determined by testing.
Also, simplify the query:
SELECT
( SELECT IFNULL(SUM(amount), 0) FROM transactions WHERE to_account = ... ) -
( SELECT IFNULL(SUM(amount), 0) FROM transactions WHERE from_account = ... )
AS balance;
The table does not seem vary big, but check your setting of innodb_buffer_pool_size.
Related
I have a query that is taking an embarrassingly long time. ~7 minutes embarrassing. I would really appreciate some help. Missing indexes? Rewrite the query? All of the above?
Many thanks
mysql Ver 14.14 Distrib 5.7.25, for Linux (x86_64)
The query looks like:
SELECT COUNT(*) AS count_all, name
FROM api_events ae
INNER JOIN products p on p.token=ae.product_token
WHERE (ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY name
Here are the two table definitions
api_events has ~31 million records
CREATE TABLE `api_events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`api_name` varchar(200) NOT NULL,
`hostname` varchar(200) NOT NULL,
`controller_action` varchar(2000) NOT NULL,
`duration` decimal(12,5) NOT NULL DEFAULT '0.00000',
`view` decimal(12,5) NOT NULL DEFAULT '0.00000',
`db` decimal(12,5) NOT NULL DEFAULT '0.00000',
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`product_token` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `product_token` (`product_token`)
) ENGINE=InnoDB AUTO_INCREMENT=64851218 DEFAULT CHARSET=latin1;
and
products has only 12 records
CREATE TABLE `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(30) NOT NULL,
`name` varchar(100) NOT NULL,
`description` varchar(2000) NOT NULL,
`token` varchar(50) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=19 DEFAULT CHARSET=latin1;
You could improve the join performance adding index
create index idx1 on api_events(product_token, created_at);
create index idx2 on products(token);
You could also trying inverting the columns ofr api_events
create index idx1 on api_events(created_at, product_token);
and trying add redundancy to product index
create index idx2 on products(token, name);
For the query as stated, you needed
api_events: INDEX(created_at, product_token)
products: INDEX(token, name)
Because the WHERE mentions api_events, the Optimizer is likely to start with that table. created_at is in the WHERE, so the index starts with that, even though starting with a 'range' is usually wrong. In this case, the pair is "covering".
Then, INDEX(token, name) is also "covering".
"Covering" indexes give a small, but widely varying, amount of performance improvement.
What happens if you group by the token instead of the name?
SELECT ae.product_token, COUNT(*) AS count_all
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY ae.product_token;
For this query, an index on api_events(created_at, product_token) will probably help.
If this is faster, then you can bring in the name using a subquery.
It seems like the criteria on created_at is very selective (looking at only the past 7 days?). That's crying out to explore an index with created_at as a leading column.
The query is also referencing the product_token column from the same table, so we can include that column in the index, to make it a covering index.
api_events_IX ON api_events ( created_at, product_token )
Using that index, we can probably avoid looking at the vast majority of the 31 million rows, and quickly narrow in on the subset of rows we actually need to look at.
Using the index, the query will still need a "Using filesort" operation to satisfy the GROUP BY.
(My guess here is that the join to the 12 rows in product doesn't exclude a lot of rows... that on the vast majority of rows in api_event the product_token refers to a row that exists in product.
Use MySQL EXPLAIN to see the query execution plan.
A further possible refinement (to test the performance of) would be to do some of the aggregation in an inline view:
SELECT SUM(s.count_all) AS count_all
, p.name
FROM ( SELECT COUNT(*) AS count_all
, ae.product_token
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732'
GROUP
BY ae.product_token
) s
JOIN products p
ON p.token = s.product_token
GROUP
BY p.name
If the assumption about product_token is misinformed, if there are lots of rows in api_event that have product_token values that don't reference a row in product ... we might take a different tack ...
I have a setup like such:
using mysql 5.5
a building has a set of measurement equipment
measurements are stored in measurements -table
there are multiple different types
users can have access to either a whole building, or a set of equipment
a few million measurements
The table creation looks like this:
CREATE TABLE `measurements` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`building_id` INT(10) UNSIGNED NOT NULL,
`equipment_id` INT(10) UNSIGNED NOT NULL,
`state` ENUM('normal','high','low','alarm','error') NOT NULL DEFAULT 'normal',
`measurement_type` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `building_id` (`building_id`),
INDEX `equipment_id` (`equipment_id`),
INDEX `state` (`state`),
INDEX `timestamp` (`timestamp`),
INDEX `measurement_type` (`measurement_type`),
INDEX `building_timestamp_type` (`building_id`, `timestamp`, `measurement_type`),
INDEX `building_type_state` (`building_id`, `measurement_type`, `state`),
INDEX `equipment_type_state` (`equipment_id`, `type`, `state`),
INDEX `equipment_type_state_stamp` (`equipment_id`, `measurement_type`, `state`, `timestamp`),
) COLLATE='utf8_unicode_ci' ENGINE=InnoDB;
Now I need to query for the last 50 measurements of certain types that the user has access to. If the user has access to a whole building, the query runs very, very fast. However, if the user only has access to separate equipments, the query execution time seems to grow linearly with the number of equipment_ids. For example, I tested having only 2 equipment_ids in the IN query and the execution time was around 10ms. At 130 equipment_ids, it took 2.5s. The query I'm using looks like this:
SELECT *
FROM `measurements`
WHERE
`measurements`.`state` IN ('high','low','alarm')
AND `measurements`.`equipment_id` IN (
SELECT `equipment_users`.`equipment_id`
FROM `equipment_users`
WHERE `equipment_users`.`user_id` = 1
)
AND (`measurements`.`measurement_type` IN ('temperature','luminosity','humidity'))
ORDER BY `measurements`.`timestamp` DESC
LIMIT 50
The query seems to favor the measurement_type index which makes it take 15seconds, forcing it to use the equipment_type_state_stamp index drops that down to the listed numbers. Still, why does the execution time go up linearly with the number of ids, and is there something I could do to prevent that?
A table with a few Million rows, something like this:
my_table (
`CONTVISITID` bigint(20) NOT NULL AUTO_INCREMENT,
`NODE_ID` bigint(20) DEFAULT NULL,
`CONT_ID` bigint(20) DEFAULT NULL,
`NODE_NAME` varchar(50) DEFAULT NULL,
`CONT_NAME` varchar(100) DEFAULT NULL,
`CREATE_TIME` datetime DEFAULT NULL,
`HITS` bigint(20) DEFAULT NULL,
`UPDATE_TIME` datetime DEFAULT NULL,
`CLIENT_TYPE` varchar(20) DEFAULT NULL,
`TYPE` bigint(1) DEFAULT NULL,
`PLAY_TIMES` bigint(20) DEFAULT NULL,
`FIRST_PUBLISH_TIME` bigint(20) DEFAULT NULL,
PRIMARY KEY (`CONTVISITID`),
KEY `cont_visit_contid` (`CONT_ID`),
KEY `cont_visit_createtime` (`CREATE_TIME`),
KEY `cont_visit_publishtime` (`FIRST_PUBLISH_TIME`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=57676834 DEFAULT CHARSET=utf8
I had a query that I have managed to optimize to the following departing from a flat select:
SELECT a.cont_id, SUM(a.hits)
FROM (
SELECT cont_id,hits,type,first_publish_time
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time>1398310263000
AND type=1) as a group by a.cont_id
order by sum(HITS) DESC LIMIT 10;
Can this be further optimized?
Edit:
I started with a FLAT select like I mentioned before, what I mean by flat select not to have a composite select like my current one. Instead of the single select that someone responded with. A single select is twice slower, so not viable in my case.
Edit2: I have a DBA friend who suggested me to change the query to this:
SELECT a.cont_id, SUM(a.hits)
FROM (
SELECT cont_id,hits
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time>1398310263000
AND type=1) as a group by a.cont_id
order by sum(HITS) DESC LIMIT 10;
As I do not need the fields extra (type,first_publish_time) and the TMP table is smaller, this makes the query faster about about 1/4 total time of the fastest version I have. He also suggested to add a composite index between (create_time, cont_id, hits). He says with this index I will get really good performance, but I have not done that as this is a production DB and the alter might affect replication. I will post results once done.
INDEX(type, first_publish_time)
INDEX(type, create_time)
Then do
SELECT cont_id, SUM(hits) AS tot_hits
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time > 1398310263000
AND type = 1
group by cont_id
order by tot_hits DESC
LIMIT 10;
Start the index with any = filters (type, in this case); then you get one chance to us a range.
The reason for 2 indexes -- The Optimizer will look at statistics and decide which look better based on the values given.
Consider shrinking the BIGINTs (8 bytes) to some smaller INT type. Saving space will help speed, especially if the table is too big to be cached.
For further discussion, please provide EXPLAIN SELECT ...;.
I am optimising my queries and found something I can't get my head around.
I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:
SELECT `c`.`id` AS `category.id`,
(SELECT `alias`
FROM `aliases`
WHERE category_id = c.id
AND `old` = 0
AND `lang_id` = 1
ORDER BY `id` DESC
LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` = 1 AND `c`.`parent_id` = '11';
There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.
Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.
Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.
So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:
The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!
Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!
Create table schema
CREATE TABLE `aliases` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lang_id` int(2) unsigned NOT NULL DEFAULT '1',
`alias` varchar(255) DEFAULT NULL,
`product_id` int(10) unsigned DEFAULT NULL,
`category_id` int(10) unsigned DEFAULT NULL,
`brand_id` int(10) unsigned DEFAULT NULL,
`page_id` int(10) unsigned DEFAULT NULL,
`campaign_id` int(10) unsigned DEFAULT NULL,
`old` tinyint(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `category_id` (`category_id`),
KEY `page_id` (`page_id`),
KEY `alias_product_id` (`product_id`,`alias`),
KEY `alias_category_id` (`category_id`,`alias`),
KEY `alias_page_id` (`page_id`,`alias`),
KEY `alias_brand_id` (`brand_id`,`alias`),
KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
SELECT ...
WHERE x=1 AND y=2
ORDER BY id DESC
LIMIT 1
will be performed in one of several ways.
Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...
INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.
Some of the EXPLAIN clues:
Using where -- does not say much
Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.
Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
With the CREATE TABLE, I may have suggestions on how to improve the index.
I have the following table in MySQL:
CREATE TABLE `history` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`code` CHAR(32) NOT NULL,
`value` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `timestamp_code` (`timestamp`, `code`),
INDEX `code` (`code`),
INDEX `timestamp` (`timestamp`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;
I would like to know what is the best practice in order to access the last available value before a certain date for a certain set of codes the most efficiently?
So far I came up with the following query:
SELECT h.* FROM history h
JOIN (
SELECT code, MAX(timestamp) as 'last_ts'
FROM history WHERE
timestamp < '2015-09-04 13:50:00' AND
code IN ('119813249', '12087792', '12087797',
'127012151', '131014335', '131014378',
'132757371', '15016059', '15016062',
'150250238', '153462747', '155802712',
'156974389', '162277696', '166330444',
'166483001', '167220356', '167264923',
'167867931', '172283682', '177539478',
'177583937', '177648754', '177649011',
'187532416', '189230667', '70273253',
'70342790', '79342386', '82460282',
'98693280', '98693380')
GROUP BY code) last_price
ON last_price.last_ts = h.timestamp
AND last_price.code = h.code
The query above works, but becomes slow as the number of entries in the table grows (100'000'000 rows).
You can download sample data to populate the table.
Create an index by code, timestamp - rather than timestamp, code. This would let mysql sort out codes before looking for the max timestamp per code - and should be much faster. Use explain for verifying that the index is used.
And if you create that index - you should not have to modify your query.