I have this query:
SELECT `country`
FROM `geoip_base`
WHERE 1840344811 BETWEEN `start` AND `stop`
It's badly use index (use, but parse big part of table) and work too slowly.
I tried use ORDER BY and LIMIT, but it hasn't helped.
"start <= 1840344811 AND 1840344811 <= stop" works similar.
CREATE TABLE IF NOT EXISTS `geoip_base` (
`start` decimal(10,0) NOT NULL,
`stop` decimal(10,0) NOT NULL,
`inetnum` char(33) collate utf8_bin NOT NULL,
`country` char(2) collate utf8_bin NOT NULL,
`city_id` int(11) NOT NULL,
PRIMARY KEY (`start`,`stop`),
UNIQUE KEY `start` (`start`),
UNIQUE KEY `stop` (`stop`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Table have 57,424 rows.
Explain for query "... BETWEEN START AND STOP ORDER BY START LIMIT 1":
using key stop and get 24099 rows.
Without order and limit, mysql doesn't use keys and gets all rows.
If your table is MyISAM, you can improve this query using SPATIAL indexes:
ALTER TABLE
geoip_base
ADD ip_range LineString;
UPDATE geoip_base
SET ip_range =
LineString
(
Point(-1, `start`),
Point(1, `stop`)
);
ALTER TABLE
geoip_base
MODIFY ip_range NOT NULL;
CREATE SPATIAL INDEX
sx_geoip_range ON geoip_base (ip_range);
SELECT country
FROM geoip_base
WHERE MBRContains(ip_range, Point(0, 1840344811)
This article may be of interest to you:
Banning IP's
Alternatively, if your ranges do not intersect (and from the nature of the database I except they don't), you can create a UNIQUE index on geoip_base.start and use this query:
SELECT *
FROM geoip_base
WHERE 1840344811 BETWEEN `start` AND `stop`
ORDER BY
`start` DESC
LIMIT 1;
Note the ORDER BY and LIMIT conditions, they are important.
This query is similar to this:
SELECT *
FROM geoip_base
WHERE `start` <= 1840344811
AND `stop` >= 1840344811
ORDER BY
`start` DESC
LIMIT 1;
Using ORDER BY / LIMIT makes the query to choose descending index scan on start which will stop on the first match (i. e. on the range with the start closest to the IP you enter). The additional filter on stop will just check whether the range contains this IP.
Since your ranges do not intersect, either this range or no range at all will contain the IP you're after.
While Quassnoi's answer https://stackoverflow.com/a/5744860/1095353 is perfectly fine. The MySQL function (5.7) MBRContains(g1,g2) does not suit the full range of the IP's when using the select. MBRContains will contain [g1,g2[
not including the g2.
Using MBRTouches(g1,g2) allows for both [g1,g2] to be matched. Having IP blocks written inside the database as start and, stop columns would make this function more viable.
On a database table with ~6m rows (AWS db.m4.xlarge)
SELECT *, AsWKT(`ip_range`) AS `ip_range`
FROM `geoip_base` where `start` <= 1046519788 AND `stop` >= 1046519788;
~ 2-5 seconds
SELECT *, AsWKT(`ip_range`) AS `ip_range`
FROM `geoip_base` where MBRTouches(`ip_range`, Point(0, INET_ATON('XX.XX.XX.XX')));
~ < 0.030 seconds
Source: MBRTouches(g1,g2) - https://dev.mysql.com/doc/refman/5.7/en/spatial-relation-functions-mbr.html#function_mbrtouches
Your table design is off.
You're using decimal but not allowing any zeroes. You immediately spend 5 bytes for storing such a number and simple INT would suffice (4 bytes).
After that, you create compound primary key (5 + 5 bytes) followed by 2 unique constraints (again 5 byte each) effectively making your index file almost the same size as the data file.
That way, no matter what you index is extremely ineffective.
Using LIMIT doesn't force MySQL to use indexes, at least not the way you constructed your query. What will happen is that MySQL will obtain the dataset satisfying the condition and then discard the rows that don't conform to offset - limit.
Also, using MySQL's protected keywords (such as START and STOP) is a bad idea, you should never name your columns using protected keywords.
What would be useful is that you create your primary key as it is and don't index the columns separately.
Also, configuring MySQL to use more memory would speed up execution.
For testing purposes I created a table similar to yours, I defined a compound key of start and stop and used the following query:
SELECT `country` FROM table WHERE 1500 BETWEEN `start` AND `stop` AND start >= 1500
My table is InnoDB type, I have 100k rows inserted, the query examines 87 rows this way and executes in a few milliseconds, my buffer pool size is 90% of the memory at my test machine. That might give insight into optimizing your query / db instance.
SELECT id FROM GEODATA WHERE start_ip <=(select INET_ATON('113.0.1.63')) AND end_ip >=(select INET_ATON('113.0.1.63')) ORDER BY start_ip DESC LIMIT 1;
The above example from Michael J.V. will not work:
SELECT country FROM table WHERE 1500 BETWEEN start AND stop AND start >= 1500
BETWEEN start AND stop
is the same as
start <= 1500 AND end >= 1500
Thus you have start <= 1500 AND start >= 1500 in the same clause. So, only way it will succeed is if start=1500 and therefore the optimizer knows to use the start index.
Related
Specs:
MySQL version: 5.6.19 (Ubuntu)
Also tried MariaDB, and got the same problem
Table:
CREATE TABLE `x` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`a` INT(10) UNSIGNED NOT NULL,
`time` DECIMAL(16,6) NOT NULL,
PRIMARY KEY (`id`),
INDEX `a` (`a`),
INDEX `time` (`time`),
INDEX `time_a` (`time`, `a`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
AUTO_INCREMENT=298846
;
Query:
SELECT COUNT(DISTINCT `a`) c
FROM `x`
WHERE `time` >= (UNIX_TIMESTAMP()- (60 * 24));
This query is very SLOW, if there are a lot of rows with time in the given range. Also note that while there might be a lot of matching rows (thousands or tens of thousands or more), the amount of DISTINCT a's is always rather small (a few hundred).
The query is fast (basically instant), no matter the size of the table, when:
there are only few rows with time in the given range or when
there is no WHERE part (because of the index on a)
That makes me think that it is somehow unable to use the index on a when counting, even though EXPLAIN mentions all three indices in possibly_keys.
The issue remains even if:
time is of type BIGINT or DATETIME (with corresponding changes to the query)
ENGINE=MyISAM
Any suggestions?
SELECT COUNT(DISTINCT `a`)
FROM `x`;
will leapfrog through INDEX(a). See EXPLAIN FORMAT=JSON SELECT ... and look for "using_index_for_group_by": true. This makes it quite fast when there are only a small number of distinct a values.
I suspect that using the WHERE clause will say "using_index_for_group_by": "scanning", implying that it is less efficient. I suspect the implementers did the single-key case, but not the multi-key case.
Was that the entire table definition? I see AUTO_INCREMENT without any index for it. What's up? About the only difference between MyISAM and InnoDB that is relevant to this discussion is the handling of the PRIMARY KEY.
The datatype of time is probably not significant.
If I have not satisfied your "Any suggestions?" question, please rephrase the question.
Try using index hinting to force the query to use the index you want it to use.
SELECT COUNT(DISTINCT `a`) c
FROM `x` FORCE INDEX (the_index_you_want_to_use)
WHERE `time` >= (UNIX_TIMESTAMP()- (60 * 24));
It is best not to do any computations in such where clauses.
var unixtime = UNIX_TIMESTAMP()- (60 * 24)
SELECT COUNT(DISTINCT `a`) c
FROM `x` FORCE INDEX (the_index_you_want_to_use)
WHERE `time` >= unixtime
If I had to guess, the problem is types. UNIX_TIMESTAMP() returns an unsigned integer. Your time variable is decimal. These are not the same thing. And, type mismatches can confuse the optimizer.
It sounds like the table is big, so changing the type is not feasible (you might want to test this, though, if you can by selecting the data into a new table with the right types).
The following might help:
WHERE `time` >= cast(UNIX_TIMESTAMP() - (60 * 24) as unsigned);
You could also declare a local unsigned variable and store the "constant" in the variable to see if that fixes the performance problem.
Finally, if the index on time, a is not being used, try this variation of the query:
SELECT COUNT(*) as c
FROM (SELECT DISTINCT a
FROM `x`
WHERE `time` >= CAST(unixtime - 24 * 60 as unsigned)
) ax
I have seen this re-structuring improve performance on other databases, although not on MySQL.
EDIT: Thank you everyone for your comments. I have tried most of your suggestions but they did not help. I need to add that I am running this query through Matlab using Connector/J 5.1.26 (Sorry for not mentioning this earlier). In the end, I think this is the source of the increase in execution time since when I run the query "directly" it takes 0.2 seconds. However, I have never come across such a huge performance hit using Connector/J. Given this new information, do you have any suggestions? I apologize for not disclosing this earlier, but again, I've never experienced performance impact with Connector/J.
I have the following table in mySQL (CREATE code taken from HeidiSQL):
CREATE TABLE `data` (
`PRIMARY` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`ID` VARCHAR(5) NULL DEFAULT NULL,
`DATE` DATE NULL DEFAULT NULL,
`PRICE` DECIMAL(14,4) NULL DEFAULT NULL,
`QUANT` INT(10) NULL DEFAULT NULL,
`TIME` TIME NULL DEFAULT NULL,
INDEX `DATE` (`DATE`),
INDEX `ID` (`SYMBOL`),
INDEX `PRICE` (`PRICE`),
INDEX `QUANT` (`SIZE`),
INDEX `TIME` (`TIME`),
PRIMARY KEY (`PRIMARY`)
)
It is populated with approximately 360,000 rows of data.
The following query takes over 10 seconds to execute:
Select ID, DATE, PRICE, QUANT, TIME FROM database.data WHERE DATE
>= "2007-01-01" AND DATE <= "2010-12-31" ORDER BY ID, DATE, TIME ASC;
I have other tables with millions of rows in which a similar query would take a fraction of a second. I can't figure out what might be causing this one to be so slow. Any ideas/tips?
EXPLAIN:
id = 1
select_type = SIMPLE
table = data
type = ALL
possible_keys = DATE
key = (NULL)
key_len = (NULL)
ref = (NULL)
rows = 361161
Extra = Using where; Using filesort
You are asking for a wide range of data. The time is probably being spent sorting the results.
Is a query on a smaller date range faster? For instance,
WHERE DATE >= '2007-01-01' AND DATE < '2007-02-01'
One possibility is that the optimizer may be using the index on id for the sort and doing a full table scan to filter out the date range. Using indexes for sorts is often suboptimal. You might try the query as:
select t.*
from (Select ID, DATE, PRICE, QUANT, TIME
FROM database.data
WHERE DATE >= "2007-01-01" AND DATE <= "2010-12-31"
) t
ORDER BY ID, DATE, TIME ASC;
I think this will force the optimizer to use the date index for the selection and then sort using file sort -- but there is the cost of a derived table. If you do not have a large result set, this might significantly improve performance.
I assume you already tried to OPTIMIZE TABLE and got no results.
You can either try to use a covering index (at the expense of more disk space, and a slight slowing down on UPDATEs) by replacing the existing date index with
CREATE INDEX data_date_ndx ON data (DATE, TIME, PRICE, QUANT, ID);
and/or you can try and create an empty table data2 with the same schema. Then just SELECT all the contents of data table into data2 and run the same query against the new table. It could be that the data table needed to be compacted more than OPTIMIZE could - maybe at the filesystem level.
Also, check out the output of EXPLAIN SELECT... for that query.
I'm not familiar with mysql but mssql so maybe:
what about to provide index which fully covers all fields in your select query.
Yes, it will duplicates data but we can move to next point of issue discussion.
I'm trying to run what I believe to be a simple query on a fairly large dataset, and it's taking a very long time to execute -- it stalls in the "Sending data" state for 3-4 hours or more.
The table looks like this:
CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'
And the query is here:
select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;
The table has approximately 222 million rows, and the where clause in the query filters down to about 20 million rows. The distinct option will bring it down to about 700,000 distinct rows, and then after grouping, (and when the query finally finishes), 4 to 5 rows are actually returned.
I realize that it's a lot of data, but it seems that 4-5 hours is an awfully long time for this query.
Thanks.
Edit: For reference, this is running on AWS on a db.m2.4xlarge RDS database instance.
Why don't you profile a query and see what exactly is happening?
SET PROFILING = 1;
SET profiling_history_size = 0;
SET profiling_history_size = 15;
/* Your query should be here */
SHOW PROFILES;
SELECT state, ROUND(SUM(duration),5) AS `duration (summed) in sec` FROM information_schema.profiling WHERE query_id = 3 GROUP BY state ORDER BY `duration (summed) in sec` DESC;
SET PROFILING = 0;
EXPLAIN /* Your query again should appear here */;
I think this will help you in seeing where exactly query takes time and based on result you can perform optimization operations.
This is a really heavy query. To understand why it takes so long you should understand the details.
You have a range condition on the indexed field, that is MySQL finds the smallest created value in the index and for each value it gets the corresponding primary key from the index, retrieves the row from disk, and fetches the required fields (protocol, userId) missing in the current index record, puts them in a "temporary table", making the groupings on those 700000 rows. The index can actually be used and is used here only for speeding up the range condition.
The only way to speed it up, is to have an index that contains all the necessary data, so that MySQL would not need to make on disk lookups for the rows. That is called a covering index. But you should understand that the index will reside in memory and will contain ~ sizeOf(created+protocol+userId+PK)*rowCount bytes, that may become a burden as itself for the queries that update the table and for other indexes. It is easier to create a separate aggregates table and periodically update the table using your query.
Both distinct and group by will need to sort and store temporary data on the server. With that much data that might take a while.
Indexing different combinations of userId, created and protocol will help, but I can't say how much or what index will help the most.
Starting from a certain version of MariaDB (maybe since 10.5), I noticed that after importing a dump with
mysql dbname < dump.sql
the optimizer thinks things different from how they are, making the wrong decisions about indexes.
In general even listing tables innodb with phpmyadmin becomes very very slow.
I noticed that running
ANALYZE TABLE myTable;
fixes.
So after each import I run, that it's equal to run ANALYZE on each table
mysqlcheck -aA
I have this table:
CREATE TABLE `table1` (
`object` varchar(255) NOT NULL,
`score` decimal(10,3) NOT NULL,
`timestamp` datetime NOT NULL
KEY `ex` (`object`,`score`,`timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
with 9.1 million rows and I am running the following query:
SELECT `object`, `timestamp`, AVG(score) as avgs
from `table1`
where timestamp >= '2011-12-14'
AND timestamp <= '2011-12-13'
group by `object`
order by `avgs` ASC limit 100;
The dates come from user input. The query takes 6-10 seconds, depending on the range of dates. The run time seems to increase with the number of rows
What can I do to improve this?
I have tried:
fiddling with indexes (brought query time down from max 13sec to max 10sec)
moving storage to fast SAN (brought query time down by around 0.1sec, regardless of parameters).
The CPU and memory load on the server doesn't appear to be too high when the query is running.
The reason why fast SAN is perform much better
is because your query require copy to temporary table,
and need file-sort for a large results set.
You have five nasty factors.
range query
group-by
sorting
varchar 255 for object
a wrong index
Break-down timestamp to two fields,
date, time
Build another reference table for object,
so, you use integer, such as object_id (instead of varchar 255) to represent object
Rebuilt the index on
date (date type), object_id
Change the query to
where date IN('2011-12-13', '2011-12-14', ...)
I have a table Cars with datetime (DATE) and bit (PUBLIC).
Now i would like to take rows ordered by DATE and with PUBLIC = 1 so i use:
select
c.*
from
Cars c
WHERE
c.PUBLIC = 1
ORDER BY
DATE DESC
But unfortunately when I use explain to see what is going on I have this:
1 SIMPLE a ALL IDX_PUBLIC,DATE NULL NULL NULL 103 Using where; Using filesort
And it takes 0,3 ms to take this data while I have only 100 rows. Is there any other way to disable filesort?
If i goes to indexes I have index on (PUBLIC, DATE) not unique.
Table def:
CREATE TABLE IF NOT EXISTS `Cars` (
`ID` int(11) NOT NULL auto_increment,
`DATE` datetime NOT NULL,
`PUBLIC` binary(1) NOT NULL default '0'
PRIMARY KEY (`ID`),
KEY `IDX_PUBLIC` (`PUBLIC`),
KEY `DATE` (`PUBLIC`,`DATE`)
) ENGINE=MyISAM AUTO_INCREMENT=186 ;
You need to have a composite index on (public, date)
This way, MySQL will filter on public and sort on date.
From your EXPLAIN I see that you don't have a composite index on (public, date).
Instead you have two different indexes on public and on date. At least, that's what their names IDX_PUBLIC and DATE tell.
Update:
You public column is not a BIT, it's a BINARY(1). It's a character type and uses character comparison.
When comparing integers to characters, MySQL converts the latter to the former, not vice versa.
These queries return different results:
CREATE TABLE t_binary (val BINARY(2) NOT NULL);
INSERT
INTO t_binary
VALUES
(1),
(2),
(3),
(10);
SELECT *
FROM t_binary
WHERE val <= 10;
---
1
2
3
10
SELECT *
FROM t_binary
WHERE val <= '10';
---
1
10
Either change your public column to be a bit or rewrite your query as this:
SELECT c.*
FROM Cars c
WHERE c.PUBLIC = '1'
ORDER BY
DATE DESC
, i. e. compare characters with characters, not integers.
If you are ordering by date, a sort will be required. If there isn't an index by date, then a filesort will be used. The only way to get rid of that would be to either add an index on date or not do the order by.
Also, a filesort does not always imply that the file will be sorted on disk. It could be sorting it in memory if the table is small enough or the sort buffer is large enough. It just means that the table itself has to be sorted.
Looks like you have an index on date already, and since you are using PUBLIC in your where clause, MySQL should be able to use that index. However, the optimizer may have decided that since you have so few rows it isn't worth bothering with the index. Try adding 10,000 or so rows to the table, re-analyze it, and see if that changes the plan.