I am trying to find a way to improve performance for my mysql table containing ip ranges (it's gonna have up to 500 SELECT queries per second (!) in peak hours so I am little worried).
I have a table of this structure:
id smallint(5) Auto Increment
ip_start char(16)
ip_end char(16)
Coding is utf8_general_ci(on whole table and each columns except id), table is type of MyISAM (only SELECT queries, no insert/delete needed here). Indexes for this table are PRIMARY id.
At this momen table has almost 2000 rows. All of them contains ranges for ip.
For example:
ip_start 128.6.230.0
ip_end 128.6.238.255
When user comes to a website I am checking if his ip is in some of those ranges in my table. I use this query (dibi sql library):
SELECT COUNT(*)
FROM ip_ranges
WHERE %s", $user_ip, " BETWEEN ip_start AND ip_end
If result of query is not zero then the ip of the user is in one of those ranges in table - which is all i need it to do.
I was thinking maybe about putting some indexes to that table? But i am not quite sure how it works and if it's such a good idea (since there is maybe nothing to really index, right? most of those ip ranges are different).
I also had varchar type on those ip_start and ip_end columns but i switched it to just char (guess its faster?).
Anyone any ideas about how to improve this table/queries even further?
You don't want to use aggregation. Instead, check whether the following returns any rows:
SELECT 1
FROM ip_ranges
WHERE %s", $user_ip, " BETWEEN ip_start AND ip_end
LIMIT 1;
The LIMIT 1 says to stop at the first match, so it is faster.
For this query, you want an index on ip_ranges(ip_start, ip_end).
This still has a performance problem when there is no match. The entire index after the ip being tested has to be scanned. I think the following should be an improvement:
SELECT COUNT(*)
FROM (SELECT i.start, ip_end
FROM ip_ranges i
WHERE %s", $user_ip, " >= ip_start
ORDER BY ip_start
LIMIT 1
) i
WHERE $user_ip <= ip_end;
The inner subquery should use the index but pull back the first match. The outer query should should then check the end of the range. Here the count(*) is okay, because there is only one row.
Related
I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or
I have geoip data in a table, network_start_ip and network_end_ip are varbinary(16) columns with the result of INET6_ATON(ip_start/end) as values. 2 other columns are latitude and longitude.
CREATE TABLE `ipblocks` (
`network_start_ip` varbinary(16) NOT NULL,
`network_last_ip` varbinary(16) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
KEY `network_start_ip` (`network_start_ip`),
KEY `network_last_ip` (`network_last_ip`),
KEY `idx_range` (`network_start_ip`,`network_last_ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
As you can see I have created 3 indexes for testing. Why does my (quite simple) query
SELECT
latitude, longitude
FROM
ipblocks b
WHERE
INET6_ATON('82.207.219.33') BETWEEN b.network_start_ip AND b.network_last_ip
not use any these indexes?
The query takes ~3 seconds which is way too long to use it in production.
It doesn't work because there are two columns referenced -- and that is really hard to optimize. Assuming that there are no overlapping IP ranges, you can restructure the query as:
SELECT b.*
FROM (SELECT b.*
FROM ipblocks b
WHERE b.network_start_ip <= INET6_ATON('82.207.219.33')
ORDER BY b.network_start_ip DESC
LIMIT 1
) b
WHERE INET6_ATON('82.207.219.33') <= network_last_ip;
The inner query should use an index on ipblocks(network_start_ip). The outer query is only comparing one row, so it does not need any index.
Or as:
SELECT b.*
FROM (SELECT b.*
FROM ipblocks b
WHERE b.network_last_ip >= INET6_ATON('82.207.219.33')
ORDER BY b.network_end_ip ASC
LIMIT 1
) b
WHERE network_last_ip <= INET6_ATON('82.207.219.33');
This would use an index on (network_last_ip). MySQL (and I think MariaDB) does a better job with ascending sorts than descending sorts.
Thanks to Gordon Linoff I found the optimal query for my question.
SELECT b.* FROM
(SELECT b.* FROM ipblocks b WHERE b.network_start_ip <= INET6_ATON('82.207.219.33')
ORDER BY b.network_start_ip DESC LIMIT 1 )
b WHERE INET6_ATON('82.207.219.33') <= network_last_ip
Now we select the blocks smaller than INET6_ATON(82.207.219.33) in the inner query but we order them descending which enables us to use the LIMIT 1 again.
Query response time is now .002 to .004 seconds. Great!
Does this query give you correct results? Your start/end IPs seem to be stored as a binary string while you're searching for an integer representation.
I would first make sure that network_start_ip and network_last_ip are unsigned INT fields with the integer representation of the IP addresses. This is assuming that you work with IPv4 only:
CREATE TABLE ipblocks_int AS
SELECT
INET_ATON(network_start_ip) as network_start_ip,
INET_ATON(network_last_ip) as network_last_ip,
latitude,
longitude
FROM ipblocks
Then use (network_start_ip,network_last_ip) as primary key.
It's a tough problem. There is no simple solution.
The reason it is tough is that it is effectively
start <= 123 AND
last >= 123
Regardless of what indexes are available, the Optimizer will work with one or the other of those. With INDEX(start, ...), it will pick start <= 123 it will scan the first part of the index. Similarly for the other clause. One of those scans more than half the index, the other scans less -- but not enough less to be worth using an index. Moving it into the PRIMARY KEY will help with some cases, but it is hardly worth the effort.
Bottom line, not matter what you do in the way of INDEX or PRIMARY KEY, most IP constants will lead to more than 1.5 seconds for the query.
Do your start/last IP ranges overlap? If so, that adds complexity. In particular, overlaps would probably invalidate Gordon's LIMIT 1.
My solution involves requires non-overlapping regions. Any gaps in IPs necessitate 'unowned' ranges of IPs. This is because there is only a start_ip; the last_ip is implied by being less than the start of the next item in the table. See http://mysql.rjweb.org/doc.php/ipranges (It includes code for IPv4 and for IPv6.)
Meanwhile, DOUBLE for lat/lng is overkill: http://mysql.rjweb.org/doc.php/latlng#representation_choices
I'm looking for a reason and suggestions.
My table have about 1.4Million rows and when I run following query it took over 3 minutes. I added count just for showing result. My real query is without count.
MariaDB [ams]> SELECT count(asin) FROM asins where asins.is_active = 1
and asins.title is null and asins.updated < '2018-10-28' order by sortorder,id;
+-------------+
| count(asin) |
+-------------+
| 187930 |
+-------------+
1 row in set (3 min 34.34 sec)
Structure
id int(9) Primary
asin varchar(25) UNIQUE
is_active int(1) Index
sortorder int(9) Index
Please let me know if you need more information.
Thanks in advance.
EDIT
Query with EXPLAIN
MariaDB [ams]> EXPLAIN SELECT asin FROM asins where asins.is_active = 1 and asins.title is null and asins.updated < '2018-10-28' order by sortorder,id;
The database is scanning all the rows to answer the query. I imagine you have a really big table.
For this query, the ORDER BY is unnecessary (but it should have no impact on performance:
SELECT count(asin)
FROM asins
WHERE asins.is_active = 1 AND
asins.title is null AND
asins.updated < '2018-10-28' ;
Then you want an index on (is_active, title, updated).
Looks like you have an index on is_active and updated. So that index is going to be scanned (like a table scan, every record in the index read), but since title is not in the index, there is going to be a second operation which looks up title in the table. You can think of this as a join between the index and the table. If most of the records in the index match your conditions, then the join is going to involve most of the data in the table. Large joins are slow.
You might be better off with a full table scan if the conditions against the index are going to result in a large number of records returned.
See https://dba.stackexchange.com/questions/110707/how-can-i-force-mysql-to-ignore-all-indexes for a way to force the full table scan. Give it a try and see if your query is faster.
Try these:
INDEX(is_active, updated),
INDEX(is_active, sortorder, id)
And please provide SHOW CREATE TABLE.
With the first of these indexes, some of the filtering will be done, but then it will still have to sort the results.
With the second index, the Optimizer may chose to filter on the only = column, then avoid the sort by launching into the ORDER BY. The risk is that it will still have to hit so many rows that avoiding the sort is not worth it.
What percentage of the table has is_active = 1? What percentage has a null title? What percentage is in that date range?
When you create a compound index, and part of it is range based, you want the range based part first.
So try the index (updated, is_active, title)
This way updated becomes a prefix and can be used in range queries.
I have to sort the query result on an alias column (total_reports) which is in group by condition with limit of having 50 number of records.
Please let me know where I am missing,
SELECT Count(world_name) AS total_reports,
name,
Max(last_update) AS report
FROM `world`
WHERE ( `id` = ''
AND `status` = 1 )
AND `time` >= '2017-07-16'
AND `name` LIKE '%%'
GROUP BY `name`
HAVING `total_reports` >= 2
ORDER BY `total_reports` DESC
LIMIT 50 offset 0
Query return what I need. However it runs on all records of table then return result and takes too many time which is not right way. I have thousands of records so its take time. I want to apply key index on alias which is total_reports in my situation.
Create an index on an column from an aggregated result? No, I'm sorry, but MySQL cannot do that natively.
What you need is probably a Materialized View that you could index. Not supported in MySQL (yet), unless you install extra plugins. See How to Create a Materialized View in MySQL.
The Long Answer
You cannot create an index on a column resulting from a GROUP BY statement. That column does not exist on the table, and cannot be derived at the row level (not a virtual column).
You query may be slow since it's probably reading the whole table. To only read the specific range of rows, add the index:
create index ix1 on `world` (`status`, `id`, `time`);
That should make the query use the filtering condition in a much better way and hopefully will speed up your query, by using and Index Range Scan.
Also, please change '%%' for '%'. Double % doesn't make too much sense. Actually, you should remove this condition altogether -- it's not filtering anything.
Finally, if the query is still slow, please post the execution plan, using:
explain <my_query_here>
I have a geoencoding database with ranges of integers (ip addresses equivalent) in each row
fromip(long) toip (long). the integers are created from ip addresses by php ip2long
I need to find the row in which a given ip address (converted to long) is within the range.
What would be the most efficient way to do it? (keys and query)
If I do (the naive solution) select * from ipranges where fromip <= givenip and toip >= givenip limit 1 and the key is fromip, toip. then for the case where the ip address is not in any given ranges the search goes through all the rows.
SOME MORE INFO:
explain select * from ipranges where
ipfrom <= 2130706433 and ipto >=
2130706433 order by ipfrom Asc
limit 1|
gives me 2.5M rows (total 3.6M in the table).
The key is:
PRIMARY KEY (ipfrom,ipto)
that does not seem to be efficient at all. (the ip above is in none of the ranges)
Your query is fine, put an index on (fromip, toip) which will be a covering index for the query. The table won't have to be examined at all, only the sorted index gets searched, which is as fast as you can be.
The search will not actually go through all the rows. Not only will it go through none of the rows, only the index, but it won't examine every entry in the index either. The index is stored as a sorted tree, and only one path through that tree will have to be followed to determine that your IP is not in the table.