I have a DB in MySQL Server, with information about ip range and location, it haves the next structure:
id (integer not null, auto inc)
from_ (bigint(20))
to_ (bigint(20))
region(integer)
The field region is a foreign key of a table cities (id, city_name).
As we know, to found to which country belongs an IP address, we have to execute something like the next query:
select region from ipcountry where ip >= from_ and ip <= to_
Due to the number of records, the query is too late for what I need.
Any idea to optimize this problem?
Do you have an index on (from_, to_). That is the place to start.
Then, the next idea is to have the index and change the query to:
select region
from ipcountry
where ip >= from_
order by from_ desc
limit 1;
If that doesn't give the performance boost, then you are going to have to think about how to optimize the data structure. The extreme approach here is to list out all ip addresses with their region. But, the billions of resulting rows may actually hinder performance.
If you go down this path, you need to be smarter. One idea is to have separate tables for Type A, Type B, and Type C addresses which have constant regions. Then a separate table for ranges of Type D addresses.
Related
I have geoip data in a table, network_start_ip and network_end_ip are varbinary(16) columns with the result of INET6_ATON(ip_start/end) as values. 2 other columns are latitude and longitude.
CREATE TABLE `ipblocks` (
`network_start_ip` varbinary(16) NOT NULL,
`network_last_ip` varbinary(16) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
KEY `network_start_ip` (`network_start_ip`),
KEY `network_last_ip` (`network_last_ip`),
KEY `idx_range` (`network_start_ip`,`network_last_ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
As you can see I have created 3 indexes for testing. Why does my (quite simple) query
SELECT
latitude, longitude
FROM
ipblocks b
WHERE
INET6_ATON('82.207.219.33') BETWEEN b.network_start_ip AND b.network_last_ip
not use any these indexes?
The query takes ~3 seconds which is way too long to use it in production.
It doesn't work because there are two columns referenced -- and that is really hard to optimize. Assuming that there are no overlapping IP ranges, you can restructure the query as:
SELECT b.*
FROM (SELECT b.*
FROM ipblocks b
WHERE b.network_start_ip <= INET6_ATON('82.207.219.33')
ORDER BY b.network_start_ip DESC
LIMIT 1
) b
WHERE INET6_ATON('82.207.219.33') <= network_last_ip;
The inner query should use an index on ipblocks(network_start_ip). The outer query is only comparing one row, so it does not need any index.
Or as:
SELECT b.*
FROM (SELECT b.*
FROM ipblocks b
WHERE b.network_last_ip >= INET6_ATON('82.207.219.33')
ORDER BY b.network_end_ip ASC
LIMIT 1
) b
WHERE network_last_ip <= INET6_ATON('82.207.219.33');
This would use an index on (network_last_ip). MySQL (and I think MariaDB) does a better job with ascending sorts than descending sorts.
Thanks to Gordon Linoff I found the optimal query for my question.
SELECT b.* FROM
(SELECT b.* FROM ipblocks b WHERE b.network_start_ip <= INET6_ATON('82.207.219.33')
ORDER BY b.network_start_ip DESC LIMIT 1 )
b WHERE INET6_ATON('82.207.219.33') <= network_last_ip
Now we select the blocks smaller than INET6_ATON(82.207.219.33) in the inner query but we order them descending which enables us to use the LIMIT 1 again.
Query response time is now .002 to .004 seconds. Great!
Does this query give you correct results? Your start/end IPs seem to be stored as a binary string while you're searching for an integer representation.
I would first make sure that network_start_ip and network_last_ip are unsigned INT fields with the integer representation of the IP addresses. This is assuming that you work with IPv4 only:
CREATE TABLE ipblocks_int AS
SELECT
INET_ATON(network_start_ip) as network_start_ip,
INET_ATON(network_last_ip) as network_last_ip,
latitude,
longitude
FROM ipblocks
Then use (network_start_ip,network_last_ip) as primary key.
It's a tough problem. There is no simple solution.
The reason it is tough is that it is effectively
start <= 123 AND
last >= 123
Regardless of what indexes are available, the Optimizer will work with one or the other of those. With INDEX(start, ...), it will pick start <= 123 it will scan the first part of the index. Similarly for the other clause. One of those scans more than half the index, the other scans less -- but not enough less to be worth using an index. Moving it into the PRIMARY KEY will help with some cases, but it is hardly worth the effort.
Bottom line, not matter what you do in the way of INDEX or PRIMARY KEY, most IP constants will lead to more than 1.5 seconds for the query.
Do your start/last IP ranges overlap? If so, that adds complexity. In particular, overlaps would probably invalidate Gordon's LIMIT 1.
My solution involves requires non-overlapping regions. Any gaps in IPs necessitate 'unowned' ranges of IPs. This is because there is only a start_ip; the last_ip is implied by being less than the start of the next item in the table. See http://mysql.rjweb.org/doc.php/ipranges (It includes code for IPv4 and for IPv6.)
Meanwhile, DOUBLE for lat/lng is overkill: http://mysql.rjweb.org/doc.php/latlng#representation_choices
I am trying to find a way to improve performance for my mysql table containing ip ranges (it's gonna have up to 500 SELECT queries per second (!) in peak hours so I am little worried).
I have a table of this structure:
id smallint(5) Auto Increment
ip_start char(16)
ip_end char(16)
Coding is utf8_general_ci(on whole table and each columns except id), table is type of MyISAM (only SELECT queries, no insert/delete needed here). Indexes for this table are PRIMARY id.
At this momen table has almost 2000 rows. All of them contains ranges for ip.
For example:
ip_start 128.6.230.0
ip_end 128.6.238.255
When user comes to a website I am checking if his ip is in some of those ranges in my table. I use this query (dibi sql library):
SELECT COUNT(*)
FROM ip_ranges
WHERE %s", $user_ip, " BETWEEN ip_start AND ip_end
If result of query is not zero then the ip of the user is in one of those ranges in table - which is all i need it to do.
I was thinking maybe about putting some indexes to that table? But i am not quite sure how it works and if it's such a good idea (since there is maybe nothing to really index, right? most of those ip ranges are different).
I also had varchar type on those ip_start and ip_end columns but i switched it to just char (guess its faster?).
Anyone any ideas about how to improve this table/queries even further?
You don't want to use aggregation. Instead, check whether the following returns any rows:
SELECT 1
FROM ip_ranges
WHERE %s", $user_ip, " BETWEEN ip_start AND ip_end
LIMIT 1;
The LIMIT 1 says to stop at the first match, so it is faster.
For this query, you want an index on ip_ranges(ip_start, ip_end).
This still has a performance problem when there is no match. The entire index after the ip being tested has to be scanned. I think the following should be an improvement:
SELECT COUNT(*)
FROM (SELECT i.start, ip_end
FROM ip_ranges i
WHERE %s", $user_ip, " >= ip_start
ORDER BY ip_start
LIMIT 1
) i
WHERE $user_ip <= ip_end;
The inner subquery should use the index but pull back the first match. The outer query should should then check the end of the range. Here the count(*) is okay, because there is only one row.
I have a table of 1.6M IP ranges with organization names.
The IP addresses are converted to integers. The table is in the form of:
I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.
I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running.
The query I'm using is
select organization from iptable where ip_addr BETWEEN ip_start AND ip_end
Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.
Given that you already have an index on ip_start, this is how to use it best, assuming that you want to make one access per IP (1234 in this example):
select organization from (
select ip_end, organization
from iptable
where ip_start <= 1234
order by ip_start desc
limit 1
) subqry where 1234 <= ip_end
This will use your index to start a scan which stops immediately because of the limit 1. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start and ip_end never overlap.
The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.
Possibly the most efficient way of doing a lookup of this kind is loading the list of addresses you want to look up into a temporary table in the database and finding the intersection with an SQL join, rather than checking each address with a separate SQL statement.
In any case you'll need to have an index on (IP_from, IP_to).
I have a mySQL query which takes a long time to process. I am querying a large table of IP ranges which relate to country codes to discover the country of origin for each IP in the url_click table. (IP database from from hxxp://ip-to-country.webhosting.info/)
It works brilliantly, albeit slowly.
Is there a more efficient way to write this query?
Table and output JPG: http://tiny.cx/a4e00d
SELECT ip_addr AS IP, geo_ip.ctry, count(ip_addr) as count
FROM `admin_adfly`.`url_click`,admin_adfly.geo_ip
WHERE INET_ATON (ip_addr)
BETWEEN geo_ip.ipfrom AND geo_ip.ipto
AND url_id = 165
GROUP BY ip_addr;
The use of a function in the join between the two tables is going to be slower than a normal join, so you probably want to defer that particular operation as long as possible. So, I'd summarize the data and then join it:
SELECT S.IP_Addr, G.Ctry AS Country, S.Count
FROM (SELECT ip_addr, COUNT(ip_addr) AS Count
FROM admin_adfly.url_click
WHERE url_id = 165
GROUP BY ip_addr) AS S
JOIN admin_adfly.geo_ip AS G
ON INET_ATON (ip_addr) BETWEEN geo_ip.ipfrom AND geo_ip.ipto;
If you can redesign the schema and are going to be doing a lot of this analysis, rework one of the two tables so that the join condition doesn't need to use INET_ATON().
Presumably, you have an index on the url_id column; that is the only one that will give you much benefit here.
IP addresses have a tree like structure and the ranges you have in your geo_ip table most probably respect that structure.
If your IP begins with 193.167, then you should have an index helping you filter the geo_ip table very quickly so that only the lines related to a subrange of 193.167 are manipulated.
I think that you should be able to dramatically improve the response time with this approach.
I hope this will help you
That INET_ATON worries me just a bit. It'd make any index on the ip_addr column useless. If you have a way of putting the info all in the same format, say by converting the data to a number before putting it in the DB, that might help.
Other than that, the standard advice about judicious use of indexes applies. You might want indexes on ipfrom and ipto, and/or url_id columns.
MySQL does not optimize queries like this well.
You would need to convert your ipfrom-ipto ranges into LineStrings, thus allowing building an R-Tree index over them:
ALTER TABLE
geo_ip
ADD range LINESTRING;
UPDATE geo_ip
SET range = LINESTRING(POINT(-1, ipfrom), POINT(1, ipfrom));
ALTER TABLE
geo_ip
MODIFY range LINESTRING NOT NULL;
CREATE SPATIAL INDEX
sx_geoip_range
ON geo_ip (range);
SELECT ip_addr AS IP, geo_ip.ctry, COUNT(*)
FROM `admin_adfly`.`url_click`
JOIN admin_adfly.geo_ip
ON MBRContains
(
Point(0, INET_ATON (ip_addr)),
range
)
WHERE url_id = 165
GROUP BY
ip_addr
geo_ip should be a MyISAM table.
See here for more details:
Banning IPs
I have a geoencoding database with ranges of integers (ip addresses equivalent) in each row
fromip(long) toip (long). the integers are created from ip addresses by php ip2long
I need to find the row in which a given ip address (converted to long) is within the range.
What would be the most efficient way to do it? (keys and query)
If I do (the naive solution) select * from ipranges where fromip <= givenip and toip >= givenip limit 1 and the key is fromip, toip. then for the case where the ip address is not in any given ranges the search goes through all the rows.
SOME MORE INFO:
explain select * from ipranges where
ipfrom <= 2130706433 and ipto >=
2130706433 order by ipfrom Asc
limit 1|
gives me 2.5M rows (total 3.6M in the table).
The key is:
PRIMARY KEY (ipfrom,ipto)
that does not seem to be efficient at all. (the ip above is in none of the ranges)
Your query is fine, put an index on (fromip, toip) which will be a covering index for the query. The table won't have to be examined at all, only the sorted index gets searched, which is as fast as you can be.
The search will not actually go through all the rows. Not only will it go through none of the rows, only the index, but it won't examine every entry in the index either. The index is stored as a sorted tree, and only one path through that tree will have to be followed to determine that your IP is not in the table.