I have a geoencoding database with ranges of integers (ip addresses equivalent) in each row
fromip(long) toip (long). the integers are created from ip addresses by php ip2long
I need to find the row in which a given ip address (converted to long) is within the range.
What would be the most efficient way to do it? (keys and query)
If I do (the naive solution) select * from ipranges where fromip <= givenip and toip >= givenip limit 1 and the key is fromip, toip. then for the case where the ip address is not in any given ranges the search goes through all the rows.
SOME MORE INFO:
explain select * from ipranges where
ipfrom <= 2130706433 and ipto >=
2130706433 order by ipfrom Asc
limit 1|
gives me 2.5M rows (total 3.6M in the table).
The key is:
PRIMARY KEY (ipfrom,ipto)
that does not seem to be efficient at all. (the ip above is in none of the ranges)
Your query is fine, put an index on (fromip, toip) which will be a covering index for the query. The table won't have to be examined at all, only the sorted index gets searched, which is as fast as you can be.
The search will not actually go through all the rows. Not only will it go through none of the rows, only the index, but it won't examine every entry in the index either. The index is stored as a sorted tree, and only one path through that tree will have to be followed to determine that your IP is not in the table.
Related
I am using mysql to query a DB. What I would like to do is to query NOT the full database, but only the last 1000 rows ordered by timestamp.
I have tried to use this query which doesn't work ad I would like. I know the limit is used to return a fixed number of selected elements. But that is not what I would like. I want to query a fixed number of element.
select * from mutable where name = 'myschema' ORDER BY start_time DESC LIMIT 1000;
Any help?
tl;dr Sort Less Data
I guess your mutable table has an autoincrementing primary key called mutable.mutable_id.
You can then do this:
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000;
It gives you a result set of the ids of all the relevant rows. The ORDER BY ... LIMIT work then only has to sort mutable_id and start_time values, not the whole table. So it takes less space and time in the MySql server.
Then you use that query to retrieve the details:
SELECT *
FROM mutable
WHERE mutable_id IN (
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000
)
ORDER BY start_time DESC;
This will fetch all the data you need without needing to scan and sort the whole table.
If you create an index on name and start_time the subquery will be faster: the query can random-access the index to the appropriate name, then scan the start_time entries one by one until it finds 1000. No need to sort; the index is presorted.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time);
If you're on MySQL 8 you can create a descending index and it's even faster.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time DESC);
This works only with auto_increment
The trick is to sort less data, like O. Jones mentioned. Problem is in telling MySQL how to do so.
MySQL can't know what "last 1000 records" are unless it sorts them based on the query. That's exactly what you want to avoid so you need to tell MySQL how to find "last 1000 records".
This trick consists of telling MySQL at which auto_increment to start looking for the data. The problem is that you're using timestamps so I'm not sure whether this fits your particular use case.
Here's the query:
SELECT * FROM mutable
WHERE `name` = 'myschema'
AND id > (SELECT MAX(id) - 1000 FROM mutable WHERE `name` = 'myschema')
ORDER BY start_time DESC LIMIT 1000;
Problems:
auto_increments have gaps. These numbers aren't sequential, they're unique, calculated via sequential increment algorithm. To get better results, increase the subtraction number. You might get 1000 results, but you might get 500 results - depending on your dataset
if you don't have an auto_increment, this is useless
if the timestamps inserted are required to be sorted beforehand from larger to lower, this is useless
advantages:
- primary key is used to define the value range (where id > x), therefore dataset reduction will be the fastest possible.
i've a table with 550.000 records
SELECT * FROM logs WHERE user = 'user1' ORDER BY date DESC LIMIT 0, 25
this query takes 0.0171 sec. without LIMIT, there are 3537 results
SELECT * FROM logs WHERE user = 'user2' ORDER BY date DESC LIMIT 0, 25
this query takes 3.0868 sec. without LIMIT, there are 13 results
table keys are:
PRIMARY KEY (`id`),
KEY `date` (`date`)
when using "LIMIT 0,25" if there are less records than 25, the query slows down. How can I solve this problem?
Using limit 25 allows the query to stop when it found 25 rows.
If you have 3537 matching rows out of 550.000, it will, on average, assuming equal distribution, have found 25 rows after examining 550.000/3537*25 rows = 3887 rows in a list that is ordered by date (the index on date) or a list that is not ordered at all.
If you have 13 matching rows out of 550.000, limit 25 will have to examine all 550.000 rows (that are 141 times as many rows), so we expect 0.0171 sec * 141 = 2.4s. There are obviously other factors that determine runtime too, but the order of magnitude fits.
There is an additional effect. Unfortunately the index by date does not contain the value for user, so MySQL has to look up that value in the original table, by jumping back and forth in that table (because the data itself is ordered by the primary key). This is slower than reading the unordered table directly.
So actually, not using an index at all could be faster than using an index, if you have a lot of rows to read. You can force MySQL to not use it by using e.g. FROM logs IGNORE INDEX (date), but this will have the effect that it now has to read the whole table in absolutely every case: the last row could be the newest and thus has to be in the resultset, because you ordered by date. So it might slow down your first query - reading the full 550.000 rows fast can be slower than reading 3887 rows slowly by jumping back and forth. (MySQL doesn't know this either beforehand, so it took a choice - for your second query obviously the wrong one).
So how to get faster results?
Have an index that is ordered by user. Then the query for 'user2' can stop after 13 rows, because it knows there are no more rows. And this will now be faster than the query for 'user1', that has to look through 3537 rows and then order them afterwards by date.
The best index for your query would therefore be user, date, because it then knows when to stop looking for further rows AND the list is already ordered the way you want it (and beat your 0.0171s in all cases).
Indexes require some resources too (e.g. hdd space and time to update the index when you update your table), so adding the perfect index for every single query might be counterproductive sometimes for the system as a whole.
I am trying to find a way to improve performance for my mysql table containing ip ranges (it's gonna have up to 500 SELECT queries per second (!) in peak hours so I am little worried).
I have a table of this structure:
id smallint(5) Auto Increment
ip_start char(16)
ip_end char(16)
Coding is utf8_general_ci(on whole table and each columns except id), table is type of MyISAM (only SELECT queries, no insert/delete needed here). Indexes for this table are PRIMARY id.
At this momen table has almost 2000 rows. All of them contains ranges for ip.
For example:
ip_start 128.6.230.0
ip_end 128.6.238.255
When user comes to a website I am checking if his ip is in some of those ranges in my table. I use this query (dibi sql library):
SELECT COUNT(*)
FROM ip_ranges
WHERE %s", $user_ip, " BETWEEN ip_start AND ip_end
If result of query is not zero then the ip of the user is in one of those ranges in table - which is all i need it to do.
I was thinking maybe about putting some indexes to that table? But i am not quite sure how it works and if it's such a good idea (since there is maybe nothing to really index, right? most of those ip ranges are different).
I also had varchar type on those ip_start and ip_end columns but i switched it to just char (guess its faster?).
Anyone any ideas about how to improve this table/queries even further?
You don't want to use aggregation. Instead, check whether the following returns any rows:
SELECT 1
FROM ip_ranges
WHERE %s", $user_ip, " BETWEEN ip_start AND ip_end
LIMIT 1;
The LIMIT 1 says to stop at the first match, so it is faster.
For this query, you want an index on ip_ranges(ip_start, ip_end).
This still has a performance problem when there is no match. The entire index after the ip being tested has to be scanned. I think the following should be an improvement:
SELECT COUNT(*)
FROM (SELECT i.start, ip_end
FROM ip_ranges i
WHERE %s", $user_ip, " >= ip_start
ORDER BY ip_start
LIMIT 1
) i
WHERE $user_ip <= ip_end;
The inner subquery should use the index but pull back the first match. The outer query should should then check the end of the range. Here the count(*) is okay, because there is only one row.
I have a table of 1.6M IP ranges with organization names.
The IP addresses are converted to integers. The table is in the form of:
I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.
I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running.
The query I'm using is
select organization from iptable where ip_addr BETWEEN ip_start AND ip_end
Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.
Given that you already have an index on ip_start, this is how to use it best, assuming that you want to make one access per IP (1234 in this example):
select organization from (
select ip_end, organization
from iptable
where ip_start <= 1234
order by ip_start desc
limit 1
) subqry where 1234 <= ip_end
This will use your index to start a scan which stops immediately because of the limit 1. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start and ip_end never overlap.
The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.
Possibly the most efficient way of doing a lookup of this kind is loading the list of addresses you want to look up into a temporary table in the database and finding the intersection with an SQL join, rather than checking each address with a separate SQL statement.
In any case you'll need to have an index on (IP_from, IP_to).
I have a DB in MySQL Server, with information about ip range and location, it haves the next structure:
id (integer not null, auto inc)
from_ (bigint(20))
to_ (bigint(20))
region(integer)
The field region is a foreign key of a table cities (id, city_name).
As we know, to found to which country belongs an IP address, we have to execute something like the next query:
select region from ipcountry where ip >= from_ and ip <= to_
Due to the number of records, the query is too late for what I need.
Any idea to optimize this problem?
Do you have an index on (from_, to_). That is the place to start.
Then, the next idea is to have the index and change the query to:
select region
from ipcountry
where ip >= from_
order by from_ desc
limit 1;
If that doesn't give the performance boost, then you are going to have to think about how to optimize the data structure. The extreme approach here is to list out all ip addresses with their region. But, the billions of resulting rows may actually hinder performance.
If you go down this path, you need to be smarter. One idea is to have separate tables for Type A, Type B, and Type C addresses which have constant regions. Then a separate table for ranges of Type D addresses.