I have a mySQL query which takes a long time to process. I am querying a large table of IP ranges which relate to country codes to discover the country of origin for each IP in the url_click table. (IP database from from hxxp://ip-to-country.webhosting.info/)
It works brilliantly, albeit slowly.
Is there a more efficient way to write this query?
Table and output JPG: http://tiny.cx/a4e00d
SELECT ip_addr AS IP, geo_ip.ctry, count(ip_addr) as count
FROM `admin_adfly`.`url_click`,admin_adfly.geo_ip
WHERE INET_ATON (ip_addr)
BETWEEN geo_ip.ipfrom AND geo_ip.ipto
AND url_id = 165
GROUP BY ip_addr;
The use of a function in the join between the two tables is going to be slower than a normal join, so you probably want to defer that particular operation as long as possible. So, I'd summarize the data and then join it:
SELECT S.IP_Addr, G.Ctry AS Country, S.Count
FROM (SELECT ip_addr, COUNT(ip_addr) AS Count
FROM admin_adfly.url_click
WHERE url_id = 165
GROUP BY ip_addr) AS S
JOIN admin_adfly.geo_ip AS G
ON INET_ATON (ip_addr) BETWEEN geo_ip.ipfrom AND geo_ip.ipto;
If you can redesign the schema and are going to be doing a lot of this analysis, rework one of the two tables so that the join condition doesn't need to use INET_ATON().
Presumably, you have an index on the url_id column; that is the only one that will give you much benefit here.
IP addresses have a tree like structure and the ranges you have in your geo_ip table most probably respect that structure.
If your IP begins with 193.167, then you should have an index helping you filter the geo_ip table very quickly so that only the lines related to a subrange of 193.167 are manipulated.
I think that you should be able to dramatically improve the response time with this approach.
I hope this will help you
That INET_ATON worries me just a bit. It'd make any index on the ip_addr column useless. If you have a way of putting the info all in the same format, say by converting the data to a number before putting it in the DB, that might help.
Other than that, the standard advice about judicious use of indexes applies. You might want indexes on ipfrom and ipto, and/or url_id columns.
MySQL does not optimize queries like this well.
You would need to convert your ipfrom-ipto ranges into LineStrings, thus allowing building an R-Tree index over them:
ALTER TABLE
geo_ip
ADD range LINESTRING;
UPDATE geo_ip
SET range = LINESTRING(POINT(-1, ipfrom), POINT(1, ipfrom));
ALTER TABLE
geo_ip
MODIFY range LINESTRING NOT NULL;
CREATE SPATIAL INDEX
sx_geoip_range
ON geo_ip (range);
SELECT ip_addr AS IP, geo_ip.ctry, COUNT(*)
FROM `admin_adfly`.`url_click`
JOIN admin_adfly.geo_ip
ON MBRContains
(
Point(0, INET_ATON (ip_addr)),
range
)
WHERE url_id = 165
GROUP BY
ip_addr
geo_ip should be a MyISAM table.
See here for more details:
Banning IPs
Related
This query is inefficient and unable to execute. track and desiredspeed table have almost million records.... after this we want to self join the track table for further processing. any efficient approach to execute bellow query is appreciated..
select
t_id,
route_id,
t.timestamp,
s_lat,
s_long,
longitude,
latitude,
SQRT(POW((latitude - d_lat),2) + POW((longitude - d_long),2)) as dst,
SUM(speed*18/5)/count(*) as speed,
'20' as actual_speed,
((20-(speed*18/5))/(speed*18/5))*100 as speed_variation
from
track t,
desiredspeed s
WHERE
LEFT(s_lat,6) = LEFT(latitude,6)
AND LEFT(s_long,6)=LEFT(longitude,6)
AND t_id > 53445
group by
route_id,
s_lat,
s_long
order by
t_id asc
firstly you are using sybase join syntax i would change that
you are also performing two computations per join across large datasets this is likely to be inefficient
this will not be able to use an index as you are performing computation on the column, either store the data precomputed or alternately add a computed column based on the rule applied above, and index accordingly
Finally it may be quicker if you used temp tables or common Table expressions (although do not know MySQL too well here)
I have a table of 1.6M IP ranges with organization names.
The IP addresses are converted to integers. The table is in the form of:
I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.
I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running.
The query I'm using is
select organization from iptable where ip_addr BETWEEN ip_start AND ip_end
Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.
Given that you already have an index on ip_start, this is how to use it best, assuming that you want to make one access per IP (1234 in this example):
select organization from (
select ip_end, organization
from iptable
where ip_start <= 1234
order by ip_start desc
limit 1
) subqry where 1234 <= ip_end
This will use your index to start a scan which stops immediately because of the limit 1. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start and ip_end never overlap.
The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.
Possibly the most efficient way of doing a lookup of this kind is loading the list of addresses you want to look up into a temporary table in the database and finding the intersection with an SQL join, rather than checking each address with a separate SQL statement.
In any case you'll need to have an index on (IP_from, IP_to).
I have a table with nearly 30 M records and size is 6.6 GB. I need to query some data from it and use group by and order by. It takes me too long to query the data, I lost connection to DB so many times...
I have index on all necessary fields as key and composite key. What else can I do to make it faster for the query?
Example query:
select id, max(price), avg(order) from table group by id, date order by id, location.
use EXPLAIN query, where query is your query. For example: EXPLAIN select * from table group by id, date order by id, location.
You'll see a table where MySQL analyses your query and shows which indices it looks for. Possibly you don't have sufficient (god enough) indices.
I don't think you can. With no filter (WHERE clause) and AVG the entire tables has to be read.
The only thing I can think of is to have a new table with ID, AVG_ORDER, MAX_PRICE (or whatever you need) and update that using a trigger or stored procedure when you insert/update new rows.
an index on ID,PRICE index might help you if you didn't need that pesky average.
Indexing isn't going to do you any good. You're averaging a column, so you have to read every row in the table. That's going to take time.
I have a problem with this query:
SELECT DISTINCT s.city, pc.start, pc.end
FROM postal_codes pc LEFT JOIN suspects s ON (s.postalcode BETWEEN pc.start AND pc.end)
WHERE pc.user_id = "username"
ORDER BY pc.start
Suspect table has about 340 000 entries, there is a index on postalcode, I have several users, but this individual query takes about 0.5s, when I run this SQL with explain, I get something like this: http://my.jetscreenshot.com/7536/20111225-myhj-41kb.jpg - does these NULLs mean that the query isn't using index? The index is a BTREE so I think this should run a little faster.
Can you please help me with this? If there are any other informations needed just let me know.
Edit: I have indexes on suspects.postalcode, postal_codes.start, postal_codes.end, postal_codes.user_id.
Basically what I'm trying to achieve: I have a table where each user ID has multiple postalcode ranges assigned, so it looks like:
user_id | start | end
Than I have a table of suspects where each suspect has an address (which contains a postalcode), so in this query I'm trying to get postalcode range - start and end and also name of the city in this range.
Hope this helps.
Whenever left join is used all the records of the first table are picked up rather than the selection on the basis of index. I would suggest to using an inner join. Something like in the below query.
select distinct
s.city,
pc.start,
pc.end
from postal_codes pc, suspect s
where
s.postalcode between (select pc1.start, pc1.end from postal_code pc1 where pc1.user_id = "username" )
and pc.user_id = "username"
order by pc.start
It's using only one index, and not for the fields involved in the join. Try creating an index for the start and end fields, or using >= and <= instead of BETWEEN
Not 100% sure, but this might be relevant:
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
So try testing with LIMIT, and if it uses the index then, you found your cause.
I have to say I'm a little confused by your table naming convention, I would expect the "suspect" table to have a user_id not the postal_code, but you must have your reasons. If you were to leave this query as it is, you can add an index on postal_code (star,end) to avoid the complete table scan.
I think you can restructure your query like following,
SELECT DISTINCT s.city, pc1.start, pc1.end FROM
(SELECT pc.start and pc.end from postal_codes pc where pc.user_id = "username") as pc1, Suspect s
WHERE s.postalcode BETWEEN pc1.start, pc1.end ORDER BY pc1.start
your query is not picking up the index on s table because of left join and your between condition. Having an Index in your table doesn't necessarily mean that it will be used in all the queries.
Try FORCE INDEX.
I basically have two tables, a 'server' table and a 'server_ratings' table. I need to optimize the current query that I have (It works but it takes around 4 seconds). Is there any way I can do this better?
SELECT ROUND(AVG(server_ratings.rating), 0), server.id, server.name
FROM server LEFT JOIN server_ratings ON server.id = server_ratings.server_id
GROUP BY server.id;
Query looks ok, but make sure you have proper indexes:
on id column in server table - probably primary key,
on server_id column in server_ratings table,
If it does not help, then add rating column into server table and calculate it on a constant basis (see this answer about Cron jobs). This way you will save the time you spend on calculations. They can be made separately eg. every minute, but probably some less frequent calculations are enough (depending on how dynamic is your data).
Also make sure you query proper table - in the question you have mentioned servers table, but in the code there is reference to server table. Probably a typo :)
This should be slightly faster, because the aggregate function is executed first, resulting in fewer JOIN operations.
SELECT s.id, s.name, r.avg_rating
FROM server s
LEFT JOIN (
SELECT server_id, ROUND(AVG(rating), 0) AS avg_rating
FROM server_ratings
GROUP BY server_id
) r ON r.server_id = s.id
But the major point are matching indexes. Primary keys are indexed automatically. Make sure you have one on server_ratings.server_id, too.