I'm working with a MariaDB (MySQL) table which contains information about some map points (latitude and longitude) and a quantity.
I'm making a lot of querys that retrieve some of this points and I want to optimize the query using indexes. I don't know how to do it well.
My queries are like this:
SELECT p.id, p.lat, p.lon, p.quantity
FROM Points p
WHERE ((p.lat BETWEEN -10.0 AND 50.5) AND
(p.lon BETWEEN -30.1 AND 20.2) AND
(100 <= p.quantity AND 2000 >= p.quantity))
ORDER BY p.name DESC;
So, the columns involved in the queries are: lat, lon and quantity.
Could anyone help me?
What you want here is a spatial index. You will need to alter the schema of your table (by turning lat and lon into a single POINT or GEOMETRY value) to support this, and use specific functions to query that value. Once you've done this, you can create a spatial index using CREATE SPATIAL INDEX; this index will allow you to perform a variety of highly optimized queries against the value.
There's more information on using spatial types in MySQL in the "Spatial Data Types" section of the MySQL manual.
When you have multiple range conditions, even if you have an standard B-tree index on all the columns, you can only get an index to optimize the first range condition.
WHERE ((p.lat BETWEEN -10.0 AND 50.5) -- index on `lat` helps
AND (p.lon BETWEEN -30.1 AND 20.2) -- no help from index
AND (100 <= p.quantity AND 2000 >= p.quantity)) -- no help from index
You can either index lat or you can index lon or you can index quantity but your query will only be able to use an B-tree index to optimize one of these conditions.
This is why the answer from #achraflakhdhar is wrong, and it's why the answer from #duskwuff suggested using a spatial index.
A spatial index is different from a B-tree index. A spatial index is designed to help exactly this sort of case, where you need range conditions in two dimensions.
Sorry this sounds like it will cause some rework for your project, but if you want it to be optimized, that's what you will have to do.
Toss indexes you have, and add these:
INDEX(lat, lon),
INDEX(lon, lat),
INDEX(quantity)
Some discussion is provided here
Related
Hello I have table with 500k records and folowing columns:
id, id_route, id_point, lat, lng, distance, status
I want to select id_routes which are inside radius from my defined point.
Thats no problem
SELECT id_route
FROM route_path
WHERE (((lat < 48.7210 + 2.0869) AND
(lat > 48.7210 - 2.0869)) AND
((lng < 21.2578 + 2.0869) AND
(lng > 21.2578 - 2.0869)))
GROUP BY id_route
But according PHPmyadmin it takes 0.2s. This is pretty to much since I am going to build huge query and this is just beginning.
I have also index on id_route.
Primary key is id, schema is MyISAM
EXPLAIN of SELECT:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
route_path
ALL
NULL
NULL
NULL
NULL
506902
Using where; Using temporary; Using filesort
How can I reduce time, I think 500K is not su much records to make it so long? Thanks
if queries takes longer time , and you have set the index properly then you need a powerful server to compute the query quickly !
A 2-dimensional search is inherently slow. The tools won't tell you how to improve this particular query.
You seem to have no indexes in your table?? You should at least try INDEX(lat). that will limit the effort to a stripe of about 4 degrees (in your example). This probably includes thousands of rows. Most of them are then eliminated by checking lng, but not until after fetching all of those thousands.
So, you are tempted to try INDEX(lat, lng) only to find that it ignores lng. And perhaps it runs slower because the index is bigger.
INDEX(lat, lng, id) and using a subquery to find the ids, then doing a self-join back to the table to do the rest of the work is perhaps the simplest semi-straightforward solution. This is slightly beneficial because that is a "covering index" for the subquery, and, although you scan thousands of rows in the index, you don't have to fetch many rows in the data.
Can it be made faster? Yes. However, the complexity is beyond the space available here. See Find the nearest 10 pizza parlors. It involves InnoDB (to get index clustering), PARTITIONs (as crude 2D indexing) and modifications to the original data (to turn lat/lng into integers for PARTITION keys).
Click on follwing Link to know how to improve MySQL performance
MySQL Query Analyzer
MySQL performance tools
i'm not very very experimented with the indexes so that's why i'm asking this silly question. i searched like everywhere but i didn't get a clear answer.
I will have a table items with columns: id,name,category,price
Here will be 3 indexes:
id - Primary Index
name - FullText Index
category,price - Composite Index
I estimate my table in future will get like 700.000-1.000.00 rows.
I need to do a fulltext search for name and where category is a specified category and order by price.
So my query will be this:
SELECT * FROM items
WHERE MATCH(name) AGAINST(‘my search’) and category='my category' order by price
My question is:
How many index will be used to perform this search?
It will use 2 indexes?
[fulltext index] & [category,price] index - Will get results for words and then will use the next index to match my category and price order
It will use 1 index
[fulltext index] only - Will get results for words, but after will have to manually match my category and price order
I want my query to be fast, what are you opinions? I know the fulltext search is fast, but what happen if i apply clauses like: category and price order? will be same fast?
MySQL will only ever use one index in any search. The reason being that using two indexes will require two searches. This will make the query much more slower. You can force MySQL to use a specific index in a query but this is not a good idea.
In summary: MySQL will only ever use one index it cant use two indexes.
explain
select
*
from
zipcode_distances z
inner join
venues v
on z.zipcode_to=v.zipcode
inner join
events e
on v.id=e.venue_id
where
z.zipcode_from='92108' and
z.distance <= 5
I'm trying to find all "events at venues within 5 miles of zipcode 92108", however, I am having a hard time optimizing this query.
Here is what the explain looks like:
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, e, ALL, idx_venue_id, , , , 60024,
1, SIMPLE, v, eq_ref, PRIMARY,idx_zipcode, PRIMARY, 4, comedyworld.e.venue_id, 1,
1, SIMPLE, z, ref, idx_zip_from_distance,idx_zip_to_distance,idx_zip_from_to, idx_zip_from_to, 30, const,comedyworld.v.zipcode, 1, Using where; Using index
I'm getting a full table scan on the "e" table, and I can't figure out what index I need to create to get it to be fast.
Any advice would be appreciated
Thank you
Based on the EXPLAIN output in your question, you already have all the indexes the query should be using, namely:
CREATE INDEX idx_zip_from_distance
ON zipcode_distances (zipcode_from, distance, zipcode_to);
CREATE INDEX idx_zipcode ON venues (zipcode, id);
CREATE INDEX idx_venue_id ON events (venue_id);
(I'm not sure from your index names whether idx_zip_from_distance really includes the zipcode_to column. If not, you should add it to make it a covering index. Also, I've included the venues.id column in idx_zipcode for completeness, but, assuming it's the primary key for the table and that you're using InnoDB, it will be included automatically anyway.)
However, it looks like MySQL is choosing a different, and possibly suboptimal, query plan, where it scans through all events, finds their venues and zip codes, and only then filters the results on distance. This could be the optimal query plan, if the cardinality of the events table was low enough, but from the fact that you're asking this question I assume it's not.
One reason for the suboptimal query plan could be the fact that you have too many indexes which are confusing the planner. For instance, do you really need all three of those indexes on the zipcode table, given that the data it stores is presumably symmetric? Personally, I'd suggest only the index I described above, plus a unique index (which can also be the primary key, if you don't have an artificial one) on (zipcode_to, zipcode_from) (preferably in that order, so that any occasional queries on zipcode_to=? can make use of it).
However, based on some testing I did, I suspect the main issue why MySQL is choosing the wrong query plan comes simply down to the relative cardinalities of your tables. Presumably, your actual zipcode_distances table is huge, and MySQL isn't smart enough to realize quite how much the conditions in the WHERE clause really narrow it down.
If so, the best and simplest fix may be to simply force MySQL to use the indexes you want:
select
*
from
zipcode_distances z
FORCE INDEX (idx_zip_from_distance)
inner join
venues v
FORCE INDEX (idx_zipcode)
on z.zipcode_to=v.zipcode
inner join
events e
FORCE INDEX (idx_venue_id)
on v.id=e.venue_id
where
z.zipcode_from='92108' and
z.distance <= 5
With that query, you should indeed get the desired query plan. (You do need FORCE INDEX here, since with just USE INDEX the query planner could still decide to use a table scan instead of the suggested index, defeating the purpose. I had this happen when I first tested this.)
Ps. Here's a demo on SQLize, both with and without FORCE INDEX, demonstrating the issue.
Have indexed the columns in both tables?
e.id and v.venue_id
If you do not, creates indexes in both tables. If you already have, it could be that you have few records in one or more tables and analyzer detects that it is more efficient to perform a full scan rather than an indexed read.
You could use a subquery:
select * from zipcode_distances z, venues v, events e
where
z.id in (select id from zipcode z where z.zipcode_from='92108' and z.distance <= 5)
and z.zipcode_to=v.zipcode
and v.id=e.venue_id
You are selecting all columns from all tables (select *) so there is little point in the optimizer using an index when the query engine will then have to do a lookup from the index to the table on every single row.
I have a mySQL query which takes a long time to process. I am querying a large table of IP ranges which relate to country codes to discover the country of origin for each IP in the url_click table. (IP database from from hxxp://ip-to-country.webhosting.info/)
It works brilliantly, albeit slowly.
Is there a more efficient way to write this query?
Table and output JPG: http://tiny.cx/a4e00d
SELECT ip_addr AS IP, geo_ip.ctry, count(ip_addr) as count
FROM `admin_adfly`.`url_click`,admin_adfly.geo_ip
WHERE INET_ATON (ip_addr)
BETWEEN geo_ip.ipfrom AND geo_ip.ipto
AND url_id = 165
GROUP BY ip_addr;
The use of a function in the join between the two tables is going to be slower than a normal join, so you probably want to defer that particular operation as long as possible. So, I'd summarize the data and then join it:
SELECT S.IP_Addr, G.Ctry AS Country, S.Count
FROM (SELECT ip_addr, COUNT(ip_addr) AS Count
FROM admin_adfly.url_click
WHERE url_id = 165
GROUP BY ip_addr) AS S
JOIN admin_adfly.geo_ip AS G
ON INET_ATON (ip_addr) BETWEEN geo_ip.ipfrom AND geo_ip.ipto;
If you can redesign the schema and are going to be doing a lot of this analysis, rework one of the two tables so that the join condition doesn't need to use INET_ATON().
Presumably, you have an index on the url_id column; that is the only one that will give you much benefit here.
IP addresses have a tree like structure and the ranges you have in your geo_ip table most probably respect that structure.
If your IP begins with 193.167, then you should have an index helping you filter the geo_ip table very quickly so that only the lines related to a subrange of 193.167 are manipulated.
I think that you should be able to dramatically improve the response time with this approach.
I hope this will help you
That INET_ATON worries me just a bit. It'd make any index on the ip_addr column useless. If you have a way of putting the info all in the same format, say by converting the data to a number before putting it in the DB, that might help.
Other than that, the standard advice about judicious use of indexes applies. You might want indexes on ipfrom and ipto, and/or url_id columns.
MySQL does not optimize queries like this well.
You would need to convert your ipfrom-ipto ranges into LineStrings, thus allowing building an R-Tree index over them:
ALTER TABLE
geo_ip
ADD range LINESTRING;
UPDATE geo_ip
SET range = LINESTRING(POINT(-1, ipfrom), POINT(1, ipfrom));
ALTER TABLE
geo_ip
MODIFY range LINESTRING NOT NULL;
CREATE SPATIAL INDEX
sx_geoip_range
ON geo_ip (range);
SELECT ip_addr AS IP, geo_ip.ctry, COUNT(*)
FROM `admin_adfly`.`url_click`
JOIN admin_adfly.geo_ip
ON MBRContains
(
Point(0, INET_ATON (ip_addr)),
range
)
WHERE url_id = 165
GROUP BY
ip_addr
geo_ip should be a MyISAM table.
See here for more details:
Banning IPs
I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.
will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html