mysql geolat geolng multi indexed query - mysql

I have a table with fields 'lat', and 'lng'. Both are pretty much continuous, meaning they don't repeat themselves much. This has led me to believe that making a multi-column index for lat and lng wouldn't really help me. What I'd LIKE to do is this:
Make an index on both lat and lng, and then perform a query like:
select from tableName where
lat >= 13.1232 and lat <=14.123 and
lng >=-80.123 and lng <=-79.232 and
name like '%greg%'
and have mysql perform this process:
select all LATs between 14.1232 and 13.123 (this should be indexed, and fast)
within the group that step#1 found, perform step#2: find lngs <=-80.123 and lngs>= -79.232 (this should also be indexed and very fast)
3.within the group created by steps #1 and #2... perform the more time-consuming key-word search.
How can I do this? I'm pretty sure that the first part of the query (the indexed lat) is narrowing things down for me... but after that I'm not sure... and this is what I've been struggling to find in the docs

MySQL handles conventional B-tree indexes like most implementations: The index helps only the range condition on the leftmost columns in the index.
The analogy I use is to a telephone book. If I search for a specific last name, first name pair like "Smith, John" the index helps. My search for the last name "Smith" is quick, and within the Smiths, the search for "John" is quick.
But if I search for a range condition like "all people whose last name begins with 'S'", then I get a subset of the telephone book but not all the people named "John" are sorted together. They're scattered through the subset I selected based on last name.
It's for this reason that MySQL searches a B-tree index up to the first range condition, and then does not use the index any further. You can still make conditions for the other dimension, but it will do a manual search through all the rows matched by the first dimension.
In other words, even if you have a compound index on (lat, long), MySQL will not use the long part of the index:
select ... from tableName
where lat >= 14.1232 and lat <=13.123 /* index-assisted */
and lng >=-80.123 and lng <=-79.232 /* full scan */
and name like '%greg%' /* pattern search never uses index anyway */
(By the way, your lat condition can never be true as you have written it, but I'll assume you mean the numbers to be reversed.)
This makes it inefficient to do latitude & longitude conditions, since both are search for a range of values.
For this reason, MySQL has another type of index, which is not a B-tree index. It's a SPATIAL index, which does support multiple range conditions.
CREATE TABLE mytable (
name TEXT NOT NULL,
coord POINT NOT NULL,
SPATIAL INDEX (coord)
);
INSERT INTO mytable (name, coord)
VALUES ('name', ST_GeomFromText('POINT(14.0 -80)'));
SELECT name FROM mytable
WHERE MBRContains(
ST_GeomFromText('Polygon((
13.123 -80.123,
14.1232 -80.123,
14.1232 -79.232,
13.123 -79.232,
13.123 -80.123))'),
coord);
Yes, this is more complex, but it's the only way you can get truly index-optimized latitude/longitude searches.
Read more about it here: http://dev.mysql.com/doc/refman/5.7/en/using-spatial-data.html

If you absolutely want each where clause to limit the result set in order you could try something like this but an sql optimizer might change things under the cover. I think a good index or two is still your best bet but I believe this is what you are asking for. I recommend Explain Plan to optimize your queries.
select * from
(
select * from
(
select * from tableName
where lat >= 14.1232 and lat <=13.123
)
where lng >=-80.123 and lng <=-79.232
)
where name like '%greg%'

Related

Index not used against MySQL SET column

I have a large data table containing details by date and across 3 independent criteria with around 12 discreet values for each criteria. That is, each criteria field in the table is defined as a 12 value ENUM. Users pull summary data by date and any filtering across the three criteria, including none at all. To make a single criteria lookup efficient, 3 separate indexes are required (date,CriteriaA), (date,CriteriaB), (date,CriteriaC). 4 indexes if you want to lookup against any of the 3 (date,A,B,C),(date,A,C),(date,B,C),(date,C).
In an attempt to be more efficient in the lookup, I built a SET column containing all 36 values from the 3 criteria. All values across the criteria are unique and none are a subset of any other. I added an index to this set (date, set_col). Queries against this table using a set lookup fails to take advantage of the index, however. Neither FIND_IN_SET('Value',set_col), set_col LIKE '%Value%', nor set_col & [pos. in set] triggers the index (according to explain and overall resultset return speed).
Is there a trick to indexing SET columns?
I tried queries like
Select Date, count(*)
FROM tbl
where DATE between [Start] and [End]
and FIND_IN_SET('Value',set_col)
group by Date
I would expect it to run nearly as fast as a lookup against the individual criteria column that has an index against it. But instead it runs as fast when just an index against DATE exists. Same number of rows processed according to Explain.
It's not possible to index SET columns for arbitrary queries.
A SET type is basically a bitfield, with one bit set for each of the values defined for your set. You could search for a specific bit pattern in such a bitfield, or you could search for a range of specific bit patterns, or an inequality, etc. But searching for rows where one specific bit is set in the bitfield is not going to be indexable.
FIND_IN_SET() is really searching for a specific bit set in the bitfield. It will not use an index for this predicate. The best you can hope to do for optimization is to have an index that narrows down the examined rows based on the other search term on date. Then among the rows matching the date range, the FIND_IN_SET() will be applied row-by-row.
It's the same problem as searching for substrings. The following predicates will not use an index on the column:
SELECT ... WHERE SUBSTRING(mytext, 5, 8) = 'word'
SELECT ... WHERE LOCATE(mytext, 'word') > 0
SELECT ... WHERE mytext LIKE '%word%'
A conventional index on the data would be alphabetized from the start of the string, not from some arbitrary point in the middle of the string. This is why fulltext indexing was created as an alternative to a simple B-tree index on the whole string value. But there's no special index type for bitfields.
I don't think the SET data type is helping in your case.
You should use your multi-column indexes with permutations of the columns.
Go back to 3 ENUMs. Then have
INDEX(A, date),
INDEX(B, date),
INDEX(C, date)
Those should significantly help with queries like
WHERE A = 'foo' AND date BETWEEN...
and somewhat help for
WHERE A = 'foo' AND date BETWEEN...
AND B = 'bar'
If you will also have queries without A/B/C, then add
INDEX(date)
Note: INDEX(date, A) is no better than INDEX(date) when using a "range". That is, I recommend against the indexes you mentioned.
FIND_IN_SET(), like virtually all other function calls, is not sargable . However enum=const is sargable since it is implemented as a simple integer.
You did not mention
WHERE A IN ('x', 'y') AND ...
That is virtually un-indexable. However, my suggestions are better than nothing.

How to get the closest matches first from MySQL

I have a table of 10 million records and and am trying to select user details like firstname, lastname and country. I need to get results back in order where (order by column="abc") would give me results where those that match are ranked on the top.
what I have tried
Query one
-- this is match slower with 45+ seconds
select firstname, lastname, town
from user_db
order by town="abc" DESC
limit 25;
Query two
-- much faster with 0.00019 seconds
select firstname, lastname, town
from user_db
order by town DESC
limit 25;
The problem
The first query also works but takes 45+ seconds while if I remove the equals expression in the (order by clause) like in the second query, it's match faster. And obviously I do use where clauses but this is a simplified example.
other notes
There are currently no joins on the query as it is just a simple select statment of user details and my setup is pretty good with 30GB ram and 2TB of storage all local.
Indexes: All columns mentioned have indexes but the (order by town="abc") clause triggers a full table search and as a result, this ends up finishing in 2 minutes
Is there a way to get results ranked by closest matches first faster within a single query?
Any help will gladly be appreciated. Thank you.
It looks to me like your user_db table has an index on your town column. That means ORDER BY town DESC LIMIT 25 can be satisfied in O(1) constant time by random-accessing the index to the last row and then scanning 25 rows of the index.
But your ORDER BY town='abc' DESC LIMIT 25 has to look at, and sort, every single row in the table. MySQL doesn't use an index to help compute that town='abc' condition when it appears in an ORDER BY clause.
Many people with requirements like yours use FULLTEXT searching and ordering by the MATCH() function. That gets a useful ordering for a person looking at the closest matches like in the searching location bar of a web browser. But don't expect Google-like match accuracy from MySQL.
You can decouple the query into two queries each one being very fast.
First, create an index on town.
create index ix1 on user_db (town);
Then get the matches, with a limit of 25 rows:
select * from user_db where town = 'abc' limit 25
The query above may return any number of rows between 0 and 25: let's call this number R. Then, get the non-matches:
select * from user_db where town <> 'abc' limit 25 - R
Assemble both result sets and problem solved. Even if the second query results in a table scan, it will be concluded earlier resulting in a low cost.
One way is to add a new column that has a value of country="abc", then sort by this column.
I'm rebuilding my workspace right now so I cannot try it properly, but something like:
select firstname, lastname, town, town="abc" as sortme
from user_db
order by sortme desc, town, lastname, firstname
limit 25;
While it is unclear what you mean by "closest match" it is difficult to answer your question. Are "abd", "bc" etc regarded a close match to "abc"? Should the word "abc" appear in the town and match "abcville"?
There are a number of options.
Appearance of search string
Using a like "%abc%" where clause will find all towns with the string "abc" appearing in it.
select firstname, lastname
from user_db
where town like "%abc%"
order by town
Leave out the first % if you want to search by towns starting with "abc". The advantage is that this probably will search in the index if there is one for town. Use "abc%" to find towns starting with "abc". There is no ranking but you could add a sort.
Use a fulltext index
Create a FULLTEXT index on town:
ALTER TABLE user_db
ADD FULLTEXT(town);
And use this with a match:
SELECT
MATCH(town) AGAINST('abc') AS Relevance,
firstname, lastname
FROM user_db
WHERE MATCH(town) AGAINST('abc')
ORDER BY Relevance DESC
LIMIT 15
Match uses words to calculate the match so in this case the string "abc" must appear with spaces in in the town in order to have a match. The NATURAL LANGUAGE options work well for plain texts but might not do so for town names.
To be honest I have no experience with FULLTEXT and match performance but it probably is well optimized and works fairly good on large tables.
Create additional fields
As storage is cheap and time is not you might want to consider adding additional fields with search strings or alternative spellings for 'town' create all the indexes you'll need and use that as a search source. As this will need analysis of your use case it is difficult to provide an solution.

Creating good indexes for table

I'm working with a MariaDB (MySQL) table which contains information about some map points (latitude and longitude) and a quantity.
I'm making a lot of querys that retrieve some of this points and I want to optimize the query using indexes. I don't know how to do it well.
My queries are like this:
SELECT p.id, p.lat, p.lon, p.quantity
FROM Points p
WHERE ((p.lat BETWEEN -10.0 AND 50.5) AND
(p.lon BETWEEN -30.1 AND 20.2) AND
(100 <= p.quantity AND 2000 >= p.quantity))
ORDER BY p.name DESC;
So, the columns involved in the queries are: lat, lon and quantity.
Could anyone help me?
What you want here is a spatial index. You will need to alter the schema of your table (by turning lat and lon into a single POINT or GEOMETRY value) to support this, and use specific functions to query that value. Once you've done this, you can create a spatial index using CREATE SPATIAL INDEX; this index will allow you to perform a variety of highly optimized queries against the value.
There's more information on using spatial types in MySQL in the "Spatial Data Types" section of the MySQL manual.
When you have multiple range conditions, even if you have an standard B-tree index on all the columns, you can only get an index to optimize the first range condition.
WHERE ((p.lat BETWEEN -10.0 AND 50.5) -- index on `lat` helps
AND (p.lon BETWEEN -30.1 AND 20.2) -- no help from index
AND (100 <= p.quantity AND 2000 >= p.quantity)) -- no help from index
You can either index lat or you can index lon or you can index quantity but your query will only be able to use an B-tree index to optimize one of these conditions.
This is why the answer from #achraflakhdhar is wrong, and it's why the answer from #duskwuff suggested using a spatial index.
A spatial index is different from a B-tree index. A spatial index is designed to help exactly this sort of case, where you need range conditions in two dimensions.
Sorry this sounds like it will cause some rework for your project, but if you want it to be optimized, that's what you will have to do.
Toss indexes you have, and add these:
INDEX(lat, lon),
INDEX(lon, lat),
INDEX(quantity)
Some discussion is provided here

MySQL - Poor performance in a select from a simple table

I have a very simple table with three columns:
- A BigINT,
- Another BigINT,
- A string.
The first two columns are defined as INDEX and there are no repetitions. Moreover, both columns have values in a growing order.
The table has nearly 400K records.
I need to select the string when a value is within those of column 1 and two, in order words:
SELECT MyString
FROM MyTable
WHERE Col_1 <= Test_Value
AND Test_Value <= Col_2 ;
The result may be either a NOT FOUND or a single value.
The query takes nearly a whole second while, intuitively (imagining a binary search throughout an array), it should take just a small fraction of a second.
I checked the index type and it is BTREE for both columns (1 and 2).
Any idea how to improve performance?
Thanks in advance.
EDIT:
The explain reads:
Select type: Simple,
Type: Range,
Possible Keys: PRIMARY
Key: Primary,
Key Length: 8,
Rows: 441,
Filtered: 33.33,
Extra: Using where.
If I understand your obfuscation correctly, you have a start and end value such as a datetime or an ip address in a pair of columns? And you want to see if your given datetime/ip is in the given range?
Well, there is no way to generically optimize such a query on such a table. The optimizer does not know whether a given value could be in multiple ranges. Or, put another way, whether the ranges are disjoint.
So, the optimizer will, at best, use an index starting with either start or end and scan half the table. Not efficient.
Are the ranges non-overlapping? IP Addresses
What can you say about the result? Perhaps a kludge like this will work: SELECT ... WHERE Col_1 <= Test_Value ORDER BY Col_1 DESC LIMIT 1.
Your query, rewritten with shorter identifiers, is this
SELECT s FROM t WHERE t.low <= v AND v <= t.high
To satisfy this query using indexes would go like this: First we must search a table or index for all rows matching the first of these criteria
t.low <= v
We can think of that as a half-scan of a BTREE index. It starts at the beginning and stops when it gets to v.
It requires another half-scan in another index to satisfy v <= t.high. It then requires a merge of the two resultsets to identify the rows matching both criteria. The problem is, the two resultsets to merge are large, and they're almost entirely non-overlapping.
So, the query planner probably should just choose a full table scan instead to satisfy your criteria. That's especially true in the case of MySQL, where the query planner isn't very good at using more than one index.
You may, or may not, be able to speed up this exact query with a compound index on (low, high, s) -- with your original column names (Col_1, Col_2, MyString). This is called a covering index and allows MySQL to satisfy the query completely from the index. It sometimes helps performance. (It would be easier to guess whether this will help if the exact definition of your table were available; the efficiency of covering indexes depends on stuff like other indexes, primary keys, column size, and so forth. But you've chosen minimal disclosure for that information.)
What will really help here? Rethinking your algorithm could do you a lot of good. It seems you're trying to retrieve rows where a test point v lies in the range [t.low, t.high]. Does your application offer an a-priori limit on the width of the range? That is, is there a known maximum value of t.high - t.low? If so, let's call that value maxrange. Then you can rewrite your query like this:
SELECT s
FROM t
WHERE t.low BETWEEN v-maxrange AND v
AND t.low <= v AND v <= t.high
When maxrange is available we can add the col BETWEEN const1 AND const2 clause. That turns into an efficient range scan on an index on low. In that case, the covering index I mentioned above will certainly accelerate this query.
Read this. http://use-the-index-luke.com/
Well... I found a suitable solution for me (not sure your guys will like it but, as stated, it works for me).
I simply partitioned my 400K records into a number of tables and created a simple table that serves as a selector:
The selector table holds the minimal value of the first column for each partition along with a simple index (i.e. 1, 2, ,...).
I then user the following to get the index of the table that is supposed to contain the searched for range like:
SELECT Table_Index
FROM tbl_selector
WHERE start_range <= Test_Val
ORDER BY start_range DESC LIMIT 1 ;
This will give me the Index of the table I wish to select from.
I then have a CASE on the retrieved Index to select the correct partition table from perform the actual search.
(I guess that more elegant would be to use Dynamic SQL, but will take care of that later; for now just wanted to test the approach).
The result is that I get the response well below a second (~0.08) and it is uniform regardless of the number being used for test. This, by the way, was not the case with the previous approach: There, if the number was "close" to the beginning of the table, the result was produced quite fast; if, on the other hand, the record was near the end of the table, it would take several seconds to complete).
[By the way, I assume you understand what I mean by beginning and end of the table]
Again, I'm sure people might dislike this, but it does the job for me.
Thank you all for the effort to assist!!

Would adding more specificity to a SELECT query make it faster?

I have a rather large table with 150,000 + records. It has a lot of fields like country_id, region_id, city_id, latitude, longitude, postal_code, to name just a few.
I need to select from this table based on the latitude and longitude, nothing else. My query looks like this:
SELECT * FROM `mytable`
WHERE `latitude` = '$latitude'
AND `longitude` = '$longitude';
Now although my sole criteria for selecting from this table is latitude/longitude I was wondering if adding more specificity would quicken the query like:
SELECT * FROM `mytable`
WHERE `city_id` = '320'
AND `latitude` = '$latitude'
AND `longitude` = '$longitude';
It seems counter intuitive to me that adding more conditions to the query would speed it up but at the same time I am narrowing down the number of possible results by a large margin by first making sure all the records selected are from a particular city id which I know resides in the latitude and longitude range I am going to be specifying by next.
There may be around 150k total records but only about 10k from that particular city.
So is this sensible at all or am I just making the query more time consuming?
In general, adding conditions to a WHERE clause in your query will not have much impact on execution time. However, if you add indexes to the fields mentioned in your WHERE clause, you could considerably improve performance.
So in your example, if there is not an index on city_id and you were to add that condition to your WHERE clause and create an index on that field, you would likely see a considerable performance improvement.
Adding criteria could go either way, but usually more criteria does help performance, especially if the columns have indices.
In your case if you have an index that includes city_id and long and lat you'll get much better performance.