Mysql: can a query be served by Btree and geospatial index? - mysql

We use mysql 5.7 to store geospatial data. A table contains 5 columns store_id (String), tenant_id (String), type (enum), status(enum), boundaries (Multipolygon). The boundaries column has only one polygon but the type was set as MultiPolygon.
Our query has
SELECT DISTINCT store_id
FROM ${boundariesTable}
WHERE tenant_id = '${tenantId}'
AND status = '${status}'
AND type <> 'Z'
AND ST_Contains(boundaries, GeomFromText(
'Point(${coords.lng} ${coords.lat})'))
This DB call is very slow when boundary data has circles with several geolocation points. Hence, we want to use a geospatial index for the boundaries key. Would we need to modify the above query to use a geospatial index for the boundaries column? If yes then how should we structure the query? Without other parameters like type and tenantId, the number of rows increases multifold. So I am apprehensive to remove all other constraints and retain only the ST_Contains part of the query.
Thank

Related

About the sql performance of select ... in

Mysql 5.7.21
I use pool to connect database and run the SQL
let mysql = require('mysql');
let pool = mysql.createPool(db);
pool.getConnection((err, conn) => {
if(err){
...
}else{
console.log('allConnections:' + pool._allConnections.length);
let q = conn.query(sql, val, (err, rows,fields) => {
...
I have a table with around 1,000,000 records. I wrote a select to fecth the records.
select * from tableA where trackingNo in (?)
I will send the trackingNo via array param. The amount of trackingNo is around 20000. It means the length of array is around 20000.
And I made the index to trackingNo column. (trackingNo column is varchar type, not unique, can be null, blank and all possible values)
The problem is, I find it will cost around 5 minutes to get the results! 5 minutes here means purely backend sql handling time. I think it is too slow to match 20000 records in 1,000,000 records. Do you have any suggestion for select.. in ?
Explain SQL:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE tableA null ALL table_tracking_no_idx null null null 999507 50 Using where
You could consider populating a table with the tracking numbers you want to match. Then, you could use an inner join instead of your current WHERE IN approach:
SELECT *
FROM tableA a
INNER JOIN tbl b
ON a.trackingNo = b.trackingNo;
This has the advantage that you may index the new tbl table on the trackingNo column to make the join lookup extremely fast.
This assumes that tbl would have a single column trackingNo which contains the 20K+ values you need to consider.
MySQL creates a binary search tree for IN lists that are composed of constants. As explained in the documentation:
If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.
In general, creating a separate table with constants does not provide much improvement in performance.
I suppose there could be some subtle issue with type compatibility -- such as collations -- that interferes with this process.
This type of query probably requires a full table scan. If the rows are wide, then the combination of the scan and returning the data may be accounting for the performance. I do agree that five minutes is a long time, but it could be entirely due to the network connection between the app/GUI and the database.

MySQL Index on date field

I'm using MySQL database. In that i'm having sales datas in one table and i've created an index for the Date column (i.e) OrderedDate.
The datas are retrieved fast when we use the query like
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE OrderDate BETWEEN '2012-01-08' AND '2012-05-08';
But for taking the details for Particular Quarter cycle , it gets slow and scans whole table datas..
The Query is like
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE Quarter(OrderDate) = Quarter('2012-01-08')
AND Year(OrderDate) = Year('2012-01-08');
Is there anyway to index the quarter function (or) any way to speedup the data retrieval for quarter cycle...?
Explain statement:
For 1st Query
id Selecttype table type possible_keys key key_len ref rows Extra
1 SIMPLE SalesData range csrv csrv 4 138 Using where
For 2nd Query (scanning 104785 rows data)
id Selecttype table type possible_keys key key_len ref rows Extra
1 SIMPLE SalesData All 104785 Using where
You found the solution yourself, why don't you use it?
Just calculate the boundary dates for the quarter in question and use them with BETWEEN:
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE OrderDate BETWEEN '2012-01-01' AND '2012-03-31';
You can calculate the boundary dates in your application, or also via MySQL as shown in this article:
http://use-the-index-luke.com/sql/where-clause/obfuscation/dates?db_type=mysql
There is no way to index anyFunction(OrderDate) in mysql, unless your store it separately.
I think you can handle it more efficient in 2 ways:
1- Migrate from MySQL to Mariadb, it has Computed/Virtual columns that you can make virtual field as OrderDateQuarter and OrderDateYear also you can index them without much overhead.
(MariaDB is a community-developed fork of the MySQL, Ithink it much better than native MySQL)
2- You can store OrderDateQuarter and OrderDateYear in another columns, then index them, you can make it easy by write a trigger to store these two columns OrderDate.

Use timestamp(or datetime) as part of primary key (or part of clustered index)

I use following query frequently:
SELECT * FROM table WHERE Timestamp > [SomeTime] AND Timestamp < [SomeOtherTime] and publish = 1 and type = 2 order by Timestamp
I would like to optimize this query, and I am thinking about put timestamp as part of primary key for clustered index, I think if timestamp is part of primary key , data inserted in table has write to disk sequentially by timestamp field.Also I think this improve my query a lot, but am not sure if this would help.
table has 3-4 million+ rows.
timestamp field never changed.
I use mysql 5.6.11
Anothet point is : if this is improve my query , it is better to use timestamp(4 byte in mysql 5.6) or datetime(5 byte in mysql 5.6)?
Four million rows isn't huge.
A one-byte difference between the data types datetime and timestamp is the last thing you should consider in choosing between those two data types. Review their specs.
Making a timestamp part of your primary key is a bad, bad idea. Think about reviewing what primary key means in a SQL database.
Put an index on your timestamp column. Get an execution plan, and paste that into your question. Determine your median query performance, and paste that into your question, too.
Returning a single day's rows from an indexed, 4 million row table on my desktop computer takes 2ms. (It returns around 8000 rows.)
1) If values of timestamp are unique you can make it primary key. If not, anyway create index on timestamp column as you frequently use it in "where".
2) using BETWEEN clause looks more natural here. I suggest you use TREE index (default index type) not HASH.
3) when timestamp column is indexed, you don't need call order by - it already sorted.
(of course, if your index is TREE not HASH).
4) integer unix_timestamp is better than datetime both from memory usage side and performance side - comparing dates is more complex operation than comparing integer numbers.
Searching data on indexed field takes O(log(rows)) tree lookups. Comparison of integers is O(1) and comparison of dates is O(date_string_length). So, difference is (number of tree lookups) * (difference_comparison) = O(date_string_length)/O(1))* O(log(rows)) = O(date_string_length)* O(log(rows))

Should I index a field if I'll only search with FIELD is not null?

I have a table with a nullable datetime field.
I'll execute queries like this:
select * from TABLE where FIELD is not null
select * from TABLE where FIELD is null
Should I index this field or is not necessary? I will NOT search for some datetime value in that field.
It's probably not necessary.
The only possible edge case when index can be used (and be of help) is if the ratio of null / not-null rows is rather big (e.g. you have 100 NULL datetimes in the table with 100,000 rows). In that case select * from TABLE where FIELD is null would use the index and be considerably faster for it.
In short: yes.
Slightly longer: yeeees. ;-)
(From http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html) - "A search using col_name IS NULL employs indexes if col_name is indexed."
It would depend on the number of unique values and the number of records in the table. If your just searching on whether or not a column is null or not, you'll probably have one query use it and one not depending on the amount of nulls in the table overall.
For example: If you have a table with 99% of the records have the querying column as null and you put/have an index on the column and then execute:
SELECT columnIndexed FROM blah WHERE columnIndexed is null;
The optimizer most likely won't use the index. It won't because it will cost more to read the index and then read the associated data for the records, than to just access the table directly. Index usage is based on the statistical analysis of a table, and one major player in that is cardinality of the values. In general, indexes work best and give the best performance when they select a small subset of the rows in the table. So if you change the above query to select where columnIndexed is not null, your bound to use the index.
For more details check out the following: http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html

How many fields should be indexed and how should I create them?

I've got a table in a MySQL database that has the following fields:
ID | GENDER | BIRTHYEAR | POSTCODE
Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000); or SELECT * FROM table WHERE BIRTHYEAR = 1973;)
From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns it won't use the index if the ID field isn't used. Do I need to create an index for every possible combination of field (ID; ID/GENDER; ID/BIRTHYEAR; etc.) or will creating one index for all fields be sufficient?
If it makes any difference, there are upwards of 3 million records in this table.
In this situation I typically log search criteria, number of results returned and time taken to perform the search. Just because you're creating the flexibility to search by any field doesn't mean your users make use of this flexibility. I'd normally create indexes on sensible combinations and then once I've determined the usage patterns drop the lowly used indexes or create new unsuspected indexes.
I'm not sure if MySQL supports statistics or histograms for skewed data but the index on gender may or may not work. If MySQL supports statistics then this will indicate the selectivity of an index. In a general population an index on a field with a 50/50 split won't help. If you're sample data is computer programmers and the data is 95% males then a search for females would use the index.
Use EXPLAIN.
(I'd say, use Postgres, too, lol).
It seems recent versions of MySQL can use several indexes in the same query, they call this Index Merge. In this case 1 index per column will be enough.
Gender is a special case, since selectivity is 50% you don't need an index on it, it would be counterproductive.
Creating indexes on single fields is useful but it would be really useful if your data was of varchar type and each record had a different value, since birthyear and postcode are numbers they are already well indexed.
You can index birthyear because it should be different for many of the records (but up to 120 birthyears in total at max I guess).
Gender in my opinion doesn't need an index.
You can find out what field combinations are most likely to give different results and index those, like: birthyear - postcode, id - birthyear, id - postcode.