I seem to hit slower query result for searching nearby coordinates ( for now the query is for latitude). This is a mysql query
select ABS(propertyCoordinatesLat - 3.33234) as diff from tablename order by diff asc limit 0,20
is there a way to improve this besides relying on server scripting to do the sorting?
table dump.
CREATE TABLE `property` (
`propertyID` bigint(20) NOT NULL,
`propertyName` varchar(100) NOT NULL,
`propertyCoordinatesLat` varchar(100) NOT NULL,
`propertyCoordinatesLng` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `property`
--
ALTER TABLE `property`
ADD PRIMARY KEY (`propertyID`),
ADD KEY `propertyCoordinatesLat` (`propertyCoordinatesLat`,`propertyCoordinatesLng`),
ADD KEY `propertyCoordinatesLat_2` (`propertyCoordinatesLat`),
ADD KEY `propertyCoordinatesLng` (`propertyCoordinatesLng`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `property`
--
ALTER TABLE `property`
MODIFY `propertyID` bigint(20) NOT NULL AUTO_INCREMENT;
COMMIT;
The query is ordering by the difference between a string and a float. This odd calculation confuses and angers MySQL and results in a slow filesort.
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by diff
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 302 | NULL | 1 | 100.00 | Using index; Using filesort |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
Changing propertyCoordinatesLat and propertyCoordinatesLng to a more sensible numeric type lets MySQL optimize better. No more filesort. This should perform much better.
alter table property change propertyCoordinatesLat propertyCoordinatesLat numeric(10,8) not null;
alter table property change propertyCoordinatesLng propertyCoordinatesLng numeric(11,8) not null;
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by propertyCoordinatesLat asc limit 0,20;
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 5 | NULL | 1 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
If you want to get fancy, look into MySQL's spatial types. These will probably perform better, and definitely be more accurate.
Related
I have a MySQL database with the following structure :
mysql> describe company;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> describe nameserver;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| companyId | int | NO | MUL | NULL | |
| ns | varchar(250) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
mysql> describe domain;
+--------------+--------------+------+-----+-------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+-------------------+-------------------+
| id | int | NO | PRI | NULL | auto_increment |
| nameserverId | int | NO | MUL | NULL | |
| domain | varchar(250) | NO | MUL | NULL | |
| tld | varchar(20) | NO | MUL | NULL | |
| createDate | datetime | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
| updatedAt | datetime | YES | | NULL | |
| status | tinyint | NO | | NULL | |
| fileNo | smallint | NO | MUL | NULL | |
+--------------+--------------+------+-----+-------------------+-------------------+
The indexes structure :
-- Indexes for table `company`
--
ALTER TABLE `company`
ADD PRIMARY KEY (`id`);
--
-- Indexes for table `domain`
--
ALTER TABLE `domain`
ADD PRIMARY KEY (`id`),
ADD KEY `nameserver` (`nameserverId`),
ADD KEY `domain` (`domain`),
ADD KEY `tld` (`tld`),
ADD KEY `fileNo` (`fileNo`);
--
-- Indexes for table `nameserver`
--
ALTER TABLE `nameserver`
ADD PRIMARY KEY (`id`),
ADD KEY `company` (`companyId`),
ADD KEY `ns` (`ns`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `company`
--
ALTER TABLE `company`
MODIFY `id` int NOT NULL AUTO_INCREMENT;
--
-- AUTO_INCREMENT for table `domain`
--
ALTER TABLE `domain`
MODIFY `id` int NOT NULL AUTO_INCREMENT;
--
-- AUTO_INCREMENT for table `nameserver`
--
ALTER TABLE `nameserver`
MODIFY `id` int NOT NULL AUTO_INCREMENT;
--
-- Constraints for dumped tables
--
--
-- Constraints for table `domain`
--
ALTER TABLE `domain`
ADD CONSTRAINT `nameserver` FOREIGN KEY (`nameserverId`) REFERENCES `nameserver` (`id`);
--
-- Constraints for table `nameserver`
--
ALTER TABLE `nameserver`
ADD CONSTRAINT `company` FOREIGN KEY (`companyId`) REFERENCES `company` (`id`);
The amount of data is as following:
domain table about 500 millions records
nameserver table about 2 millions records
Running this query take about 4 hours to get me the result :
SELECT distinct domain FROM domain
INNER join nameserver on nameserver.id = domain.nameserverId
WHERE nameserver.companyId = 2
The explain result for above query :
+----+-------------+------------+------------+------+-------------------
+------------+---------+-----------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+-------------------+------------+---------+-----------------------+------+----------+------------------------------+
| 1 | SIMPLE | nameserver | NULL | ref | PRIMARY,company | company | 4 | const | 1738 | 100.00 | Using index; Using temporary |
| 1 | SIMPLE | domain | NULL | ref | nameserver,domain | nameserver | 4 | tldzone.nameserver.id | 716 | 100.00 | NULL |
+----+-------------+------------+------------+------+-------------------+------------+---------+-----------------------+------+----------+------------------------------+
My question is how can I improve the speed of getting query from this database?
It is possible for me to change the DB structure or even replace it with another DBMS.
MySQL is running on a VPS with 8.0 GB RAM and dual core CPU.
nameserver: INDEX(companyId, id) -- in this order (you have this)
domain: INDEX(nameserverId, domain) -- in this order
("MUL" does not tell me whether you already have either of these composite indexes. SHOW CREATE TABLE is more descriptive than DESCRIBE.)
1 Add indexes to the relevant columns: Adding indexes to the companyId, nameserverId, and domain columns in the nameserver and domain tables can help to speed up the query by allowing the database to quickly locate the relevant rows.
2 Use a covering index: A covering index is an index that includes all the columns that are used in the query. By creating a covering index on the companyId, nameserverId, and domain columns, you can avoid the need for the database to look up the data in the actual tables, which can improve query performance.
3 Use a column-store index: A column-store index is an index that stores data by column rather than by row. Column-store indexes can be more efficient for querying large datasets and can improve the performance of the query you provided.
4 Use a database management system that is optimized for large datasets: If you are using a database management system that is not well-suited to handling large datasets, you may see improved performance by switching to a different system. Some options to consider include column-oriented database management systems such as Vertica or ClickHouse, or distributed database management systems such as Cassandra or HBase.
5 Consider using a distributed database: If you have a very large dataset and are still experiencing slow query performance, you may want to consider using a distributed database management system, which allows you to spread your data across multiple servers and can improve the scalability and performance of your database.
6 It's important to keep in mind that the specific solutions that work best for you will depend on the specific requirements of your database and the workload you are placing on it. It may be helpful to perform some benchmarking and testing to determine which approaches work best for your needs.
table:(quantity:2100W)
CREATE TABLE `prefix` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`number` int(11) NOT NULL,
`string` varchar(750) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_string_prefix10` (`string`(10)),
KEY `idx_string` (`string`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
discrimination:
select count(distinct(left(string,10)))/count(*) from prefix;
+-------------------------------------------+
| count(distinct(left(string,10)))/count(*) |
+-------------------------------------------+
| 0.9999 |
+-------------------------------------------+
result:
select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string <"1505d28b"
243.96s,241.88s
select sql_no_cache count(*) from prefix force index(idx_string)
where string < "1505d28b"
7.96s,7.21s,7.53s
why prefix index is slower than index in mysql?(forgive my broken English)
explain select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string < "1505d28b";
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | prefix | NULL | range | idx_string_prefix10 | idx_string_prefix10 | 42 | NULL | 3489704 | 100.00 | Using where |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
When you use a prefix index, MySQL has to read from the index and also after reading the index, it has to read the row of data too, to make sure the value is selected by the WHERE condition. That's two reads, and scanning a lot more data.
When you use a non-prefix index, MySQL can read the whole string value from the index, and it knows immediately whether the value is selected by the condition, or if it can be skipped.
I know there are many questions already posted to this topic, but I haven't find a solution for my problem.
I have just one table that has 1.042.162 rows.
Table definition:
CREATE TABLE `tbllinks` (
`idLinks` int(11) NOT NULL AUTO_INCREMENT,
`linksText` varchar(500) DEFAULT NULL,
`linksLastChecked` datetime DEFAULT NULL,
`linksLastNewData` datetime DEFAULT NULL,
PRIMARY KEY (`idLinks`),
UNIQUE KEY `idtblLinks_UNIQUE` (`idLinks`),
UNIQUE KEY `linksText_UNIQUE` (`linksText`),
KEY `fasterDate` (`linksLastChecked`),
KEY `faster2` (`linksText`)
) ENGINE=InnoDB AUTO_INCREMENT=3029595 DEFAULT CHARSET=latin1;
This one:
SELECT * FROM tbllinks order by linksLastChecked asc limit 9324;
needs 0.094 sec.
And this one:
SELECT * FROM tbllinks order by linksLastChecked asc limit 9325;
needs 42.559 sec.
Both querys only return "Null" values in the linksLastChecked column (what is right), but most datasets have a Value.
There is also nothing special about the dataset 9325. So i realy have no idea why its so much longer for just one Dataset more.
Edit:
With explains it makes sence why it needs so long.
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------+
| 1 | SIMPLE | tbllinks | index | NULL | fasterDate | 6 | NULL | 9324 | NULL |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------+
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | tbllinks | ALL | NULL | NULL | NULL | NULL | 805031 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+----------------+
The first one is using the index key, the scond one not! But i have still the question why that and why the second one is unsing 805031 rows?
Edit: (Answer)
Ok sorry that I even ask. I just searched with the wrong question.
While searching for "mysql doesn't use index" i found this nice article.
http://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index
while USE INDEX didnt worked for me, FORCE INDEX helped.
I have the following MySQL table (table size - around 10K records):
CREATE TABLE `tmp_index_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`m_id` int(11) DEFAULT NULL,
`r_id` int(11) DEFAULT NULL,
`price` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `m_key` (`m_id`),
KEY `r_key` (`r_id`),
KEY `price_key` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=16390 DEFAULT CHARSET=utf8;
As you can see, I have two INTEGER fields (r_id and m_id) and one FLOAT field (price).
For each of these fields I have an index.
Now, when I run a query with condition on the first integer AND on the second one, everything is fine:
mysql> explain select * from tmp_index_test where m_id=1 and r_id=2;
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| 1 | SIMPLE | tmp_index_test | index_merge | m_key,r_key | r_key,m_key | 5,5 | NULL | 1 | Using intersect(r_key,m_key); Using where |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
Seems like MySQL performs it very well since there is the Using intersect(r_key,m_key) in the Extra field.
I'm not a MySQL expert, but according to what I understand, MySQL is first making the intersection on indexes, and only then collects the result of the intersection from the table itself.
HOWEVER, when I run very similar query, but instead of condition on two integers, I put similar condition on an integer and a float, MySQL refuses to intersect the result on indexes:
mysql> explain select * from tmp_index_test where m_id=3 and price=100;
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | price_key | 5 | const | 1 | Using where |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
As you can see, MySQL decides to use the index of price only.
My first question is why, and how to fix it?
In addition to it, I need to run queries with MORE sign (>) instead of the equal sign (=) on price. Currently explain shows that for such queries, MySQL uses the integer key only.
mysql> explain select * from tmp_index_test where m_id=3 and price > 100;
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | m_key | 5 | const | 2 | Using where |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
I need to make somehow MySQL first do the intersection on indexes. Anybody has any idea how?
Thanks a lot in advance!
From the MySQL manual:
ref is used if the join uses only a leftmost prefix of the key or if
the key is not a PRIMARY KEY or UNIQUE index (in other words, if the
join cannot select a single row based on the key value). If the key
that is used matches only a few rows, this is a good join type.
price is not unique or primary, so ref is chosen. I don't believe you can force an intersect.
I have the following query:
explain select * from users, dls where dls.user_id=users.id and users.status = 'accepted' and users.acc = 0 order by users.user_name desc limit 18416, 16
Which results in the following explain;
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
| 1 | SIMPLE | dls | ALL | PRIMARY,user_id | NULL | NULL | NULL | 19910 | Using temporary; Using filesort |
| 1 | SIMPLE | users | ref | PRIMARY,id_user_name | id_user_name | 4 | dls.user_id | 1 | Using where |
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
2 rows in set (0.00 sec)
This query is really, really slow and I cannot figure out how to fix it. I tried all kinds of indexes from reading articles on how to optimize order by / limit queries, but the result remains the same. Can anyone please help?
Edit: schemas:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_name` varchar(100) character set utf8 NOT NULL,
`status` enum('accepted','rejected') character set utf8 NOT NULL,
`acc` varchar(6) character set utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `user_name` (`user_name`),
KEY `id_user_name` (`id`,`user_name`)
)
CREATE TABLE `dls` (
`user_id` int(10) unsigned NOT NULL,
`category_id` bigint(20) NOT NULL,
`download_url` varchar(255) character set utf8 NOT NULL,
PRIMARY KEY (`user_id`,`category_id`),
KEY `user_id` (`user_id`)
)
Output for query by Scrummeister;
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
| 1 | SIMPLE | u | ALL | PRIMARY,id_user_name | NULL | NULL | NULL | 10838 | Using where; Using filesort |
| 1 | SIMPLE | dls | ref | PRIMARY,user_id | user_id | 4 | u.id | 2 | |
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
MySql is known to have issues with a LIMIT using a large offset.
The STRAIGHT_JOIN keyword, tells MySql to first scan the users table and then for every user, look up the rows in the dls table.
SELECT STRAIGHT_JOIN *
FROM users u JOIN dls ON dls.user_id = users.id
WHERE u.status = 'accepted' and u.acc = 0
ORDER BY users.user_name desc
LIMIT 18416, 16
Using STRAIGHT_JOIN is not recommended unless there is a need for it, In this specific case i believe it might work since it can use the user_name index for Sorting.
Other options you have:
Increase the size of sort_buffer_size
Increase the size of read_rnd_buffer_size (with caution!)
Doing the paging on the users table only, regardless of how many dls he has, Only than apply the JOIN.
Handle the paging in your code. Assuming a user goes from page to page with skipping to many, you should store the first & last user names for each page. If the user clicks the next page - Add a WHERE user_name > "{LastPageLastUsername} LIMIT 0,16" this will increase
For other optimization, read ORDER BY Optimization and Limit Optimization
Try add an index to the users table with the following columns
status, acc, user_name
or
acc, status, user_name
which ever is the faster