I've only just began to run explain on my queries and see that the type is All and I'm using filesort.
I'm not sure how to optimise even the simplest of queries, if anyone could provide guidance on the following query which just retrieves users and orders by their first name primarily and second name secondarily:
SELECT UserID, TRIM(FName) AS FName, TRIM(SName) as SName, pic
FROM users WHERE Blocked <> 1
ORDER BY FName, SName
LIMIT ?, 10
Table is created as follows:
CREATE TABLE IF NOT EXISTS `users` (
`UserID` int(11) NOT NULL,
`FName` varchar(25) NOT NULL,
`SName` varchar(25) NOT NULL,
`Pword` varchar(50) NOT NULL,
`Longitude` double NOT NULL,
`Latitude` double NOT NULL,
`DateJoined` bigint(20) NOT NULL,
`Email` varchar(254) NOT NULL,
`NotificationID` varchar(256) NOT NULL,
`Pic` varchar(500) DEFAULT NULL,
`Radius` int(11) NOT NULL,
`ads` tinyint(1) NOT NULL,
`Type` varchar(5) NOT NULL,
`Blocked` tinyint(4) NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=1469 DEFAULT CHARSET=latin1;
Explain gives the following:
id : 1
select_type : SIMPLE
table : users
type : ALL
possible_keys : NULL
key : NULL
key_len : NULL
ref : NULL
rows : 1141
Extra : Using where; Using filesort
Add index (Blocked, FName, SName)
And change where to Blocked = 0, if it is possible
If you want optimize this query you could create an index on the field in where condition
CREATE INDEX id_users_blocked ON users (Blocked) ;
the optimization depend by the number of users withBlocked <> 1
If these are few don't aspect particolar improvement .. but in explain you don't shoudl see All.
You can also add fname, sname in index field but the use of trim for a way and the need of the field pic can't make this index performant .. because if the firsta case the fields normally with function like trim are not get from index and in the second is not gooe an index with a field like pic .. so the access at the table row is mandatory.
SELECT UserID, TRIM(FName) AS FName, TRIM(SName) as SName, pic
FROM users WHERE Blocked <> 1
ORDER BY FName, SName
LIMIT ?, 10
Let's analyze your query. You've used clause WHERE to extract values of column Blocked with value <> 1. Improving this clause depends on data distribution of values in the column Blocked. If only small part of data contains values
blocked <> 1
using INDEX on column blocked will give to you performance increase. In another case INDEX will not help to you.
You have also used TRIM function for each record of your table. If you removed it you will increase performance.
Off course, sort will also affect on query performance.
Related
I have a stored procedure that I've used to 'de-identify' client information when I want to use it in a test environment. I am replacing actual names and addresses with random values. I have database tables in a database called dict (for dictionary) for female names, male names, last names, and addresses.
Each of these has a field called f_row_id that is a sequential number from 1 to x, one for each record in the table.
We recently upgraded to mySQL 8 and the stored procedure quit working. I ended up with NULL for every field where I tried filling in a random value out of the other table. In trying to find what will now work, I'm unable to get the following query to work as I expect:
SELECT
f_enroll_id,
(SELECT f_name FROM dict.dummy_female_first_name fn WHERE fn.f_row_id = (FLOOR(RAND() * 850) + 1) LIMIT 1)
FROM
t_enroll
My data table (that I eventually want to have contain random names) is called t_enroll. There is an ID field in that (f_enroll_id) I want to get a list of each ID and a random first name for each record in that table.
There are 850 records in the table of random first names (dummy_female_first_name) (in my stored procedure this is a session variable that I compute at the start of the procedure).
When I first tried running this I got an error that my sub-query returned more than one value. I don't understand why it would do that since (FLOOR(RAND() * 850) + 1) should return a single integer. So I added the LIMIT 1. But when I run this, about half of the returned rows have NULL for the first name.
I have verified that all the rows in my first name table have a row ID, that the row ID is unique, and there not any gaps in the numbers.
What do you think is causing this?
Thanks in advance!
Here is the schema for the table that I'm updating:
CREATE TABLE `t_enroll` (
`f_enroll_id` int(15) NOT NULL AUTO_INCREMENT,
`f_status` int(2) DEFAULT NULL,
`f_date_enrolled` date NOT NULL DEFAULT '0000-00-00',
`f_first_name` varchar(20) DEFAULT NULL,
`f_mi` char(1) DEFAULT NULL,
`f_last_name` varchar(20) NOT NULL DEFAULT '',
`f_maiden_name` varchar(20) DEFAULT NULL,
`f_dob` date NOT NULL DEFAULT '0000-00-00',
`f_date_fee_received` date NOT NULL DEFAULT '0000-00-00',
`f_gender` int(11) NOT NULL DEFAULT '2',
`f_address_1` varchar(40) DEFAULT NULL,
`f_address_2` varchar(20) DEFAULT NULL,
`f_quadrant` char(2) DEFAULT NULL,
`f_city` varchar(25) DEFAULT NULL,
`f_state` char(2) NOT NULL DEFAULT '',
`f_county` varchar(3) NOT NULL,
`f_zip_code` varchar(10) DEFAULT NULL,
PRIMARY KEY (`f_enroll_id`),
KEY `f_date_enrolled` (`f_date_enrolled`),
KEY `f_last_name` (`f_last_name`),
KEY `f_first_name` (`f_first_name`),
KEY `f_dob` (`f_dob`),
KEY `f_gender` (`f_gender`)
ENGINE=InnoDB AUTO_INCREMENT=532 DEFAULT CHARSET=latin1 COMMENT='InnoDB free: 15360 kB';
Here is the schema for the dictionary table where I pull names from:
CREATE TABLE `dummy_female_first_name` (
`f_row_id` int(11) NOT NULL,
`f_name` varchar(25) NOT NULL,
PRIMARY KEY (`f_row_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
As I mentioned in my comment, I have found an alternate approach using the ORDER BY RAND() LIMIT 1 variation. But I am still curious as to what is going on that prevented my original method to fail. This is something that changed in the more recent mySQL version because it used to work.
Thanks again.
It is a much more expensive approach, but you can use:
SELECT f_enroll_id,
(SELECT f_name FROM dict.dummy_female_first_name fn ORDER BY rand() LIMIT 1)
FROM t_enroll;
You can make this more efficient using:
SELECT f_enroll_id,
(SELECT f_name
FROM dict.dummy_female_first_name fn
WHERE rand() < 0.01
ORDER BY rand() LIMIT 1
)
FROM t_enroll;
The where clause means that about 8 rows will filter through so the sorting will be much faster.
I need to fill the location field in users table with a country name from geoip table, depending on the user's IP.
Here is the tables' CREATE code.
CREATE TABLE `geoip` (
`IP_FROM` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
`IP_TO` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
`COUNTRY_NAME` VARCHAR(50) NOT NULL DEFAULT '',
PRIMARY KEY (`IP_FROM`, `IP_TO`)
)
ENGINE=InnoDB;
CREATE TABLE `users` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`login` VARCHAR(25) NOT NULL DEFAULT ''
`password` VARCHAR(64) NOT NULL DEFAULT ''
`ip` VARCHAR(128) NULL DEFAULT ''
`location` VARCHAR(128) NULL DEFAULT ''
PRIMARY KEY (`id`),
UNIQUE INDEX `login` (`login`),
INDEX `ip` (`ip`(10))
)
ENGINE=InnoDB
ROW_FORMAT=DYNAMIC;
The update query I try to run is:
UPDATE users u
SET u.location =
(SELECT COUNTRY_NAME FROM geoip WHERE INET_ATON(u.ip) BETWEEN IP_FROM AND IP_TO)
The problem is that this query refuses to use PRIMARY index on the geoip table, though it would speed things up a lot. The EXPLAIN gives me:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY u index NULL PRIMARY 4 NULL 1254395
2 DEPENDENT SUBQUERY geoip ALL PRIMARY NULL NULL NULL 62271 Using where
I've ended up converting geoip table to the MEMORY engine for this query only, but I'd like to know what was the right way to do it.
UPDATE
The DBMS I'm using is MariaDB 10.0.17, if it could make a difference.
Did you try to force the index like this
UPDATE users u
SET u.location =
(SELECT COUNTRY_NAME FROM geoip FORCE INDEX (PRIMARY)
WHERE INET_ATON(u.ip) BETWEEN IP_FROM AND IP_TO)
Also since ip can be NULL it probably messing with index optimiziation.
The IP ranges are non-overlapping, correct? You are not getting any IPv6 addresses? (The world ran out of IPv4 a couple of years ago.)
No, the index won't be used, or at least won't perform as well as you would like. So, I have devised a scheme to solve that. However it requires reformulating the schema and building a Stored Routine. See my IP-ranges blog; It has links to code for IPv4 and for IPv6. It will usually touch only one row in the table, not have to scan half the table.
Edit
MySQL does not know that there is only one range (from/to) that should match. So, it scans far too much. The difference between the two encodings of IP (INT UNSIGNED vs VARCHAR) makes it difficult to use a JOIN (instead of a subquery). Alas a JOIN would not be any better because it does not understand that there is exactly one match. Give this a try:
UPDATE users u
SET u.location =
( SELECT COUNTRY_NAME
FROM geoip
WHERE INET_ATON(u.ip) BETWEEN IP_FROM AND IP_TO
LIMIT 1 -- added
)
If that fails to significantly improve the speed, then change from VARCHAR to INT UNSIGNED in users and try again (without INET_ATON).
I am trying to generate a group query on a large table (more than 8 million rows). However I can reduce the need to group all the data by date. I have a view that captures that dates I require and this limits the query bu it's not much better.
Finally I need to join to another table to pick up a field.
I am showing the query, the create on the main table and the query explain below.
Main Query:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
INNER JOIN summary_max
ON pgi_raw_data.wsp_channel = summary_max.wsp_channel
AND pgi_raw_data.dated > summary_max.race_date
INNER JOIN pgi_accounts
ON pgi_raw_data.account = pgi_accounts.account
GROUP BY pgi_raw_data.event_id
ORDER BY NULL
The create table:
CREATE TABLE `pgi_raw_data` (
`event_id` char(25) NOT NULL DEFAULT '',
`wsp_channel` varchar(5) NOT NULL,
`dated` date NOT NULL,
`time` time DEFAULT NULL,
`program` varchar(5) NOT NULL,
`track` varchar(25) NOT NULL,
`raceno` tinyint(2) NOT NULL,
`detail` varchar(30) DEFAULT NULL,
`ticket` varchar(20) NOT NULL DEFAULT '',
`breed` varchar(12) NOT NULL,
`pool` varchar(10) NOT NULL,
`gross` decimal(11,2) NOT NULL,
`refunds` decimal(11,2) NOT NULL,
`handle` decimal(11,2) NOT NULL,
`payout` decimal(11,4) NOT NULL,
`rebate` decimal(11,4) NOT NULL,
`profit` decimal(11,4) NOT NULL,
`account` mediumint(10) NOT NULL,
PRIMARY KEY (`event_id`,`ticket`),
KEY `idx_account` (`account`),
KEY `idx_wspchannel` (`wsp_channel`,`dated`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This is my view for summary_max:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost` SQL SECURITY DEFINER VIEW
`summary_max` AS select `pgi_summary_tbl`.`wsp_channel` AS
`wsp_channel`,max(`pgi_summary_tbl`.`race_date`) AS `race_date`
from `pgi_summary_tbl` group by `pgi_summary_tbl`.`wsp
And also the evaluated query:
1 PRIMARY <derived2> ALL 6 Using temporary
1 PRIMARY pgi_raw_data ref idx_account,idx_wspchannel idx_wspchannel
7 summary_max.wsp_channel 470690 Using where
1 PRIMARY pgi_accounts ref PRIMARY PRIMARY 3 gf3data_momutech.pgi_raw_data.account 29 Using index
2 DERIVED pgi_summary_tbl ALL 42282 Using temporary; Using filesort
Any help on indexing would help.
At a minimum you need indexes on these fields:
pgi_raw_data.wsp_channel,
pgi_raw_data.dated,
pgi_raw_data.account
pgi_raw_data.event_id,
summary_max.wsp_channel,
summary_max.race_date,
pgi_accounts.account
The general (not always) rule is anything you are sorting, grouping, filtering or joining on should have an index.
Also: pgi_summary_tbl.wsp
Also, why the order by null?
The first thing is to be sure that you have indexes on pgi_summary_table(wsp_channel, race_date) and pgi_accounts(account). For this query, you don't need indexes on these columns in the raw data.
MySQL has a tendency to use indexes even when they are not the most efficient path. I would start by looking at the performance of the "full" query, without the joins:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
-- pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
GROUP BY pgi_raw_data.event_id
If this has better performance, you may have a situation where the indexes are working against you. The specific problem is called "thrashing". It occurs when a table is too bit to fit into memory. Often, the fastest way to deal with such a table is to just read the whole thing. Accessing the table through an index can result in an extra I/O operation for most of the rows.
If this works, then do the joins after the aggregate. Also, consider getting more memory, so the whole table will fit into memory.
Second, if you have to deal with this type of data, then partitioning the table by date may prove to be a very useful option. This will allow you to significantly reduce the overhead of reading the large table. You do have to be sure that the summary table can be read the same way.
I have a simple mysql query, but when I have a lot of records (currently 103,0000), the performance is really slow and it says it is using filesort, im not sure if this is why it is slow. Has anyone any suggestions to speed it up? or stop it using filesort?
MYSQL query :
SELECT *
FROM adverts
WHERE (price >= 0)
AND (status = 1)
AND (approved = 1)
ORDER BY date_updated DESC
LIMIT 19990, 10
The Explain results :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE adverts range price price 4 NULL 103854 Using where; Using filesort
Here is the adverts table and indexes:
CREATE TABLE `adverts` (
`advert_id` int(10) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL,
`type_id` tinyint(1) NOT NULL,
`breed_id` int(10) NOT NULL,
`advert_type` tinyint(1) NOT NULL,
`headline` varchar(50) NOT NULL,
`description` text NOT NULL,
`price` int(4) NOT NULL,
`postcode` varchar(7) NOT NULL,
`town` varchar(60) NOT NULL,
`county` varchar(60) NOT NULL,
`latitude` float NOT NULL,
`longitude` float NOT NULL,
`telephone1` varchar(15) NOT NULL,
`telephone2` varchar(15) NOT NULL,
`email` varchar(80) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`approved` tinyint(1) NOT NULL DEFAULT '0',
`date_created` datetime NOT NULL,
`date_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`expiry_date` datetime NOT NULL,
PRIMARY KEY (`advert_id`),
KEY `price` (`price`),
KEY `user` (`user_id`),
KEY `type_breed` (`type_id`,`breed_id`),
KEY `headline_keywords` (`headline`),
KEY `date_updated` (`date_updated`),
KEY `type_status_approved` (`advert_type`,`status`,`approved`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The problem is that MySQL only uses one index when executing the query. If you add a new index that uses the 3 fields in your WHERE clause, it will find the rows faster.
ALTER TABLE `adverts` ADD INDEX price_status_approved(`price`, `status`, `approved`);
According to the MySQL documentation ORDER BY Optimization:
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
The key used to fetch the rows is not the same as the one used in the ORDER BY.
This is what happens in your case.
As the output of EXPLAIN tells us, the optimizer uses the key price to find the rows. However, the ORDER BY is on the field date_updated which does not belong to the key price.
To find the rows faster AND sort the rows faster, you need to add an index that contains all the fields used in the WHERE and in the ORDER BY clauses:
ALTER TABLE `adverts` ADD INDEX status_approved_date_updated(`status`, `approved`, `date_updated`);
The field used for sorting must be in the last position in the index. It is useless to include price in the index, because the condition used in the query will return a range of values.
If EXPLAIN still shows that it is using filesort, you may try forcing MySQL to use an index you choose:
SELECT adverts.*
FROM adverts
FORCE INDEX(status_approved_date_updated)
WHERE price >= 0
AND adverts.status = 1
AND adverts.approved = 1
ORDER BY date_updated DESC
LIMIT 19990, 10
It is usually not necessary to force an index, because the MySQL optimizer most often does the correct choice. But sometimes it makes a bad choice, or not the best choice. You will need to run some tests to see if it improves performance or not.
Remove the ticks around the '0' - it currently may prevent using the index but I am not sure.
Nevertheless it is better style since price is int type and not a character column.
SELECT adverts .*
FROM adverts
WHERE (
price >= 0
)
AND (
adverts.status = 1
)
AND (
adverts.approved = 1
)
ORDER BY date_updated DESC
LIMIT 19990 , 10
MySQL does not make use of the key date_updated for the sorting but just uses the price key as it is used in the WHERE clause. You could try to to use index hints:
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Add something like
USE KEY FOR ORDER BY (date_updated)
I have two suggestions. First, remove the quotes around the zero in your where clause. That line should be:
price >= 0
Second, create this index:
CREATE INDEX `helper` ON `adverts`(`status`,`approved`,`price`,`date_created`);
This should allow MySQL to find the 10 rows specified by your LIMIT clause by using only the index. Filesort itself is not a bad thing... the number of rows that need to be processed is.
Your WHERE condition uses price, status, approved to select, and then date_updated is used to sort.
So you need a single index with those fields; I'd suggest indexing on approved, status, price and date_updated, in this order.
The general rule is placing WHERE equalities first, then ranges (more than, less or equal, between, etc), and sorting fields last. (Note that leaving one field out might make the index less usable, or even unusable, for this purpose).
CREATE INDEX advert_ndx ON adverts (approved, status, price, date_updated);
This way, access to the table data is only needed after LIMIT has worked its magic, and you will slow-retrieve only a small number of records.
I'd also remove any unneeded indexes, which would speed up INSERTs and UPDATEs.
Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.