Mysql query optimisation - sum() function - mysql

I've created the following table:
CREATE TABLE `clicks_summ` (
`dt` INT(7) UNSIGNED NOT NULL,
`banner` SMALLINT(6) UNSIGNED NOT NULL,
`client` SMALLINT(6) UNSIGNED NOT NULL,
`channel` SMALLINT(6) UNSIGNED NOT NULL,
`cnt` INT(11) UNSIGNED NOT NULL,
`lpid` INT(11) NULL DEFAULT NULL,
UNIQUE INDEX `dt` (`dt`, `banner`, `client`, `channel`, `lpid`),
INDEX `banner` (`banner`),
INDEX `channel` (`channel`),
INDEX `client` (`client`),
INDEX `lpid` (`lpid`),
INDEX `cnt` (`cnt`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB;
and i am using following query to fetch rows/records from this table:
select client, sum(cnt) cnt
from clicks_summ cs
group by client;
and it's awful! It takes about a second to perform this query. EXPLAIN shows me
So, the question is: how I can speed up this query? I've tried indexing this table on different fields without any reasonable success. Now there are 331 036 rows in this table, I guess, is not so big.

Try craete INDEX client_cnt(client, cnt). Another way to make query faster is upgrade your hardware:)

one cool thing
if you always have 5 col in your where clause when a grouped index of 5 col will outperform 5 individual indexes ;)

Related

Why the index on my MySQL table is not being used?

I have a table in MySQL (InnoDB engine) with 100M records. Structure is as below:
CREATE TABLE LEDGER_AGR (
ID BIGINT(20) NOT NULL AUTO_INCREMENT,
`Booking` int(11) NOT NULL,
`LType` varchar(5) NOT NULL,
`PType` varchar(5) NOT NULL,
`FType` varchar(5) DEFAULT NULL,
`TType` varchar(10) DEFAULT NULL,
`AccountCode` varchar(55) DEFAULT NULL,
`AgAccountId` int(11) DEFAULT '0',
`TransactionDate` date NOT NULL,
`DebitAmt` decimal(37,6) DEFAULT '0.000000',
`CreditAmt` decimal(37,6) DEFAULT '0.000000',
KEY `TRANSACTION_DATE` (`TransactionDate`)
)
ENGINE=InnoDB;
When I am doing:
EXPLAIN
SELECT * FROM LEDGER_AGR
WHERE TransactionDate >= '2000-08-01'
AND TransactionDate <= '2017-08-01'
It is not using TRANSACTION_DATE index. But when I am doing:
EXPLAIN
SELECT * FROM LEDGER_AGR
WHERE TransactionDate = '2000-08-01'
it is using TRANSACTION_DATE index. Could someone explain please?
Range query #1 has poor selectivity. Equality query #2 has excellent selectivity. The optimizer is very likely to choose the index access path when result rows will be < 1% of total rows in table. The backend optimizer is unlikely to prefer the index when result rows will be a large fraction of the total, for example a half or a quarter of all rows.
A range of '2000-08-01' thru '2000-08-03' would likely exploit the index.
cf: mysql not using index?

Strange index behavior mysql

I usually pride myself to be a database pro but I can't really wrap my head around this behavior. I hope someone can explain how this is working.
I have two mysql tables orders:
CREATE TABLE `orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL,
`total` decimal(7,2) NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`voucher_code` varchar(127) DEFAULT NULL,
`voucher_id` int(11) unsigned DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
`billing_address_id` int(11) unsigned NOT NULL,
`shipping_address_id` int(11) unsigned NOT NULL,
`reference_id` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reference_id` (`reference_id`),
KEY `address_id` (`billing_address_id`)
) ENGINE=InnoDB AUTO_INCREMENT=168067 DEFAULT CHARSET=latin1;
and addresses:
CREATE TABLE `addresses` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` tinyint(4) DEFAULT NULL,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
`street` varchar(255) NOT NULL,
`street2` varchar(255) DEFAULT NULL,
`company_name` varchar(255) DEFAULT NULL,
`city` varchar(45) NOT NULL,
`postcode` varchar(45) DEFAULT NULL,
`region` varchar(45) DEFAULT NULL,
`country` varchar(45) NOT NULL,
`phone` varchar(45) DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_addresses_users1_idx` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=95277 DEFAULT CHARSET=latin1;
Now as you can see I have created an index inside the orders table for the billing_address_id called address_id that should match with the address id.
This is the query I am trying to run:
SELECT
o.id, a.first_name, a.last_name, o.total, o.date_created
FROM
orders o USE INDEX FOR JOIN (PRIMARY) JOIN
addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
If I run the query without any index specification it will pickup and use the address_id index which I would expect be the fastest way to match the two tables.
Strangely enough with the 'address_id' index the query runs in 2 seconds.
If i use the normal 'PRIMARY' index which works on the order id it takes 0.000 seconds.
This is bugging me out. I thought I was supposed to create indexes to expedite the joining process between tables.
If I run EXPLAIN on the two queries I get:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE a ALL PRIMARY 95234 100.00 Using temporary; Using filesort
1 SIMPLE o ref address_id address_id 4 my_basket.a.id 1 100.00
With the index:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o USE INDEX FOR
JOIN (PRIMARY)
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE o index PRIMARY 4 50 332632.00
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 my_basket.o.billing_address_id 1 100.00
Thank you for finding the time to answer this question.
For ORDER BY ... LIMIT queries it will often be beneficial to use a query execution plan that avoids sorting. This is not necessarily because the sorting is expensive, but because it makes it possible to stop the query execution once the number of requested rows (here 50) are found.
In your case, if one starts with table a, the full join result will have to be generated before selecting the "top" 50 rows. If you start with scanning table o using the PRIMARY index, the join result will be sorted on o.id, and the join execution can stop once 50 rows have been found.
The cost model used to select between the two approaches has been improved since MySQL 5.6. I suggest you try out MySQL 5.7 to see if the MySQL optimizer is now able to select the most optimal plan.
I'm surprised that the two queries even compile -- ORDER BY id is ambiguous since each table has a different id.
When doing a JOIN, always qualify all columns.
Meanwhile, remove the USE INDEX.

Optimizing aggregation on MySQL Table with 850 million rows

I have a query that I'm using to summarize via aggregations.
The table is called 'connections' and has about 843 million rows.
CREATE TABLE `connections` (
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
KEY `app_id` (`bid`),
KEY `time_started_dt` (`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
When I try to run a query, such as the one below, it takes over 10 hours and I end up killing it. Does anyone see any mistakes that I'm making, of have any suggestions as to how I could optimize the query?
SELECT
app_id,
MAX(time_started_dt),
MIN(time_started_dt),
COUNT(*)
FROM
connections
GROUP BY
app_id
I suggest you create a composite index on (app_id, time_started_dt):
ALTER TABLE connections ADD INDEX(app_id, time_started_dt)
To get that query to perform, you really need a suitable covering index, with app_id as the leading column, e.g.
CREATE INDEX `connections_IX1` ON `connections` (`app_id`,` time_start_dt`);
NOTE: creating the index may take hours, and the operation will prevent insert/update/delete to the table while it is running.
An EXPLAIN will show the proposed execution plan for your query. With the covering index in place, you'll see "Using index" in the plan. (A "covering index" is an index that can be used by MySQL to satisfy a query without having to access the underlying table. That is, the query can be satisfied entirely from the index.)
With the large number of rows in this table, you may also want to consider partitioning.
I have tried your query on randomly generated data (around 1 million rows). Adding PRIMATY KEY will improve performance of your query by 10%.
As already suggested by other people composite index should be added to the table. Index time_started_dt is useless.
CREATE TABLE `connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `composite_idx` (`app_id`,`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

mysql join not use index for 'between' operator

So basically I have three tables:
CREATE TABLE `cdIPAddressToLocation` (
`IPADDR_FROM` int(10) unsigned NOT NULL COMMENT 'Low end of the IP Address block',
`IPADDR_TO` int(10) unsigned NOT NULL COMMENT 'High end of the IP Address block',
`IPLOCID` int(10) unsigned NOT NULL COMMENT 'The Location ID for the IP Address range',
PRIMARY KEY (`IPADDR_TO`),
KEY `Index_2` USING BTREE (`IPLOCID`),
KEY `Index_3` USING BTREE (`IPADDR_FROM`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `cdIPLocation` (
`IPLOCID` int(10) unsigned NOT NULL default '0',
`Country` varchar(4) default NULL,
`Region` int(10) unsigned default NULL,
`City` varchar(90) default NULL,
`PostalCode` varchar(10) default NULL,
`Latitude` float NOT NULL,
`Longitude` float NOT NULL,
`MetroCode` varchar(4) default NULL,
`AreaCode` varchar(4) default NULL,
`State` varchar(45) default NULL,
`Continent` varchar(10) default NULL,
PRIMARY KEY (`IPLOCID`)
) ENGINE=MyISAM AUTO_INCREMENT=218611 DEFAULT CHARSET=latin1;
and
CREATE TABLE 'data'{
'IP' varchar(50)
'SCORE' int
}
My task is to join these three tables and find the location data for given IP address.
My query is as follows:
select
t.ip,
l.Country,
l.State,
l.City,
l.PostalCode,
l.Latitude,
l.Longitude,
t.score
from
(select
ip, inet_aton(ip) ipv, score
from
data
order by score desc
limit 5) t
join
cdIPAddressToLocation a ON t.ipv between a.IPADDR_FROM and a.IPADDR_TO
join
cdIPLocation l ON l.IPLOCID = a.IPLOCID
While this query works, it's very very slow, it took about 100 seconds to return the result on my dev box.
I'm using mysql 5.1, the cdIPAddressToLocation has 5.9 million rows and cdIPLocation table has about 0.3 million rows.
When I check the execution plan, I found it's not using any index in the table 'cdIPAddressToLocation', so for each row in the 'data' table it would do a full table scan against table 'cdIPAddressToLocation'.
It is very weird to me. I mean since there are already two indexes in table 'cdIPAddressToLocation' on columns 'IPADDR_FROM' and 'IPADDR_TO', the execution plan should exploit the index to improve the performance, but why it didn't use them.
Or was there something wrong with my query?
Please help, thanks a lot.
Have you tried using a composite index on the columns cdIPAddressToLocation.IPADDR_FROM and cdIPAddressToLocation.IPADDR_TO?
Multiple-Column Indexes

optimize query (2 simple left joins)

SELECT fcat.id,fcat.title,fcat.description,
count(DISTINCT ftopic.id) as number_topics,
count(DISTINCT fpost.id) as number_posts FROM fcat
LEFT JOIN ftopic ON fcat.id=ftopic.cat_id
LEFT JOIN fpost ON ftopic.id=fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
index on ftopic_cat_id, fpost.topic_id, fcat.ord
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fcat ALL PRIMARY NULL NULL NULL 11 Using temporary; Using filesort
1 SIMPLE ftopic ref PRIMARY,cat_id_2 cat_id_2 4 bloki.fcat.id 72
1 SIMPLE fpost ref topic_id_2 topic_id_2 4 bloki.ftopic.id 245
fcat - 11 rows,
ftopic - 1106 rows,
fpost - 363000 rows
Query takes 4,2 sec
TABLES:
CREATE TABLE IF NOT EXISTS `fcat` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(250) collate utf8_unicode_ci NOT NULL,
`description` varchar(250) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`ord` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ord` (`ord`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=12 ;
CREATE TABLE IF NOT EXISTS `ftopic` (
`id` int(11) NOT NULL auto_increment,
`cat_id` int(11) NOT NULL,
`title` varchar(100) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP,
`lastname` varchar(200) collate utf8_unicode_ci NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`closed` tinyint(4) NOT NULL default '0',
`views` int(11) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `cat_id_2` (`cat_id`,`updated`,`visible`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1116 ;
CREATE TABLE IF NOT EXISTS `fpost` (
`id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
`pet_id` int(11) NOT NULL,
`content` text collate utf8_unicode_ci NOT NULL,
`imageName` varchar(300) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`reply_id` int(11) NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`md5` varchar(100) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `topic_id_2` (`topic_id`,`created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=390971 ;
Thanks,
hamlet
you need to create a key with both fcat.id, fcat.ord
Bold rewrite
This code is not functionally identical, but...
Because you want to know about distinct ftopic.id and fpost.id I'm going to be bold and suggest two INNER JOIN's instead of LEFT JOIN's.
Then because the two id's are autoincrementing they will no longer repeat, so you can drop the distinct.
SELECT
fcat.id
, fcat.title
, fcat.description
, count(ftopic.id) as number_topics
, count(fpost.id) as number_posts
FROM fcat
INNER JOIN ftopic ON fcat.id = ftopic.cat_id
INNER JOIN fpost ON ftopic.id = fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
It depends on your data if this is what you are looking for, but I'm guessing it will be faster.
All your indexes seem to be in order though.
MySQL does not use indexes for small sample sizes!
Note that the explain list that MySQL only has 11 rows to consider for fcat. This is not enough for MySQL to really start worrying about indexes, so it doesn't.
Because going to the index for small row-counts slows things down.
MySQL is trying to speed things up so it chooses not to use the index, this confuses a lot of people because we are trained so hard on the index. Small sample sizes don't give good explains!
Increase the size of the test data so MySQL has more rows to consider and you should start seeing the index being used.
Common misconceptions about force index
Force index does not force MySQL to use an index as such.
It hints at MySQL to use a different index from the one it might naturally use and it pushes MySQL into using an index by setting a very high cost on a table scan.
(In your case MySQL is not using a table scan, so force index has no effect)
MySQL (same most other DBMS's on the planet) has a very strong urge to use indexes, so if it doesn't (use any) that's because using no index at all is faster.
How does MySQL know which index to use
One of the parameters the query optimizer uses is the stored cardinality of the indexes.
Over time these values change... But studying the table takes time, so MySQL doesn't do that unless you tell it to.
Another parameter that affects index selection is the predicted disk-seek-times that MySQL expects to encounter when performing the query.
Tips to improve index usage
ANALYZE TABLE will instruct MySQL to re-evaluate the indexes and update its key distribution (cardinality). (consider running it daily/weekly in a cron job)
SHOW INDEX FROM table will display the key distribution.
MyISAM tables and indexes fragment over time. Use OPTIMIZE TABLE to unfragment the tables and recreate the indexes.
FORCE/USE/IGNORE INDEX limits the options MySQL's query optimizer has to perform your query. Only consider it on complex queries.
Time the effect of your meddling with indexes on a regular basis. A forced index that speeds up your query today might slow it down tomorrow because the underlying data has changed.