Mysql Queries in big data table - mysql

I have problem with my mysql database table. I have more then 20 millions rows in table. Table structure shown below. Main problem is that queries take really long time to execute (some queries take more then 20 second). I use indexes where i can, however many queries use date range and with date range my indexes don't work. Also in queries i use almost every column. What i need to change to my data table, to improve efficiency?
`history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`barcode` varchar(100) DEFAULT NULL,
`bag` varchar(100) DEFAULT NULL,
`action` int(10) unsigned DEFAULT NULL,
`place` int(10) unsigned DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`old_price` decimal(10,2) DEFAULT NULL,
`user` int(11) DEFAULT NULL,
`amount` int(10) DEFAULT NULL,
`rotation` int(10) unsigned DEFAULT NULL,
`discount` decimal(10,2) DEFAULT NULL,
`discount_type` tinyint(2) unsigned DEFAULT NULL,
`original` int(10) unsigned DEFAULT NULL,
`was_in_shop` int(10) unsigned DEFAULT NULL,
`cate` int(10) unsigned DEFAULT NULL COMMENT 'grupe',
`sub_cate` int(10) unsigned DEFAULT NULL,
`comment` varchar(255) DEFAULT NULL,
`helper` varchar(255) DEFAULT NULL,
`ywd` varchar(255) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL
)
PRIMARY KEY (`id`),
KEY `barcode` (`barcode`) USING BTREE,
KEY `action` (`action`) USING BTREE,
KEY `original` (`original`) USING BTREE,
KEY `created_at` (`created_at`) USING BTREE,
KEY `bag` (`bag`) USING BTREE
ENGINE=InnoDB
Some of my queries:
select SUM(amount) as amount,
SUM(comment) as price,
cate
from `history`
where ( `action` = '4'
and `place` = '28'
and `created_at` >= '2018-04-01 00:00:00'
and `created_at` <= '2018-04-30 23:59:59'
)
and `history`.`deleted_at` is null
group by `cate`;
select cate,
SUM(amount) AS kiekis,
SUM(IF(discount>0,(price*amount)-discount,(price*amount))) AS suma,
SUM(IF(discount>0,IF(discount_type=1,(discount*price)/100,discount),0)) AS nuolaida
from `history`
where ( `history`.`action` = '4'
and `history`.`created_at` >= '2018-01-01 00:00:00'
and `history`.`created_at` <= '2018-01-23 23:59:59'
)
and LENGTH(barcode) > 7
and `history`.`deleted_at` is null
group by `cate`;

Your first query is better written as:
select SUM(h.amount) as amount,
SUM(h.comment) as price,
h.cate
from history h
where h.action = 4 and
h.place = 28 and
h.created_at >= '2018-04-01' and
h.created_at < '2018-05-01' and
h.deleted_at is null
group by h.cate;
Why?
place and action are numbers. The comparison should be to a number. Mixing types can prevent the use of indexes.
The time component is not useful for the date comparison.
Qualifying all columns names is just a good idea.
Then, for this query, a reasonable index is history(action, place, created_at, deleted_at).
So, I would start with multi-column indexes.
If you continue to have performance issues, you should then consider partitioning the data based on the created_at date.

INDEX(a), INDEX(b) serves some purposes, but the "composite" INDEX(a,b) better serves some queries.
where ( `action` = '4'
and `place` = '28'
and `created_at` >= '2018-04-01 00:00:00'
and `created_at` <= '2018-04-30 23:59:59'
)
and `history`.`deleted_at` is null
Needs
INDEX(action, place, -- first, but in either order
deleted_at,
created_at) -- last
I prefer to write the date range thus:
and `history`.`created_at` >= '2018-04-01'
and `history`.`created_at` < '2018-04-01' + INTERVAL 1 MONTH
It's a lot easier than dealing with leap year, end of year, etc. And it works 'correctly' for DATE, DATETIME, DATETIME(6), TIMESTAMP, and TIMESTAMP(6).
For this
where ( `history`.`action` = '4'
and `history`.`created_at` >= '2018-01-01 00:00:00'
and `history`.`created_at` <= '2018-01-23 23:59:59'
)
and LENGTH(barcode) > 7
and `history`.`deleted_at` is null
I would try this as the most likely:
INDEX(action, deleted_at, created_at) -- in this order
Do not have separate tables for separate years. If you will be deleting old data, then consider PARTITION BY RANGE(TO_DAYS(...)) in order to get the speed of DROP PARTITION. (But that is another discussion.)

If I was in your situation I would consider a paged database name. By this I mean have multiple history_X tables where X is an int related to the content.
Since this is a history table is it possible to include part of the date in the name?
You said that you use ranges to search for the data, so if you were to use year in the table name you could have
history_2014
history_2015
history_2016
history_2017
history_2018
etc.
Then you could search with the table that applies to your date range.
If you need date from a range that spans to tables then you could use a UNION query to bridge the 2 result sets into one.

Related

What is the best index for this two query?

I want to avoid redundant index, so what is the best composite index for these two query? Based on my understanding, these two query cannot have the same composite index since one need country, the other one need product_id, but if I make the index as below, will it be redundant index, and effect the DB performance?
combine merchant_id, created_at and product_id
combine merchant_id, created_at and country
Query 1
SELECT * from shop_order
WHERE shop_order.merchant_id = ?
AND shop_order.created_at >= TIMESTAMP(?)
AND shop_order.created_at <= TIMESTAMP(?)
AND shop_order.product_id = ?) AS mytable
WHERE product_id IS NOT NULL GROUP BY product_id, title;
Query 2
SELECT COALESCE(SUM(total_price_usd),0) AS revenue,
COUNT(*) as total_order, COALESCE(province, 'Unknown') AS name
FROM shop_order
WHERE DATE(created_at) >= '2021-02-08 13:37:42'
AND DATE(created_at) <= '2021-02-14 22:44:13'
AND merchant_id IN (18,19,20,1)
AND country = 'Malaysia' GROUP BY province;
Table structure
CREATE TABLE `shop_order` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`merchant_id` bigint(20) DEFAULT NULL,
`order_id` bigint(20) NOT NULL,
`customer_id` bigint(20) DEFAULT NULL,
`customer_orders_count` varchar(45) DEFAULT NULL,
`customer_total_spent` varchar(45) DEFAULT NULL,
`customer_email` varchar(100) DEFAULT NULL,
`customer_last_order_name` varchar(45) DEFAULT NULL,
`currency` varchar(10) NOT NULL,
`total_price` decimal(20,8) NOT NULL,
`subtotal_price` decimal(20,8) NOT NULL,
`transaction_fee` decimal(20,8) DEFAULT NULL,
`total_discount` decimal(20,8) DEFAULT '0.00000000',
`shipping_fee` decimal(20,8) DEFAULT '0.00000000',
`total_price_usd` decimal(20,8) DEFAULT NULL,
`transaction_fee_usd` decimal(20,8) DEFAULT NULL,
`country` varchar(50) DEFAULT NULL,
`province` varchar(45) DEFAULT NULL,
`processed_at` datetime DEFAULT NULL,
`refunds` json DEFAULT NULL,
`ffm_status` varchar(50) DEFAULT NULL,
`gateway` varchar(45) DEFAULT NULL,
`confirmed` tinyint(1) DEFAULT NULL,
`cancelled_at` datetime DEFAULT NULL,
`cancel_reason` varchar(100) DEFAULT NULL,
`created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`order_number` bigint(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`financial_status` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `shop_order_unique` (`merchant_id`,`order_id`),
KEY `merchant_id` (`merchant_id`),
KEY `combine_idx1` (`country`,`merchant_id`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2237 DEFAULT CHARSET=utf8mb4;
Please help me
Query 1:
INDEX(merchant_id, product_id, -- Put columns for "=" tests first (in any order)
created_at) -- then range
Query 2. First, avoid hiding created_at in a function call (DATE()); it prevents using it in an index.
INDEX(country, -- "="
merchant_id, -- IN
created_at) -- range (after removing DATE)
You are correct in saying that you need separate indexes for those queries. And possibly some of your existing indexes are needed for other queries.
Also, you already have a redundant index. Drop KEY merchant_id (merchant_id), -- You have ate least one other index starting with merchant_id.
Having extra indexes is only a minor performance drag. And the hit is during INSERT or if you UPDATE any column in an index. Generally, the benefit of the 'right' index to a SELECT outweighs the hit to the writes.
Having multiple unique indexes is somewhat a burden. Do you really need id since you have a "natural" PK made up of those two columns? Check to see if other tables need to join on id.
Consider shrinking many of the datasizes. BIGINT takes 8 bytes and has a range that is rarely needed. decimal(20,8) takes 10 bytes and allows up to a trillion dollars; this seems excessive, too. Is customer_orders_count a number?

Reduce MySQL Query Runtime

I have a query like this:
SELECT DISTINCT `cr`.`idCustomer`, `rbase`.`id`
FROM `customers` `t`
JOIN `customersregion` `cr` ON t.idCustomer = cr.idCustomer
and cr.isDeleted = 0
JOIN `calendaritems` `rbase` ON rbase.idAgentsRegion = cr.idRegion
and rbase.isDeleted = 0
where (
(rbase.startDate <= '2020-07-06 00:00:00' and rbase.endDate >= '2020-07-06 00:00:00') or
(rbase.startDate <= '2020-07-28 00:00:00' and rbase.endDate >= '2020-07-28 00:00:00') or
(rbase.startDate >= '2020-07-06 00:00:00' and rbase.startDate <= '2020-07-28 23:59:59') or
(rbase.endDate >= '2020-07-06 00:00:00' and rbase.endDate <= '2020-07-28 23:59:59')
)
Database: MySQL
Customers: 132,000 row
CustomersRegion: 1,754,000 row
CalendarItems: 3,838,000 row (with conditions reduce to 555,000 row)
t.idCustomer & cr.idCustomer & cr.isDeleted & rbase.idAgentsRegion & cr.idRegion & rbase.isDeleted are indexes
This query runtime is about 100 seconds and i want to reduce the runtime of this query
I can't have limit on rows or have another condition in tables
Can you help me?
Thank you
Explain Query:
Customers DDL:
create table customers
(
idCustomer int auto_increment
primary key,
CustomerName varchar(255) not null comment 'نام فروشگاه',
FirstName varchar(60) null comment 'نام رابط',
LastName varchar(60) null comment 'نام مشتري',
idUser int null comment '!#dont show',
idPayment int null,
idCompany int default 0 not null,
LatitudePoint decimal(18, 12) default 0.000000000000 null comment 'gpslat',
LongitudePoint decimal(18, 12) default 0.000000000000 null comment 'gpslongs',
LastOrderDate datetime default '0000-00-00 00:00:00' null comment 'lastorderdate',
VisitPeriod int default 0 null comment 'visitperiod',
LastVisit datetime default '0000-00-00 00:00:00' null comment 'LastVisitDate',
LastNoOrderDate datetime default '0000-00-00 00:00:00' null,
Credit decimal(20, 4) default 0.0000 null comment 'credit',
RemainCredit decimal(20, 4) default 0.0000 null comment 'remaincredit',
Balance decimal(20, 4) default 0.0000 null comment '!#dont show',
RFID varchar(60) null comment 'rfid',
ReturnCheck tinyint(1) default 0 null comment '!#dont show',
AccountStatus tinyint(1) default 0 null comment 'accountstatus',
FaxNumber varchar(20) null,
LiquidationDate date default '0000-00-00' null comment '!#dont show',
EldestDue date default '0000-00-00' null comment '!#dont show',
MaturityDate date default '0000-00-00' null comment '!#dont show',
PriceKind int null,
isDefault tinyint(1) default 0 not null comment '!#dont show',
TimeStamp timestamp default current_timestamp() not null on update current_timestamp(),
isDeleted tinyint(1) default 0 not null,
Address varchar(255) null,
PhoneNumber varchar(60) null,
MobileNumber varchar(60) null,
CustomerErpCode varchar(60) null comment '!#dont show',
StoreType int null,
country varchar(255) null,
state varchar(255) null,
City varchar(30) null,
Region varchar(30) null,
idUserCreator int null,
idBranche int null,
idTagsinfo int null,
shop_id int null,
shop_id_address int null,
lastActivityDate datetime null,
lastActivityType tinyint(1) null,
duplicateOf int null,
isConfirmed tinyint(1) default 2 not null comment '0:rejected - 1:confirmed - 2:notChecked',
Status tinyint(1) default 1 not null,
createDate datetime null,
idProcess int null comment 'نیازی نیست به اینکه حتما پروسه داشته باشد',
idUserConfirmer int null comment 'this is refered to agents table',
nextDate datetime null,
prevDate datetime null,
idImage int null,
idColor int null,
idRate int null,
LastImageDate datetime null,
LastOrderAgentName varchar(255) null,
LastVisitAgentName varchar(255) null,
LastNoOrderAgentName varchar(255) null,
LastImageAgentName varchar(255) null,
LastOrderIdAgent int null,
LastVisitIdAgent int null,
LastNoOrderIdAgent int null,
LastImageIdAgent int null,
isSaleActive tinyint(1) default 1 null,
isReturnActive tinyint(1) default 1 null,
alley varchar(256) null,
street varchar(256) null,
plaque varchar(256) null,
secondAddress varchar(255) null,
description varchar(255) null,
appType varchar(50) default 'iorder' not null,
idPipeline varchar(255) default '0' null,
constraint shop_id
unique (shop_id),
constraint shop_id_address
unique (shop_id_address),
constraint ux_customererp
unique (CustomerErpCode),
constraint customers_ibfk_1
foreign key (idBranche) references branches (idBranche)
on update set null on delete set null,
constraint customers_ibfk_2
foreign key (idTagsinfo) references tagsinfo (idTag)
on update set null on delete set null,
constraint customers_ibfk_3
foreign key (idRate) references rates (idRate)
on update set null on delete set null,
constraint customers_ibfk_4
foreign key (idColor) references colors (idColor)
on update set null on delete set null,
constraint customers_ibfk_5
foreign key (idRate) references rates (idRate)
on update set null on delete set null,
constraint customers_ibfk_6
foreign key (idColor) references colors (idColor)
on update set null on delete set null,
constraint fk_customer_agents
foreign key (idUser) references agents (idAgents)
on update set null on delete set null,
constraint fk_customer_paymant
foreign key (idPayment) references payment (idPayment),
constraint fk_customer_pricelist
foreign key (PriceKind) references pricelist (idPriceList),
constraint fk_customer_storeinfo
foreign key (StoreType) references storesinfo (idStore)
)
charset = utf8;
create index fk_customer_agents_idx
on customers (idUser);
create index fk_customer_paymant_idx
on customers (idPayment);
create index fk_customer_pricelist_idx
on customers (PriceKind);
create index fk_customer_storeinfo_idx
on customers (StoreType);
create index idBranche
on customers (idBranche);
create index idColor
on customers (idColor);
create index idProcess
on customers (idProcess);
create index idRate
on customers (idRate);
create index idTagsinfo
on customers (idTagsinfo);
create index idx_isdeleted_customername
on customers (isDeleted, CustomerName);
create index isdeleted_lat_lng
on customers (isDeleted, LatitudePoint, LongitudePoint);
create index isdeleted_status_isconfirmed
on customers (isDeleted, Status, isConfirmed);
create index lat_lng
on customers (LatitudePoint, LongitudePoint);
CalendarItems DDL:
create table calendaritems
(
id int auto_increment
primary key,
TimeStamp timestamp default current_timestamp() not null on update current_timestamp(),
isDone tinyint(1) null,
isDeleted tinyint(1) default 0 not null,
subject varchar(255) null,
startDate datetime not null,
endDate datetime not null,
isAllDayEvent tinyint(1) default 1 null,
message varchar(255) null,
color varchar(200) null,
rMessage varchar(255) null,
rTime datetime null,
rLocationLat decimal(18, 12) null,
rLocationLong decimal(18, 12) null,
idAgent int not null,
idCustomer int null,
idVisitPath int null,
isFinal tinyint(1) null,
idUserCreator int not null,
idAgentsRegion int null,
type int(5) default 1 not null,
systemFill tinyint(1) default 0 not null,
createDate datetime null,
reqUp tinyint(1) default 0 not null,
dependOn int null,
idPlan int null comment 'to keep track of customer types of a region inside a plan',
idPlanTour int null,
startTime time null,
endTime time null,
constraint calendaritems_ibfk_agents
foreign key (idAgent) references agents (idAgents),
constraint calendaritems_ibfk_agents2
foreign key (idUserCreator) references agents (idAgents),
constraint calendaritems_ibfk_customers
foreign key (idCustomer) references customers (idCustomer)
on delete set null
)
charset = utf8;
create index `Index 10`
on calendaritems (isDeleted, idAgent, startDate, idCustomer);
create index `Index 14`
on calendaritems (isDeleted, idAgent, idAgentsRegion, idPlan, startDate, endDate);
create index `Index 7`
on calendaritems (startDate);
create index `Index 8`
on calendaritems (isDeleted, idAgent, startDate, idVisitPath);
create index `Index 9`
on calendaritems (isDeleted, idAgent, startDate, idAgentsRegion);
create index createDate
on calendaritems (createDate);
create index idAgent
on calendaritems (idAgent);
create index idAgentsRegion
on calendaritems (idAgentsRegion);
create index idCustomer
on calendaritems (idCustomer);
create index idUserCreator
on calendaritems (idUserCreator);
create index idVisitPath
on calendaritems (idVisitPath);
create index reqUp
on calendaritems (reqUp);
create index `systemFill-startDate-idAgent-idPlan`
on calendaritems (systemFill, startDate, idAgent, idPlan);
CustomersRegion DDL:
create table customersregion
(
idCustomer int not null,
idRegion int not null,
idCompany int default 0 null,
isDeleted tinyint(1) default 0 null,
TimeStamp timestamp default current_timestamp() null on update current_timestamp(),
ERPCode varchar(255) default '' null,
createDate datetime null,
primary key (idCustomer, idRegion),
constraint customersregion_ibfk_1
foreign key (idCustomer) references customers (idCustomer)
on update cascade on delete cascade,
constraint customersregion_ibfk_2
foreign key (idRegion) references region (idRegion)
on update cascade on delete cascade
)
charset = utf8;
create index idRegion
on customersregion (idRegion);
create index isdeleted_idregion_idcustomer
on customersregion (isDeleted, idRegion, idCustomer);
The EXPLAIN plan shows that the first step taken is to scan the calendaritems table ("rbase"), scanning a total of estimated 1.6 million rows.
There is an index being used, but it's not really fitting as it has too many extra columns not really used. A better index would be one consisting of (isDeleted, startDate, endDate, idAgentsRegion), in that order, the first three columns of that would be perfect for the first three OR parts of the WHERE condition, but unfortunately not for the last one.
The idAgentsRegion column is not needed for the WHERE or JOIN conditions at all, by adding it you make the index a "covering" one though, so that all data needed can be retrieved from the index alone, without extra lookup steps needed for actual table rows.
What I would do in this case would be to have two indexes, one on (isDeleted, startDate, endDate, idAgentsRegion) and one on just (isDeleted, startDate, endDate, idAgentsRegion), and then split the query into two separate ones combined by UNION:
SELECT DISTINCT `cr`.`idCustomer`, `rbase`.`id`
FROM `customers` `t`
JOIN `customersregion` `cr` ON t.idCustomer = cr.idCustomer and cr.isDeleted = 0
JOIN `calendaritems` `rbase` ON rbase.idAgentsRegion = cr.idRegion and rbase.isDeleted = 0
where (
(rbase.startDate <= '2020-07-06 00:00:00' and rbase.endDate >= '2020-07-06 00:00:00') or
(rbase.startDate <= '2020-07-28 00:00:00' and rbase.endDate >= '2020-07-28 00:00:00') or
(rbase.startDate >= '2020-07-06 00:00:00' and rbase.startDate <= '2020-07-28 23:59:59')
)
UNION
SELECT DISTINCT `cr`.`idCustomer`, `rbase`.`id`
FROM `customers` `t`
JOIN `customersregion` `cr` ON t.idCustomer = cr.idCustomer and cr.isDeleted = 0
JOIN `calendaritems` `rbase` ON rbase.idAgentsRegion = cr.idRegion and rbase.isDeleted = 0
where (rbase.endDate >= '2020-07-06 00:00:00' and rbase.endDate <= '2020-07-28 23:59:59')
For the first part the first index is perfect, for the 2nd part the 2nd index is perfect, leading to much smaller index range scans, and in the end the results just need to be combined and duplicates removed.
First of all, there is a foreign key relation from customersreqion to customers, so you don't need the customers table in your query. You don't select anything from it and the foreign key relation already ensures that you won't select any customerid's that are not in the customers table. This doesn't reduce your 100 seconds significantly, but every bit helps.
To get full gain of indexes, you will need two extra indexes:
CREATE INDEX firstindextoadd ON calendaritems(idAgentsRegion, isDeleted, startDate, endDate);
CREATE INDEX secondindextoadd ON calendaritems(idAgentsRegion, isDeleted, endDate);
The first index will be used for your first 3 conditions:
(rbase.startDate <= '2020-07-06 00:00:00' and rbase.endDate >= '2020-07-06 00:00:00') or
(rbase.startDate <= '2020-07-28 00:00:00' and rbase.endDate >= '2020-07-28 00:00:00') or
(rbase.startDate >= '2020-07-06 00:00:00' and rbase.startDate <= '2020-07-28 23:59:59')
The second will be your for the forth condition:
(rbase.endDate >= '2020-07-06 00:00:00' and rbase.endDate <= '2020-07-28 23:59:59')
It depends on the number of deleted records if you should include the isDeleted, but I added them 'just in case'.
It didn't test it on a huge dataset, so you need to tell me if this worked for you.
In addtion you can simplify your conditions to:
SELECT DISTINCT `cr`.`idCustomer`, `rbase`.`id`
FROM `customersregion` `cr` ON t.idCustomer = cr.idCustomer and cr.isDeleted = 0
JOIN `calendaritems` `rbase` ON rbase.idAgentsRegion = cr.idRegion and rbase.isDeleted = 0
where
rbase.startDate <= '2020-07-06 00:00:00' and rbase.endDate >= '2020-07-28 00:00:00' OR
rbase.startDate BETWEEN '2020-07-06 00:00:00' and '2020-07-28 00:00:00' OR
rbase.endDate BETWEEN '2020-07-06 00:00:00' and '2020-07-28 00:00:00'
rbase: INDEX(isDeleted, startDate, endDate, idAgentsRegion, id)
rbase: INDEX(isDeleted, endDate, startDate, idAgentsRegion, id)
Those have these qualities:
First two columns are useful in ON and WHERE.
Optimizer will pick between them based on whether startDate or endDate is more selective.
Covering
That assumes that the Optimizer will start with rbase. If, instead, it starts with cr, then have both of these for the Optimizer to choose between:
rbase: INDEX(idAgentsRegion, isDeleted, startDate, endDate, id)
rbase: INDEX(idAgentsRegion, isDeleted, endDate, startDate, id)
cr is the only other table that the Optimizer might start with. (There is a WHERE clause to filter by.)
cr: INDEX(isDeleted, idRegion, -- first, (in either order)
idCustomer) -- last
Assuming that start <= end, the range test can probably be simplified to only this:
WHERE rbase.startDate < '2020-07-28'
AND rbase.endDate >= '2020-07-06'
(I don't recognize the funnybusiness with '2020-07-28' versus '2020-07-28 23:59:59'.)
I recommend using "< midnight" and ">= midnight" consistently. A plain date is equivalent to midnight for that morning. Another way to specify '2020-07-28' is '2020-07-06' + INTERVAL 22 DAY. The latter is convenient when you know the span (22 days) and don't want to fuss with leap days, etc.
It is "proper" for the ON to specify how the tables are 'related', and the WHERE to be used for filtering. That is, the isDeleted tests belong in the WHERE clause. (The execution is unaffected for JOIN, but important for LEFT JOIN.)
The Last...Id... and Last...Name columns seem to be redundant? Somewhere else there is a mapping from id to name?
Rates and Colors -- Those seem like things that are not worth normalizing? If you ever need to search on either, undoing this normalization will help performance, possibly a lot.
This combo seems 'wrong'; is there a reason for it:
startDate DATETIME
startTime TIME
When you have both of these,
INDEX(a) -- drop
INDEX(a,b) -- keep (it takes care of the other case)
LatitudePoint decimal(18, 12) takes 9 bytes; gross overkill. Suggested alternatives: http://mysql.rjweb.org/doc.php/latlng#representation_choices
Unless you can assume startDate is no more than e.g. 30 days before endDate, there is no practical way to index for this query to avoid having to check all rows. You can try a composite index on (startDate,endDate) and that may help some.
You can try doing a union with some of your where conditions using a start date index and some using an end date index, but if you really are expecting half a million of your 3.8 million rows to get selected, it may not help at all.

Why the index on my MySQL table is not being used?

I have a table in MySQL (InnoDB engine) with 100M records. Structure is as below:
CREATE TABLE LEDGER_AGR (
ID BIGINT(20) NOT NULL AUTO_INCREMENT,
`Booking` int(11) NOT NULL,
`LType` varchar(5) NOT NULL,
`PType` varchar(5) NOT NULL,
`FType` varchar(5) DEFAULT NULL,
`TType` varchar(10) DEFAULT NULL,
`AccountCode` varchar(55) DEFAULT NULL,
`AgAccountId` int(11) DEFAULT '0',
`TransactionDate` date NOT NULL,
`DebitAmt` decimal(37,6) DEFAULT '0.000000',
`CreditAmt` decimal(37,6) DEFAULT '0.000000',
KEY `TRANSACTION_DATE` (`TransactionDate`)
)
ENGINE=InnoDB;
When I am doing:
EXPLAIN
SELECT * FROM LEDGER_AGR
WHERE TransactionDate >= '2000-08-01'
AND TransactionDate <= '2017-08-01'
It is not using TRANSACTION_DATE index. But when I am doing:
EXPLAIN
SELECT * FROM LEDGER_AGR
WHERE TransactionDate = '2000-08-01'
it is using TRANSACTION_DATE index. Could someone explain please?
Range query #1 has poor selectivity. Equality query #2 has excellent selectivity. The optimizer is very likely to choose the index access path when result rows will be < 1% of total rows in table. The backend optimizer is unlikely to prefer the index when result rows will be a large fraction of the total, for example a half or a quarter of all rows.
A range of '2000-08-01' thru '2000-08-03' would likely exploit the index.
cf: mysql not using index?

Select a record from millions of records slowness

I have a standalone table, we insert it's data through a weekly job, and retrieve data in our search module.
the table has around 4 millions records (and will get bigger) when I execute the straight forward select query it take long time (around 15 second). I am using MySql DB.
Here is my table structure
CREATE TABLE `myTable` (
`myTableId` int(11) NOT NULL AUTO_INCREMENT,
`date` varchar(255) DEFAULT NULL,
`startTime` int(11) DEFAULT NULL,
`endTime` int(11) DEFAULT NULL,
`price` decimal(19,4) DEFAULT NULL,
`total` decimal(19,4) DEFAULT NULL,
`taxes` decimal(19,4) DEFAULT NULL,
`persons` int(11) NOT NULL DEFAULT '0',
`length` int(11) DEFAULT NULL,
`total` decimal(19,4) DEFAULT NULL,
`totalPerPerson` decimal(19,4) DEFAULT NULL,
`dayId` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`myTableId`)
);
When I run the following statement it take around 15 second to retrieve results.
So, how to optimize it to be faster.
SELECT
tt.testTableId,
(SELECT
totalPerPerson
FROM
myTable mt
WHERE
mt.venueId = tt.venueId
ORDER BY totalPerPerson ASC
LIMIT 1) AS minValue
FROM
testTable tt
WHERE
status is NULL;
Please note that testTable tble has around 15 records only.
This is the query:
SELECT tt.testTableId,
(SELECT mt.totalPerPerson
FROM myTable mt
WHERE mt.venueId = tt.venueId
ORDER BY mt.totalPerPerson ASC
LIMIT 1
) as minValue
FROM testTable tt
WHERE status is NULL;
For the subquery, you want an index on mytable(venueId, totalPerPerson). For the outer query, an index is unnecessary. However, if the table were larger, you would want in index on testTable(status, venueId, testTableId).
Using MIN and GROUP BY may be faster.
SELECT tt.testTableId, MIN(totalPerPerson)
FROM testTable tt
INNER JOIN mytable mt ON tt.venueId = mt.venueId
WHERE tt.status is NULL
GROUP BY tt.testTableId

False Positives outside date range in this mySQL JOIN

I am getting historical count data together in an automated report. The two main tables schemas are below. The third table referenced is person which has it's ids as foreign keys in email_list_subscription. That table's primary key consists of the two foreign keys email_list and person.
SQLFIDDLE HERE
The query below is coming up with a count which is outside the date ranges allowed in the query and I can't figure out why. It has rows for an email list that definitely has now rows in 2014 at all.
CREATE TABLE `email_list` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`handle` varchar(50) NOT NULL DEFAULT '',
`title` varchar(255) DEFAULT NULL,
`operator` varchar(255) DEFAULT NULL,
`operator_contact_name` varchar(255) DEFAULT NULL,
`operator_contact_email` varchar(150) DEFAULT NULL,
`operator_contact_phone` varchar(20) DEFAULT NULL,
`operator_listid` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `handle` (`handle`),
KEY `handle_2` (`handle`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `email_list_subscription` (
`email_list` smallint(5) unsigned NOT NULL DEFAULT '0',
`person` int(10) unsigned NOT NULL DEFAULT '0',
`as_email_address` varchar(150) DEFAULT NULL,
`datetime_synced_to_operator` datetime DEFAULT NULL,
`opted_in` datetime DEFAULT NULL,
`opted_out` datetime NOT NULL,
`undeliverable` datetime NOT NULL,
PRIMARY KEY (`email_list`,`person`),
KEY `email_list` (`email_list`),
KEY `person` (`person`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Here is a query dumped from the script and it's results checked directly in mySQL monitor:
SELECT
el.id, el.handle,
els.`email_list` ,
COUNT( els.person ) AS c
FROM
`email_list` el,
`email_list_subscription` els
WHERE
el.id = els.email_list
AND (
DATE( els.`datetime_synced_to_operator` ) >= '2014-04-01'
OR
DATE( els.`opted_in` ) >= '2014-04-01'
)
AND (
DATE( els.`datetime_synced_to_operator` ) <= '2014-05-18'
OR
DATE( els.`opted_in` ) <= '2014-05-18'
)
GROUP BY els.`email_list`
How is this capturing els rows whose dates are not in the range?
Those DATE() calls are going to kill your performance, much better to do
els.`datetime_synced_to_operator` >= '2014-04-01 00:00:00'
(for example).
Also, it is not clear your date ranges are going to work as intended; this seems more clear (but may have different results depending on data):
WHERE el.id = els.email_list
AND (
( els.`datetime_synced_to_operator` BETWEEN '2014-04-01 00:00:00' AND '2014-05-18 23:59:59')
OR
( els.`opted_in` BETWEEN '2014-04-01 00:00:00' AND '2014-05-18 23:59:59')
)
;
Also: What was wrong with the original where (below)?
AND (
DATE( els.`datetime_synced_to_operator` ) >= '2014-04-01'
OR
DATE( els.`opted_in` ) >= '2014-04-01'
)
AND (
DATE( els.`datetime_synced_to_operator` ) <= '2014-05-18'
OR
DATE( els.`opted_in` ) <= '2014-05-18'
)
Best illustrated with an example... any row with datetime_synced_to_operator any time after the start date (even after the end date) and an opted_in any time before the end date (even before the start date) gives true for this clause; and vice versa.