mariadb slow count on console but very fast on dbeaver - mysql

I have this SQL query that when executing from mariadb console or PHP (Laravel) takes a long time
SELECT Count(*) AS aggregate
FROM (SELECT paquetes.id,
paquetes.codigoseguimiento,
paquetes.direccionentrega,
paquetes.telefono1,
paquetes.nombrerecibe,
paquetes.nombrequienenvia,
paquetes.estado,
paquetes.marcadevolucion,
tipos_paquetes.nombre AS tipo,
paquetes.created_at,
paquetes.devolucion,
ciudades.nombre AS ciudadEntrega
FROM paquetes
LEFT JOIN tipos_paquetes
ON tipos_paquetes.id = paquetes.tipo
LEFT JOIN ciudades
ON ciudades.id = paquetes.ciudadentrega
WHERE paquetes.created_at BETWEEN
"2021-10-17 00:00:00" AND "2021-11-17 23:59:59"
AND paquetes.estado != 0
ORDER BY paquetes.created_at DESC) count_row_table
Result:
+-----------+
| aggregate |
+-----------+
| 763141 |
+-----------+
1 row in set (20.631 sec)
But if I run the same SQL from DBeaver (always clearing the query cache)
It only takes 1,458 seconds.
What I discovered is that adding a limit 1 to the SQL runs at the same speed as DBeaver in both mariadb console and PHP
SELECT Count(*) AS aggregate
FROM (SELECT paquetes.id,
paquetes.codigoseguimiento,
paquetes.direccionentrega,
paquetes.telefono1,
paquetes.nombrerecibe,
paquetes.nombrequienenvia,
paquetes.estado,
paquetes.marcadevolucion,
tipos_paquetes.nombre AS tipo,
paquetes.created_at,
paquetes.devolucion,
ciudades.nombre AS ciudadEntrega
FROM paquetes
LEFT JOIN tipos_paquetes
ON tipos_paquetes.id = paquetes.tipo
LEFT JOIN ciudades
ON ciudades.id = paquetes.ciudadentrega
WHERE paquetes.created_at BETWEEN
"2021-10-17 00:00:00" AND "2021-11-17 23:59:59"
AND paquetes.estado != 0
ORDER BY paquetes.created_at DESC) count_row_table
LIMIT 1;
Result:
+-----------+
| aggregate |
+-----------+
| 763141 |
+-----------+
1 row in set (1.339 sec)
The SQL query is generated automatically by the DataTables library for Laravel, the explains shows:
But I'm not sure if this is correct or why it happens, there are currently about 8 million records in the table
Edit:
The execution plan for the query without limit 1
It seems that it is going through the whole table and not using the index.
The execution plan for the query with limit 1:
Apparently I was using the index and it does not go through the whole table
Table indexes:

This will probably run faster in both places:
SELECT COUNT(*) AS aggregate
FROM paquetes
WHERE paquetes.created_at >= "2021-10-17"
AND paquetes.created_at < "2021-10-17" + INTERVAL 1 MONTH + INTERVAL 1 DAY
AND paquetes.estado != 0;
The + INTERVAL 1 DAY was derived from your range, and may be a mistake.
More
The Optimizer, in general, has two ways to perform a query -- Use an index, or scan the table. As a simplified rule, if more than 20% of the rows are needed, it will prefer to simply scan the table; Explain will say "All". Otherwise, it will tell you which Index it picked.
When using an index, it will bounce between the Index's BTree and the Data's BTree. Since the Optimizer cannot precisely predict whether this "bouncing" will be faster or slower than simply reading all rows ('table scan') and ignoring the ones that don't meet the Where clause, it will sometimes pick the 'wrong' way to perform the query.
You are probably seeing the results of this "which way?" quandary when you change the date range.

Related

Below Sql query is taking too much time. How to make this faster?

SELECT
call_id
,call_date
,call_no
,call_amountdue
,rechargesamount
,call_penalty
,callpayment_received
,calldiscount
FROM `call`
WHERE calltype = 'Regular'
AND callcode = 98
AND call_connect = 1
AND call_date < '2018-01-01'
ORDER BY
`call_date` DESC
,`call_id` DESC
limit 1
Index is already there on call_date, callcode, calltype, callconnect
Table has 10 million records. Query is taking 2 min
How to get results within 3sec?
INDEX (calltype, callcode, call_connect, -- in any order
call_date, -- next
call_id) -- last
This will make it possible to find the one row that is desired without having to step over other rows.
Since you seem to have INDEX(calltype), Drop it; it will be in the way and, anyway, redundant. The rest of the indexes you mentioned will be ignored.
More discussion in Index Cookbook

Optimizing SQL query with sub queries

I have got a SQL query that I tried to optimize and I could reduce through various means the time from over 5 seconds to about 1.3 seconds, but no further. I was wondering if anyone would be able to suggest further improvements.
The Explain diagram shows a full scan:
explain diagram
The Explain table will give you more details:
explain tabular
The query is simplified and shown below - just for reference, I'm using MySQL 5.6
select * from (
select
#row_num := if(#yacht_id = yacht_id and #charter_type = charter_type and #start_base_id = start_base_id and #end_base_id = end_base_id, #row_num +1, 1) as row_number,
#yacht_id := yacht_id as yacht_id,
#charter_type := charter_type as charter_type,
#start_base_id := start_base_id as start_base_id,
#end_base_id := end_base_id as end_base_id,
model, offer_type, instant, rating, reviews, loa, berths, cabins, currency, list_price, list_price_per_day,
discount, client_price, client_price_per_day, days, date_from, date_to, start_base_city, end_base_city, start_base_country, end_base_country,
service_binary, product_id, ext_yacht_id, main_image_url
from (
select
offer.yacht_id, offer.charter_type, yacht.model, offer.offer_type, offer.instant, yacht.rating, yacht.reviews, yacht.loa,
yacht.berths, yacht.cabins, offer.currency, offer.list_price, offer.list_price_per_day,
offer.discount, offer.client_price, offer.client_price_per_day, offer.days, date_from, date_to,
offer.start_base_city, offer.end_base_city, offer.start_base_country, offer.end_base_country,
offer.service_binary, offer.product_id, offer.start_base_id, offer.end_base_id,
yacht.ext_yacht_id, yacht.main_image_url
from website_offer as offer
join website_yacht as yacht
on offer.yacht_id = yacht.yacht_id,
(select #yacht_id:='') as init
where date_from > CURDATE()
and date_to <= CURDATE() + INTERVAL 3 MONTH
and days = 7
order by offer.yacht_id, charter_type, start_base_id, end_base_id, list_price_per_day asc, discount desc
) as filtered_offers
) as offers
where row_number=1;
Thanks,
goppi
UPDATE
I had to abandon some performance improvements and replaced the original select with the new one. The select query is actually dynamically built by the backend based on which filter criteria are set. As such the where clause of the most inner select can expland quite a lot. However, this is the default select if no filter is set and is the version that takes significantly longer than 1 sec.
explain in text form - doesn't come out pretty as I couldn't figure out how to format a table, but here it is:
1 PRIMARY ref <auto_key0> <auto_key0> 9 const 10
2 DERIVED ALL 385967
3 DERIVED system 1 Using filesort
3 DERIVED offer ref idx_yachtid,idx_search,idx_dates idx_dates 5 const 385967 Using index condition; Using where
3 DERIVED yacht eq_ref PRIMARY,id_UNIQUE PRIMARY 4 yachtcharter.offer.yacht_id 1
4 DERIVED No tables used
Sub selects are never great,
You should sign up here: https://www.eversql.com/
Run that and it will give you all the right indexes and optimsiations you need for this query.
There's still some optimization you can use. Considering the subquery returns 5000 rows only you could use an index for it.
First rephrase the predicate as:
select *
from website_offer
where date_from >= CURDATE() + INTERVAL 1 DAY -- rephrased here
and date(date_to) <= CURDATE() + INTERVAL 3 MONTH
and days = 7
order by yacht_id, charter_type, list_price_per_day asc, discount desc
limit 5000
Then, if you add the following index the performance could improve:
create index ix1 on website_offer (days, date_from, date_to);

MySQL in clause slow with 10 or more items

This query takes 18 seconds
SELECT `wd`.`week` AS `start_week`, `wd`.`hold_code`, COUNT(wd.hold_code) AS hold_code_count
FROM `weekly_data` AS `wd`
JOIN aol_reporting_hold_codes hc ON hc.hold_code = wd.hold_code AND chart = 'GR'
WHERE `wd`.`days` <= 6
AND `wd`.`hold_code` IS NOT NULL
AND NOT `wd`.`hold_code` = ''
AND `wd`.`week` >= '201717'
AND `wd`.`itemgroup` IN ('BOTDTO', 'BOTDWG', 'C&FORG', 'C&FOTO', 'MF-SUB', 'MI-SUB', 'PROPRI', 'PROPTO', 'STRSTO', 'STRSUB')
AND `production_type` = 2
AND `contract` = "1234"
AND `project` = 8
GROUP BY `start_week`, `wd`.`hold_code`
This query takes 4 seconds
SELECT `wd`.`week` AS `start_week`, `wd`.`hold_code`, COUNT(wd.hold_code) AS hold_code_count
FROM `weekly_data` AS `wd`
JOIN aol_reporting_hold_codes hc ON hc.hold_code = wd.hold_code AND chart = 'GR'
WHERE `wd`.`days` <= 6
AND `wd`.`hold_code` IS NOT NULL
AND NOT `wd`.`hold_code` = ''
AND `wd`.`week` >= '201717'
AND `wd`.`itemgroup` IN ('BOTDWG', 'C&FORG', 'C&FOTO', 'MF-SUB', 'MI-SUB', 'PROPRI', 'PROPTO', 'STRSTO', 'STRSUB')
AND `production_type` = 2
AND `contract` = "1234"
AND `project` = 8
GROUP BY `start_week`, `wd`.`hold_code`
All I have done is removed one item from the IN clause. I can remove any one of the items. It runs in 4 seconds as long as there are 9 items or less. It takes 18 seconds to run as soon as I increase to 10 items.
I thought MySQL limited length of command by size i.e. 1MB
More than just the EXPLAIN, use EXPLAIN FORMAT=JSON and get the "Optimizer trace" for the query. I suspect the length of the IN leads to picking a different query plan.
There is virtually no limit to the number of items in IN. I have seen as many as 70K.
That aside, you may be able to speed up even the 4-sec version...
I suggest having this index. Grrr... I can't tell which columns are in which tables. So, if these are all in one table, then make such an index:
INDEX(production_type, contract, project) -- in any order
If those are all in wd, then tack on a 4th column - any of week, itemgroup, days.
Be cautious about COUNT(wd.hold_code).
COUNT(x) checks x for being non-NULL; is that what you want? If not, then simply say COUNT(*).
When JOINing, then GROUP BY, you get an "explode-implode". The number of intermediate rows is big; that is when the COUNT is performed.
It seems wrong to both COUNT(hold_code) and GROUP BY hold_code. What are you trying to do?
For further discussion, please provide SHOW CREATE TABLE and EXPLAIN.
Please note MySql IN clause limit is established with max_allowed_packet value. You may check with NOT IN if results are faster. Also I suggest put values to be checked with IN clause under a buffer string instead of comma separated values and then give a try.

Setting column equal to lagged value of another column in the same table

I have a column where the date is recorded and I want to set another column to the lagged version of the date column. In other words, for every date I want the new column to have the previous date.
I tried a lot of stuff, mostly stupid, and I got nowhere. My main issue was that I was updating a column based on where clauses from the same table and same column and MySQL doesn't allow it.
An example of the data follows below. My goal is to update colum PREVDATE, with the previous row from DATA_DATE with the condition that GVKEY is the same for both rows. I would define previous row as follows, order by GVKEY and DATE_DATE ASC and for every row (given that GVKEY is the same ) I want the previous one
+--------------+--------+---------+-------+----------+-------------+
| DATA_DATE |PREVDATE| PRICE | GVKEY | CUR_DEBT | LT_DEBT |
+--------------+--------+---------+-------+----------+-------------+
| 1965-05-31 | NULL | -17.625 | 1004 | 0.198 | 1.63 |
| 1970-05-31 | NULL | -18.375 | 1004 | 2.298 | 1.58 |
+--------------+--------+---------+-------+----------+-------------+
Here's one approach that makes use of MySQL user-defined variables, and behavior that is not guaranteed, but which see as consistent (at least in MySQL 5.1, 5.5 and 5.6).
WARNING: this returns every row in the table. You may want to consider doing this for a limited range of gvkey values, for testing. Add a WHERE clause...
SELECT IF(r.gvkey=#prev_gvkey,#prev_ddate,NULL) AS prev_date
, #prev_gvkey := r.gvkey AS gvkey
, #prev_ddate := r.data_date AS data_date
FROM (SELECT #prev_ddate := NULL, #prev_gvkey := NULL) i
CROSS
JOIN mytable r
ORDER BY r.gvkey, r.data_date
The order of the expressions in the SELECT list is important, we need to compare the value of the current row to the value "saved" from the previous row, before we save the current values in the #prev_ variables, for the next row.
We need a conditional test to make sure we're still working on the same gvkey. The first data_date for a gvkey isn't going to have a "previous" data_date, so we need to return a NULL.
For best performance, we'll want to have a covering index, with gvkey and data_date as the leading columns:
... ON mytable (gvkey,data_data)
The index can include additional columns, after those, but we need those two columns first, in that order. That will allow MySQL to return the rows "in order" using the index, and avoid an expensive "Using filesort" operation. (Extra column from EXPLAIN will show MySQL "Using index".)
Once we get that working correctly, we can use that as an inline view in an UPDATE statement.
For example:
UPDATE mytable t
JOIN (
SELECT IF(r.gvkey=#prev_gvkey,#prev_ddate,NULL) AS prev_date
, #prev_gvkey := r.gvkey AS gvkey
, #prev_ddate := r.data_date AS data_date
FROM (SELECT #prev_ddate := NULL, #prev_gvkey := NULL) i
CROSS
JOIN mytable r
ORDER BY r.gvkey, r.data_date
) s
ON t.gvkey = s.gvkey
AND t.data_date = s.data_date
SET t.prev_date = s.prev_date
(Again, for a very large table, we probably want to break that transaction up into smaller chunks, by including a predicate on gvkey in the inline view, to limit the number of rows returned/updated.)
Doing this in batches of gvkey ranges is a reasonable approach... eg.
/* first batch */ WHERE r.gvkey >= 1 AND r.gvkey < 100
/* second run */ WHERE r.gvkey >= 100 AND r.gvkey < 200
/* third batch */ WHERE r.gvkey >= 200 AND r.gvkey < 300
Obviously, there are other approaches/SQL patterns to accomplish an equivalent result. I've had success with this approach.
To emphasize an earlier IMPORTANT note: this relies on behavior that is not guaranteed, and which the MySQL Reference Manual warns against (using user-defined variables like this.)

How to best get daily SUM for minute-level data?

I have a data set consisting of minute-by-minute data. My goal is to return minute-by-minute records, and add calculations that create sums of a certain field for the past 24 hours, counting back from each minute record.
The query I have is the following:
SELECT main.recorded_at AS x, (SELECT SUM(precipitation) FROM data AS sub WHERE sub.host = main.host sub.recorded_at BETWEEN SUBTIME(main.recorded_at, '24:00:00') AND main.recorded_at) AS y FROM data AS main WHERE host = 'xxxx' ORDER BY x ASC;
Is there a more efficient way to write this query? I have tried, but failed, so far, using LEFT JOINS and different GROUP BYs.
When I explain this query, I get the following:
1 PRIMARY main ref host host 767 const 4038 100.00 Using where; Using filesort
2 DEPENDENT SUBQUERY sub ref host,recorded_at host 767 const 4038 100.00 Using where
In total, the query takes about 200 seconds to run with 8000 records, getting slower all the time. My goal is to get the aggregate 24-hour precipitation for each result, and somehow in under 2 seconds.
Maybe I'm going about this the wrong way? I'm open to suggestions for other avenues to get the same result. :)
Thanks!
~Mike
Assuming I'm understanding your question correctly, it looks like you can use SUM with CASE to achieve the same result without using the correlated subquery.
SELECT recorded_at AS x,
SUM(CASE WHEN recorded_at BETWEEN SUBTIME(recorded_at, '24:00:00') AND recorded_at
THEN precipitation END) As y
FROM data
WHERE host = 'xxxx'
GROUP BY recorded_at
ORDER BY x ASC;
While I'm not sure this would yield a better performance, I do think it would solve your issue using an OUTER JOIN with GROUP BY:
SELECT main.recorded_at AS x,
SUM(sub.precipitation) As y
FROM data main LEFT JOIN data sub ON
main.host = sub.host AND
sub.recorded_at BETWEEN SUBTIME(main.recorded_at, '24:00:00') AND main.recorded_at
WHERE main.host = 'xxxx'
GROUP BY main.recorded_at
ORDER BY x ASC;