I am trying to optimize a query (on MySQL 5.7) that takes about 1s to execute on 3M rows. The desired result is a SUM of float values per day within the last 30days:
Query:
SELECT
DATE,
SUM(RIX) as value
FROM
tbl1 v
JOIN tbl2 b ON v.BRAND_ID = b.ID and b.MANUFACTURER_ID = 18670
WHERE
DATE BETWEEN FROM_UNIXTIME(1613663944) AND FROM_UNIXTIME(1616255944)
AND type = 'kwd'
group by v.DATE
Tbl1 (10K rows):
create table tbl1
(
ID mediumint unsigned auto_increment
primary key,
ID_parent mediumint unsigned null,
MANUFACTURER_ID mediumint unsigned not null,
BRAND varchar(255) null,
CREATED date null,
constraint test
unique (MANUFACTURER_ID, BRAND),
);
create index idx2
on tbl1 (ID_parent);
create index idx3
on tbl1 (MANUFACTURER_ID);
create index idx4
on tbl1 (BRAND);
Tbl2 (3M rows:
create table tbl2
(
DATE date not null,
MERCHANT_ID mediumint unsigned not null,
TYPE enum ('cat', 'kwd', 'css') not null,
BRAND_ID mediumint unsigned not null,
RIX float unsigned not null,
primary key (DATE, TYPE, MERCHANT_ID, BRAND_ID)
)
create index idx1
on tbl2 (RIX);
I could not get it to perform faster. Any ideas on how to improve the query time?
primary key (DATE, TYPE, MERCHANT_ID, BRAND_ID)
Try rearranging it to
primary key (TYPE, DATE, MERCHANT_ID, BRAND_ID)
That would make this more efficient:
WHERE type = ...
AND date BETWEEN ...
tbl1 has an index starting with MANUFACTURER_ID; don't add another index on just MANUFACTURER_ID.
(Please qualify each column with the table alias, it is hard to read the query. While you are at it, fix v.BRAND_ID and others!!)
Related
Below is my query:
EXPLAIN SELECT
SUM(rcr.impressions),
SUM(rcr.clicks) AS clicks,
cv.id AS cup_version_id,
cv.source_id as source_id,
rcr.date AS date,
adg.subproduct_type_id
FROM
report_custom_records AS rcr
JOIN report_ad_groups AS adg ON (rcr.group_id = adg.ID)
JOIN ad_campaigns AS cmp ON (adg.campaign_id = cmp.id)
JOIN campaign_creatives AS cre ON (rcr.creative_id = cre.id)
JOIN campaign_versions AS cv ON (cre.version_id = cv.id)
WHERE
cmp.business_id IN (-1,9126,102538)
AND adg.campaign_id IN (-1,870689,870696,884963,884964,902027,907809,914889,914893,925233,930390,930391,955423,955429,1004323,1004324,1021355,1021356,1078026,1078027)
AND cast(rcr.date as date) BETWEEN '2018-03-01' AND '2018-10-31'
AND ((adg.target_type LIKE "%Pre-Roll%" AND adg.subproduct_type_id IN (2, 4)) OR adg.subproduct_type_id IN (12, 16 , 10 ))
AND cv.source_id IS NULL
GROUP BY
cup_version_id, date
If I use it with the date condition ( between dates ), it returns the explain output as 11644969 records processed.
But If I use it without the date condition, it returns me only 1584016 records processed.
FYI:
date field is having date datatype
tables are used with join but no foreign keys
Table:
create table report_custom_records
(
group_id varchar(100) default '' not null,
creative_id int default 0 not null,
tp_creative_id varchar(20) not null,
date date not null,
impressions int(8) not null,
clicks int(8) not null,
cost float(6, 2) not null,
tp_creative_source smallint(6) not null comment '0 = TD, 1 = DFA',
primary key (adgroup_id, tp_creative_id, creative_id, date)
);
I have below indexes on the table, does it slowing the query?
create index adg_id
on report_creative_records (group_id, date);
create index group_id_2
on report_creative_records (group_id, creative_id, date);
create index group_id_3
on report_creative_records (group_id);
create index creative_id
on report_creative_records (creative_id);
create index date
on report_creative_records (date);
I am trying to create a procedure where my transfer table is joined to my account table. In my transfer table, there are two FK columns that reference the account table id column.
account table:
CREATE TABLE account (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(30) NOT NULL,
number VARCHAR(30) NOT NULL DEFAULT '',
description VARCHAR(255)NOT NULL DEFAULT '',
is_active BIT(1) NOT NULL DEFAULT b'1',
PRIMARY KEY (id),
UNIQUE account_name (name, number)
);
transfer table:
CREATE TABLE transfer (
id INT NOT NULL AUTO_INCREMENT,
date DATE NOT NULL,
from_account INT NULL,
to_account INT NULL,
amount DECIMAL(12, 2) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (from_account)
REFERENCES account(id),
FOREIGN KEY (to_account)
REFERENCES account(id)
);
get_account procedure:
CREATE PROCEDURE get_account()
SELECT a.*,
(SUM(t.amount) - SUM(f.amount)) AS balance
FROM account a
LEFT JOIN transfer f
ON a.id = f.from_account
LEFT JOIN transfer t
ON a.id = t.to_account
GROUP BY a.id;
I am trying to subtract the total of the from_accout column from the total of the to_account column. I am able to get the sum of just one column but when I try to get both it returns a NULL.
This seems like it should be easy, but I can't figure it out.
We use to index our tables basing on where statement. It works fine during our MSSQL days, but now we are using MySQL and things are differenct. Sub-query has terrible performance. Consider this table :
# 250K records per day
create table t_101(
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`transaction_date` datetime not null,
`memo_1` nvarchar(255) not null,
`memo_2` nvarchar(255) not null,
`product_id` bigint not null,
#many more columns
PRIMARY KEY (`id`),
key `index.t_101.101`(`transaction_date`, `product_id`, `memo_1`),
key `index.t_101.102`(`transaction_date`, `product_id`, `memo_2`)
)ENGINE=MyIsam;
A temporary table where I store condition values :
# 150 records
create temporary table `temporary.user.accessibleProducts`
(
product_id bigint not null,
PRIMARY KEY (`product_id`)
)Engine=MyIsam;
And this is the original query :
select
COUNT(a.id) as rowCount_la1,
COUNT(DISTINCT a.product_id) as productCount
from t_101 a
where a.transaction_date = '2017-05-01'
and a.product_id in(select xa.product_id from `temporary.user.accessibleProducts` xa)
and a.memo_1 <> '';
it takes 7 seconds to be executed, while this query :
select
COUNT(a.id) as rowCount_la1,
COUNT(DISTINCT a.product_id) as productCount
from t_101 a
inner join `temporary.user.accessibleProducts` b on b.product_id = a.product_id
where a.transaction_date = '2017-05-01'
and a.memo_1 <> '';
takes 0.063 seconds to execute.. Even though 0.063 seconds is acceptable, I'm worrying about index. With the given above, how do I index t_101 properly?
We are using MySQL 5.5.42.
I have a table with about 50M rows and format:
CREATE TABLE `big_table` (
`id` BIGINT NOT NULL,
`t1` DATETIME NOT NULL,
`a` BIGINT NOT NULL,
`type` VARCHAR(10) NOT NULL,
`b` BIGINT NOT NULL,
`is_c` BOOLEAN NOT NULL,
PRIMARY KEY (`id`),
INDEX `a_b_index` (a,b)
) ENGINE=InnoDB;
I then define the table t2, with no indices:
Create table `t2` (
`id` BIGINT NOT NULL,
`a` BIGINT NOT NULL,
`b` BIGINT NOT NULL,
`t1min` DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I then populate t2 using a query from big_table (this will add about 12M rows).
insert into opportunities
(id, a,b,t1min)
SELECT id,a,b,min(t1)
FROM big_table use index (a_b_index)
where type='SUBMIT' and is_c=1
GROUP BY a,b;
I find that it takes this query about a minute to process 5000 distinct (a,b) in big_table.
Since there are 12M distinct (a,b) in big_table then it would take about 40 hours to run
the query on all of big_table.
What is going wrong?
If I just do SELECT ... then the query does 5000 lines in about 2s. If I SELECT ... INTO OUTFILE ..., then the query still takes 60s for 5000 lines.
EXPLAIN SELECT ... gives:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,stdnt_intctn_t,index,NULL,a_b_index,16,NULL,46214255,"Using where"
I found that the problem was that the GROUP_BY resulted in too many random-access reads of big_table. The following strategy allows one sequential trip through big_table. First, we add a key to t2:
Create table `t2` (
`id` BIGINT NOT NULL,
`a` BIGINT NOT NULL,
`b` BIGINT NOT NULL,
`t1min` DATETIME NOT NULL,
PRIMARY KEY (a,b),
INDEX `id` (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then we fill t2 using:
insert into t2
(id, a,b,t1min)
SELECT id,a,b,t1
FROM big_table
where type='SUBMIT' and is_c=1
ON DUPLICATE KEY UPDATE
t1min=if(t1<t1min,t1,t1min),
id=if(t1<t1min,big_table.id,t2.id);
The resulting speed-up is several orders of magnitude.
The group by might be part of the issue. You are using an index on (a,b), but your where is not being utilized. I would have an index on
(type, is_c, a, b )
Also, you are getting the "ID", but not specifying which... you probably want to do a MIN(ID) for a consistent result.
I have the following table structure:
CREATE TABLE a (
a_id int(10) unsigned NOT NULL AUTO_INCREMENT,
);
CREATE TABLE b {
b_id int(10) unsigned NOT NULL AUTO_INCREMENT,
};
CREATE TABLE cross (
a_id int(10) unsigned NOT NULL,
b_id int(10) unsigned NOT NULL,
PRIMARY KEY (a_id),
KEY (b_id),
CONSTRAINT FOREIGN KEY (a_id) REFERENCES a (a_id),
CONSTRAINT FOREIGN KEY (b_id) REFERENCES b (b_id)
);
CREATE TABLE prices (
a_id int(10) unsigned NOT NULL,
price int(10) NOT NULL,
PRIMARY KEY (a_id),
CONSTRAINT FOREIGN KEY (a_id) REFERENCES a (a_id)
);
I would like to retrieve every b_id value for which there are inconsistent prices. A b.id value 'B' has an inconsistent price if the following conditions both hold:
There exist two a_id values (say, 'A1' and 'A2') such that table cross contains both ('A1', 'B') and ('A2', 'B'). (For any b_id value, there may be zero or more rows in cross.)
Either 'A1' and 'A2' correspond to rows of prices that have different values of price, or else exactly one of 'A1' and 'A2' corresponds to an entry in prices.
Because of restrictions by the hosting provider, I cannot use stored procedures with this data base. I haven't figured out a sensible way to do this with SQL queries. So far, I've resorted to retrieving all relevant data and scanning for inconsistencies in Perl. That's a lot of data retrieval. Is there a better way? (I'm using InnoDB, if it makes a difference.)
/* Condition 1 and Condition 2a */
SELECT
c.b_id
FROM
`cross` AS c
JOIN prices AS p ON (p.a_id = c.a_id)
GROUP BY
c.b_id
HAVING
COUNT(c.a_id) > 1 AND
MAX(p.price) != MIN(p.price)
UNION
/* Condition 1 and Condition 2b */
SELECT
c.b_id
FROM
`cross` AS c
LEFT JOIN prices AS p ON (p.a_id = c.a_id)
GROUP BY
c.b_id
HAVING
COUNT(c.a_id) > 1 AND
SUM(IF(p.price IS NULL, 0 ,1)) = 1;