I have two MYSQL tables, A and B. Table A has 44,902 rows and table B has 109,583 rows.
I would like to compare two columns from the two tables and return the rows from table A where it finds a match. My, unsuccessful queries are:
SELECT pool.domain_name FROM `pool`, `en_dict` WHERE pool.domain_string = en_dict.word
and another variant:
SELECT a.domain_name FROM `pool` as a inner join en_dict as b on a.domain_string = b.word
both solutions falied returning any values under 300 seconds.
What should I do to reduce the time for finding the matches??
P.S. I have tried adding a LIMIT at the end of the queries and managed to display 10 results in 245 seconds.
Edit: My tables structures are as follows :
--
-- Table structure for table `en_dict`
--
CREATE TABLE `en_dict` (
`word_id` bigint(20) unsigned zerofill NOT NULL AUTO_INCREMENT,
`word` varchar(35) NOT NULL,
PRIMARY KEY (`word_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=109584 ;
-- --------------------------------------------------------
--
-- Table structure for table `pool`
--
CREATE TABLE `pool` (
`domain_id` bigint(20) unsigned zerofill NOT NULL AUTO_INCREMENT,
`domain_name` varchar(100) NOT NULL,
`domain_tld` varchar(10) NOT NULL,
`domain_string` varchar(90) NOT NULL,
`domain_lenght` int(2) NOT NULL,
`domain_expiretime` date NOT NULL,
PRIMARY KEY (`domain_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=44903 ;
Try adding an index on the relevant columns of your tables:
ALTER TABLE `pool` ADD INDEX `domain_string_idx` (`domain_string`);
ALTER TABLE `en_dict` ADD INDEX `word_idx` (`word`);
Related
This question is more or less the same as this one: MySQL select rows that do not have matching column in other table; however, the solution there is not not practical for large data sets.
This table has ~120,000 rows.
CREATE TABLE `tblTimers` (
`TimerID` int(11) NOT NULL,
`TaskID` int(11) NOT NULL,
`UserID` int(11) NOT NULL,
`StartDateTime` datetime NOT NULL,
`dtStopTime` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `tblTimers`
ADD PRIMARY KEY (`TimerID`);
ALTER TABLE `tblTimers`
MODIFY `TimerID` int(11) NOT NULL AUTO_INCREMENT;
This table has about ~70,000 rows.
CREATE TABLE `tblWorkDays` (
`WorkDayID` int(11) NOT NULL,
`TaskID` int(11) NOT NULL,
`UserID` int(11) NOT NULL,
`WorkDayDate` date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `tblWorkDays`
ADD PRIMARY KEY (`WorkDayID`);
ALTER TABLE `tblWorkDays`
MODIFY `WorkDayID` int(11) NOT NULL AUTO_INCREMENT;
tblWorkDays should have one line per TaskID per UserID per WorkDayDate, but due to a bug, a few work days are missing despite there being timers for those days; so, I am trying to create a report that shows any timer that does not have a work day associated with it.
SELECT A.TimerID FROM tblTimers A
LEFT JOIN tblWorkDays B ON A.TaskID = B.TaskID AND A.UserID = B.UserID AND DATE(A.StartDateTime) = B.WorkDayDate
WHERE B.WorkDayID IS NULL
Doing this causes the server to time out; so, I am looking for if there is a way to do this more efficiently?
You don't have any indexes on the columns you're joining on, so it has to do full scans of both tables. Try adding the following:
ALTER TABLE tblTimers ADD INDEX (TaskID, UserID);
ALTER TABLE tblWorkDays ADD INDEX (TaskID, UserID);
I have a first table containing my ips stored as integer (500k rows), and a second one containing ranges of black listed ips and the reason of black listing (10M rows)
here is the table structure :
CREATE TABLE `black_lists` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`ip_start` INT(11) UNSIGNED NOT NULL,
`ip_end` INT(11) UNSIGNED NULL DEFAULT NULL,
`reason` VARCHAR(3) NOT NULL,
`excluded` TINYINT(1) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `ip_range` (`ip_end`, `ip_start`),
INDEX `ip_start` ( `ip_start`),
INDEX `ip_end` (`ip_end`),
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=10747741
;
CREATE TABLE `ips` (
`id` INT(11) NOT NULL AUTO_INCREMENT COMMENT 'Id ips',
`idhost` INT(11) NOT NULL COMMENT 'Id Host',
`ip` VARCHAR(45) NULL DEFAULT NULL COMMENT 'Ip',
`ipint` INT(11) UNSIGNED NULL DEFAULT NULL COMMENT 'Int ip',
`type` VARCHAR(45) NULL DEFAULT NULL COMMENT 'Type',
PRIMARY KEY (`id`),
INDEX `host` (`idhost`),
INDEX `index3` (`ip`),
INDEX `index4` (`idhost`, `ip`),
INDEX `ipsin` (`ipint`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=675651;
my problem is when I try to run this query no index is used and it takes an eternity to finish :
select i.ip,s1.reason
from ips i
left join black_lists s1 on i.ipint BETWEEN s1.ip_start and s1.ip_end;
I'm using MariaDB 10.0.16
True.
The optimizer has no knowledge that start..end values are non overlapping, nor anything else obvious about them. So, the best it can do is decide between
s1.ip_start <= i.ipint -- and use INDEX(ip_start), or
s1.ip_end >= i.ipint -- and use INDEX(ip_end)
Either of those could result in upwards of half the table being scanned.
In 2 steps you could achieve the desired goal for one ip; let's say #ip:
SELECT ip_start, reason
FROM black_lists
WHERE ip_start <= #ip
ORDER BY ip_start DESC
LIMIT 1
But after that, you need to see if the ip_end corresponding to that ip_start is <= #ip before deciding whether you have a black-listed item.
SELECT reason
FROM ( ... ) a -- fill in the above query
JOIN black_lists b USING(ip_start)
WHERE b.ip_end <= #ip
That will either return the reason or no rows.
In spite of the complexity, it will be very fast. But, you seem to have a set of IPs to check. That makes it more complex.
For black_lists, there seems to be no need for id. Suggest you replace the 4 indexes with only 2:
PRIMARY KEY(ip_start, ip_end),
INDEX(ip_end)
In ips, isn't ip unique? If so, get rid if id and change 5 indexes to 3:
PRIMARY KEY(idint),
INDEX(host, ip),
INDEX(ip)
You have allowed more than enough in the VARCHAR for IPv6, but not in INT UNSIGNED.
More discussion.
I read many posts on forum, but still I have confusion on creating index to speed up join queries in mysql, here is my doubt
I have two tables, one is category table which just contains few thousand lines, and contains all information about data, and another one is geo_data table which contains huge amount of data, I join geo_data table based on 2 keys s_key1 and s_key2. Following is structure of table
category table
CREATE TABLE `category` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`STD_DATE` datetime DEFAULT NULL,
`LATITUDE` float DEFAULT NULL,
`LONGITUDE` float DEFAULT NULL,
`COUNTRY_CD` varchar(15) DEFAULT NULL,
`INSTR_CODE` varchar(15) DEFAULT NULL,
`CANADACR_CD` varchar(15) DEFAULT NULL,
`PROBST_T` varchar(15) DEFAULT NULL,
`TYPE` varchar(15) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=32350 DEFAULT CHARSET=latin1;
geo_data table
CREATE TABLE `geo_data` (
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`MAGNETIC` float DEFAULT NULL,
`GRAVITY` float DEFAULT NULL,
`BATHY` float DEFAULT NULL,
`CORE` float DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I have many tables like geo_data table which contains s_key1, s_key2 and other columns, in my application I often use fields std_date,latitude,longitude,country_cd,type from category table
I do inner join, sometimes left join depending on the requirement, for example my query looks like below
SELECT
c.s_key1,
c.s_key2,
c.std_date,
c.latitude,
c.longitude,
g.magnetic,
g.bathy
FROM
category c, geo_data g
WHERE
c.s_key1 = g.s_key1 && c.s_key2 = g.s_key2;
and sometimes my where clause will have something like this too
WHERE
c.latitude between -30 to 30 AND
c.longitude between 10 to 140 AND
c.country_cd = 'INDIA' AND
c.type = 'NON_PROFIT';
So what's the right way of creating index to speed up my query, whether below one right ? please someone help
create index `myindex` on
`category` (s_key1,s_key2,std_date,latitude,longitude,country_cd)
create index `myindex` on
`geo_data` (s_key1,s_key2)
and One more doubt whether both tables (category,geo_data) should have index key to speed up performance or only geo_data table ?
From the where condition it makes sense to simplify the first index as:
create index `myindex` on
`category` (s_key1,s_key2)
The original however can improve the performance in terms that it doesn't have to access the full table row to get the other values. However it makes the index bigger and therefore slower. So it depends on whether this is optimization for only this query or there are more of them which use only the s_key1 and s_key2 (or with combination with other columns).
Regarding the clarification - for lat/lng check it will make sense to move std_date after lat/lng (or remove completely):
create index `myindex` on
`category` (s_key1,s_key2,latitude,longitude,std_date,country_cd)
I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.
I want to reduce the time taken by the query in mysql.
There are three tables say
A ~600k rows,
B ~2K rows,
C ~100K rows
having 2 columns each.
A has one column which is used in aggregation and other to join with table B.
B has one column to join with A and other with C
C has one column to join with B and other column to group by.
What should be the indexing plan to reduce the run time. As of now it is using temporary tables and then file sort. Is there any way we could avoid temporary tables.
Sample query :
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
`category_groups` AS `category_groups`,
`revenue_facts` AS `revenue_facts`,
`dim_products` AS `dim_products`
WHERE
`dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk` AND
`revenue_facts`.`product_sk` = `dim_products`.`product_sk`
GROUP BY `category_groups`.`category_name`;
I already have indexes on group by column and the columns in join.
my query is currently taking *6 minute*s. I want to reduce the time taken. table structure is as
table A :
CREATE TABLE `revenue_facts` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_sk` bigint(20) unsigned NOT NULL,
`total_price` decimal(12,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `product_sk` (`product_sk`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table B :
CREATE TABLE `dim_products` (
`product_sk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_category_group_sk` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`product_sk`),
KEY `product_id` (`product_id`),
KEY (`product_sk`) (`product_sk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
table C :
CREATE TABLE `category_groups` (
`product_category_group_sk` bigint(20) unsigned NOT NULL,
`category_sk` bigint(20) unsigned NOT NULL,
`category_name` varchar(255) NOT NULL,
PRIMARY KEY (`product_category_group_sk`,`category_sk`),
KEY `category_sk` (`category_sk`),
KEY `product_category_group_sk` (`product_category_group_sk`
KEY `category_sk` (`category_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Execution plan used is:
1 SIMPLE dim_products index PRIMARY,product_category_group_index product_category_group_index 8 NULL 651264 Using index; Using temporary; Using filesort
1 SIMPLE category_groups ref PRIMARY,category_sk,product_category_group_sk,category_name product_category_group_sk 8 etl_testing.dim_products.product_category_group_sk 4 Using index
1 SIMPLE revenue_facts ref product_sk product_sk 8 etl_testing..dim_products.product_sk 5 NULL
Try this:
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
(`dim_products` LEFT JOIN `category_groups` ON `dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk`)
LEFT JOIN `revenue_facts` ON `dim_products`.`product_sk` = `revenue_facts`.`product_sk`
GROUP BY `category_groups`.`category_name`;
Also, as Abdul said:
"Post your table structures and explain plan"