Confusion on creating index to improve join performance in mysql - mysql

I read many posts on forum, but still I have confusion on creating index to speed up join queries in mysql, here is my doubt
I have two tables, one is category table which just contains few thousand lines, and contains all information about data, and another one is geo_data table which contains huge amount of data, I join geo_data table based on 2 keys s_key1 and s_key2. Following is structure of table
category table
CREATE TABLE `category` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`STD_DATE` datetime DEFAULT NULL,
`LATITUDE` float DEFAULT NULL,
`LONGITUDE` float DEFAULT NULL,
`COUNTRY_CD` varchar(15) DEFAULT NULL,
`INSTR_CODE` varchar(15) DEFAULT NULL,
`CANADACR_CD` varchar(15) DEFAULT NULL,
`PROBST_T` varchar(15) DEFAULT NULL,
`TYPE` varchar(15) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=32350 DEFAULT CHARSET=latin1;
geo_data table
CREATE TABLE `geo_data` (
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`MAGNETIC` float DEFAULT NULL,
`GRAVITY` float DEFAULT NULL,
`BATHY` float DEFAULT NULL,
`CORE` float DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I have many tables like geo_data table which contains s_key1, s_key2 and other columns, in my application I often use fields std_date,latitude,longitude,country_cd,type from category table
I do inner join, sometimes left join depending on the requirement, for example my query looks like below
SELECT
c.s_key1,
c.s_key2,
c.std_date,
c.latitude,
c.longitude,
g.magnetic,
g.bathy
FROM
category c, geo_data g
WHERE
c.s_key1 = g.s_key1 && c.s_key2 = g.s_key2;
and sometimes my where clause will have something like this too
WHERE
c.latitude between -30 to 30 AND
c.longitude between 10 to 140 AND
c.country_cd = 'INDIA' AND
c.type = 'NON_PROFIT';
So what's the right way of creating index to speed up my query, whether below one right ? please someone help
create index `myindex` on
`category` (s_key1,s_key2,std_date,latitude,longitude,country_cd)
create index `myindex` on
`geo_data` (s_key1,s_key2)
and One more doubt whether both tables (category,geo_data) should have index key to speed up performance or only geo_data table ?

From the where condition it makes sense to simplify the first index as:
create index `myindex` on
`category` (s_key1,s_key2)
The original however can improve the performance in terms that it doesn't have to access the full table row to get the other values. However it makes the index bigger and therefore slower. So it depends on whether this is optimization for only this query or there are more of them which use only the s_key1 and s_key2 (or with combination with other columns).
Regarding the clarification - for lat/lng check it will make sense to move std_date after lat/lng (or remove completely):
create index `myindex` on
`category` (s_key1,s_key2,latitude,longitude,std_date,country_cd)

Related

Speed Up A Large Insert From Select Query With Multiple Joins

I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop

update table takes long time in mysql?

CREATE TABLE fa (
book varchar(100) DEFAULT NULL,
PRODUCTION varchar(1000) DEFAULT NULL,
VENDOR_LEVEL varchar(100) DEFAULT NULL,
BOOK_NO int(10) DEFAULT NULL,
UNSTABLE_TIME_PERIOD varchar(100) DEFAULT NULL,
`PERIOD_YEAR` int(10) DEFAULT NULL,
promo_3_visuals_manual_drag int(10) DEFAULT NULL,
BOOK_NO int(10) DEFAULT NULL,
PRODUCT_LEVEL_DIST varchar(100) DEFAULT NULL,
PRODUCT_LEVEL_ACV_TREND varchar(100) DEFAULT NULL,
KEY book (BOOK_NO),
KEY period (PERIOD_YEAR)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index we added to column
Index : BOOK_NO and PERIODIC_YEAR has added
we cant add unique nor primary key to both column as it has plenty of duplicate values in it.
There are 46 millions rows.
We tried partitioning to period year and catno for sub partition, but doesn't worked as it is still takes long time
When i run the update query :
update fa set UNSTABLE_TIME_PERIOD = NULL where BOOK_NO = 0 and periodic_year = 201502;
It taking me more than 7 min , how can i OPTIMIZE the query?
Instead of creating 2 different keys, create single composite key for both the columns like:
KEY book_period (BOOK_NO, PERIOD_YEAR)
Also, first filter the records based on the column which will return the small set of records as compare to other.
If you think BOOK_NO will return less number of records as compare to PERIOD_YEAR, Use BOOK_NO first in where clause else use PERIOD_YEAR first and create the key accordingly.
As Álvaro González said, you should use some sort of key (eg. a Primary Key).
Adding a Primary Key:
CREATE TABLE fa (
<your_id>,
{...},
PRIMARY KEY(<your_id>),
{...}
)
or
CREATE TABLE fa (
<your_id> PRIMARY KEY,
{...}
)
It'd be a good idea to make your PRIMARY KEY AUTO_INCREMENT too for convenience, but this is not essenitial.

MySQL query with multiple joins taking too long to execute

I have 3 tables. The first one is called map_life, the second one is called scripts and the third one is called npc_data.
I'm running the following query to get all the properties from map_life while also getting the script column from scripts and the storage_cost column from npc_data if the ids match.
SELECT life.*
, script.script
, npc.storage_cost
FROM map_life life
LEFT
JOIN scripts script
ON script.objectid = life.lifeid
AND script.script_type = 'npc'
LEFT
JOIN npc_data npc
ON npc.npcid = life.lifeid
As you can see, map_life id is lifeid, while scripts id is objectid and npc_data id is npcid.
This query is taking about 5 seconds to execute, and I have no idea why. Here's the CREATE statements for all those 3 tables, maybe I'm missing something?
CREATE TABLE `mcdb83`.`map_life` (
`id` bigint(21) unsigned NOT NULL AUTO_INCREMENT,
`mapid` int(11) NOT NULL,
`life_type` enum('npc','mob','reactor') NOT NULL,
`lifeid` int(11) NOT NULL,
`life_name` varchar(50) DEFAULT NULL COMMENT 'For reactors, specifies a handle so scripts may interact with them; for NPC/mob, this field is useless',
`x_pos` smallint(6) NOT NULL DEFAULT '0',
`y_pos` smallint(6) NOT NULL DEFAULT '0',
`foothold` smallint(6) NOT NULL DEFAULT '0',
`min_click_pos` smallint(6) NOT NULL DEFAULT '0',
`max_click_pos` smallint(6) NOT NULL DEFAULT '0',
`respawn_time` int(11) NOT NULL DEFAULT '0',
`flags` set('faces_left') NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `lifetype` (`mapid`,`life_type`)
) ENGINE=InnoDB AUTO_INCREMENT=32122 DEFAULT CHARSET=latin1;
CREATE TABLE `mcdb83`.`scripts` (
`script_type` enum('npc','reactor','quest','item','map_enter','map_first_enter') NOT NULL,
`helper` tinyint(3) NOT NULL DEFAULT '-1' COMMENT 'Represents the quest state for quests, and the index of the script for NPCs (NPCs may have multiple scripts).',
`objectid` int(11) NOT NULL DEFAULT '0',
`script` varchar(30) NOT NULL DEFAULT '',
PRIMARY KEY (`script_type`,`helper`,`objectid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Lists all the scripts that belong to NPCs/reactors/etc. ';
CREATE TABLE `mcdb83`.`npc_data` (
`npcid` int(11) NOT NULL,
`storage_cost` int(11) NOT NULL DEFAULT '0',
`flags` set('maple_tv','is_guild_rank') NOT NULL DEFAULT '',
PRIMARY KEY (`npcid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For this query:
SELECT l.*, s.script, npc.storage_cost
FROM map_life l LEFT JOIN
scripts s
ON s.objectid = l.lifeid AND
s.script_type = 'npc' LEFT JOIN
npc_data npc
ON npc.npcid = l.lifeid;
You want indexes on: scripts(object_id, script_type, script) and npc_data(npcid, storage_cost). The order of the columns in these indexes is important.
map_life.lifeid does not have any indexes defined, therefore the joins will result in full table scans. Define an index on map_life.lifeid field.
In scripts table the primary key is defined on the following fields in that order: script_type, helper, objectid. The join is done on objectid and there is a constant filter criterion on script_type. Because the order of the fields in the index is wrong, MySQL cannot use the primary key for this query. For this query the order of the fields in the index should b: objectid, script_type, helper.
The above will significantly speed up the joins. You may further increase the speed of the query if your indexes actually cover all fields that are in the query because in this case MySQL does not even have to touch the tables.
Consider adding an index with the following fields and order to the scripts table: object_id, script_type, script and npcid, storage_cost index to npc_data table. However, these indexes may slow down insert / update / delete statements, so do some performance testing before adding these indexes to production environment.

Simple select query takes more time in very large table in MySQL database in C# application

I am using a MySQL database in my ASP.NET with C# web application. The MySQL Server version is 5.7 and there is 8 GB RAM in the PC. When I am executing the select query in MySQL database table, it takes more time in execution; a simple select query takes around 42 seconds. Across 1 crorerecord (10 million records) in the table. I have also done indexing for the table. How can I fix this?
The following is my table structure.
CREATE TABLE `smstable_read` (
`MessageID` int(11) NOT NULL AUTO_INCREMENT,
`ApplicationID` int(11) DEFAULT NULL,
`Api_userid` int(11) DEFAULT NULL,
`ReturnMessageID` varchar(255) DEFAULT NULL,
`Sequence_Id` int(11) DEFAULT NULL,
`messagetext` longtext,
`adtextid` int(11) DEFAULT NULL,
`mobileno` varchar(255) DEFAULT NULL,
`deliverystatus` int(11) DEFAULT NULL,
`SMSlength` int(11) DEFAULT NULL,
`DOC` varchar(255) DEFAULT NULL,
`DOM` varchar(255) DEFAULT NULL,
`BatchID` int(11) DEFAULT NULL,
`StudentID` int(11) DEFAULT NULL,
`SMSSentTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTimeTicks` decimal(28,0) DEFAULT '0',
`SMSSentTimeTicks` decimal(28,0) DEFAULT '0',
`Sent_SMS_Day` int(11) DEFAULT NULL,
`Sent_SMS_Month` int(11) DEFAULT NULL,
`Sent_SMS_Year` int(11) DEFAULT NULL,
`smssent` int(11) DEFAULT '1',
`Batch_Name` varchar(255) DEFAULT NULL,
`User_ID` varchar(255) DEFAULT NULL,
`Year_ID` int(11) DEFAULT NULL,
`Date_Time` varchar(255) DEFAULT NULL,
`IsGroup` double DEFAULT NULL,
`Date_Time_Ticks` decimal(28,0) DEFAULT NULL,
`IsNotificationSent` int(11) DEFAULT NULL,
`Module_Id` double DEFAULT NULL,
`Doc_Batch` decimal(28,0) DEFAULT NULL,
`SMS_Category_ID` int(11) DEFAULT NULL,
`SID` int(11) DEFAULT NULL,
PRIMARY KEY (`MessageID`),
KEY `index2` (`ReturnMessageID`),
KEY `index3` (`mobileno`),
KEY `BatchID` (`BatchID`),
KEY `smssent` (`smssent`),
KEY `deliverystatus` (`deliverystatus`),
KEY `day` (`Sent_SMS_Day`),
KEY `month` (`Sent_SMS_Month`),
KEY `year` (`Sent_SMS_Year`),
KEY `index4` (`ApplicationID`,`SMSSentTimeTicks`),
KEY `smslength` (`SMSlength`),
KEY `studid` (`StudentID`),
KEY `batchid_studid` (`BatchID`,`StudentID`),
KEY `User_ID` (`User_ID`),
KEY `Year_Id` (`Year_ID`),
KEY `IsNotificationSent` (`IsNotificationSent`),
KEY `isgroup` (`IsGroup`),
KEY `SID` (`SID`),
KEY `SMS_Category_ID` (`SMS_Category_ID`),
KEY `SMSSentTimeTicks` (`SMSSentTimeTicks`)
) ENGINE=MyISAM AUTO_INCREMENT=16513292 DEFAULT CHARSET=utf8;
The following is my select query:
SELECT messagetext, SMSSentTime, StudentID, batchid,
User_ID,MessageID,Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch
FROM smstable_read
WHERE StudentID=977 AND SID = 8582 AND MessageID>16013282
You need to learn about compound indexes and covering indexes. Read about those things.
Your query is slow because it's doing a half-scan of the table. It uses the primary key to find the first row with a qualifying MessageID, then looks at every row of the table to find matching rows.
Your filter criteria are StudentID = constant, SID = constant AND MessageID > constant. That means you need those three columns, in that order, in an index. The first two filter criteria will random-access your index to the correct place. The third criterion will scan the index starting right after the constant value in your query. It's called an Index Range Scan operation, and it's quite efficient.
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId);
This compound index should make your query efficient. Notice that in MyISAM, the primary key column of a table should appear in compound indexes. That's cool in this case because it's also part of your query criteria.
If this query is used very frequently, you could make a covering index: you could add the other columns of the query (the ones mentioned in your SELECT clause) to the index.
But, unfortunately you have defined your messageText column with a longtext data type. That allows for each message to contain up to four gigabytes. (Why? Is this really SMS data? There's a limit of 160 bytes per message in SMS. Four gigabytes >> 160 bytes.)
Now the point of a covering index is to allow the query to be satisfied entirely from the index, without referring back to the table. But when you include a longtext or any other LOB column in an index, it only contains a subset of the data. So the point of the covering index is lost.
If I were you I would change my table so messageText was a VARCHAR(255) data type, and then create this covering index:
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId,
SMSSentTime, batchid,
User_ID, Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch,
messageText);
(Notice that you should put variable-length items last in the index if you can.)
If you can't change your application to handle VARCHAR(255) then go with the first index I mentioned.
Pro tip: putting lots of single-column indexes on MySQL tables rarely helps SELECT performance and always harms INSERT and UPDATE performance. You need an index on your primary key, and you need indexes to support the queries you run. Extra indexes are harmful.
It looks like your database is not properly indexed and even not properly normalized. Normalizing your database will go a long way to speed up all your queries. Particularly in view of the fact that mysql used only one index per table in a query. Even though you have lot's of indexes, they cannot be used.
Your current query filters on StudentID,SID, and MessageID. The last is an inequality comparision so an index will not be very effective with that but the other two columns are equality comparisons. I suggest an index like this:
KEY `studid` (`StudentID`,`SID`)
Follow that up by dropping your existing index on SID. If you find that you don't want to drop it because it's used in another query, further evidence that your table is in desperate need of normalization.
Too many indexes slow down inserts and adds a little overhead to each SELECT because the query planner needs more effort to figure out which index to use.

How to optimized mysql query having large dataset

I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.