I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.
Related
I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop
I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.
So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).
It seems that the database is receiving the queries and make a queue of them.
I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.
TableA (13M records) schema:
CREATE TABLE `TableA` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(20) DEFAULT NULL,
`columnC` varchar(15) DEFAULT NULL,
`columnD` varchar(20) DEFAULT NULL,
`columnE` varchar(255) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
`columnH` varchar(10) DEFAULT NULL,
`columnI` bigint(11) DEFAULT NULL,
`columnJ` bigint(11) DEFAULT NULL,
`columnK` varchar(5) DEFAULT NULL,
`columnL` varchar(50) DEFAULT NULL,
`columnM` varchar(20) DEFAULT NULL,
`columnN` int(1) DEFAULT NULL,
`columnO` int(1) DEFAULT '0',
`columnP` datetime NOT NULL,
`columnQ` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnO` (`columnO`),
KEY `columnK` (`columnK`),
KEY `columnN` (`columnN`),
FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;
TableB (15M records) schema:
CREATE TABLE `TableB` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(50) DEFAULT NULL,
`columnC` varchar(50) DEFAULT NULL,
`columnD` int(1) DEFAULT NULL,
`columnE` datetime NOT NULL,
`columnF` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;
Query:
SELECT COUNT(*) AS total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "3764301"
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
)
AND columnM > 2;
1 execution return in 2s
10 executions return the first result in 20s and the another results after that time.
To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".
It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.
UPDATE
Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.
-- USING IN
SELECT COUNT(*) as total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableB.columnC NOT IN (
SELECT field
FROM tableX
WHERE fieldX = 15
)
)
AND columnM > 2;
-- USING EXISTS
SELECT COUNT(*) as total
FROM TableA
WHERE EXISTS (
SELECT *
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableA.columnB = TableB.columnC
AND NOT EXISTS (
SELECT *
FROM tableX
WHERE fieldX = 15
AND fieldY = TableB.columnC
)
)
AND columnM > 2;
-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)
Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time.
Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.
Recommendation 1
Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
-->
AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )
table needs an index on field.
IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.
Recommendation 2
To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.
Recommendation 3
Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.
Recommendation 4
It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.
What I can see here is that HUGE temporary table is being build for each query. Consider different architecture.
I read many posts on forum, but still I have confusion on creating index to speed up join queries in mysql, here is my doubt
I have two tables, one is category table which just contains few thousand lines, and contains all information about data, and another one is geo_data table which contains huge amount of data, I join geo_data table based on 2 keys s_key1 and s_key2. Following is structure of table
category table
CREATE TABLE `category` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`STD_DATE` datetime DEFAULT NULL,
`LATITUDE` float DEFAULT NULL,
`LONGITUDE` float DEFAULT NULL,
`COUNTRY_CD` varchar(15) DEFAULT NULL,
`INSTR_CODE` varchar(15) DEFAULT NULL,
`CANADACR_CD` varchar(15) DEFAULT NULL,
`PROBST_T` varchar(15) DEFAULT NULL,
`TYPE` varchar(15) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=32350 DEFAULT CHARSET=latin1;
geo_data table
CREATE TABLE `geo_data` (
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`MAGNETIC` float DEFAULT NULL,
`GRAVITY` float DEFAULT NULL,
`BATHY` float DEFAULT NULL,
`CORE` float DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I have many tables like geo_data table which contains s_key1, s_key2 and other columns, in my application I often use fields std_date,latitude,longitude,country_cd,type from category table
I do inner join, sometimes left join depending on the requirement, for example my query looks like below
SELECT
c.s_key1,
c.s_key2,
c.std_date,
c.latitude,
c.longitude,
g.magnetic,
g.bathy
FROM
category c, geo_data g
WHERE
c.s_key1 = g.s_key1 && c.s_key2 = g.s_key2;
and sometimes my where clause will have something like this too
WHERE
c.latitude between -30 to 30 AND
c.longitude between 10 to 140 AND
c.country_cd = 'INDIA' AND
c.type = 'NON_PROFIT';
So what's the right way of creating index to speed up my query, whether below one right ? please someone help
create index `myindex` on
`category` (s_key1,s_key2,std_date,latitude,longitude,country_cd)
create index `myindex` on
`geo_data` (s_key1,s_key2)
and One more doubt whether both tables (category,geo_data) should have index key to speed up performance or only geo_data table ?
From the where condition it makes sense to simplify the first index as:
create index `myindex` on
`category` (s_key1,s_key2)
The original however can improve the performance in terms that it doesn't have to access the full table row to get the other values. However it makes the index bigger and therefore slower. So it depends on whether this is optimization for only this query or there are more of them which use only the s_key1 and s_key2 (or with combination with other columns).
Regarding the clarification - for lat/lng check it will make sense to move std_date after lat/lng (or remove completely):
create index `myindex` on
`category` (s_key1,s_key2,latitude,longitude,std_date,country_cd)
I am facing some issue in query execution here is my case :
I have two tables log with 2 lakh records and logrecords with 6 lakh records
Where single log record in log table can have multiple log messages in logrecords table my database schema is as below
log Table
CREATE TABLE `log` (
`logid` varchar(50) NOT NULL DEFAULT '',
`creationtime` bigint(20) DEFAULT NULL,
`serviceInitiatorID` varchar(50) DEFAULT NULL,
PRIMARY KEY (`logid`),
KEY `idx_creationtime_wsc_log` (`creationtime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And logrecords Table
CREATE TABLE `logrecords` (
`logrecordid` varchar(50) NOT NULL DEFAULT '',
`timestamp` bigint(20) DEFAULT NULL,
`message` varchar(8000) DEFAULT NULL,
`loglevel` int(11) DEFAULT NULL,
`logid` varchar(50) DEFAULT NULL,
`indexcolumn` int(11) DEFAULT NULL,
PRIMARY KEY (`logrecordid`),
KEY `indx_logrecordid_message_logid` (`logrecordid`,`message`(767),`logid`),
KEY `logid` (`logid`),
KEY `indx_message` (`message`(767))
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Query created by hibernate is like
select this_.logid as logid4_1_, this_.loglevel as loglevel4_1_, this_.creationtime as creation3_4_1_,this_.serviceInitiatorID as service17_4_1_, this_.logtype as logtype4_1_,logrecord1_.logrecordid as logrecor1_3_0_, logrecord1_.timestamp as timestamp3_0_, logrecord1_.message as message3_0_, logrecord1_.loglevel as loglevel3_0_, logrecord1_.logid as logid3_0_, logrecord1_.indexcolumn as indexcol6_3_0_ from log this_ inner join wsc_logrecords logrecord1_ on this_.logid=logrecord1_.logid where (1=1) and (1=1) and logrecord1_.message like 'SecondMessage' order by this_.creationtime desc limit 25
Which taking around 7313ms to execute
Query Explain is like
But when I execute below query it is taking around 15 min to execute
select count(*) as y0_ from log this_ inner join logrecords logrecord1_ on this_.logid=logrecord1_.logid where (1=1) and (1=1) and lower(logrecord1_.message) like 'SecondMessage' order by this_.creationtime desc limit 25
For above query explain is like
and I am using MySQl database. I think there is some issue in indexing or some other which I am not able to identify
Any solution will be appreciated.
When you use lower(logrecord1_.message) like 'SecondMessage' instead of plain logrecord1_.message like 'SecondMessage' the DB engine will stop using the index on logrecord1_.message.
You can overcome this by creating a function based index that has lower(logrecord1_.message) in place of logrecord1_.message.
here goes my two MySQL Queries and can some guide me which is the best query to use as per MYSQl DATABase
the below goes my two sql queries
query 1)
select cast(sum(G1.amount)as decimal(8,2)) as YTDRegularPay,cast(sum(b1.amount)as decimal(8,2))as YTDBonusPay
from tbl_employees_swc_grosswagedetails g1,tbl_employees_swc_grosswagedetails b1
where g1.empid=b1.empid
and g1.PayYear=b1.PayYear
and g1.PayperiodNumber=b1.PayperiodNumber
and g1.Fedtaxid=b1.Fedtaxid
and g1.fedtaxid=998899889
and g1.payyear=2011
and g1.PayperiodNumber<=26
and g1.Wage_code='GRTT'
and g1.Taxing_AuthType=b1.Taxing_AuthType
and g1.empid=1005 and b1.wage_code='GRSP'
and g1.taxing_AuthType='FED' ;
and
Query 2)
select abc.Amount as YTDRegularPay,def.Amount as YTDBonusPay
from (select Cast(sum(EG.Amount) as Decimal(8,2)) as Amount
from tbl_employees_swc_grosswagedetails EG
where EG.FedTaxID=998899889
and EG.EmpID=1005
and PayYear=2011
and EG.PayPeriodNumber<=26
and EG.Wage_code='GRTT'
and Taxing_AuthType='FED') as abc,
(select Cast(sum(EG.Amount) as Decimal(8,2)) as Amount
from tbl_employees_swc_grosswagedetails EG
where EG.FedTaxID=998899889
and EG.EmpID=1005
and PayYear=2011
and EG.PayPeriodNumber<=26
and EG.Wage_code='GRSP'
and Taxing_AuthType='FED') as def ;
Here goes my Table structure
delimiter $$
CREATE TABLE `tbl_employees_swc_grosswagedetails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`empid` int(11) NOT NULL,
`Fedtaxid` varchar(9) NOT NULL,
`Wage_code` varchar(45) NOT NULL,
`Amount` double NOT NULL,
`Hrly_Rate` double DEFAULT NULL,
`Num_hours` double DEFAULT NULL,
`Taxing_AuthType` varchar(10) DEFAULT NULL,
`Taxing_Auth_Name` varchar(10) DEFAULT NULL,
`PayperiodNumber` int(11) NOT NULL,
`PayYear` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `empid` (`empid`),
CONSTRAINT `empid` FOREIGN KEY (`empid`) REFERENCES `tblemployee` (`EmpID`)
ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=359 DEFAULT CHARSET=latin1$$
any good query else these are very much appreciable
Thanks IN adv,
Raghavendra.V
I would say the first one is better, since using JOIN is almost always better than using a subquery. It is also recommended to write the JOIN explicitly (though it does not matter in terms of performance), like this:
SELECT
CAST(SUM(G1.amount) AS decimal(8,2)) AS YTDRegularPay,
CAST(SUM(b1.amount) AS decimal(8,2)) AS YTDBonusPay
FROM
tbl_employees_swc_grosswagedetails g1,
JOIN
tbl_employees_swc_grosswagedetails b1 ON g1.empid = b1.empid
AND g1.PayYear = b1.PayYear
AND g1.PayperiodNumber = b1.PayperiodNumber
AND g1.Taxing_AuthType = b1.Taxing_AuthType
AND g1.Fedtaxid = b1.Fedtaxid
WHERE
g1.fedtaxid = 998899889
AND g1.payyear = 2011
AND g1.PayperiodNumber <= 26
AND g1.Wage_code = 'GRTT'
AND b1.wage_code = 'GRSP'
AND g1.empid = 1005
AND g1.taxing_AuthType = 'FED';
Adding some indexes will probably help as well to make both queries quicker. Since you use many columns in your WHERE clause, you need to choose which ones to index according to the data structure. Try adding a bunch of indexes, run the query with EXPLAIN and see which index is used - this one would be the most effective one and than you can drop the others.