Trying to optimize MySQL query with LEFT OUTER JOIN - mysql

I've this query, which works fine except it takes a long while (7 seconds, with 40k records in the jobs table, and 700k in the wq table).
I tried an EXPLAIN and it says its looking at all the records in the job table, and not using any of the indexes.
I don't know how to tell MySQL that it should use the jobs.status field to filter the the records before looking up the wq table.
The objective of this, is to get all the records from jobs that have a status != 331, and also any other job which has a wq status of (101, 111, 151).
Query:
SELECT jobs.*
FROM jobs
LEFT OUTER JOIN wq ON (wq.job = jobs.id AND jobs.status IN (341, 331) AND wq.status IN (101, 111, 151))
WHERE ((wq.info is not NULL) or (jobs.status != 331 and ack = 0))
EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE jobs ALL ack,status,status_ack NULL NULL NULL 38111 Using filesort
1 SIMPLE wq ref PRIMARY,job,status PRIMARY 4 cts.jobs.id 20 Using where
Table definitions:
CREATE TABLE jobs ( id int(10) NOT NULL AUTO_INCREMENT,
comment varchar(100) NOT NULL DEFAULT '',
profile varchar(60) NOT NULL DEFAULT '',
start_at int(10) NOT NULL DEFAULT '0',
data text NOT NULL,
status int(10) NOT NULL DEFAULT '0',
info varchar(200) NOT NULL DEFAULT '',
finish int(10) NOT NULL DEFAULT '0',
priority int(5) NOT NULL DEFAULT '0',
ack tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
KEY start_at (start_at),
KEY status (status),
KEY status_ack (status,
ack) ) ENGINE=MyISAM AUTO_INCREMENT=2037530 DEFAULT CHARSET=latin1;
CREATE TABLE wq ( job int(10) NOT NULL DEFAULT '0',
process varchar(60) NOT NULL DEFAULT '',
step varchar(60) NOT NULL DEFAULT '',
status int(10) NOT NULL DEFAULT '0',
run_at int(10) NOT NULL DEFAULT '0',
original_run_at int(10) NOT NULL DEFAULT '0',
info varchar(200) NOT NULL DEFAULT '',
pos int(10) NOT NULL DEFAULT '0',
changed_at int(10) NOT NULL DEFAULT '0',
file varchar(60) NOT NULL DEFAULT '',
PRIMARY KEY (job,
process,
step,
file),
KEY job (job),
KEY status (status) ) ENGINE=MyISAM DEFAULT CHARSET=latin1

Unfortunately mysql (and perhaps any dbms) cannot optimize expressions like jobs.status != 331 and ack = 0 because B-Tree is not a structure that allows to find fast anything that is-not-equal-to-a-constant-value. Thus you'll always get a fullscan.
If there were some better condition like jobs.status = 331 and ack = 0 (note on the fact that i've changed != to =) then it would be an advice to speed up this query:
split the query into 2, joined by UNION ALL
replace in one query LEFT JOIN to INNER JOIN (in the one that implies that wq.info is not NULL)

Related

Why update mysql query run slow

I have a SQL Tabe "STG_S_CUST" which contains a lot of rows (up to 1.5 million) and another table "S_CUST" which contains a lot of rows.
when I'm executing the following Update query, it's very slow, it takes too much time.
UPDATE STG_S_CUST AS STG
INNER JOIN S_CUST AS ST ON STG.SRC_NM=ST.SRC_NM
AND STG.SRC_KEY = ST.SRC_KEY
SET UPDATE_IND = 1,
STG.S_ID = ST.S_ID,
STG.M_ID = ST.M_ID
WHERE STG.PROCESSED_IND = 0
The problem is, that I get a Timeout-Exception unable to execute SQL.
EXPLAIN UPDATE STG_S_CUST AS STG
INNER JOIN S_CUST AS ST ON STG.SRC_NM=ST.SRC_NM
AND STG.SRC_KEY = ST.SRC_KEY
SET UPDATE_IND = 1,
STG.S_ID = ST.S_ID,
STG.M_ID = ST.M_ID
WHERE STG.PROCESSED_IND = 0
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ST ALL NULL NULL NULL NULL 10479 NULL
1 SIMPLE STG ALL NULL NULL NULL NULL 159334 Using where; Using join buffer (Block Nested Loop)
here's an abbreviated version of the create tables
STG_S_CUST :
CREATE TABLE `STG_S_CUST` (
`STG_ID` int(14) NOT NULL AUTO_INCREMENT,
`STG_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`SRC_KEY` varchar(100) DEFAULT NULL,
`SRC_NM` varchar(20) DEFAULT NULL,
`M_ID` int(14) DEFAULT NULL,
`S_ID` int(14) DEFAULT NULL,
`PROCESSED_IND` int(1) NOT NULL DEFAULT '0',
`THREAD_ID` int(3) DEFAULT NULL,
`UPDATE_IND` int(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`STG_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=171998 DEFAULT CHARSET=latin1
S_CUST :
CREATE TABLE `S_CUST` (
`S_ID` int(14) NOT NULL AUTO_INCREMENT,
`SRC_KEY` varchar(100) DEFAULT NULL,
`SRC_NM` varchar(20) DEFAULT NULL,
`M_ID` int(14) DEFAULT NULL,
PRIMARY KEY (`S_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=10803 DEFAULT CHARSET=latin1
Does anyone have any ideas why this would be so slow and how to speed it up ?
Could anyone help me here for optimization?
You need some indexing for making the select part faster while doing the join update, start with adding the following indexes
alter table STG_S_CUST add index PROCESSED_IND_idx(PROCESSED_IND);
alter table STG_S_CUST add index SRC_idx(SRC_NM,SRC_KEY);
alter table S_CUST add index SRC_NM_idx(SRC_NM)
Take a backup of the tables first before applying the indexes

Mysql optimizing query

I have some existing Mysql query and just wondering how to IMPROVE it. Because it's take sometimes up to 20s to execute.
Well in fact it's take up to 0.3690s to find right records but then when need to get 40k record is take up to 20s .
So my question is how can I improve my settings or my sql code to get records faster? Or it's depend now only on my machine (such SAS hard drive) ?
First some necessary info:
my application use MySQL server 5.6 and InnoDB Engine
my custom settings:
innodb_buffer_pool_size = 7G
innodb_log_buffer_size = 64M
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 0
innodb_write_io_threads = 32
join_buffer_size = 32M
tmp_table_size = 128M
max_heap_table_size = 128M
sort_buffer_size = 128M
table_open_cache = 4000
bulk_insert_buffer_size = 256M
Table definitions:
CREATE TABLE `tblusers` (
`user_id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(100) NOT NULL,
`password` varchar(50) NOT NULL,
`user_name` varchar(45) NOT NULL,
`phone` varchar(12) NOT NULL,
`machine_id` int(11) NOT NULL DEFAULT '1',
`lang_id` int(11) NOT NULL DEFAULT '14',
`user_type` tinyint(4) NOT NULL DEFAULT '1' ,
`created_on` datetime NOT NULL,
`active_open` tinyint(4) NOT NULL DEFAULT '0' ,
`email_hash` varchar(50) NOT NULL DEFAULT '1',
`profile_approved` tinyint(4) NOT NULL DEFAULT '0',
`menage_data` tinyint(4) NOT NULL DEFAULT '0' ,
`mailing_agree` tinyint(4) NOT NULL DEFAULT '0' ,
`edited` tinyint(4) NOT NULL DEFAULT '0',
`deleted` tinyint(4) NOT NULL DEFAULT '0',
`warnings` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`user_id`),
UNIQUE KEY `email_UNIQUE` (`email`),
UNIQUE KEY `user_name_UNIQUEE` (`user_name`),
KEY `fk_tblUsers_hlpLangs1_idx` (`lang_id`),
KEY `email_hash` (`email_hash`),
KEY `trio` (`user_type`,`profile_approved`,`deleted`,`email_hash`),
CONSTRAINT `tblusers_ibfk_1` FOREIGN KEY (`lang_id`) REFERENCES `hlplangs` (`lang_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `tblhostess` (
`user_id` int(11) NOT NULL,
`first_name` varchar(45) NOT NULL,
`sure_name` varchar(45) DEFAULT NULL,
`gender` tinyint(4) NOT NULL DEFAULT '0' ,
`dob` datetime NOT NULL,
`driver_license` tinyint(4) NOT NULL DEFAULT '0',
`sanepid` tinyint(4) NOT NULL DEFAULT '0',
`city_id` int(11) NOT NULL,
`province_id` int(11) NOT NULL,
`picture_id` int(11) NOT NULL DEFAULT '1',
`hair_id` int(11) NOT NULL DEFAULT '1',
`hair_color_id` int(11) NOT NULL DEFAULT '1',
`number_of_view` int(11) NOT NULL DEFAULT '0',
`who_can_see` tinyint(4) NOT NULL DEFAULT '0' ,
`complete_register` tinyint(4) NOT NULL DEFAULT '0',
`skin_id` int(11) NOT NULL DEFAULT '1',
`bra_id` int(11) NOT NULL DEFAULT '1',
`wear_id` int(11) NOT NULL DEFAULT '1',
`shoe_id` int(11) NOT NULL DEFAULT '1',
`desc` text,
`height` int(11) NOT NULL DEFAULT '0',
`weight` int(11) NOT NULL DEFAULT '0',
`bust` int(11) NOT NULL DEFAULT '0',
`waist` int(11) NOT NULL DEFAULT '0',
`hip` int(11) NOT NULL DEFAULT '0',
`redirect_url` varchar(255) DEFAULT NULL,
`friend_url` varchar(90) DEFAULT NULL,
`premium` tinyint(4) NOT NULL DEFAULT '0',
`premium_until` datetime DEFAULT NULL,
`work_as_model` tinyint(4) NOT NULL DEFAULT '0',
`work_as_hostess` tinyint(4) NOT NULL DEFAULT '1',
`work_as_fotomodel` tinyint(4) NOT NULL DEFAULT '0',
`work_in_club` tinyint(4) NOT NULL DEFAULT '0',
`work_in_party` tinyint(4) NOT NULL DEFAULT '0',
`work_in_promo` tinyint(4) NOT NULL DEFAULT '0',
`work_in_trade` tinyint(4) NOT NULL DEFAULT '0',
`work_in_event` tinyint(4) NOT NULL DEFAULT '0',
`work_in_gala` tinyint(4) NOT NULL DEFAULT '0',
`phone_ver` tinyint(4) NOT NULL DEFAULT '0',
`cert` tinyint(4) NOT NULL DEFAULT '0',
`fb_premium` tinyint(4) NOT NULL DEFAULT '0' ,
PRIMARY KEY (`user_id`),
KEY `fk_tblHostess_tblCities1_idx` (`city_id`),
KEY `fk_tblHostess_hlpProvinces1_idx` (`province_id`),
KEY `fk_tblHostess_hlpHairColor1_idx` (`hair_color_id`),
KEY `fk_tblHostess_hlpHair1_idx` (`hair_id`),
KEY `fk_tblHostess_hlpShoes1_idx` (`shoe_id`),
KEY `fk_tblHostess_hlpBra1_idx` (`bra_id`),
KEY `fk_tblHostess_hlpWear1_idx` (`wear_id`),
KEY `fk_tblHostess_hlpSkinColor1_idx` (`skin_id`),
KEY `premium` (`premium`),
KEY `num_of_views` (`number_of_view`),
KEY `views_premium` (`number_of_view`,`premium`),
CONSTRAINT `tblhostess_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `tblusers` (`user_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `tblhostess_ibfk_2` FOREIGN KEY (`city_id`) REFERENCES `hlpcities` (`city_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `tblhostess_ibfk_3` FOREIGN KEY (`province_id`) REFERENCES `hlpprovinces` (`province_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `tblhostessmailings` (
`hostess_id` int(11) NOT NULL,
`new_job_offers` int(11) NOT NULL DEFAULT '2' ,
`comments` int(11) NOT NULL DEFAULT '1' ,
`job_offer_accept` int(11) NOT NULL DEFAULT '1',
`private_message` int(11) NOT NULL DEFAULT '1' ,
`job_offer_sms` int(11) NOT NULL DEFAULT '0' ,
`job_offer_private` int(11) NOT NULL DEFAULT '0' ,
PRIMARY KEY (`hostess_id`),
CONSTRAINT `tblhostessmailings_ibfk_1` FOREIGN KEY (`hostess_id`) REFERENCES `tblhostess` (`user_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `tbljoboffers` (
`offer_id` int(11) NOT NULL AUTO_INCREMENT,
`employer_id` int(11) NOT NULL,
`subject` varchar(250) NOT NULL,
`content` text NOT NULL,
`content_html` text,
`date_added` datetime NOT NULL,
`active` tinyint(3) unsigned NOT NULL DEFAULT '1' ,
`approved` tinyint(3) unsigned NOT NULL DEFAULT '0',
`edited` tinyint(3) unsigned NOT NULL DEFAULT '0' ,
`email` varchar(100) DEFAULT NULL,
`freqence_id` int(11) NOT NULL DEFAULT '1' ,
`premium` tinyint(3) unsigned NOT NULL DEFAULT '0' ,
`start_date` datetime DEFAULT NULL ,
`end_date` datetime DEFAULT NULL ,
`premium_old_user` tinyint(3) unsigned NOT NULL DEFAULT '0',
`sending` tinyint(3) unsigned NOT NULL DEFAULT '0' ,
`external_sent` tinyint(3) unsigned NOT NULL DEFAULT '0' ,
`internal_sent` tinyint(3) unsigned NOT NULL DEFAULT '0',
`archiwal` tinyint(3) unsigned NOT NULL DEFAULT '0',
`friend_url` varchar(250) DEFAULT NULL,
`deleted` tinyint(3) unsigned NOT NULL DEFAULT '0',
`to_export` tinyint(3) unsigned NOT NULL DEFAULT '0',
`exported` tinyint(3) unsigned NOT NULL DEFAULT '0',
`sms_sent` tinyint(3) unsigned NOT NULL DEFAULT '0',
`sms_sending` tinyint(3) unsigned NOT NULL DEFAULT '0' ,
`private` tinyint(4) NOT NULL DEFAULT '0' ,
`private_paid` tinyint(4) NOT NULL DEFAULT '0' ,
PRIMARY KEY (`offer_id`,`freqence_id`),
KEY `fk_tblJoboffers_tblEmployers1_idx` (`employer_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `tbljobofferslocations` (
`location_id` int(11) NOT NULL AUTO_INCREMENT,
`offer_id` int(11) NOT NULL,
`city_id` int(11) NOT NULL,
`province_id` int(11) NOT NULL,
`ref_code` varchar(100) DEFAULT NULL,
`display_times` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`location_id`),
UNIQUE KEY `offer_id` (`offer_id`,`city_id`,`province_id`),
KEY `fk_tblJobOffersLocations_hlpProvinces1_idx` (`province_id`),
KEY `fk_tblJobOffersLocations_tblCities1_idx` (`city_id`),
KEY `fk_tblJobOffersLocations_tblJobOffers1_idx` (`offer_id`),
CONSTRAINT `tbljobofferslocations_ibfk_1` FOREIGN KEY (`offer_id`) REFERENCES `tbljoboffers` (`offer_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `tbljobofferslocations_ibfk_2` FOREIGN KEY (`city_id`) REFERENCES `hlpcities` (`city_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `tbljobofferslocations_ibfk_3` FOREIGN KEY (`province_id`) REFERENCES `hlpprovinces` (`province_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
records count approx in db
users 100k
job offers around 20k (all)
and locations around 40k
And finally my sql example:
select email, first_name, sure_name
from tblusers u join tblhostess h on h.user_id = u.user_id
join tblhostessmailings m on m.hostess_id = h.user_id
join
(select province_id from tbljoboffers o join tbljobofferslocations l on l.offer_id = o.offer_id
where o.deleted = 0 and o.start_date > date_add(current_timestamp, INTERVAL -1 DAY) and o.approved = 1 and o.active = 1 and o.internal_sent = 1 and o.private = 0
group by l.province_id) z on z.province_id = h.province_id
where u.deleted = 0 and u.email_hash = '1' and email != '' and user_type = 1 and (m.new_job_offers = 2)
What I wonna to get from here is a list of all users within 1 day who gets at least one offer on their inside mail box
All jobs are divided into province so this
(select province_id from tbljoboffers o join tbljobofferslocations l on l.offer_id = o.offer_id
where o.deleted = 0 and o.start_date > date_add(current_timestamp, INTERVAL -1 DAY) and o.approved = 1 and o.active = 1 and o.internal_sent = 1 and o.private = 0
group by l.province_id)
gets the full list of region and rest get information about users
Sometimes happens i will need retrive 40k records with this query so I really need to improve this.
Thanks in advance.
I would look at the subquery you are joining by itself.
select province_id from tbljoboffers o join tbljobofferslocations l on l.offer_id = o.offer_id
where o.deleted = 0 and o.start_date > date_add(current_timestamp, INTERVAL -1 DAY) and o.approved = 1 and o.active = 1 and o.internal_sent = 1 and o.private = 0
group by l.province_id
The date comparison in the WHERE clause is likely part of the slow down, how long does that subquery take to run on its own? A greater than comparison on a date_add function will likely be slow. But, without changing the query itself you might gain some performance by simply making that subquery into a view and then you can join the view back to the main query.
CREATE VIEW vprovince_ids AS
select province_id from tbljoboffers o join tbljobofferslocations l on l.offer_id = o.offer_id
where o.deleted = 0 and o.start_date > date_add(current_timestamp, INTERVAL -1 DAY) and o.approved = 1 and o.active = 1 and o.internal_sent = 1 and o.private = 0
group by l.province_id
Then your main query would become:
select email, first_name, sure_name
from tblusers u join tblhostess h on h.user_id = u.user_id
join tblhostessmailings m on m.hostess_id = h.user_id
join vprovince_ids z on z.province_id = h.province_id
where u.deleted = 0 and u.email_hash = '1' and email != '' and user_type = 1 and (m.new_job_offers = 2)
Hopefully that helps!
I've slightly restructured the query for readability and to follow the explanation of how I would think / apply indexes to the respective tables.
select
u.email,
u.first_name,
u.sure_name
from
tblusers u
join tblhostess h
on u.user_id = h.user_id
join tblhostessmailings m
on h.user_id = m.hostess_id
and m.new_job_offers = 2
join (
select DISTINCT
l.province_id
from
tbljoboffers o
join tbljobofferslocations l
on o.offer_id = l.offer_id
where
o.deleted = 0
and o.approved = 1
and o.active = 1
and o.internal_sent = 1
and o.private = 0
and o.start_date > date_add( current_timestamp, INTERVAL -1 DAY)
) z
on h.province_id = z.province_id
where
u.deleted = 0
and u.email_hash = '1'
and u.user_type = 1
and u.email != ''
First, your queries would be best to have all columns represent the proper table (or alias) they are coming from so others in the future, or offering help don't have to guess where columns are coming from which tables (yet you did provide the table structures).
To better handle your queries, having your table with a bunch of individual indexes is not the best way to support your queries. Instead, you need to have indexes that match the type of criteria you are searching on. In the case of the inner query of job offers, you are explicitly looking for 6 criteria, AND applying a group by. Having an index on the criteria components will significantly improve. Also, due to the other criteria, I have moved the date basis to the end, both visually, and as part of the index. Since you are not doing any aggregates by province, I removed the group by, and just did DISTINCT.
table create index on...
tblJobOffers ( deleted, approved, active, internal_sent, private, start_date )
Think of the indexes like this. If you had individual indexes on things like deleted or approved or active... Only one of them would be used as basis to help optimize the query. But since multiple fields are used in your query, it would be faster to use those indexes that matched multiple criteria you are running with.
Try to think of it like this... Each index is a box with the content ordered first by the first field, then sub-sorted by the field under it and so forth.
If you index on just "deleted", you have two boxes... one with all deleted stuff, one with NOT deleted. Similar if only an "active" index. But by having a composite index as I have, the engine can jump quickly to the records in question... Seeing your column names, they appear to be "flags" of either on = 1, or off = 0
Deleted = 0
Approved = 0
(ignored, you don't want approved = 0)
Approved = 1
Active = 0
(ignored, you don't want active = 0)
Active = 1
Internal_Sent = 0
(ignored, you don't want internal sent = 0)
Internal_Sent = 1
Private = 0
Start_Date
2014 Dec...
2015 Jan...
2015 Feb...
2015 Feb 4
2015 Feb 5
2015 Feb 6
Private = 1
(ignored, you don't want private = 1)
Deleted = 1
(ignored since you are not even looking for deleted = 1 records)
So, following the tree down, the index can jump directly to the 2015 Feb transactions, get those and ignore everything else. Use that to join to the province locations and it's done.
For your OUTER query, your join TO the "tblhostessmailings" is based on the hostess ID, but also you are only looking for specific flag of new job offers = 2. Have that as a compound index too. Hostess table is joined by the user ID, so that should have an index. And tblUsers has multiple criteria too, so that (in similar context as job offers above) should have a compound index
table index
tblhostessmailings ( hostess_id, new_job_offers )
tblhostess ( user_id )
tblusers ( deleted, user_type, email_hash, email )
See what these sugggestions and revised query do for your performance, and let us know the performance improvement as a result, good or bad.
Indexed columns on performing the search, not on selecting
Use unique indexes
Use short indices
Do not abuse indexes

MySQL Results pivot table - of a sort

I have a table as defined below:
CREATE TABLE `z_data` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`subscriberid` INT(11) NOT NULL DEFAULT '0',
`Email Address` VARCHAR(255) NOT NULL DEFAULT '',
`Title` VARCHAR(255) NOT NULL DEFAULT '',
`First Name` VARCHAR(255) NOT NULL DEFAULT '',
`Last Name` VARCHAR(255) NOT NULL DEFAULT '',
`Postal Code` VARCHAR(255) NOT NULL DEFAULT '',
`banned` INT(11) NOT NULL DEFAULT '0',
`bounced` INT(1) NOT NULL DEFAULT '0',
`unsub` INT(1) NOT NULL DEFAULT '0',
`duplicate` INT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_subscriberid` (`subscriberid`),
KEY `idx_banned` (`banned`),
KEY `idx_bounced` (`bounced`),
KEY `idx_unsub` (`unsub`),
KEY `idx_duplicate` (`duplicate`),
KEY `idx_email` (`Email Address`),
FULLTEXT KEY `idx_emailaddress` (`Email Address`)
) ENGINE=MYISAM AUTO_INCREMENT=20 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
This table is populated from a CSV file and a query updates the subscriberid column with a number if the emailaddress is allready present in the main database with a query like:
UPDATE `z_data` AS z
LEFT JOIN main_subscribers AS b ON (z.`Email Address`=b.emailaddress && b.listid=1 && b.unsubscribed!=0)
SET z.subscriberid = b.subscriberid
WHERE b.subscriberid IS NOT NULL;";
Now, my question. I need to select the subscriberids of all of the records in z_data in rows of ten fields, this can be comma seperated lists.
my result could be similar to:
1293572,1293573,1293574,1293575,1099590,1174275,1293576,1293577,1293578,1293579,
673070,813617,1293580,1293581,1293582,1131221,1293583,1182045,419085,1293584,
1050278,1293585,1064945,638483,737691,1293586,1293587,799800,1110596,1293588,
1293589,1293590,1293591,421394,1293592,1293593,1293594,1293595,1293596,851491,
1293597,1293598,1293599,628250,1293600,1293601,1293602,535366,1293603,256590,
1293604,1293605,736956,1293606,1209511,673075,1293607,1293608,1293609,754357,
The reason for this is so that I can store these values in a TEXT field for later use in a IN clause and yet have a reasonably human readable script.
I have managed to get the first row thus:
SELECT GROUP_CONCAT(a.subscriberid) AS 'IDs' FROM (
SELECT subscriberid FROM
`z_data`
WHERE subscriberid!=0
LIMIT 1,10
) AS a
but do not know how to 'walk' though the rest of the table when the number of rows with subscriberids is unknown.
Any suggestions would be gratefully received.

MySQL Indexes for extremely slow queries

The following query, regardless of environment, takes more than 30 seconds to compute.
SELECT COUNT( r.response_answer )
FROM response r
INNER JOIN (
SELECT G.question_id
FROM question G
INNER JOIN answer_group AG ON G.answer_group_id = AG.answer_group_id
WHERE AG.answer_group_stat = 'statistic'
) AS q ON r.question_id = q.question_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE os.survey_id =42
AND r.response_answer = 5
AND DATEDIFF( NOW( ) , r.added_dt ) <1000000
AND r.uuid IS NOT NULL
When I explain the query,
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1087
1 PRIMARY r ref question_id,org_survey_code,code_question,uuid,uor question_id 4 q.question_id 1545 Using where
1 PRIMARY os eq_ref org_survey_code,survey_id,org_survey_code_2 org_survey_code 12 survey_2.r.org_survey_code 1 Using where
2 DERIVED G ALL agid NULL NULL NULL 1680
2 DERIVED AG eq_ref PRIMARY PRIMARY 1 survey_2.G.answer_group_id 1 Using where
I have a very basic knowledge of indexing, but I have tried nearly every combination I can think of and cannot seem to improve the speed of this query. The responses table is right around 2 million rows, question is about 1500 rows, answer_group is about 50, and org_survey is about 8,000.
Here is the basic structure for each:
CREATE TABLE `response` (
`response_id` int(10) unsigned NOT NULL auto_increment,
`response_answer` text NOT NULL,
`question_id` int(10) unsigned NOT NULL default '0',
`org_survey_code` varchar(7) NOT NULL,
`uuid` varchar(40) default NULL,
`added_dt` datetime default NULL,
PRIMARY KEY (`response_id`),
KEY `question_id` (`question_id`),
KEY `org_survey_code` (`org_survey_code`),
KEY `code_question` (`org_survey_code`,`question_id`),
KEY `IDX_ADDED_DT` (`added_dt`),
KEY `uuid` (`uuid`),
KEY `response_answer` (`response_answer`(1)),
KEY `response_question` (`response_answer`(1),`question_id`),
) ENGINE=MyISAM AUTO_INCREMENT=2298109 DEFAULT CHARSET=latin1
CREATE TABLE `question` (
`question_id` int(10) unsigned NOT NULL auto_increment,
`question_text` varchar(250) NOT NULL default '',
`question_group` varchar(250) default NULL,
`question_position` tinyint(3) unsigned NOT NULL default '0',
`survey_id` tinyint(3) unsigned NOT NULL default '0',
`answer_group_id` mediumint(8) unsigned NOT NULL default '0',
`seq_id` int(11) NOT NULL default '0',
PRIMARY KEY (`question_id`),
KEY `question_group` (`question_group`(10)),
KEY `survey_id` (`survey_id`),
KEY `agid` (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1860 DEFAULT CHARSET=latin1
CREATE TABLE `org_survey` (
`org_survey_id` int(11) NOT NULL auto_increment,
`org_survey_code` varchar(10) NOT NULL default '',
`org_id` int(11) NOT NULL default '0',
`org_manager_id` int(11) NOT NULL default '0',
`org_url_id` int(11) default '0',
`division_id` int(11) default '0',
`sector_id` int(11) default NULL,
`survey_id` int(11) NOT NULL default '0',
`process_batch` tinyint(4) default '0',
`added_dt` datetime default NULL,
PRIMARY KEY (`org_survey_id`),
UNIQUE KEY `org_survey_code` (`org_survey_code`),
KEY `org_id` (`org_id`),
KEY `survey_id` (`survey_id`),
KEY `org_survey_code_2` (`org_survey_code`,`total_taken`),
KEY `org_manager_id` (`org_manager_id`),
KEY `sector_id` (`sector_id`)
) ENGINE=MyISAM AUTO_INCREMENT=9268 DEFAULT CHARSET=latin1
CREATE TABLE `answer_group` (
`answer_group_id` tinyint(3) unsigned NOT NULL auto_increment,
`answer_group_name` varchar(50) NOT NULL default '',
`answer_group_type` varchar(20) NOT NULL default '',
`answer_group_stat` varchar(20) NOT NULL default 'demographic',
PRIMARY KEY (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=53 DEFAULT CHARSET=latin1
I know there are small things I can probably do to improve the efficiency of the database, such as reducing the size of integers where it's unnecessary. However, those are fairly trivial considering the ridiculous time it takes just to produce a result here. How can I properly index these tables, based on what explain has shown me? It seems that I have tried a large variety of combinations to no avail. Also, is there anything else that anyone can see that will optimize the table and reduce the query? I need it to be computed in less than a second. Thanks in advance!
1.If you want the index of r.added_dt to be used, instead of:
DATEDIFF(NOW(), r.added_dt) < 1000000
use:
CURDATE() - INTERVAL 1000000 DAY < r.added_dt
Anyway, the above condition is checking if added_at is a million days old or not. Do you really store so old dates? If not, you can simply remove this condition.
If you want this condition, an index on added_at would help a lot. Your query as it is now, checks all rows for this condition, calling the DATEDIFF() function as many times as the rows of the response table.
2.Since r.response_answer cannot be NULL, instead of:
SELECT COUNT( r.response_answer )
use:
SELECT COUNT( * )
COUNT(*) is faster than COUNT(field).
3.Two of the three fields that you use for joining tables have different datatypes:
ON question . answer_group_id
= answer_group . answer_group_id
CREATE TABLE question (
...
answer_group_id mediumint(8) ..., <--- mediumint
CREATE TABLE answer_group (
answer_group_id` tinyint(3) ..., <--- tinyint
-------------------------------
ON org_survey . org_survey_code
= response . org_survey_code
CREATE TABLE response (
...
org_survey_code varchar(7) NOT NULL, <--- 7
CREATE TABLE org_survey (
...
org_survey_code varchar(10) NOT NULL default '', <--- 10
Datatype mediumint is not the same as tinyint and the same goes for varchar(7) and varchar(10). When they are used for join, MySQL has to lose time doing conversion from one type to another. Convert one of them so they have identical datatypes. This is not the main issue of the query but this change will also help all other queries that use these joins.
And after making this change do a 'Analyze Table ' for the table. It will help mysql making better execution plans.
You have a response_answer = 5 condition, where response_answer is text. It's not an error, but it's better to use response_answer = '5' (the conversion of 5 to '5' will be done by MySQL anyway, if you don't do that).
Real issue is that you don't have a compound index on the 3 fields that are used in the WHERE conditions. Try adding this one:
ALTER TABLE response
ADD INDEX ind_u1_ra1_aa
(uuid(1), response_answer(1), added_at) ;
(this may take a while as your table is not small)
Can you try the following query? I've removed the sub-query from your original one. This may let the optimiser produce a better execution plan.
SELECT COUNT(r.response_answer)
FROM response r
INNER JOIN question q ON r.question_id = q.question_id
INNER JOIN answer_group ag ON q.answer_group_id = ag.answer_group_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE
ag.answer_group_stat = 'statistic'
AND os.survey_id = 42
AND r.response_answer = 5
AND DATEDIFF(NOW(), r.added_dt) < 1000000
AND r.uuid IS NOT NULL

Need help optimizing MYSQL query with join

I'm doing a join between the "favorites" table (3 million rows) the "items" table (600k rows).
The query is taking anywhere from .3 seconds to 2 seconds, and I'm hoping I can optimize it some.
Favorites.faver_profile_id and Items.id are indexed.
Instead of using the faver_profile_id index I created a new index on (faver_profile_id,id), which eliminated the filesort needed when sorting by id. Unfortunately this index doesn't help at all and I'll probably remove it (yay, 3 more hours of downtime to drop the index..)
Any ideas on how I can optimize this query?
In case it helps:
Favorite.removed and Item.removed are "0" 98% of the time.
Favorite.collection_id is NULL about 80% of the time.
SELECT `Item`.`id`, `Item`.`source_image`, `Item`.`cached_image`, `Item`.`source_title`, `Item`.`source_url`, `Item`.`width`, `Item`.`height`, `Item`.`fave_count`, `Item`.`created`
FROM `favorites` AS `Favorite`
LEFT JOIN `items` AS `Item`
ON (`Item`.`removed` = 0 AND `Favorite`.`notice_id` = `Item`.`id`)
WHERE ((`faver_profile_id` = 1) AND (`collection_id` IS NULL) AND (`Favorite`.`removed` = 0) AND (`Item`.`removed` = '0'))
ORDER BY `Favorite`.`id` desc LIMIT 50;
+----+-------------+----------+--------+----------------------------------------------------- ----------+------------------+---------+-----------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------------------------------------------------------+------------------+---------+-----------------------------------------+------+-------------+
| 1 | SIMPLE | Favorite | ref | notice_id,faver_profile_id,collection_id_idx,idx_faver_idx_id | idx_faver_idx_id | 4 | const | 7910 | Using where |
| 1 | SIMPLE | Item | eq_ref | PRIMARY | PRIMARY | 4 | gragland_imgfavebeta.Favorite.notice_id | 1 | Using where |
+----+-------------+----------+--------+---------------------------------------------------------------+------------------+---------+-----------------------------------------+------+-------------+
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| favorites | CREATE TABLE `favorites` (
`id` int(11) NOT NULL auto_increment COMMENT 'unique identifier',
`faver_profile_id` int(11) NOT NULL default '0',
`collection_id` int(11) default NULL,
`collection_order` int(8) default NULL,
`created` datetime NOT NULL default '0000-00-00 00:00:00' COMMENT 'date this record was created',
`modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP COMMENT 'date this record was modified',
`notice_id` int(11) NOT NULL default '0',
`removed` tinyint(1) NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `notice_id` (`notice_id`),
KEY `faver_profile_id` (`faver_profile_id`),
KEY `collection_id_idx` (`collection_id`),
KEY `idx_faver_idx_id` (`faver_profile_id`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| items |CREATE TABLE `items` (
`id` int(11) NOT NULL auto_increment COMMENT 'unique identifier',
`submitter_id` int(11) NOT NULL default '0' COMMENT 'who made the update',
`source_image` varchar(255) default NULL COMMENT 'update content',
`cached_image` varchar(255) default NULL,
`source_title` varchar(255) NOT NULL default '',
`source_url` text NOT NULL,
`width` int(4) NOT NULL default '0',
`height` int(4) NOT NULL default '0',
`status` varchar(122) NOT NULL default '',
`popular` int(1) NOT NULL default '0',
`made_popular` timestamp NULL default NULL,
`fave_count` int(9) NOT NULL default '0',
`tags` text,
`user_art` tinyint(1) NOT NULL default '0',
`nudity` tinyint(1) NOT NULL default '0',
`created` datetime NOT NULL default '0000-00-00 00:00:00' COMMENT 'date this record was created',
`modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP COMMENT 'date this record was modified',
`removed` int(1) NOT NULL default '0',
`nofront` tinyint(1) NOT NULL default '0',
`test` varchar(10) NOT NULL default '',
`recs` text,
`recs_data` text,
PRIMARY KEY (`id`),
KEY `notice_profile_id_idx` (`submitter_id`),
KEY `content` (`source_image`),
KEY `idx_popular` (`popular`),
KEY `idx_madepopular` (`made_popular`),
KEY `idx_favecount_idx_id` (`fave_count`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
First of all, you order by favorites.id which is clustered primary key in favorites table. This wil not be necessary of you will join favorites to items instead of items to favorites.
Second, (Item.removed = '0') in WHERE is excess, because the same condition has already been used in JOIN.
Third, change the order of condition in join to:
`Favorite`.`notice_id` = `Item`.`id` AND `Item`.`removed` = 0
the optimizer will be able to use you primary key for index. You may even consider creating (id, removed) index on items table.
Next, create (faver_profile_id, removed) index in favorites (or better update faver_profile_id index) and change the order of conditions in WHERE to the following:
(`faver_profile_id` = 1)
AND (`Favorite`.`removed` = 0)
AND (`collection_id` IS NULL)
UPD: I am sorry, I missed that you already join favorites to items. Then the ORDER BY is not needed. You should result in something like the following:
SELECT
`Item`.`id`,
`Item`.`source_image`,
`Item`.`cached_image`,
`Item`.`source_title`,
`Item`.`source_url`,
`Item`.`width`,
`Item`.`height`,
`Item`.`fave_count`,
`Item`.`created`
FROM `favorites` AS `Favorite`
LEFT JOIN `items` AS `Item`
ON (`Favorite`.`notice_id` = `Item`.`id` AND `Item`.`removed` = 0)
WHERE `faver_profile_id` = 1
AND `Favorite`.`removed` = 0
AND `collection_id` IS NULL
LIMIT 50;
And one more thing, when you have KEY idx_faver_idx_id (faver_profile_id,id) you do not need KEY faver_profile_id (faver_profile_id), because the second index just duplicates half of the idx_faver_idx_id. I hope you will extend the second index, as I suggested.
Get a copy of your table from backup, and try to make an index on Favorite table covering all WHERE and JOIN conditions, namely (removed, collection_id, profile_id). Do the same with Item. It might help, but will make inserts potentially much slower.
The SQL engine won't use an index if it still has to do full table scan due to constraints, would it?