MySQL Left join take too long for huge data - mysql

I have two tables .Property tables and it related photo.One property may have many photo but I want only one any of it related photo, When I use left join MySQL query it become too slow.
Here is my query
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
CREATE TABLE IF NOT EXISTS `sp_property_images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`property_id` varchar(100) NOT NULL,
`filepath_name` text,
`label_name` varchar(45) DEFAULT NULL,
`primary` char(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `property_id` (`property_id`),
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=12941 ;
CREATE TABLE IF NOT EXISTS `sp_property` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`propertytype` varchar(50) NOT NULL,
`adv_type` varchar(45) NOT NULL,
`property_name` text,
`division` varchar(45) NOT NULL,
`township` varchar(45) NOT NULL,
`property_price` decimal(20,2) unsigned DEFAULT NULL,
`price_type` varchar(45) NOT NULL,
`availability` varchar(100) DEFAULT NULL,
`property_address` text,
`p_dimension_length` varchar(45) NOT NULL,
`p_dimension_width` varchar(45) NOT NULL,
`p_dimension_sqft` varchar(45) NOT NULL,
`p_dimension_acre` varchar(45) NOT NULL,
`floor` varchar(45) NOT NULL,
`phone` varchar(100) DEFAULT NULL,
`aircorn` varchar(45) NOT NULL,
`ownership` varchar(45) NOT NULL,
`bedroom` varchar(45) NOT NULL,
`bathroom` varchar(45) NOT NULL,
`special_feature` text,
`amentites` text,
`property_detail` text,
`submit_date` datetime DEFAULT NULL,
`published` varchar(45) NOT NULL,
`published_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`agent_id` varchar(45) NOT NULL,
`source` varchar(45) NOT NULL,
`contact_name` varchar(100) NOT NULL,
`contact_no` varchar(100) NOT NULL,
`contact_address` text NOT NULL,
`contact_email` varchar(100) NOT NULL,
`unit_type` varchar(100) DEFAULT NULL,
`map_lat` varchar(100) DEFAULT NULL,
`map_long` varchar(100) DEFAULT NULL,
`show_map` varchar(3) DEFAULT 'no',
`total_view` bigint(20) NOT NULL DEFAULT '0',
`feature_listing` varchar(10) NOT NULL DEFAULT 'no',
`new_homes_id` int(11) NOT NULL,
`publish_price` int(1) NOT NULL DEFAULT '0',
`usd` decimal(20,2) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=18524 ;

Have you added indices on your tables? You would need three indices on the following columns:
article_photo.a_id for grouping and joining
article_photo.p_id for sorting
article.a_id for joining (although this is hopefully already the PK of your table)
The result of joins is not guaranteed to be sorted in any order, so you probably want to move your ORDER BY clause from the subquery to the outer query:
SELECT * from `article`
LEFT JOIN (
SELECT * from `article_photo`
GROUP BY `a_id`) as images
ON article.a_id = images.a_id
ORDER BY images.p_id DESC
Also, you have no guarantee on which article_photo you will get, since select data without an aggregate function (and only MySQL will allow you to do that).
Now that the question contains the real tables and all information essential to answering, here's my take – first, here's your query:
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
Let's see. you are joining sp_property_images.property_id with sp_property.id. These columns have different types (int vs. varchar) and I suppose this results in a severe performance penalty (since the values have to be converted to the same type).
Then, you are filtering by sp_property.published, so I suggest adding an index on this column as well. Also, examine if you really need to have this column as varchar. A bool/bit flag probably suffices as well (if it doesn't, an enum might be a better choice still).
Ordering benefits from an index too. Add an index spanning both columns sp_property.feature_listing and sp_property.submit_date.
If all of the above still doesn't help, you might have to remove the sub-select. It might prevent the mysql engine from recognizing (and using!) the index you have defined on the sp_property_images.property_id column.

This simple query will give you what you asked for, all articles with their photos
SELECT ar.a_id, ar.a_title, ap.p_id, ap.photo_name
FROM article ar
JOIN article_photo ap on ar.a_id = ap.a_id
No reason for left join and grouping there or you wanna get sum on photos by article?

Related

Long running Mysql Query on Indexes and sort by clause

I have a very long running MySql query. The query simply joins two tables which are very huge
bizevents - Nearly 34 Million rows
bizevents_actions - Nearly 17 million rows
Here is the query:
select
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
bizevent0_.createdBy as createdB4_37_,
bizevent0_.createdOn as createdO5_37_,
bizevent0_.description as descript6_37_,
bizevent0_.iconCss as iconCss7_37_,
bizevent0_.modifiedBy as modified8_37_,
bizevent0_.modifiedOn as modified9_37_,
bizevent0_.name as name10_37_,
bizevent0_.version as version11_37_,
bizevent0_.fired as fired12_37_,
bizevent0_.preCreateFired as preCrea13_37_,
bizevent0_.entityRefClazz as entityR14_37_,
bizevent0_.entityRefIdAsStr as entityR15_37_,
bizevent0_.entityRefIdType as entityR16_37_,
bizevent0_.entityRefName as entityR17_37_,
bizevent0_.entityRefType as entityR18_37_,
bizevent0_.entityRefVersion as entityR19_37_
from
BizEvent bizevent0_
left outer join BizEvent_actions actions1_ on
bizevent0_.id = actions1_.BizEvent_id
where
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
and (actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
order by
bizevent0_.createdOn;
Below are the table definitions -- As you see i have defined the indexes well enough on these two tables on all the search columns plus the sort column. But still my queries are running for very very long time. Appreciate any more ideas either with respective indexing.
-- bizevent definition
CREATE TABLE `bizevent` (
`id` bigint(20) NOT NULL,
`json` longtext,
`account_id` int(11) DEFAULT NULL,
`createdBy` varchar(50) NOT NULL,
`createdon` datetime(3) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`iconCss` varchar(50) DEFAULT NULL,
`modifiedBy` varchar(50) NOT NULL,
`modifiedon` datetime(3) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`version` int(11) NOT NULL,
`fired` bit(1) NOT NULL,
`preCreateFired` bit(1) NOT NULL,
`entityRefClazz` varchar(255) DEFAULT NULL,
`entityRefIdAsStr` varchar(255) DEFAULT NULL,
`entityRefIdType` varchar(25) DEFAULT NULL,
`entityRefName` varchar(255) DEFAULT NULL,
`entityRefType` varchar(50) DEFAULT NULL,
`entityRefVersion` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDXk9kxuuprilygwfwddr67xt1pw` (`createdon`),
KEY `IDXsf3ufmeg5t9ok7qkypppuey7y` (`entityRefIdAsStr`),
KEY `IDX5bxv4g72wxmjqshb770lvjcto` (`entityRefClazz`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- bizevent_actions definition
CREATE TABLE `bizevent_actions` (
`BizEvent_id` bigint(20) NOT NULL,
`action` varchar(255) DEFAULT NULL,
`objectBizType` varchar(255) DEFAULT NULL,
`objectName` varchar(255) DEFAULT NULL,
`objectRefClazz` varchar(255) DEFAULT NULL,
`objectRefIdAsStr` varchar(255) DEFAULT NULL,
`objectRefIdType` int(11) DEFAULT NULL,
`objectRefVersion` int(11) DEFAULT NULL,
`targetBizType` varchar(255) DEFAULT NULL,
`targetName` varchar(255) DEFAULT NULL,
`targetRefClazz` varchar(255) DEFAULT NULL,
`targetRefIdAsStr` varchar(255) DEFAULT NULL,
`targetRefIdType` int(11) DEFAULT NULL,
`targetRefVersion` int(11) DEFAULT NULL,
`embedJson` longtext,
`actions_ORDER` int(11) NOT NULL,
PRIMARY KEY (`BizEvent_id`,`actions_ORDER`),
KEY `IDXa21hhagjogn3lar1bn5obl48gll` (`action`),
KEY `IDX7agsatk8u8qvtj37vhotja0ce77` (`targetRefClazz`),
KEY `IDXa7tktl678kqu3tk8mmkt1mo8lbo` (`targetRefIdAsStr`),
KEY `IDXa22eevu7m820jeb2uekkt42pqeu` (`objectRefClazz`),
KEY `IDXa33ba772tpkl9ig8ptkfhk18ig6` (`objectRefIdAsStr`),
CONSTRAINT `FKr9qjs61id11n48tdn1cdp3wot` FOREIGN KEY (`BizEvent_id`) REFERENCES `bizevent` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;>
By the way we are using Amazon RDS 5.7.33 MySql version. 16 GB RAM and 4 vCPU.
I also did a Explain Extended on the query and below is what it shows. Appreciate any help.
Initially the search of the bizevent_actions didn;t have the indexes defined. I have defined the indexes for them and tried the query but of no use.
One technique that worked for me in a similar situation was abandoning the idea of JOIN completely and switching to queries by PK. More detailed: find out which table in join would give less rows on average if you use only that table and related filter to query; get the primary keys from that table and then query the other one using WHERE pk IN ().
In your case one example would be:
SELECT
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
...
FROM BizEvent bizevent0_
WHERE
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
AND bizevent0_.id IN (
SELECT BizEvent_id
FROM BizEvent_actions actions1_
WHERE
actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
ORDER BY
bizevent0_.createdOn;
This assumes that you're not actually willing to select 33+ Mio rows from BizEvent though - your code with LEFT OUTER JOIN would have done exactly this.

Huge speed difference in two similar queries (MySQL ORDER clause)

Important updade (explanation):
I realized that my query having single DESC order is 10 times slower that the same query with ASC order. The ordered field has an index. Is it normal behavior?
Original question with queries:
I have a mysql table with a few hundred of product items. It's suprising (for me) how 2 similar sql queries differs in terms of performance. I don't know why. Can you please give me a hint or explain why the difference is so huge?
This query takes 3ms:
SELECT
*
FROM
`product_items`
WHERE
(product_items.shop_active = 1)
AND (product_items.active = 1)
AND (product_items.active_category_id is not null)
AND (has_picture is not null)
AND (price_orig is not null)
AND (category_min_discount IS NOT NULL)
AND (product_items.slug is not null)
AND `product_items`.`active_category_id` IN (6797, 5926, 5806, 6852)
ORDER BY
price asc
LIMIT 1
But the following query takes already 169ms... Only difference is that the order clause contains 2 columns. "Price" value has each product, while "price top" has roughly only 1% of products.
SELECT
*
FROM
`product_items`
WHERE
(product_items.shop_active = 1)
AND (product_items.active = 1)
AND (product_items.active_category_id is not null)
AND (has_picture is not null)
AND (price_orig is not null)
AND (category_min_discount IS NOT NULL)
AND (product_items.slug is not null)
AND `product_items`.`active_category_id` IN (6797, 5926, 5806, 6852)
ORDER BY
price asc,
price_top desc
LIMIT 1
The table structure looks like this:
CREATE TABLE `product_items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`shop_id` int(11) DEFAULT NULL,
`item_id` varchar(255) DEFAULT NULL,
`productname` varchar(255) DEFAULT NULL,
`description` text,
`url` text,
`url_hash` varchar(255) DEFAULT NULL,
`img_url` text,
`price` decimal(10,2) DEFAULT NULL,
`price_orig` decimal(10,2) DEFAULT NULL,
`discount` decimal(10,2) DEFAULT NULL,
`discount_percent` decimal(10,2) DEFAULT NULL,
`manufacturer` varchar(255) DEFAULT NULL,
`delivery_date` varchar(255) DEFAULT NULL,
`categorytext` text,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`active_category_id` int(11) DEFAULT NULL,
`shop_active` int(11) DEFAULT NULL,
`active` int(11) DEFAULT '0',
`price_top` decimal(10,2) NOT NULL DEFAULT '0.00',
`attention_priority` int(11) DEFAULT NULL,
`attention_priority_over` int(11) DEFAULT NULL,
`has_picture` varchar(255) DEFAULT NULL,
`size` varchar(255) DEFAULT NULL,
`category_min_discount` int(11) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_product_items_on_url_hash` (`url_hash`),
KEY `index_product_items_on_shop_id` (`shop_id`),
KEY `index_product_items_on_active_category_id` (`active_category_id`),
KEY `index_product_items_on_productname` (`productname`),
KEY `index_product_items_on_price` (`price`),
KEY `index_product_items_on_discount_percent` (`discount_percent`),
KEY `index_product_items_on_price_top` (`price_top`)
) ENGINE=InnoDB AUTO_INCREMENT=1715708 DEFAULT CHARSET=utf8;
UPDATE
I realized that the difference is mainly in the type of ordering: if I use asc+asc for both columns the query takes around 6ms, if I use asc+desc or desc+asc, the query takes around 160ms..
Thank you.
If creating an index to help ORDER BY doesn't help, try creating an index that helps both WHERE and ORDER BY:
CREATE INDEX product_items_i1 ON product_items (
shop_active,
active,
active_category_id,
has_picture,
price_orig,
category_min_discount,
slug,
price,
price_top DESC
)
Obviously, this is a bit clunky, and you'll have to balance the performance gain for the query with the price of maintaining the index.

Creating a summary from two tables?

I have two tables and I'm trying to create a summary with the sum of amount due per person but don't have the creative ID involved.
Table 1:
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Name` varchar(255) NOT NULL,
`Lname` varchar(255) NOT NULL,
`phone` varchar(15) NOT NULL,
`address` varchar(255) DEFAULT NULL,
`city` varchar(255) NOT NULL,
`state` char(2) NOT NULL,
`zip` varchar(50) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
`business_name varchar(255) DEFAULT NULL,
Second table:
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`C_ID` INT(10) UNSIGNED NOT NULL,
`Amount_Due` DECIMAL(7 , 2 ) not null DEFAULT 0,
`created_date` DATETIME NOT NULL,
`closed_date` DATETIME default NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=LATIN1;
Here is what I'm trying to do:
I'm trying to make a summary with dates within 5/1/18 and 6/15/18.
Have the sum of due amount for each person
Have these aliases : Business name, Phone Number ,Invoiced Amounts
I'm trying to test my code but i'm getting errors:
SELECT Name,phone,SUM(Amount_Due) FROM test_customer,test_invoices
WHERE created_date BETWEEN '2018-05-01' AND "2018-06-15'
If I understand correctly, you need to use JOIN instead of ,(CROSS JOIN) and GROUP BY in non-aggregate function columns.
SELECT Name 'Customer Name',
phone 'Phone number',
SUM(i.Amount_Due) 'Amount due'
FROM test_customer c
INNER JOIN test_invoices i ON C.id = i.C_ID
GROUP BY Name,phone
sqlfiddle

Query large table with 50 million rows

trying to query a large table (senddb.order_histories) that has close to 50M rows and this is the MySQL query I am using:
FIRST APPROACH- inner join:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join (
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b
on a.id = b.order_line_item_id
and a.fulfillment_status = '2';
EXPLAIN output :
SECOND APPROACH- nested select:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
where a.fulfillment_status = '2'
and a.id in (
select b.order_line_item_id from(
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where
order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b);
I believe nested select is a bad approach on large data but i anyhow added it here because it worked on my sample set. Anyway both the queries eventually time out after 600 seconds with the message : Error Code: 2013. Lost connection to MySQL server during query.
I would like to know if there are any ways to alter the query to make it run faster. I have already tried reducing the columns in the inner select / inner join but that should not really be an issue IMO. I also looked up a solution that says "create a clustered index" but i wasn't really able to follow. Any help is appreciated.
TABLE order_histories :
order_histories CREATE TABLE `order_histories` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`order_status_description` varchar(255) DEFAULT NULL,
`datetime_stamp` datetime DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`fulfillment_location` int(8) DEFAULT NULL,
`order_status` int(8) DEFAULT NULL,
`user_id` int(8) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`order_line_item_id` int(11) DEFAULT NULL,
`pooled` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `order_histories_ecash_idx` (`order_number`),
KEY `order_line_item_id` (`order_line_item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=454738178 DEFAULT CHARSET=latin1
TABLE order_line_items :
order_line_items CREATE TABLE `order_line_items` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`sku_id` int(8) DEFAULT NULL,
`original_price` float DEFAULT NULL,
`dept_description` varchar(100) DEFAULT NULL,
`description` varchar(100) DEFAULT NULL,
`quantity_ordered` int(8) DEFAULT NULL,
`gift_indicator` char(1) DEFAULT NULL,
`gift_wrap_flag` char(1) DEFAULT NULL,
`shipping_record_flag` char(1) DEFAULT NULL,
`gift_comments` varchar(100) DEFAULT NULL,
`item_status` char(1) DEFAULT NULL,
`tax_amount` float DEFAULT NULL,
`tax_rate` float DEFAULT NULL,
`upc` varchar(20) DEFAULT NULL,
`final_price` float DEFAULT NULL,
`line_number` int(8) DEFAULT NULL,
`master_line_number` int(8) DEFAULT NULL,
`gift_wrap_flag_type` char(1) DEFAULT NULL,
`color_code` varchar(4) DEFAULT NULL,
`size_id` varchar(6) DEFAULT NULL,
`width_id` varchar(6) DEFAULT NULL,
`brand` varchar(15) DEFAULT NULL,
`vpn` varchar(30) DEFAULT NULL,
`dept_number` int(8) DEFAULT NULL,
`class_number` int(8) DEFAULT NULL,
`non_merch_item` char(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`chain_id` int(11) DEFAULT NULL,
`fulfillment_location` int(11) DEFAULT NULL,
`fulfillment_date` datetime DEFAULT NULL,
`fulfillment_status` int(11) DEFAULT NULL,
`fulfillment_sales_associate` int(11) DEFAULT NULL,
`gift_wrap_line_number` int(11) DEFAULT NULL,
`shipping_type` int(11) DEFAULT NULL,
`order_track_info_id` int(11) DEFAULT NULL,
`store_tlog_updated` varchar(1) DEFAULT NULL,
`shipping_tlx_code` int(11) DEFAULT NULL,
`store_closed` tinyint(1) DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`deal_based_index` int(11) DEFAULT NULL,
`tlog_calc_ret_price` float DEFAULT NULL,
`tlog_amount` float DEFAULT NULL,
`tlog_retail_price` float DEFAULT NULL,
`tlog_ext_amount` float DEFAULT NULL,
`tlog_flag_1` int(11) DEFAULT NULL,
`tlog_flag_2` int(11) DEFAULT NULL,
`tlog_flag_3` int(11) DEFAULT NULL,
`time_remaining` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_line_items_ecash_idx` (`order_number`),
KEY `order_line_item_fulfillment_location_idx` (`fulfillment_location`),
KEY `order_line_item_fulfillment_status_idx` (`fulfillment_status`),
KEY `upc_idx` (`upc`),
KEY `sku_id_idx` (`sku_id`),
KEY `order_line_items_idx001` (`order_number`,`id`,`fulfillment_status`),
KEY `order_track_info_id` (`order_track_info_id`),
KEY `shipping_type_idx` (`shipping_type`,`non_merch_item`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=11367052 DEFAULT CHARSET=latin1
This query can be simplified:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join senddb.order_histories b on a.id = b.order_line_item_id
where b.order_status in ('x','y','z')
and b.fulfillment_location = 'abcd'
and a.fulfillment_status = '2';
Since you're only selecting values from table a, you don't need to select specific values from table b and can instead just apply your conditions. Outside of this, you need to ensure that b.order_line_item_id has an index on it. You can find more about creating indexes here. I'm not an expert in MySQL but something similar to this should work if senddb.order_histories.order_line_item_id isn't already the primary key.
CREATE INDEX IX_order_histories_order_line_item_id ON order_histories (order_line_item_id);
You need to read up the optimization section of the MySQL docs. It contains a lot of information on how you can optimize your queries and data sets. The main idea here is to add indexes to the fields that are being used as the criteria in the WHERE clause of the SQL statements.
Basically, both of your alternatives are using a "sub-SELECT, not an INNER JOIN.
The syntax of a true JOIN is one of the following:
SELECT ...
FROM X INNER JOIN Y USING (field_list)
... or ...
SELECT ...
FROM X INNER JOIN Y ON (x.field1 = y.field2) ...
But in both cases the objects being joined are tables or views.
I'm going to presume ... admittedly, without checking ... that Nick Larsen's answer #1 adequately re-expresses your original query using JOINs.
(Notice how, in his answer, the shorthand identifiers A and B are introduced as referring to each of the two table-names mentioned in his query.)
Firstly, you need to decide if a 50 million resultset is what you are asking for. Big data tables are not there so that you can select all their rows. They are there so that you can ask them questions using sql queries. SQL is a query language, it's not a data loading language.
What's your purpose? If you want to copy the data you can do that by loading the data, for example, 1000 rows per query in a for loop. if you are loading the data for processing, you can do that in the same way.
If you want to derive statistical information, you can use outer join and return a low number of rows, using aggregate functions. But you shouldn't do that either, what you "should" do is to decide what you want from the table and preferably, run aggregate functions to store useful information in a different table. (mostly SELECT INTO queries) You should never need to join a table of 50 million records in the first place.
Telling you how to do something wrong using indexes wouldn't be the right thing here.

Data Structure causing impossible joins

Tables:
nodes
data_texts
data_profiles
data_locations
data_profiles
data_media
data_products
data_metas
categories
tags
categories_nodes
tags_nodes
This question is a generalized question and is on the back of another question
Explanation:
Each of the "data" tables has a node_id that refers back to the id of the nodes table (hasMany/belongsTo association).
A "Node" can be anything - a TV Show, a Movie, a Person, an Article...etc (all generated via a CMS, so the user can control what type of "Nodes" they want).
When pulling data, I want to be able to query against certain fields. For example if they do a search, I want to be able to pull nodes that have data_texts.title = '%george%' or order by the datetime field in data_locations.
The problem is, when I do a join on all seven data tables (or more), the query has to hit so many combined rows that it just times out (even with a nearly empty database.... total 200 rows across the entire database).
I realize I can determine IF I need a join depending on what I'm doing - but even with five or six joins (once the database gets to 10k+ records), it's going to be horribly slow, if it works at all. Per this question, the query I'm using just doing a join on these tables times out completely.
Each node can have multiple rows of each data type (for multi-language reasons among others).
I'm completely defeated - I'm at the point where I think I need to restructure the entire thing, but don't have the time for that. I've thought about combining all into one table, but aren't sure how....etc
nodes
CREATE TABLE `nodes` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`),
INDEX `nodeId` (`id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
data_texts:
CREATE TABLE `data_texts` (
`id` CHAR(36) NOT NULL,
`title` VARCHAR(250) NULL DEFAULT NULL,
`subtitle` VARCHAR(500) NULL DEFAULT NULL,
`content` LONGTEXT NULL,
`byline` VARCHAR(250) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`language_id`, `node_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
data_profiles
CREATE TABLE `data_profiles` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(80) NULL DEFAULT NULL,
`email_personal` VARCHAR(100) NULL DEFAULT NULL,
`email_business` VARCHAR(100) NULL DEFAULT NULL,
`email_other` VARCHAR(100) NULL DEFAULT NULL,
`title` VARCHAR(100) NULL DEFAULT NULL,
`description` LONGTEXT NULL,
`prefix` VARCHAR(40) NULL DEFAULT NULL,
`phone_home` VARCHAR(40) NULL DEFAULT NULL,
`phone_business` VARCHAR(40) NULL DEFAULT NULL,
`phone_mobile` VARCHAR(40) NULL DEFAULT NULL,
`phone_other` VARCHAR(40) NULL DEFAULT NULL,
`foreign_key` CHAR(36) NULL DEFAULT NULL,
`model` VARCHAR(40) NULL DEFAULT NULL,
`node_id` CHAR(36) NULL DEFAULT NULL,
`language_id` CHAR(36) NULL DEFAULT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
`user_id` CHAR(36) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `nodeId` (`node_id`),
INDEX `languageId_nodeId` (`node_id`, `language_id`),
INDEX `foreignKey_model` (`foreign_key`, `model`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
categories
CREATE TABLE `categories` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(100) NOT NULL,
`node_type_id` CHAR(36) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`slug` VARCHAR(100) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `nodeTypeId` (`node_type_id`),
INDEX `slug` (`slug`)
)
COMMENT='Used to categorize nodes'
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
categories_nodes
CREATE TABLE `categories_nodes` (
`id` CHAR(36) NOT NULL,
`category_id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `categoryId_nodeId` (`category_id`, `node_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
node_tags
CREATE TABLE `node_tags` (
`id` CHAR(36) NOT NULL,
`name` VARCHAR(40) NOT NULL,
`site_id` CHAR(36) NOT NULL,
`created` DATETIME NOT NULL,
`modified` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `siteId` (`site_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
nodes_node_tags
CREATE TABLE `nodes_node_tags` (
`id` CHAR(36) NOT NULL,
`node_id` CHAR(36) NOT NULL,
`node_tag_id` CHAR(36) NOT NULL,
PRIMARY KEY (`id`),
INDEX `node_id_node_tag_id` (`node_id`, `node_tag_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
MySQL:
SELECT `Node`.`id`, `Node`.`name`, `Node`.`slug`, `Node`.`node_type_id`, `Node`.`site_id`, `Node`.`created`, `Node`.`modified`
FROM `mysite`.`nodes` AS `Node`
LEFT JOIN `mysite`.`data_date_times` AS `DataDateTime` ON (`DataDateTime`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_locations` AS `DataLocation` ON (`DataLocation`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_media` AS `DataMedia` ON (`DataMedia`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_metas` AS `DataMeta` ON (`DataMeta`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_profiles` AS `DataProfile` ON (`DataProfile`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_products` AS `DataProduct` ON (`DataProduct`.`node_id` = `Node`.`id`)
LEFT JOIN `mysite`.`data_texts` AS `DataText` ON (`DataText`.`node_id` = `Node`.`id`)
WHERE 1=1
GROUP BY `Node`.`id`
Firstly, try InnoDB, not MyISAM.
Secondly, remove the group by, see how well it runs then, and how many rows are involved. Shouldn't be that many, but it's interesting.
You don't need the 'nodeId' index on node (as you already have it as a primary key). Again, shouldn't make any difference.
The where clause is irrelevant. You can remove it with no effect one way or another.
Thirdly, well, something is seriously broken.
Have a quick look on how to start profiling (e.g. http://dev.mysql.com/doc/refman/5.0/en/show-profile.html) , and run a profile command to see where all the time is going. Post it here if it doesn't immediately show that something is broken.
I'm unfortunately not in a position where I can do any tests right now. I'll just throw out some ideas. I might be able to do some tests later.
Be suspicious of different collations.
Some of your ids are useless. For example, you should drop the column categories_nodes.id, and put a primary key constraint on {category_id, node_id} instead.
Be suspicious of any design that requires joining all the tables at run time. There are better ways.
Use innodb and foreign key constraints.