Query large table with 50 million rows - mysql

trying to query a large table (senddb.order_histories) that has close to 50M rows and this is the MySQL query I am using:
FIRST APPROACH- inner join:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join (
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b
on a.id = b.order_line_item_id
and a.fulfillment_status = '2';
EXPLAIN output :
SECOND APPROACH- nested select:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
where a.fulfillment_status = '2'
and a.id in (
select b.order_line_item_id from(
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where
order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b);
I believe nested select is a bad approach on large data but i anyhow added it here because it worked on my sample set. Anyway both the queries eventually time out after 600 seconds with the message : Error Code: 2013. Lost connection to MySQL server during query.
I would like to know if there are any ways to alter the query to make it run faster. I have already tried reducing the columns in the inner select / inner join but that should not really be an issue IMO. I also looked up a solution that says "create a clustered index" but i wasn't really able to follow. Any help is appreciated.
TABLE order_histories :
order_histories CREATE TABLE `order_histories` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`order_status_description` varchar(255) DEFAULT NULL,
`datetime_stamp` datetime DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`fulfillment_location` int(8) DEFAULT NULL,
`order_status` int(8) DEFAULT NULL,
`user_id` int(8) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`order_line_item_id` int(11) DEFAULT NULL,
`pooled` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `order_histories_ecash_idx` (`order_number`),
KEY `order_line_item_id` (`order_line_item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=454738178 DEFAULT CHARSET=latin1
TABLE order_line_items :
order_line_items CREATE TABLE `order_line_items` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`sku_id` int(8) DEFAULT NULL,
`original_price` float DEFAULT NULL,
`dept_description` varchar(100) DEFAULT NULL,
`description` varchar(100) DEFAULT NULL,
`quantity_ordered` int(8) DEFAULT NULL,
`gift_indicator` char(1) DEFAULT NULL,
`gift_wrap_flag` char(1) DEFAULT NULL,
`shipping_record_flag` char(1) DEFAULT NULL,
`gift_comments` varchar(100) DEFAULT NULL,
`item_status` char(1) DEFAULT NULL,
`tax_amount` float DEFAULT NULL,
`tax_rate` float DEFAULT NULL,
`upc` varchar(20) DEFAULT NULL,
`final_price` float DEFAULT NULL,
`line_number` int(8) DEFAULT NULL,
`master_line_number` int(8) DEFAULT NULL,
`gift_wrap_flag_type` char(1) DEFAULT NULL,
`color_code` varchar(4) DEFAULT NULL,
`size_id` varchar(6) DEFAULT NULL,
`width_id` varchar(6) DEFAULT NULL,
`brand` varchar(15) DEFAULT NULL,
`vpn` varchar(30) DEFAULT NULL,
`dept_number` int(8) DEFAULT NULL,
`class_number` int(8) DEFAULT NULL,
`non_merch_item` char(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`chain_id` int(11) DEFAULT NULL,
`fulfillment_location` int(11) DEFAULT NULL,
`fulfillment_date` datetime DEFAULT NULL,
`fulfillment_status` int(11) DEFAULT NULL,
`fulfillment_sales_associate` int(11) DEFAULT NULL,
`gift_wrap_line_number` int(11) DEFAULT NULL,
`shipping_type` int(11) DEFAULT NULL,
`order_track_info_id` int(11) DEFAULT NULL,
`store_tlog_updated` varchar(1) DEFAULT NULL,
`shipping_tlx_code` int(11) DEFAULT NULL,
`store_closed` tinyint(1) DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`deal_based_index` int(11) DEFAULT NULL,
`tlog_calc_ret_price` float DEFAULT NULL,
`tlog_amount` float DEFAULT NULL,
`tlog_retail_price` float DEFAULT NULL,
`tlog_ext_amount` float DEFAULT NULL,
`tlog_flag_1` int(11) DEFAULT NULL,
`tlog_flag_2` int(11) DEFAULT NULL,
`tlog_flag_3` int(11) DEFAULT NULL,
`time_remaining` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_line_items_ecash_idx` (`order_number`),
KEY `order_line_item_fulfillment_location_idx` (`fulfillment_location`),
KEY `order_line_item_fulfillment_status_idx` (`fulfillment_status`),
KEY `upc_idx` (`upc`),
KEY `sku_id_idx` (`sku_id`),
KEY `order_line_items_idx001` (`order_number`,`id`,`fulfillment_status`),
KEY `order_track_info_id` (`order_track_info_id`),
KEY `shipping_type_idx` (`shipping_type`,`non_merch_item`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=11367052 DEFAULT CHARSET=latin1

This query can be simplified:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join senddb.order_histories b on a.id = b.order_line_item_id
where b.order_status in ('x','y','z')
and b.fulfillment_location = 'abcd'
and a.fulfillment_status = '2';
Since you're only selecting values from table a, you don't need to select specific values from table b and can instead just apply your conditions. Outside of this, you need to ensure that b.order_line_item_id has an index on it. You can find more about creating indexes here. I'm not an expert in MySQL but something similar to this should work if senddb.order_histories.order_line_item_id isn't already the primary key.
CREATE INDEX IX_order_histories_order_line_item_id ON order_histories (order_line_item_id);

You need to read up the optimization section of the MySQL docs. It contains a lot of information on how you can optimize your queries and data sets. The main idea here is to add indexes to the fields that are being used as the criteria in the WHERE clause of the SQL statements.

Basically, both of your alternatives are using a "sub-SELECT, not an INNER JOIN.
The syntax of a true JOIN is one of the following:
SELECT ...
FROM X INNER JOIN Y USING (field_list)
... or ...
SELECT ...
FROM X INNER JOIN Y ON (x.field1 = y.field2) ...
But in both cases the objects being joined are tables or views.
I'm going to presume ... admittedly, without checking ... that Nick Larsen's answer #1 adequately re-expresses your original query using JOINs.
(Notice how, in his answer, the shorthand identifiers A and B are introduced as referring to each of the two table-names mentioned in his query.)

Firstly, you need to decide if a 50 million resultset is what you are asking for. Big data tables are not there so that you can select all their rows. They are there so that you can ask them questions using sql queries. SQL is a query language, it's not a data loading language.
What's your purpose? If you want to copy the data you can do that by loading the data, for example, 1000 rows per query in a for loop. if you are loading the data for processing, you can do that in the same way.
If you want to derive statistical information, you can use outer join and return a low number of rows, using aggregate functions. But you shouldn't do that either, what you "should" do is to decide what you want from the table and preferably, run aggregate functions to store useful information in a different table. (mostly SELECT INTO queries) You should never need to join a table of 50 million records in the first place.
Telling you how to do something wrong using indexes wouldn't be the right thing here.

Related

Long running Mysql Query on Indexes and sort by clause

I have a very long running MySql query. The query simply joins two tables which are very huge
bizevents - Nearly 34 Million rows
bizevents_actions - Nearly 17 million rows
Here is the query:
select
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
bizevent0_.createdBy as createdB4_37_,
bizevent0_.createdOn as createdO5_37_,
bizevent0_.description as descript6_37_,
bizevent0_.iconCss as iconCss7_37_,
bizevent0_.modifiedBy as modified8_37_,
bizevent0_.modifiedOn as modified9_37_,
bizevent0_.name as name10_37_,
bizevent0_.version as version11_37_,
bizevent0_.fired as fired12_37_,
bizevent0_.preCreateFired as preCrea13_37_,
bizevent0_.entityRefClazz as entityR14_37_,
bizevent0_.entityRefIdAsStr as entityR15_37_,
bizevent0_.entityRefIdType as entityR16_37_,
bizevent0_.entityRefName as entityR17_37_,
bizevent0_.entityRefType as entityR18_37_,
bizevent0_.entityRefVersion as entityR19_37_
from
BizEvent bizevent0_
left outer join BizEvent_actions actions1_ on
bizevent0_.id = actions1_.BizEvent_id
where
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
and (actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
order by
bizevent0_.createdOn;
Below are the table definitions -- As you see i have defined the indexes well enough on these two tables on all the search columns plus the sort column. But still my queries are running for very very long time. Appreciate any more ideas either with respective indexing.
-- bizevent definition
CREATE TABLE `bizevent` (
`id` bigint(20) NOT NULL,
`json` longtext,
`account_id` int(11) DEFAULT NULL,
`createdBy` varchar(50) NOT NULL,
`createdon` datetime(3) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`iconCss` varchar(50) DEFAULT NULL,
`modifiedBy` varchar(50) NOT NULL,
`modifiedon` datetime(3) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`version` int(11) NOT NULL,
`fired` bit(1) NOT NULL,
`preCreateFired` bit(1) NOT NULL,
`entityRefClazz` varchar(255) DEFAULT NULL,
`entityRefIdAsStr` varchar(255) DEFAULT NULL,
`entityRefIdType` varchar(25) DEFAULT NULL,
`entityRefName` varchar(255) DEFAULT NULL,
`entityRefType` varchar(50) DEFAULT NULL,
`entityRefVersion` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDXk9kxuuprilygwfwddr67xt1pw` (`createdon`),
KEY `IDXsf3ufmeg5t9ok7qkypppuey7y` (`entityRefIdAsStr`),
KEY `IDX5bxv4g72wxmjqshb770lvjcto` (`entityRefClazz`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- bizevent_actions definition
CREATE TABLE `bizevent_actions` (
`BizEvent_id` bigint(20) NOT NULL,
`action` varchar(255) DEFAULT NULL,
`objectBizType` varchar(255) DEFAULT NULL,
`objectName` varchar(255) DEFAULT NULL,
`objectRefClazz` varchar(255) DEFAULT NULL,
`objectRefIdAsStr` varchar(255) DEFAULT NULL,
`objectRefIdType` int(11) DEFAULT NULL,
`objectRefVersion` int(11) DEFAULT NULL,
`targetBizType` varchar(255) DEFAULT NULL,
`targetName` varchar(255) DEFAULT NULL,
`targetRefClazz` varchar(255) DEFAULT NULL,
`targetRefIdAsStr` varchar(255) DEFAULT NULL,
`targetRefIdType` int(11) DEFAULT NULL,
`targetRefVersion` int(11) DEFAULT NULL,
`embedJson` longtext,
`actions_ORDER` int(11) NOT NULL,
PRIMARY KEY (`BizEvent_id`,`actions_ORDER`),
KEY `IDXa21hhagjogn3lar1bn5obl48gll` (`action`),
KEY `IDX7agsatk8u8qvtj37vhotja0ce77` (`targetRefClazz`),
KEY `IDXa7tktl678kqu3tk8mmkt1mo8lbo` (`targetRefIdAsStr`),
KEY `IDXa22eevu7m820jeb2uekkt42pqeu` (`objectRefClazz`),
KEY `IDXa33ba772tpkl9ig8ptkfhk18ig6` (`objectRefIdAsStr`),
CONSTRAINT `FKr9qjs61id11n48tdn1cdp3wot` FOREIGN KEY (`BizEvent_id`) REFERENCES `bizevent` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;>
By the way we are using Amazon RDS 5.7.33 MySql version. 16 GB RAM and 4 vCPU.
I also did a Explain Extended on the query and below is what it shows. Appreciate any help.
Initially the search of the bizevent_actions didn;t have the indexes defined. I have defined the indexes for them and tried the query but of no use.
One technique that worked for me in a similar situation was abandoning the idea of JOIN completely and switching to queries by PK. More detailed: find out which table in join would give less rows on average if you use only that table and related filter to query; get the primary keys from that table and then query the other one using WHERE pk IN ().
In your case one example would be:
SELECT
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
...
FROM BizEvent bizevent0_
WHERE
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
AND bizevent0_.id IN (
SELECT BizEvent_id
FROM BizEvent_actions actions1_
WHERE
actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
ORDER BY
bizevent0_.createdOn;
This assumes that you're not actually willing to select 33+ Mio rows from BizEvent though - your code with LEFT OUTER JOIN would have done exactly this.

Mysql JOIN query apparently slow

I have 2 tables. The first, called stazioni, where I store live weather data from some weather station, and the second called archivio2, where are stored archived day data. The two tables have in common the ID station data (ID on stazioni, IDStazione on archvio2).
stazioni (1,743 rows)
CREATE TABLE `stazioni` (
`ID` int(10) NOT NULL,
`user` varchar(100) NOT NULL,
`nome` varchar(100) NOT NULL,
`email` varchar(50) NOT NULL,
`localita` varchar(100) NOT NULL,
`provincia` varchar(50) NOT NULL,
`regione` varchar(50) NOT NULL,
`altitudine` int(10) NOT NULL,
`stazione` varchar(100) NOT NULL,
`schermo` varchar(50) NOT NULL,
`installazione` varchar(50) NOT NULL,
`ubicazione` varchar(50) NOT NULL,
`immagine` varchar(100) NOT NULL,
`lat` double NOT NULL,
`longi` double NOT NULL,
`file` varchar(255) NOT NULL,
`url` varchar(255) NOT NULL,
`temperatura` decimal(10,1) DEFAULT NULL,
`umidita` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`vento_direzione` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`rate` decimal(10,1) DEFAULT NULL,
`minima` decimal(10,1) DEFAULT NULL,
`massima` decimal(10,1) DEFAULT NULL,
`orario` varchar(16) DEFAULT NULL,
`online` int(1) NOT NULL DEFAULT '0',
`tipo` int(1) NOT NULL DEFAULT '0',
`webcam` varchar(255) DEFAULT NULL,
`webcam2` varchar(255) DEFAULT NULL,
`condizioni` varchar(255) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
archivio2 (2,127,347 rows)
CREATE TABLE `archivio2` (
`ID` int(10) NOT NULL,
`IDStazione` int(4) NOT NULL DEFAULT '0',
`localita` varchar(100) NOT NULL,
`temp_media` decimal(10,1) DEFAULT NULL,
`temp_minima` decimal(10,1) DEFAULT NULL,
`temp_massima` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`records` int(10) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The indexes that I set
-- Indexes for table `archivio2`
--
ALTER TABLE `archivio2`
ADD PRIMARY KEY (`ID`),
ADD KEY `IDStazione` (`IDStazione`),
ADD KEY `Data2` (`Data2`);
-- Indexes for table `stazioni`
--
ALTER TABLE `stazioni`
ADD PRIMARY KEY (`ID`),
ADD KEY `Tipo` (`Tipo`);
ALTER TABLE `stazioni` ADD FULLTEXT KEY `localita` (`localita`);
On a map, I call by a calendar the date to search data on archive2 table, by this INNER JOIN query (I put an example date):
SELECT *, c.pioggia AS rain, c.raffica AS raff, c.vento AS wind, c.pressione AS press
FROM stazioni as o
INNER JOIN archivio2 as c ON o.ID = c.IDStazione
WHERE c.Data2 LIKE '2019-01-01%'
All works fine, but the time needed to show result are really slow (4/5 seconds), even if the query execution time seems to be ok (about 0.5s/1.0s).
I tried to execute the query on PHPMyadmin, and the results are the same. Execution time quickly, but time to show result extremely slow.
EXPLAIN query result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o ALL PRIMARY,ID NULL NULL NULL 1743 NULL
1 SIMPLE c ref IDStazione,Data2 IDStazione 4 sccavzuq_rete.o.ID 1141 Using where
UPDATE: the query goes fine if I remove the index from 'IDStazione'. But in this way I lost all advantages and speed on other queries... why only that query become slow if I put index on that field?
In your WHERE clause
WHERE c.Data2 LIKE '2019-01-01%'
the value of Data2 must be casted to a string. No index can be used for that condition.
Change it to
WHERE c.Data2 >= '2019-01-01' AND c.Data2 < '2019-01-01' + INTERVAL 1 DAY
This way the engine should be able to use the index on (Data2).
Now check the EXPLAIN result. I would expect, that the table order is swapped and the key column will show Data2 (for c) and ID (for o).
(Fixing the DATE is the main performance solution; here is a less critical issue.)
The tables are much bigger than necessary. Size impacts disk space and, to some extent, speed.
You have 1743 stations, yet the datatype is a 32-bit (4-byte) number (INT). SMALLINT UNSIGNED would allow for 64K stations and use only 2 bytes.
Does it get really, really, hot there? Like 999999999.9 degrees? DECIMAL(10.1) takes 5 bytes; DECIMAL(4,1) takes only 3 and allows up to 999.9 degrees. DECIMAL(3,1) has a max of 99.9 and takes only 2 bytes.
What is "localita varchar(100)" doing in the big table? Seems like you could JOIN to the stations table when you need it? Removing that might cut the table size in half.

MySQL Left join take too long for huge data

I have two tables .Property tables and it related photo.One property may have many photo but I want only one any of it related photo, When I use left join MySQL query it become too slow.
Here is my query
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
CREATE TABLE IF NOT EXISTS `sp_property_images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`property_id` varchar(100) NOT NULL,
`filepath_name` text,
`label_name` varchar(45) DEFAULT NULL,
`primary` char(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `property_id` (`property_id`),
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=12941 ;
CREATE TABLE IF NOT EXISTS `sp_property` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`propertytype` varchar(50) NOT NULL,
`adv_type` varchar(45) NOT NULL,
`property_name` text,
`division` varchar(45) NOT NULL,
`township` varchar(45) NOT NULL,
`property_price` decimal(20,2) unsigned DEFAULT NULL,
`price_type` varchar(45) NOT NULL,
`availability` varchar(100) DEFAULT NULL,
`property_address` text,
`p_dimension_length` varchar(45) NOT NULL,
`p_dimension_width` varchar(45) NOT NULL,
`p_dimension_sqft` varchar(45) NOT NULL,
`p_dimension_acre` varchar(45) NOT NULL,
`floor` varchar(45) NOT NULL,
`phone` varchar(100) DEFAULT NULL,
`aircorn` varchar(45) NOT NULL,
`ownership` varchar(45) NOT NULL,
`bedroom` varchar(45) NOT NULL,
`bathroom` varchar(45) NOT NULL,
`special_feature` text,
`amentites` text,
`property_detail` text,
`submit_date` datetime DEFAULT NULL,
`published` varchar(45) NOT NULL,
`published_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`agent_id` varchar(45) NOT NULL,
`source` varchar(45) NOT NULL,
`contact_name` varchar(100) NOT NULL,
`contact_no` varchar(100) NOT NULL,
`contact_address` text NOT NULL,
`contact_email` varchar(100) NOT NULL,
`unit_type` varchar(100) DEFAULT NULL,
`map_lat` varchar(100) DEFAULT NULL,
`map_long` varchar(100) DEFAULT NULL,
`show_map` varchar(3) DEFAULT 'no',
`total_view` bigint(20) NOT NULL DEFAULT '0',
`feature_listing` varchar(10) NOT NULL DEFAULT 'no',
`new_homes_id` int(11) NOT NULL,
`publish_price` int(1) NOT NULL DEFAULT '0',
`usd` decimal(20,2) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=18524 ;
Have you added indices on your tables? You would need three indices on the following columns:
article_photo.a_id for grouping and joining
article_photo.p_id for sorting
article.a_id for joining (although this is hopefully already the PK of your table)
The result of joins is not guaranteed to be sorted in any order, so you probably want to move your ORDER BY clause from the subquery to the outer query:
SELECT * from `article`
LEFT JOIN (
SELECT * from `article_photo`
GROUP BY `a_id`) as images
ON article.a_id = images.a_id
ORDER BY images.p_id DESC
Also, you have no guarantee on which article_photo you will get, since select data without an aggregate function (and only MySQL will allow you to do that).
Now that the question contains the real tables and all information essential to answering, here's my take – first, here's your query:
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
Let's see. you are joining sp_property_images.property_id with sp_property.id. These columns have different types (int vs. varchar) and I suppose this results in a severe performance penalty (since the values have to be converted to the same type).
Then, you are filtering by sp_property.published, so I suggest adding an index on this column as well. Also, examine if you really need to have this column as varchar. A bool/bit flag probably suffices as well (if it doesn't, an enum might be a better choice still).
Ordering benefits from an index too. Add an index spanning both columns sp_property.feature_listing and sp_property.submit_date.
If all of the above still doesn't help, you might have to remove the sub-select. It might prevent the mysql engine from recognizing (and using!) the index you have defined on the sp_property_images.property_id column.
This simple query will give you what you asked for, all articles with their photos
SELECT ar.a_id, ar.a_title, ap.p_id, ap.photo_name
FROM article ar
JOIN article_photo ap on ar.a_id = ap.a_id
No reason for left join and grouping there or you wanna get sum on photos by article?

Join data from one table based on 2 values

I have the following SQL statement:
SELECT user_accounts.uacc_id,
user_accounts.uacc_username,
ride_rides.ride_type,
ride_rides.ride_num_seats,
ride_rides.ride_price_seat,
ride_rides.ride_accept_nm,
ride_rides.ride_split_cost,
ride_rides.ride_from,
ride_rides.ride_from_lat,
ride_rides.ride_from_lng,
ride_rides.ride_to,
ride_rides.ride_to_lat,
ride_rides.ride_to_lng,
user_profiles.upro_image_name,
ride_times.ridetms_id,
ride_times.ridetms_return,
ride_times.ridetms_depart_date,
ride_times.ridetms_depart_time,
ride_times.ridetms_return_date,
ride_times.ridetms_return_time,
depart_times.dpttme_text
FROM ride_times
LEFT JOIN ride_rides
ON ride_rides.ride_id = ride_times.ridetms_ride_fk
LEFT JOIN user_accounts
ON ride_rides.ride_uacc_fk = user_accounts.uacc_id
LEFT JOIN user_profiles
ON user_profiles.upro_uacc_fk = user_accounts.uacc_id
LEFT JOIN depart_times
ON depart_times.dpttme_id = ride_times.ridetms_depart_time
WHERE ride_times.ridetms_id = ?"
Right now, I have the query pulling a text representation of the data from ride_times.ridetms_depart_time in the last join, and it works fine. However, I need to do the same with another column in the ride_times table. I think I need to use an alias, but after reading several sources on aliases, I can't wrap my head around how to change the call.
Also, 100 brownie points for any feedback about any glaring mistakes in this call. It is my first attempt at using JOINs.
take care,
lee
Thanks to the responses I've received so far. Here is the structure of the tables involved:
CREATE TABLE `depart_times` (
`dpttme_id` int(11) NOT NULL,
`dpttme_text` varchar(50) DEFAULT NULL,
PRIMARY KEY (`dpttme_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `ride_rides` (
`ride_id` int(11) NOT NULL AUTO_INCREMENT,
`ride_uacc_fk` int(11) NOT NULL,
`ride_date_added` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`ride_type` tinyint(4) DEFAULT NULL,
`ride_from` varchar(200) DEFAULT NULL,
`ride_from_lat` float(10,6) DEFAULT NULL,
`ride_from_lng` float(10,6) DEFAULT NULL,
`ride_to` varchar(200) DEFAULT NULL,
`ride_to_lat` float(10,6) DEFAULT NULL,
`ride_to_lng` float(10,6) DEFAULT NULL,
`ride_num_seats` tinyint(4) DEFAULT NULL,
`ride_price_seat` float DEFAULT NULL,
`ride_accept_nm` tinyint(1) DEFAULT '0' COMMENT 'accept non-monetary items',
`ride_split_cost` tinyint(1) DEFAULT '0',
`ride_notes` longtext,
PRIMARY KEY (`ride_id`)
) ENGINE=MyISAM AUTO_INCREMENT=34 DEFAULT CHARSET=latin1;
CREATE TABLE `ride_times` (
`ridetms_id` int(11) NOT NULL AUTO_INCREMENT,
`ridetms_ride_fk` int(11) DEFAULT NULL,
`ridetms_date_added` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`ridetms_depart_date` date NOT NULL DEFAULT '0000-00-00',
`ridetms_depart_time` tinyint(4) DEFAULT '0',
`ridetms_return` tinyint(1) DEFAULT '0',
`ridetms_return_date` date NOT NULL DEFAULT '0000-00-00',
`ridetms_return_time` tinyint(4) DEFAULT '0',
PRIMARY KEY (`ridetms_id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
CREATE TABLE `user_accounts` (
`uacc_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`uacc_group_fk` smallint(5) unsigned NOT NULL,
`uacc_email` varchar(100) NOT NULL,
`uacc_username` varchar(15) NOT NULL,
`uacc_password` varchar(60) NOT NULL,
`uacc_ip_address` varchar(40) NOT NULL,
`uacc_salt` varchar(40) NOT NULL,
`uacc_activation_token` varchar(40) NOT NULL,
`uacc_forgotten_password_token` varchar(40) NOT NULL,
`uacc_forgotten_password_expire` datetime NOT NULL,
`uacc_update_email_token` varchar(40) NOT NULL,
`uacc_update_email` varchar(100) NOT NULL,
`uacc_active` tinyint(1) unsigned NOT NULL,
`uacc_suspend` tinyint(1) unsigned NOT NULL,
`uacc_fail_login_attempts` smallint(5) NOT NULL,
`uacc_fail_login_ip_address` varchar(40) NOT NULL,
`uacc_date_fail_login_ban` datetime NOT NULL COMMENT 'Time user is banned until due to repeated failed logins',
`uacc_date_last_login` datetime NOT NULL,
`uacc_date_added` datetime NOT NULL,
PRIMARY KEY (`uacc_id`),
UNIQUE KEY `uacc_id` (`uacc_id`),
KEY `uacc_group_fk` (`uacc_group_fk`),
KEY `uacc_email` (`uacc_email`),
KEY `uacc_username` (`uacc_username`),
KEY `uacc_fail_login_ip_address` (`uacc_fail_login_ip_address`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=latin1;
CREATE TABLE `user_profiles` (
`upro_id` int(11) NOT NULL AUTO_INCREMENT,
`upro_uacc_fk` int(11) NOT NULL,
`upro_name` varchar(100) DEFAULT NULL,
`upro_blackberry_id` varchar(200) DEFAULT NULL,
`upro_yahoo_id` varchar(200) DEFAULT NULL,
`upro_skype_id` varchar(200) DEFAULT NULL,
`upro_gmail_id` varchar(200) DEFAULT NULL,
`upro_image_name` varchar(200) DEFAULT 'default.jpg',
PRIMARY KEY (`upro_id`),
UNIQUE KEY `upro_id` (`upro_id`),
KEY `upro_uacc_fk` (`upro_uacc_fk`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=latin1;
So, to clarify:
Right now I am pulling some text from depart_times based on ride_times.ridetms_depart_time. I need to also pull some text form depart_times based on ride_times.ridetms_return_time.
I think if you have joined all the tables already, Just put the condition with the WHERE clause Like :
WHERE
ride_times.ridetms_id = ?
AND
depart_times.column_name1 = ride_times.return_time
Does this answer your question?
Well, I must point out this line:
LEFT JOIN depart_times ON depart_times.dpttme_id = ride_times.ridetms_depart_time
Just feels like, something wrong with the schema. But, that is also a guess from my side seeing the namings..
Ok,
After LOTS more searching, I found this post:
MySQL alias for SELECT * columns
I have revised my statement to the following and it is working properly:
SELECT user_accounts.uacc_id,
user_accounts.uacc_username,
ride_rides.*,
user_profiles.upro_image_name,
ride_times.*,
dpt1.dpttme_text AS dep_text,
dpt2.dpttme_text AS ret_text
FROM ride_times
LEFT JOIN ride_rides
ON ride_rides.ride_id = ride_times.ridetms_ride_fk
LEFT JOIN user_accounts
ON ride_rides.ride_uacc_fk = user_accounts.uacc_id
LEFT JOIN user_profiles
ON user_profiles.upro_uacc_fk = user_accounts.uacc_id
LEFT JOIN depart_times AS dpt1
ON dpt1.dpttme_id = ride_times.ridetms_depart_time
LEFT JOIN depart_times AS dpt2
ON dpt2.dpttme_id = ride_times.ridetms_return_time
WHERE ride_times.ridetms_id = ?
Thanks very much to everyone who attempted to help me.
take care,
lee

MySQL index help - which is faster?

What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'
As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`
The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).