MySQL index help - which is faster? - mysql

What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'

As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`

The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).

Related

Mysql Query taking long time where using variable

I had an mysql event and runs eery day at 9:45 AM.
Begin
SET #v_ym :=(SELECT extract(year_month from DATE_SUB(SYSDATE(),INTERVAL 1 DAY)));
SELECT CAST(#ym AS CHAR);
select ssaname,extract(year_month from date_sub(sysdate(),interval 1 day)) ym,
omcr.btscount_ssa(ssaname) btscount,sum(case when duration>30 then duration else 0 end) dur_30 from
btsoutage.bts_faults
where ym=#v_ym and ssaname is not null
group by ssaname;
END;
in the query [ym is yearmonth and ym is indexed] when i substitute with variable #v_ym it is taking full table scan and the table is locked for further inserts. where as when i given the value directly it is using index and the output is fast.
The table contains more than 10 million records.
Create table is
CREATE TABLE `bts_faults` (
`bts_name` varchar(250) DEFAULT NULL,
`make` varchar(10) DEFAULT NULL,
`occuredtime` datetime DEFAULT NULL,
`clearedtime` datetime DEFAULT NULL,
`duration` int(10) DEFAULT NULL,
`reason` varchar(100) DEFAULT NULL,
`site_type` varchar(10) DEFAULT NULL,
`tech` varchar(5) DEFAULT NULL,
`fault_id` bigint(20) NOT NULL AUTO_INCREMENT,
`ssaname` varchar(20) DEFAULT NULL,
`fault_type` int(1) DEFAULT '0',
`remarks` varchar(250) DEFAULT NULL,
`bts_section` varchar(100) DEFAULT NULL,
`vendor` varchar(50) DEFAULT NULL,
`occureddate` date DEFAULT NULL,
`cleareddate` date DEFAULT NULL,
`ym` varchar(6) DEFAULT NULL,
`updatedate` datetime DEFAULT NULL,
`USERNAME` varchar(100) DEFAULT NULL,
`mask` int(1) DEFAULT '0',
`mask_cat` varchar(10) DEFAULT NULL,
`outage_cat` varchar(20) DEFAULT NULL,
`site_category` varchar(50) DEFAULT NULL,
`escalated_time` datetime DEFAULT NULL,
`zone` varchar(20) DEFAULT NULL,
`zone_fault_reason` varchar(500) DEFAULT NULL,
`zone_fault_remarks` varchar(500) DEFAULT NULL,
`zone_username` varchar(20) DEFAULT NULL,
`zone_updatetime` datetime DEFAULT NULL,
`zone_fault_duration` int(11) DEFAULT NULL,
`fault_category` varchar(250) DEFAULT NULL,
`remarks_1` varchar(2500) DEFAULT NULL,
PRIMARY KEY (`fault_id`),
UNIQUE KEY `UIDX_BTS_FAULTS` (`bts_name`,`occuredtime`),
KEY `indx_btsfaults_ym` (`ym`),
KEY `indx_btsfaults_cleareddate` (`cleareddate`),
KEY `Index_btsfaults_btsname` (`bts_name`),
KEY `index_btsfaults_ssaname` (`ssaname`),
KEY `indx_btsfaults_occureddate` (`occureddate`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3807469710 DEFAULT CHARSET=latin1
The Explain Plan for the 2 type are
What percentage of the table is in the "current month"? If that is more than something like 20%, then there is no fix -- a table scan is likely to be faster. If it is less than 20%, then, as you suspect, #variables may be the villain. In that case, change the test to be
WHERE ym = CAST(
extract(year_month from DATE_SUB(SYSDATE(),INTERVAL 1 DAY))
AS CHAR)
AND ...
Much faster would be to build and maintain a Summary Table with a PRIMARY KEY of day and ssaname. This would have the subtotals for each day. It would be maintained either as the data is INSERTed or each night after midnight.
Then the 9:45 query becomes very fast. Maybe so fast that you don't even need to do it just once a day, but instead "on-demand".
More discussion: http://mysql.rjweb.org/doc.php/summarytables
I suggest you use NOW() instead of SYSDATE() -- The former is constant throughout a statement; the latter is not.
bts_faults looks like it might be a terabyte in size. If so, you probably don't want to here ways to make is smaller.
If the Auto_inc value is at 3.8B, yet there are only 10M rows, does this mean that you are purging 'old' data? Do you want to discuss speeding up the Deletes? (Start a new Question if you do.)

MySQL extremely slow on a very simple query

I'm getting very slow response running a very simple query in a small table (115k records)...
It takes about 8sec to respond, and I can't figure out why it's taking that long. Any advice would be awesome
Table:
CREATE TABLE `financeiro_fluxo` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`branch` int(10) unsigned NOT NULL,
`abertura` int(10) DEFAULT NULL,
`origem` int(10) unsigned DEFAULT NULL,
`status_pagamento` tinyint(3) unsigned DEFAULT NULL,
`conta` int(10) unsigned NOT NULL,
`tipo_lancamento` tinyint(3) unsigned NOT NULL,
`categoria` int(10) unsigned NOT NULL,
`tipo_entidade` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
`entidade` int(10) unsigned DEFAULT NULL,
`entidade_input` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`tipo_pagamento` tinyint(3) unsigned NOT NULL,
`parcela` smallint(5) unsigned NOT NULL,
`parcelas` smallint(5) unsigned NOT NULL,
`valor` decimal(12,2) NOT NULL,
`valor_taxa` decimal(12,2) DEFAULT NULL,
`valor_troco` decimal(12,2) DEFAULT NULL,
`confirmado` tinyint(3) unsigned DEFAULT NULL,
`data_confirmacao` datetime DEFAULT NULL,
`vencimento` date NOT NULL,
`info` varchar(510) COLLATE utf8_unicode_ci DEFAULT NULL,
`bandeira` int(10) unsigned DEFAULT NULL,
`user_add` int(10) unsigned NOT NULL,
`user_last` int(10) unsigned NOT NULL,
`param_ref` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`param` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`file` int(10) unsigned DEFAULT NULL,
`date_created` datetime NOT NULL,
`date_modified` datetime NOT NULL,
`status` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=116749 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Query:
SELECT * from financeiro_fluxo
Explain:
id select_type table type key key_len rows
1 SIMPLE financeiro_fluxo ALL 116244
The same query running on localhost with the same table, returns in less than a sec...
Profile:
Seems you are doing a full table scan because your query does not include any limiting conditions (for example WHERE clause or LIMIT). To let the query preform better use indexed columns with some kind of criteria. What happens if you add WHERE id IS NOT NULL
I assume you need all the records, if not limit the result set by added conditions in a more specific WHERE clause (on a indexed column) or a LIMIT clause.
Will the "reports" aggregate data? Of so, you could speed up the 8 second (remote) query by doing more work in the server, thereby shipping less data across the wire.
That is, think about whether AVG(..), COUNT(*), SUM(..), MAX(..), etc can be done in the SELECT.
Taking that another step... Build and maintain a "Summary table" that has subtotals (etc). Then, reading (or scanning) the summary table and summing up the subtotals, etc, will be even faster, both in the Server and across the wire.
(And I agree with the need to avoid *, and that the 8 seconds is probably due to network delay (and "bandwidth"). Where is the server geographically? How long does SELECT 1; take?)

Mysql JOIN query apparently slow

I have 2 tables. The first, called stazioni, where I store live weather data from some weather station, and the second called archivio2, where are stored archived day data. The two tables have in common the ID station data (ID on stazioni, IDStazione on archvio2).
stazioni (1,743 rows)
CREATE TABLE `stazioni` (
`ID` int(10) NOT NULL,
`user` varchar(100) NOT NULL,
`nome` varchar(100) NOT NULL,
`email` varchar(50) NOT NULL,
`localita` varchar(100) NOT NULL,
`provincia` varchar(50) NOT NULL,
`regione` varchar(50) NOT NULL,
`altitudine` int(10) NOT NULL,
`stazione` varchar(100) NOT NULL,
`schermo` varchar(50) NOT NULL,
`installazione` varchar(50) NOT NULL,
`ubicazione` varchar(50) NOT NULL,
`immagine` varchar(100) NOT NULL,
`lat` double NOT NULL,
`longi` double NOT NULL,
`file` varchar(255) NOT NULL,
`url` varchar(255) NOT NULL,
`temperatura` decimal(10,1) DEFAULT NULL,
`umidita` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`vento_direzione` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`rate` decimal(10,1) DEFAULT NULL,
`minima` decimal(10,1) DEFAULT NULL,
`massima` decimal(10,1) DEFAULT NULL,
`orario` varchar(16) DEFAULT NULL,
`online` int(1) NOT NULL DEFAULT '0',
`tipo` int(1) NOT NULL DEFAULT '0',
`webcam` varchar(255) DEFAULT NULL,
`webcam2` varchar(255) DEFAULT NULL,
`condizioni` varchar(255) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
archivio2 (2,127,347 rows)
CREATE TABLE `archivio2` (
`ID` int(10) NOT NULL,
`IDStazione` int(4) NOT NULL DEFAULT '0',
`localita` varchar(100) NOT NULL,
`temp_media` decimal(10,1) DEFAULT NULL,
`temp_minima` decimal(10,1) DEFAULT NULL,
`temp_massima` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`records` int(10) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The indexes that I set
-- Indexes for table `archivio2`
--
ALTER TABLE `archivio2`
ADD PRIMARY KEY (`ID`),
ADD KEY `IDStazione` (`IDStazione`),
ADD KEY `Data2` (`Data2`);
-- Indexes for table `stazioni`
--
ALTER TABLE `stazioni`
ADD PRIMARY KEY (`ID`),
ADD KEY `Tipo` (`Tipo`);
ALTER TABLE `stazioni` ADD FULLTEXT KEY `localita` (`localita`);
On a map, I call by a calendar the date to search data on archive2 table, by this INNER JOIN query (I put an example date):
SELECT *, c.pioggia AS rain, c.raffica AS raff, c.vento AS wind, c.pressione AS press
FROM stazioni as o
INNER JOIN archivio2 as c ON o.ID = c.IDStazione
WHERE c.Data2 LIKE '2019-01-01%'
All works fine, but the time needed to show result are really slow (4/5 seconds), even if the query execution time seems to be ok (about 0.5s/1.0s).
I tried to execute the query on PHPMyadmin, and the results are the same. Execution time quickly, but time to show result extremely slow.
EXPLAIN query result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o ALL PRIMARY,ID NULL NULL NULL 1743 NULL
1 SIMPLE c ref IDStazione,Data2 IDStazione 4 sccavzuq_rete.o.ID 1141 Using where
UPDATE: the query goes fine if I remove the index from 'IDStazione'. But in this way I lost all advantages and speed on other queries... why only that query become slow if I put index on that field?
In your WHERE clause
WHERE c.Data2 LIKE '2019-01-01%'
the value of Data2 must be casted to a string. No index can be used for that condition.
Change it to
WHERE c.Data2 >= '2019-01-01' AND c.Data2 < '2019-01-01' + INTERVAL 1 DAY
This way the engine should be able to use the index on (Data2).
Now check the EXPLAIN result. I would expect, that the table order is swapped and the key column will show Data2 (for c) and ID (for o).
(Fixing the DATE is the main performance solution; here is a less critical issue.)
The tables are much bigger than necessary. Size impacts disk space and, to some extent, speed.
You have 1743 stations, yet the datatype is a 32-bit (4-byte) number (INT). SMALLINT UNSIGNED would allow for 64K stations and use only 2 bytes.
Does it get really, really, hot there? Like 999999999.9 degrees? DECIMAL(10.1) takes 5 bytes; DECIMAL(4,1) takes only 3 and allows up to 999.9 degrees. DECIMAL(3,1) has a max of 99.9 and takes only 2 bytes.
What is "localita varchar(100)" doing in the big table? Seems like you could JOIN to the stations table when you need it? Removing that might cut the table size in half.

Query large table with 50 million rows

trying to query a large table (senddb.order_histories) that has close to 50M rows and this is the MySQL query I am using:
FIRST APPROACH- inner join:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join (
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b
on a.id = b.order_line_item_id
and a.fulfillment_status = '2';
EXPLAIN output :
SECOND APPROACH- nested select:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
where a.fulfillment_status = '2'
and a.id in (
select b.order_line_item_id from(
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where
order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b);
I believe nested select is a bad approach on large data but i anyhow added it here because it worked on my sample set. Anyway both the queries eventually time out after 600 seconds with the message : Error Code: 2013. Lost connection to MySQL server during query.
I would like to know if there are any ways to alter the query to make it run faster. I have already tried reducing the columns in the inner select / inner join but that should not really be an issue IMO. I also looked up a solution that says "create a clustered index" but i wasn't really able to follow. Any help is appreciated.
TABLE order_histories :
order_histories CREATE TABLE `order_histories` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`order_status_description` varchar(255) DEFAULT NULL,
`datetime_stamp` datetime DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`fulfillment_location` int(8) DEFAULT NULL,
`order_status` int(8) DEFAULT NULL,
`user_id` int(8) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`order_line_item_id` int(11) DEFAULT NULL,
`pooled` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `order_histories_ecash_idx` (`order_number`),
KEY `order_line_item_id` (`order_line_item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=454738178 DEFAULT CHARSET=latin1
TABLE order_line_items :
order_line_items CREATE TABLE `order_line_items` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`sku_id` int(8) DEFAULT NULL,
`original_price` float DEFAULT NULL,
`dept_description` varchar(100) DEFAULT NULL,
`description` varchar(100) DEFAULT NULL,
`quantity_ordered` int(8) DEFAULT NULL,
`gift_indicator` char(1) DEFAULT NULL,
`gift_wrap_flag` char(1) DEFAULT NULL,
`shipping_record_flag` char(1) DEFAULT NULL,
`gift_comments` varchar(100) DEFAULT NULL,
`item_status` char(1) DEFAULT NULL,
`tax_amount` float DEFAULT NULL,
`tax_rate` float DEFAULT NULL,
`upc` varchar(20) DEFAULT NULL,
`final_price` float DEFAULT NULL,
`line_number` int(8) DEFAULT NULL,
`master_line_number` int(8) DEFAULT NULL,
`gift_wrap_flag_type` char(1) DEFAULT NULL,
`color_code` varchar(4) DEFAULT NULL,
`size_id` varchar(6) DEFAULT NULL,
`width_id` varchar(6) DEFAULT NULL,
`brand` varchar(15) DEFAULT NULL,
`vpn` varchar(30) DEFAULT NULL,
`dept_number` int(8) DEFAULT NULL,
`class_number` int(8) DEFAULT NULL,
`non_merch_item` char(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`chain_id` int(11) DEFAULT NULL,
`fulfillment_location` int(11) DEFAULT NULL,
`fulfillment_date` datetime DEFAULT NULL,
`fulfillment_status` int(11) DEFAULT NULL,
`fulfillment_sales_associate` int(11) DEFAULT NULL,
`gift_wrap_line_number` int(11) DEFAULT NULL,
`shipping_type` int(11) DEFAULT NULL,
`order_track_info_id` int(11) DEFAULT NULL,
`store_tlog_updated` varchar(1) DEFAULT NULL,
`shipping_tlx_code` int(11) DEFAULT NULL,
`store_closed` tinyint(1) DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`deal_based_index` int(11) DEFAULT NULL,
`tlog_calc_ret_price` float DEFAULT NULL,
`tlog_amount` float DEFAULT NULL,
`tlog_retail_price` float DEFAULT NULL,
`tlog_ext_amount` float DEFAULT NULL,
`tlog_flag_1` int(11) DEFAULT NULL,
`tlog_flag_2` int(11) DEFAULT NULL,
`tlog_flag_3` int(11) DEFAULT NULL,
`time_remaining` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_line_items_ecash_idx` (`order_number`),
KEY `order_line_item_fulfillment_location_idx` (`fulfillment_location`),
KEY `order_line_item_fulfillment_status_idx` (`fulfillment_status`),
KEY `upc_idx` (`upc`),
KEY `sku_id_idx` (`sku_id`),
KEY `order_line_items_idx001` (`order_number`,`id`,`fulfillment_status`),
KEY `order_track_info_id` (`order_track_info_id`),
KEY `shipping_type_idx` (`shipping_type`,`non_merch_item`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=11367052 DEFAULT CHARSET=latin1
This query can be simplified:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join senddb.order_histories b on a.id = b.order_line_item_id
where b.order_status in ('x','y','z')
and b.fulfillment_location = 'abcd'
and a.fulfillment_status = '2';
Since you're only selecting values from table a, you don't need to select specific values from table b and can instead just apply your conditions. Outside of this, you need to ensure that b.order_line_item_id has an index on it. You can find more about creating indexes here. I'm not an expert in MySQL but something similar to this should work if senddb.order_histories.order_line_item_id isn't already the primary key.
CREATE INDEX IX_order_histories_order_line_item_id ON order_histories (order_line_item_id);
You need to read up the optimization section of the MySQL docs. It contains a lot of information on how you can optimize your queries and data sets. The main idea here is to add indexes to the fields that are being used as the criteria in the WHERE clause of the SQL statements.
Basically, both of your alternatives are using a "sub-SELECT, not an INNER JOIN.
The syntax of a true JOIN is one of the following:
SELECT ...
FROM X INNER JOIN Y USING (field_list)
... or ...
SELECT ...
FROM X INNER JOIN Y ON (x.field1 = y.field2) ...
But in both cases the objects being joined are tables or views.
I'm going to presume ... admittedly, without checking ... that Nick Larsen's answer #1 adequately re-expresses your original query using JOINs.
(Notice how, in his answer, the shorthand identifiers A and B are introduced as referring to each of the two table-names mentioned in his query.)
Firstly, you need to decide if a 50 million resultset is what you are asking for. Big data tables are not there so that you can select all their rows. They are there so that you can ask them questions using sql queries. SQL is a query language, it's not a data loading language.
What's your purpose? If you want to copy the data you can do that by loading the data, for example, 1000 rows per query in a for loop. if you are loading the data for processing, you can do that in the same way.
If you want to derive statistical information, you can use outer join and return a low number of rows, using aggregate functions. But you shouldn't do that either, what you "should" do is to decide what you want from the table and preferably, run aggregate functions to store useful information in a different table. (mostly SELECT INTO queries) You should never need to join a table of 50 million records in the first place.
Telling you how to do something wrong using indexes wouldn't be the right thing here.

SQL Table Split-Is it necessary

I've a table with around 6-7lacs records and it's going to grow as time passes.It has around 16-20 columns in it. There are no one-many relationship to any of these columns.
User data entries are stored in these table.
So would it be feasible to split my table into multiple small tables or else just split the table into 2 halfs one with all the entries in it and other the recently fresh records which would be present to the data entry operators to feed in their entries.
In short my question is whether the mysql execution time would be faster if I split the tables, or would it be faster if I split them into two half's.
I guess the latter would be more feasible since it would not perform any join queries.
Updated:
CREATE TABLE `images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`primary_category_id` int(10) unsigned DEFAULT NULL,
`secondary_category_id` int(10) unsigned DEFAULT NULL,
`front_url` varchar(255) DEFAULT NULL,
`back_url` varchar(255) DEFAULT NULL,
`title` varchar(100) DEFAULT NULL,
`part` varchar(10) DEFAULT NULL,
`photo_id` int(10) unsigned DEFAULT NULL,
`photo_dt_month` varchar(2) DEFAULT NULL,
`photo_dt_day` varchar(2) DEFAULT NULL,
`photo_dt_yr` varchar(4) DEFAULT NULL,
`type` varchar(25) DEFAULT NULL,
`size_width` int(10) unsigned DEFAULT NULL,
`size_height` int(10) unsigned DEFAULT NULL,
`dpi` int(10) unsigned NOT NULL DEFAULT '0',
`dpix` int(10) unsigned DEFAULT NULL,
`dpiy` int(10) unsigned DEFAULT NULL,
`in_stock` varchar(50) DEFAULT NULL,
`outlet` varchar(50) DEFAULT NULL,
`source` varchar(50) DEFAULT NULL,
`keywords` varchar(255) DEFAULT NULL,
`emotional_keywords` varchar(255) DEFAULT NULL,
`mechanical_keywords` varchar(255) DEFAULT NULL,
`description` text,
`notes` text,
`comments` text,
`exported_to_ebay_dt` datetime DEFAULT NULL,
`exported_to_ebay` set('Y','N') NOT NULL DEFAULT 'N',
`updated_worker_id` int(10) unsigned DEFAULT NULL,
`updated_worker_dt` datetime DEFAULT NULL,
`locked_worker_id` int(10) unsigned DEFAULT NULL,
`locked_worker_dt` datetime DEFAULT NULL,
`updated_admin_id` int(10) unsigned DEFAULT NULL,
`updated_admin_dt` datetime DEFAULT NULL,
`added_dt` datetime DEFAULT NULL,
`updated_manager_id` int(10) unsigned DEFAULT NULL,
`updated_manager_dt` datetime DEFAULT NULL,
`manager_review` set('Y','N') NOT NULL DEFAULT 'N',
`paid_status` set('Y','N') NOT NULL DEFAULT 'N',
`exported_to_web_dt` datetime DEFAULT NULL,
`exported_to_web` set('Y','N') DEFAULT 'N',
`prefix` varchar(50) DEFAULT NULL,
`is_premium` set('Y','N') DEFAULT 'N',
`template` varchar(50) DEFAULT 'HIPE_default',
`photographer` varchar(100) DEFAULT NULL,
`copyright` varchar(100) DEFAULT NULL,
`priority` int(4) DEFAULT '1',
`step` set('1','2') DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `part` (`part`),
KEY `primary_category_id` (`primary_category_id`),
KEY `updated_worker_id` (`updated_worker_id`),
KEY `updated_worker_dt` (`updated_worker_dt`)
) ENGINE=MyISAM AUTO_INCREMENT=1013687 DEFAULT CHARSET=latin1
The above is my table structure.After there are entries being made say around 1lac I would split it into another table say images_history with same structure.Is this feasible or should I split them into multiple tables to reduce the query execution time
Why do you want to split the table? It would lead to a ton of extra code and slow down the execution time by adding extra queries if you still want to access both of the new tables. (If one of the tables are going to store rarely used previous versions of records of the images table - i.e. version control - it may still be a good idea).
Before even thinking about splitting the table, see if you can increase performance by optimizing the existing code by making sure none of the following performance disasters are:
Do all SELECTs filter by PRIMARY KEY?
Is the index cache large enough to hold all indices in the computers RAM?
Are any string matching SELECTs with LIKE using the indices? I.e. only exact matches or wildcards on the right, never on the left (e.g. "searchword%" and never "%searchword"
Are there any slow performing queries that use SELECT * instead of selecting only the columns you need?
Have you avoided using OR in SELECTs?
Performing queries on a table with 700 000 records shouldn't be slow if tabels are properly indexed and queries are actually using those indices.