MySQL COUNT very slow - mysql

I have got this table:
CREATE TABLE `pertemba_client_raw_data` (
`line_id` int(11) NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL COMMENT 'References pertemba_client_feed_log.feed_id',
`data_line` int(11) NOT NULL COMMENT 'Eg. The CSV line number or JSON object index.',
`property_title` varchar(255) NOT NULL COMMENT 'Eg. The CSV header or JSON key.',
`property_value` varchar(255) NOT NULL COMMENT 'Eg. The CSV field value or JSON object value.',
`date_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`line_id`),
UNIQUE KEY `pertemba_client_raw_data_line_id_pk` (`line_id`),
KEY `feed_id` (`feed_id`),
CONSTRAINT `pertemba_client_raw_data_ibfk_1` FOREIGN KEY (`feed_id`) REFERENCES `pertemba_client_feed_log` (`feed_id`)
) ENGINE=InnoDB AUTO_INCREMENT=113121 DEFAULT CHARSET=utf8
Which currently contains about 110,000 records, but will become much larger.
I have a php process running against this table that is running very slowly - the run time is currently 10+ minutes. When I repeatedly run show PROCESSLIST; this query in the process is always running:
SELECT COUNT(pcr.line_id) AS result FROM pertemba_client_raw_data AS pcr
WHERE pcr.feed_id = :feedId
AND pcr.property_title = :title
AND pcr.property_value = :optionLink
I would appreciate any optimisations that can be suggested for beating this problem.

First step is to identify the problem. Try
EXPLAIN SELECT COUNT(pcr.line_id) AS result
FROM pertemba_client_raw_data AS pcr
WHERE pcr.feed_id = :feedId
AND pcr.property_title = :title
AND pcr.property_value = :optionLink
For your query, as pointed out by juergen, I believe you can improve performance if you add index to property_title and property_value as composite index such as:
KEY `feed_id` (`feed_id`, `property_title`, `property_value`)
After that, try to execute EXPLAIN again to confirm that performance issue is solved or not.

Related

Simple join makes MySQL/MariaDB COUNT(*) rows very slow

I'm joining two tables and counting returned rows with simple MySQL query:
SELECT SQL_NO_CACHE count(parc2.id)
FROM SHIP__shipments AS ship
JOIN SHIP__shipments_parcels AS parc2 ON ship.shipmentId = parc2.shipmentId
It takes approx. 2 seconds to provide result, which is around 800k rows. Primary table has cca. 700k rows, joined table has cca. 800k rows.
Both tables have indexes and all that stuff. Join without counting is very fast, cca. 0.005s.
Counting just one table is also very fast, something like 0.01s.
Once counting and join is in the same query, we are dropping to 2s with 99% of time in "sending data" by profiler.
Output from explain:
1 SIMPLE ship index PRIMARY senderId 4 NULL 738700 Using index
1 SIMPLE parc2 ref shippmentId,shipmentId shippmentId 4 ship.shipmentId 1 Using index
I did tons of tries during testing. Using for example combined keys, using count(*), forcing index to use.. also more exotic ways like using subqueries, etc. Nothing really helps, it's always that slow.
Tables:
CREATE TABLE `SHIP__shipments` (
`shipmentId` int(11) NOT NULL COMMENT 'generated ID',
`externalId` varchar(255) DEFAULT NULL COMMENT 'spedition number',
`senderId` int(11) NOT NULL COMMENT 'FK - sender address',
`recipientId` int(11) DEFAULT NULL COMMENT 'Fk - recipient address',
`customerId` int(11) NOT NULL COMMENT 'FK - custromer',
`packageCount` int(11) NOT NULL COMMENT 'number of parcels',
`shipmentPickupDate` datetime NOT NULL COMMENT 'when to pickup shipent',
`shipmenmtDescription` varchar(255) DEFAULT NULL COMMENT 'free description',
`codAmount` double DEFAULT NULL COMMENT 'COD to take',
`codReference` varchar(255) DEFAULT NULL COMMENT 'customer''s COD refference',
`codCurrencyCode` varchar(50) DEFAULT NULL COMMENT 'FK - currency',
`codConfirmed` tinyint(1) NOT NULL COMMENT 'COD confirmed by spedition',
`codSent` tinyint(1) NOT NULL COMMENT 'COD paid to customer? 1/0',
`trackingCountryCode` varchar(50) NOT NULL COMMENT 'FK - country of shippment tracking',
`subscriptionDate` datetime NOT NULL COMMENT 'when to enter to the sped. system',
`speditionCode` varchar(50) NOT NULL COMMENT 'FK - spedition',
`shipmentType` enum('DIRECT','WAREHOUSE') NOT NULL DEFAULT 'WAREHOUSE' COMMENT 'internal OLZA flag',
`weight` decimal(10,3) NOT NULL COMMENT 'sum weight of parcells',
`billingPrice` decimal(10,2) NOT NULL COMMENT 'stored price of delivery',
`billingCurrencyCode` varchar(50) NOT NULL COMMENT 'storred currency of delivery price',
`invoiceCreated` tinyint(1) NOT NULL COMMENT 'invoicing has been done? 1/0',
`invoicingDate` datetime NOT NULL COMMENT 'date of creating invoice',
`pickupPlaceId` varchar(100) DEFAULT NULL COMMENT 'pickup place ID, if applicable for shipment',
`created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`lastCheckDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'last date of status check'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='shippment details';
ALTER TABLE `SHIP__shipments`
ADD PRIMARY KEY (`shipmentId`),
ADD UNIQUE KEY `senderId` (`senderId`) USING BTREE,
ADD UNIQUE KEY `externalId` (`externalId`,`trackingCountryCode`,`speditionCode`),
ADD UNIQUE KEY `recipientId_2` (`recipientId`),
ADD KEY `recipientId` (`recipientId`),
ADD KEY `customerId` (`customerId`),
ADD KEY `codCurrencyCode` (`codCurrencyCode`),
ADD KEY `trackingCountryCode` (`trackingCountryCode`),
ADD KEY `speditionCode` (`speditionCode`);
ALTER TABLE `SHIP__shipments`
MODIFY `shipmentId` int(11) NOT NULL AUTO_INCREMENT COMMENT 'generated ID';
ALTER TABLE `SHIP__shipments`
ADD CONSTRAINT `SHIP__shipments_ibfk_3` FOREIGN KEY (`recipientId`) REFERENCES `SHIP__recipient_list` (`recipientId`),
ADD CONSTRAINT `SHIP__shipments_ibfk_4` FOREIGN KEY (`customerId`) REFERENCES `CUST__customer_list` (`customerId`),
ADD CONSTRAINT `SHIP__shipments_ibfk_5` FOREIGN KEY (`codCurrencyCode`) REFERENCES `SYS__currencies` (`code`),
ADD CONSTRAINT `SHIP__shipments_ibfk_6` FOREIGN KEY (`trackingCountryCode`) REFERENCES `SYS__countries` (`code`),
ADD CONSTRAINT `SHIP__shipments_ibfk_7` FOREIGN KEY (`speditionCode`) REFERENCES `SYS__speditions` (`code`),
ADD CONSTRAINT `SHIP__shipments_ibfk_8` FOREIGN KEY (`senderId`) REFERENCES `SHIP__sender_list` (`senderId`);
CREATE TABLE `SHIP__shipments_parcels` (
`id` int(11) NOT NULL COMMENT 'generated ID',
`shipmentId` int(11) NOT NULL COMMENT 'FK - shippment',
`externalNumber` varchar(255) DEFAULT NULL COMMENT 'number from spedition',
`externalBarcode` varchar(255) DEFAULT NULL COMMENT 'Barcode ID - external reference',
`status` varchar(100) DEFAULT NULL COMMENT 'FK - current status',
`weigth` decimal(10,3) NOT NULL COMMENT 'weight of parcel',
`weightConfirmed` tinyint(1) NOT NULL COMMENT 'provided weight has been confirmed/updated by measuring',
`parcelType` varchar(255) NOT NULL COMMENT 'foreign key',
`created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='data and relations between shippment and it''s parcels';
ALTER TABLE `SHIP__shipments_parcels`
ADD PRIMARY KEY (`id`),
ADD KEY `shippmentId` (`shipmentId`,`status`),
ADD KEY `status` (`status`),
ADD KEY `parcelType` (`parcelType`),
ADD KEY `externalBarcode` (`externalBarcode`),
ADD KEY `weightConfirmed` (`weightConfirmed`),
ADD KEY `externalNumber` (`externalNumber`),
ADD KEY `shipmentId` (`shipmentId`);
ALTER TABLE `SHIP__shipments_parcels`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'generated ID';
ALTER TABLE `SHIP__shipments_parcels`
ADD CONSTRAINT `SHIP__shipments_parcels_ibfk_2` FOREIGN KEY (`status`) REFERENCES `SHIP__statuses` (`statusCode`),
ADD CONSTRAINT `SHIP__shipments_parcels_ibfk_3` FOREIGN KEY (`shipmentId`) REFERENCES `SHIP__shipments` (`shipmentId`),
ADD CONSTRAINT `SHIP__shipments_parcels_ibfk_4` FOREIGN KEY (`parcelType`) REFERENCES `SHIP__parcel_types` (`parcelType`);
Server is running on SSD disks and we are not talking about a lot of data here.
Am I missing something here? Or 2 seconds is real time of row counting?
Can I have count result in "normal" time like 0.01s?
We are running MariaDB 10.
Analysis
Let's dissect some columns and the EXPLAIN:
`shipmentId` int(11) (*3) NOT NULL COMMENT 'generated ID',
`senderId` int(11) (*3) NOT NULL COMMENT 'FK - sender address',
1 SIMPLE ship index PRIMARY
senderId (*2) 4 NULL 738700 Using index (*1)
1 SIMPLE parc2 ref shippmentId,shipmentId
shippmentId (*4) 4 ship.shipmentId 1 Using index (*1)
SELECT ... count(parc2.id) (*5) ... STRAIGHT_JOIN (*6) ...
Notes:
*1 -- Both are Using index; this is likely to help a lot.
*2 -- INDEX(senderId) is probably the "smallest" index. Note that you are using InnoDB. The PK is "clustered" with the data, so it is not "small". Every secondary index has the PK implicitly tacked on, so that is effectively (senderId, shipmentId). This explains why the Optimizer mysteriously picked INDEX(senderId).
*3 -- INT takes 4 bytes, allowing numbers up to +/- 2 billion. Do you expect to have that many senders and shipments? Shrinking the datatype (and making it UNSIGNED will save some space and I/O, and therefore may speed things up a little.
*4 -- INDEX(shipmentId) is actually like INDEX(shipmentId, id), again 2 INTs.
*5 -- COUNT(x) checks x for being NOT NULL. This is probably unnecessary in your application. Change to COUNT(*) unless you do need the null check. (The performance difference will be minor.)
*6 -- It probably does not matter which table it picks first, except perhaps for what indexes are available. Hence, STRAIGHT_JOIN did not help.
Now let's discuss how the JOIN works. Virtually all JOINs in MySQL are "NLJ" (Nested Loop Join). This is where the code walks through one of the tables (actually just an index for one table), then reaches into the other table (also, just into an index) for each row found.
To do a COUNT(*) it only needs to check for the existence of the row.
So, it walked through the 2-column INDEX(senderId, shipmentId) to find a list of all shipmentIds in the first table. It did not waste time sorting or dedupping that list. And, since shipmentId is the PK, (hence UNIQUE), there won't be any dups.
For each shipmentId, it then looked up all the rows in the second table. That was efficient to do because of INDEX(shipmentId, id).
I/O (or not)
Let's digress into another issue. Was there any I/O? Were all those rows of those two indexes fully cached in RAM? What is the value of innodb_buffer_pool_size?
The way InnoDB fetches a row (from a table or from an index) is to first check to see if it is in the "buffer pool". If it is not there, then it must bump something out of the buffer pool and read the desired 16KB block into the buffer pool.
At one extreme, nothing is in the buffer pool and all the blocks must be read from disk. At the other extreme, all are cached, and no I/O is needed. Since you tried all sorts of things, I assume that all the relevant blocks (those two indexes) were in RAM.
2 INTs * (800K + 700K rows) + some overhead = maybe 50MB. Assuming innodb_buffer_pool_size is more than that, and no swapping occurred, then it is reasonable for there to be no I/O.
So, how long should it take to touch 1.5M rows that are fully cached, in a JOIN? Alas, 2 seconds seems reasonable.
User expectations
It is rare to need an exact, up-to-the-second count that is in the millions. Rethink the User requirement. Or we can discuss ways to pre-compute the value. Or dead-reckon it.
Side notes
(These do not impact the question at hand.)
Don't blindly use 255 for all strings.
UNIQUE(x) is an INDEX, so don't also have INDEX(x).
Having more than 2 PRIMARY or UNIQUE indexes is usually a design error in the schema.
Some columns could (should?) be normalized. Example: parcelType?
Don't use FLOAT or DOUBLE for monetary values; use DECIMAL. (weight could be floating.)

MySQL composite index effect on joins

I have the following SQL query (DB is MySQL 5):
select
event.full_session_id,
DATE(min(event.date)),
event_exe.user_id,
COUNT(DISTINCT event_pat.user_id)
FROM
event AS event
JOIN event_participant AS event_pat ON
event.pat_id = event_pat.id
JOIN event_participant AS event_exe on
event.exe_id = event_exe.id
WHERE
event_pat.user_id <> event_exe.user_id
GROUP BY
event.full_session_id;
"SHOW CREATE TABLE event":
CREATE TABLE `event` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`session_id` varchar(64) DEFAULT NULL,
`full_session_id` varchar(72) DEFAULT NULL,
`pat_id` int(12) DEFAULT NULL,
`exe_id` int(12) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `SESSION_IDX` (`full_session_id`),
KEY `PAT_ID_IDX` (`pat_id`),
KEY `DATE_IDX` (`date`),
KEY `SESSLOGPATEXEC_IDX` (`full_session_id`,`date`,`pat_id`,`exe_id`)
) ENGINE=MyISAM AUTO_INCREMENT=371955 DEFAULT CHARSET=utf8
"SHOW CREATE TABLE event_participant":
CREATE TABLE `event_participant` (
`id` int(12) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) NOT NULL,
`alt_user_id` varchar(64) NOT NULL,
`username` varchar(128) NOT NULL,
`usertype` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ALL_UNQ` (`user_id`,`alt_user_id`,`username`,`usertype`),
KEY `USER_ID_IDX` (`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=5397 DEFAULT CHARSET=utf8
Also, the query itself seems ugly, but this is legacy code on a production system, so we are not expected to change it (at least for now).
The problem is that, there is around 36 million record on the event table (in the production system), so there have been frequent crashes of the DB machine due to using temporary;using filesort processing (they provided these EXPLAIN outputs, unfortunately, I don't have them right now. I'll try to update them to this post later.)
The customer asks for a "quick fix" by adding indices. Currently we have indices on full_session_id, pat_id, date (separately) on event and user_id on event_participant.
Thus I'm thinking of creating a composite index (pat_id, exe_id, full_session_id, date) on event- this index comprises of the fields in the join (equivalent to where ?), then group by, then aggregate (min) parts.
This is just an idea because we currently don't have that kind of data volume to test, so we try the best we could first.
My question is:
Could the index above help in the performance ? (It's quite confusing on the effect because I have found two really contrasting results: https://dba.stackexchange.com/questions/158385/compound-index-on-inner-join-table
versus Separate Join clause in a Composite Index, where the latter suggests that composite index on joins won't work and the former that it'll work.
Does this path (adding indices) have hopes ? Or should we forget it and just try to optimize the query instead ?
Thanks in advance for your help :)
Update:
I have updated the full table description for the two related tables.
MySQL version is 5.1.69. But I think we don't need to worry about the ambiguous data issue mentioned in the comments, because it seems there won't be ambiguity for our data. Specifically, for each full_session_id, there is only one "event_exe.user_id" returned (it's just a business logic in the application)
So, what do you think about my 2 questions ?

Indexes for a large MYSQL table

hope you will allow me to pick your brains so I can gain some knowledge in the process.
We have 3 tables - data_product, data_issuer, data_accountbalance
CREATE TABLE `data_issuer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`issuer_name` varchar(128) NOT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`issuer_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_product_name_issuer_id_260fec65_uniq` (`name`,`issuer_id`),
KEY `data_product_issuer_id_d07fa696_fk_data_issuer_id` (`issuer_id`),
CONSTRAINT `data_product_issuer_id_d07fa696_fk_data_issuer_id` FOREIGN KEY
(`issuer_id`) REFERENCES `data_issuer` (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_accountbalance` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`nominee_name` varchar(128) NOT NULL,
`beneficiary_name` varchar(128) NOT NULL,
`nominee_id` varchar(128) NOT NULL,
`account_id` varchar(16) NOT NULL,
`product_id` int(11) NOT NULL,
`register_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_accountbalance_date_product_id_nominee__7b8d2c6a_uniq` (`date`,`product_id`,`nominee_id`,`beneficiary_name`),
KEY `data_accountbalance_product_id_nominee_id_date_8ef8754f_idx` (`product_id`,`nominee_id`,`date`),
KEY `data_accountbalance_register_id_4e78ec16_fk_data_register_id` (`register_id`),
KEY `data_accountbalance_product_id_date_nominee_i_c3a41e39_idx` (`product_id`,`date`,`nominee_id`,`beneficiary_name`,`balance_amount`),
CONSTRAINT `data_accountbalance_product_id_acfb18f6_fk_data_product_id` FOREIGN KEY (`product_id`) REFERENCES `data_product` (`id`),
CONSTRAINT `data_accountbalance_register_id_4e78ec16_fk_data_register_id` FOREIGN KEY (`register_id`) REFERENCES `data_register` (`id`)
) ENGINE=InnoDB
When running the query below, the system takes about an hour to respond -
SELECT SQL_NO_CACHE *
from data_product
INNER JOIN `data_issuer` ON (`data_issuer`.`id` = `data_product`.`issuer_id`)
INNER JOIN `data_accountbalance` ON (`data_accountbalance`.`product_id` = `data_product`.`id`)
LIMIT 100000000;
Both data_issuer and data_product only have few 100 records in them, but the data_accountbalance is huge with about 15,384,358 records.
The explain plan produced is below -
# id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE data_product ALL PRIMARY,data_product_issuer_id_d07fa696_fk_data_issuer_id 459 100
1 SIMPLE data_issuer eq_ref PRIMARY PRIMARY 4 pnl.data_product.issuer_id 1 100
1 SIMPLE data_accountbalance ref data_accountbalance_product_id_nominee_id_date_8ef8754f_idx,data_accountbalance_product_id_date_nominee_i_c3a41e39_idx data_accountbalance_product_id_date_nominee_i_c3a41e39_idx 4 pnl.data_product.id 493 100
Can someone help tune the query so it does not take an hour to run please? Appreciate any pointers you might have for me.
If your query is literally what you are showing there... Then thats the problem. It has no WHERE clause.
That query would literally return 15,384,358 results. As the two smaller tables are typical domain tables with NOT NULL relations all the way across, it will return 1 to 1 results for every row in data_accountbalance.
The actual time cost will probably be in creating a Massive temp table (tho I'm not sure about that). Just to download the entire database, all 3 tables, you could look into optimize your temp table MySQL config to possibly speed this up, OR preferably make it so that when you start executing the query that you can read the results as MySQL gets them ready (avoids a temp table). Alternatively, maybe your script that runs this query is trying to read the whole data set into memory, which takes a long time?
Is there a particular reason to download All the data? Usually you just download the data you are meaning to operate on. Or have MySQL do the grouping, summing, etc then return the answer you wanted based on All the data.
How many rows did you expect the query to return? If you are thinking something less than 15 million, then the answer is to add some kind of WHERE statement, or an aggregate function. Depending on what table and column in you use to reduce the result set, those columns will have to be indexed.
I hope this helps. :)

mysql select with order by using filesort no index used

Sorry fot long post but this is really strange and I am close to give it up. 2 tables:
CREATE TABLE `endu_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
`base_yob` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_results_206a6355` (`base_name`),
KEY `endu_results_63df4402` (`base_nr`),
KEY `base_yob` (`base_yob`)
) ENGINE=InnoDB AUTO_INCREMENT=3424028 DEFAULT CHARSET=utf8;enter code here
and 2nd:
CREATE TABLE `endu_resultinterest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`result_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `endu_resultinterest_3b529087` (`result_id`),
CONSTRAINT `result_id_refs_id_19e24435` FOREIGN KEY (`result_id`) REFERENCES `endu_results` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=48590 DEFAULT CHARSET=utf8;
There are about 2mln records in endu_resultstable and less then 100K i endu_resultinterest. I have slow query:
explain select base_yob from endu_resultinterest
inner join endu_results
on (endu_results.id = endu_resultinterest.result_id)
order by endu_results.base_yob;
1 SIMPLE endu_resultinterest index endu_resultinterest_3b529087 endu_resultinterest_3b529087 4 NULL 47559 Using index; Using temporary; Using filesort
The question is: Why mysql is using this index: endu_resultinterest_3b529087 - but it should use base_yob - this is where sorting is requested ?
To test it further I have manaully created 2 additional identical tables endu_testresults and endu_testresultintrest and filled those with some records:
CREATE TABLE `endu_testresults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_yob` int(11) DEFAULT NULL,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_testresults_a65b2616` (`base_yob`),
KEY `endu_testresults_ba0ab39c` (`base_name`),
KEY `endu_testresults_d75ba04d` (`base_nr`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8;
So I go again for explain:
explain select base_yob from endu_testresultinterest
inner join endu_testresults
on (endu_testresults.id = endu_testresultinterest.result_id)
order by endu_testresults.base_yob;
and suprise suprise:
1 SIMPLE endu_testresults index PRIMARY endu_testresults_a65b2616 5 NULL 19 Using index
Index sort column base_yob (endu_testresults_a65b2616) is now used.
Why is that in one case index is used and in other I got 'using filesort;using temporary ? Does size matters ? I will try to copy records from one to another but do not get it with indexes. MySql is 5.6.16
Short answer: Because it is faster.
Long answer...
Your EXPLAINs seem to be incomplete -- I would expect 2 lines in each.
The first table is 20 (70?) times as big as the second. The optimizer picked the smaller table to start with. Hence it is initially doing 1/20th the amount of work. The sort that comes later (ORDER BY ...) is much less work than if it had to do 20 times as much work to start with.
The output is only 48K rows, correct? And that is how many rows in the 2nd table, correct?
Your test tables did not have the same bigger/smaller ratio, did they? Hence the different EXPLAIN.

Working with EF4 and MySql

Need some advice working with EF4 and MySql.
I have a table with lots of data items. Each item belongs to a module and a zone. The data item also has a timestamp (ticks). The most common usage is for the app to query for data after a specified time for a module and a zone. The data should be sorted.
Problem is that the query selects to many rows and the database server will be low on memory resulting in a very slow query. I tried to limit the query to 100 items but the generated sql will only apply the limit after all the items has been selected and sorted.
dataRepository.GetData().WithModuleId(ModuleId).InZone(ZoneId).After(ztime).OrderBy(p
=> p.Timestamp).Take(100).ToList();
Generated SQL by the MySql .Net Connector 6.3.6
SELECT
`Project1`.`Id`,
`Project1`.`Data`,
`Project1`.`Timestamp`,
`Project1`.`ModuleId`,
`Project1`.`ZoneId`,
`Project1`.`Version`,
`Project1`.`Type`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`Data`,
`Extent1`.`Timestamp`,
`Extent1`.`ModuleId`,
`Extent1`.`ZoneId`,
`Extent1`.`Version`,
`Extent1`.`Type`
FROM `DataItems` AS `Extent1`
WHERE ((`Extent1`.`ModuleId` = 1) AND (`Extent1`.`ZoneId` = 1)) AND
(`Extent1`.`Timestamp` > 634376753657189002)) AS `Project1`
ORDER BY
`Timestamp` ASC LIMIT 100
Table definition
CREATE TABLE `mydb`.`DataItems` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`Data` mediumblob NOT NULL,
`Timestamp` bigint(20) NOT NULL,
`ModuleId` bigint(20) NOT NULL,
`ZoneId` bigint(20) NOT NULL,
`Version` int(11) NOT NULL,
`Type` varchar(1000) NOT NULL,
PRIMARY KEY (`Id`),
KEY `IX_FK_ModuleDataItem` (`ModuleId`),
KEY `IX_FK_ZoneDataItem` (`ZoneId`),
KEY `Index_4` (`Timestamp`),
KEY `Index_5` (`ModuleId`,`ZoneId`),
CONSTRAINT `FK_ModuleDataItem` FOREIGN KEY (`ModuleId`) REFERENCES
`Modules` (`Id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `FK_ZoneDataItem` FOREIGN KEY (`ZoneId`) REFERENCES `Zones`
(`Id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=22904 DEFAULT CHARSET=utf8;
All suggestions on how to solve this are welcome.
What's your GetData() method doing? I'd bet it's executing a query on the entire table. And that's why your Take(100) at the end isn't doing anything.
I solved this by using the Table Splitting method described here:
Table splitting in entity framework