MYSQL need faster group by on single table - mysql

I am trying to select one property per state from a table with 1,000,000 properties. I am trying
select * from properties
where latitude is not null and longitude is not null
group by property_state;
But the query takes 3 seconds. I have an index on latitude and longitude, and an index on state. I tried adding a third index on all 3 columns, but that did not help.
Any ideas?
Here is the create table code, if that helps (I removed new index that did not help)
CREATE TABLE `t_national_comps` (
`deal_Id` INT(11) NULL DEFAULT NULL,
`nc_id` INT(11) NOT NULL AUTO_INCREMENT,
`property_id` INT(15) NULL DEFAULT NULL,
`reonomy_property_id` VARCHAR(50) NULL DEFAULT NULL,
`reonomy_url` VARCHAR(80) NULL DEFAULT NULL,
`confidence` FLOAT NULL DEFAULT NULL,
`latitude` DECIMAL(11,8) NULL DEFAULT NULL,
`longitude` DECIMAL(11,8) NULL DEFAULT NULL,
`prop_key` VARCHAR(255) NULL DEFAULT NULL,
`fmt_address` VARCHAR(255) NULL DEFAULT NULL,
`property_street_number` VARCHAR(20) NULL DEFAULT NULL,
`property_street_name` VARCHAR(40) NULL DEFAULT NULL,
`property_street_mode` VARCHAR(20) NULL DEFAULT NULL,
`property_city` VARCHAR(40) NULL DEFAULT NULL,
`property_state` VARCHAR(10) NULL DEFAULT NULL,
`property_zip` VARCHAR(10) NULL DEFAULT NULL,
`property_zip4` VARCHAR(10) NULL DEFAULT NULL,
`municipality` VARCHAR(40) NULL DEFAULT NULL,
`property_class_id` VARCHAR(15) NULL DEFAULT NULL,
`std_land_use_code` VARCHAR(15) NULL DEFAULT NULL,
`sale_doc_num` VARCHAR(30) NULL DEFAULT NULL,
`mortgage_doc_num` VARCHAR(30) NULL DEFAULT NULL,
`mortgage_date` DATE NULL DEFAULT NULL,
`lender` VARCHAR(100) NULL DEFAULT NULL,
`bank_id` INT(11) NULL DEFAULT NULL,
`loan_amount` BIGINT(15) NULL DEFAULT NULL,
`maturity_date` DATE NULL DEFAULT NULL,
`rate` VARCHAR(20) NULL DEFAULT NULL,
`sale_date` DATE NULL DEFAULT NULL,
`curr_sale_contract_date` DATE NULL DEFAULT NULL,
`curr_sale_document_type` VARCHAR(20) NULL DEFAULT NULL,
`sale_price` BIGINT(22) NULL DEFAULT NULL,
`curr_sale_buyer1_full_name` VARCHAR(60) NULL DEFAULT NULL,
`curr_sale_buyer2_full_name` VARCHAR(60) NULL DEFAULT NULL,
`reported_owner` VARCHAR(60) NULL DEFAULT NULL,
`mailing_address` VARCHAR(500) NULL DEFAULT NULL,
`curr_sale_seller1_full_name` VARCHAR(60) NULL DEFAULT NULL,
`curr_sale_seller2_full_name` VARCHAR(60) NULL DEFAULT NULL,
`sq_footage` VARCHAR(10) NULL DEFAULT NULL,
`resi_units` VARCHAR(10) NULL DEFAULT NULL,
`commercial_units` VARCHAR(10) NULL DEFAULT NULL,
`num_floors` VARCHAR(10) NULL DEFAULT NULL,
`num_buildings` VARCHAR(10) NULL DEFAULT NULL,
`price_per_sq_ft` INT(11) NULL DEFAULT NULL,
`price_per_unit` INT(11) NULL DEFAULT NULL,
`property_type_id` INT(11) NULL DEFAULT NULL,
`property_type` VARCHAR(60) NULL DEFAULT NULL,
`long_lat_point` POINT NULL DEFAULT NULL,
PRIMARY KEY (`nc_id`),
INDEX `t_national_comps_latitude_longitude_index` (`latitude`, `longitude`),
INDEX `t_national_comps_property_city_index` (`property_city`),
INDEX `t_national_comps_property_state_index` (`property_state`),
INDEX `t_national_comps_sale_date_index` (`sale_date`),
INDEX `t_national_comps_point_index` (`long_lat_point`(25)),
INDEX `t_national_comps_reonomy_id_index` (`reonomy_property_id`),
INDEX `mailing_address_index` (`mailing_address`),
INDEX `mortgage_date_index` (`mortgage_date`),
INDEX `t_national_comps_lender_index` (`lender`),
INDEX `bank_id_index` (`bank_id`),
INDEX `street_num_and_zip` (`property_street_number`, `property_zip`)
);
EDIT
The reason I did not aggregate anything in the query, is because I have nothing to aggregate. I know that is not the primary use of group by, but it is commonly used as such, just to get one of each record.
I was able to speed up the query by forcing the use of an index on all 3 columns, like
select latitude, longitude, property_street_number, property_street_name,
property_city, property_state, property_zip from properties
USE INDEX (lat_long_state_index)
where latitude is not null and longitude is not null
group by property_state;
but I am still looking for more optimization.
Thanks to all for your help.

Group By
I am not convinced Group By should be used in such a way, although internally MySQL could be smart enough ( I am not sure) to use a Distinct when it sees a group by with no aggregation, but i don't think it's a correct way to use Group By.
Index
MySQL uses one index per table per query and it will only pick one, so before you have the three columns, it's correct to pick the index with property_state because MySQL generally won't use index for not equal condition.
You can do a EXPLAIN compare the query before and after force index. MySQL optimizer believe the single column index is better.
Too many indexes will also add overhead to insert. After you have the three columns index, you can actually delete the property_state index because it's covered by the three columns index(leftmost). Your future query will surely use the new index you created.

My suggestion is do not use select *. Instead of using select * use select id, .... This will definitely reduce your execution time.

Related

How to use MYSQL JOIN structure? and how to optimize query time

I have a few problems with MYSQL database that I can't solve.
My query below is taking too much time and making the system hang. I'm trying to use the "JOIN" construct to develop this. But this time my aggregation, which I'm trying to do with "SUM", reduces the query to one line. Is it ok to do this job with "JOIN"? or how should i improve this query.
This database works with a total of 22 client devices in ASP .NET application. As I mentioned above, in cases where the query time is long, when the client devices send a query to the database at the same time, the client device freezes. What I don't understand is why a query in the browser app is making all devices wait. Isn't each query processed as a separate "Thread" in MYSQL? So if a query has a return time of 10 seconds, will all clients wait 10 seconds for the query to be answered in the browser?
SELECT *,
(SELECT MachModel FROM machine WHERE MachCode=workorder.MachCode) AS MachModel,
(SELECT RawMaterialDescription FROM rawmaterials WHERE RawMaterialCode=workorder.ProductRawMaterial) AS RawMaterialDescr,
(SELECT RawMaterialColor FROM rawmaterials WHERE RawMaterialCode=workorder.ProductRawMaterial) AS RawMaterialColor,
(SELECT StaffName FROM staff WHERE AccountName=workorder.AssignStaff) AS AssignStaffName,
(SELECT StaffCode FROM staff WHERE AccountName=workorder.AssignStaff) AS AssignStaffCode,
(SELECT MachStatus FROM machine WHERE MachCode=workorder.MachCode) AS MachStatus,
(SELECT SUM(xStopTime) FROM workorderb WHERE xWoNumber=workorder.WoNumber) AS WoTotalStopTime
FROM workorder
WHERE WoStatus=3
ORDER BY PlanProdStartDate DESC, WoSortNumber, WoNumber LIMIT 100
SELECT workorder.*,machine.MachModel,machine.MachStatus,rawmaterials.RawMaterialDescription,rawmaterials.RawMaterialColor,staff.StaffName,staff.StaffCode,SUM(workorderb.xStopTime)
FROM workorder
LEFT JOIN machine ON machine.MachCode=workorder.MachCode
LEFT JOIN rawmaterials ON rawmaterials.RawMaterialCode=workorder.ProductRawMaterial
LEFT JOIN staff ON staff.AccountName=workorder.AssignStaff
LEFT JOIN workorderb ON workorderb.xWoNumber=workorder.WoNumber
WHERE workorder.WoStatus=3
ORDER BY workorder.PlanProdStartDate DESC, workorder.WoSortNumber, workorder.WoNumber LIMIT 100
CREATE TABLE `workorder` (
`WoNumber` varchar(20) NOT NULL,
`MachCode` varchar(15) NOT NULL,
`PlannedMoldCode` varchar(10) NOT NULL,
`PartyNumber` smallint(6) NOT NULL,
`PlanProdCycleTime` smallint(6) NOT NULL,
`CalAverageCycleTime` float(15,1) unsigned NOT NULL,
`ProductRawMaterial` varchar(30) NOT NULL,
`PlanProdStartDate` date NOT NULL,
`PlanProdFinishDate` int(10) unsigned NOT NULL,
`WoStartDate` datetime DEFAULT NULL,
`WoFinishDate` datetime DEFAULT NULL,
`WoWorkTime` int(10) unsigned NOT NULL,
`WoSystemProductivity` smallint(6) unsigned NOT NULL,
`AssignStaff` varchar(50) DEFAULT '',
`WoStatus` smallint(6) NOT NULL,
`WoSortNumber` int(10) unsigned NOT NULL,
`CycleCount` int(11) unsigned NOT NULL,
`ControlDate` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP,
`WoProductionStatus` smallint(6) NOT NULL DEFAULT '0',
`Creator` varchar(50) NOT NULL,
`Changer` varchar(50) NOT NULL,
`CreateDate` datetime NOT NULL,
PRIMARY KEY (`WoNumber`) USING BTREE,
KEY `WoNumber` (`WoNumber`) USING BTREE,
KEY `WoNumber_2` (`WoNumber`) USING BTREE,
KEY `WoNumber_3` (`WoNumber`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `machine` (
`MachCode` varchar(15) NOT NULL,
`MachModel` varchar(30) NOT NULL,
`FirstProdDate` date NOT NULL,
`MachCapacity` smallint(6) NOT NULL,
`MachStatus` smallint(6) NOT NULL,
`NowMoldOnMach` varchar(10) NOT NULL DEFAULT '',
`NowMachOperator` varchar(50) NOT NULL DEFAULT '',
`NowWorkOrder` varchar(20) NOT NULL DEFAULT '',
`IPNumber` varchar(15) NOT NULL,
`Creator` varchar(50) NOT NULL,
`Changer` varchar(50) NOT NULL,
`ControlDate` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`OperatorLoginDate` datetime DEFAULT NULL,
`Message` varchar(500) DEFAULT NULL,
`MessageReaded` smallint(6) DEFAULT '0',
`StaffName` varchar(50) DEFAULT 'OSIS',
`StaffImage` varchar(255) DEFAULT '',
`StopDesc` varchar(30) DEFAULT 'OSIS',
`StopTime` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`MachCode`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `rawmaterials` (
`RawMaterialCode` varchar(15) NOT NULL,
`RawMaterialDescription` varchar(30) NOT NULL,
`RawMaterialColor` varchar(30) NOT NULL,
PRIMARY KEY (`RawMaterialCode`) USING BTREE,
KEY `RawMaterialCode` (`RawMaterialCode`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `staff` (
`StaffCode` varchar(15) DEFAULT NULL,
`StaffCardCode` varchar(10) DEFAULT NULL,
`StaffName` varchar(50) NOT NULL,
`StaffPassword` varchar(10) NOT NULL,
`StaffStatus` smallint(6) NOT NULL DEFAULT '2',
`StaffDateOfStart` date NOT NULL,
`StaffBirthDay` date DEFAULT NULL,
`StaffGender` varchar(5) DEFAULT NULL,
`StaffRoleA` smallint(6) NOT NULL,
`StaffEmail` varchar(100) NOT NULL,
`StaffImageLink` varchar(255) DEFAULT NULL,
`AccountName` varchar(50) NOT NULL,
`StaffRoleB` smallint(6) NOT NULL,
`StaffRoleD` smallint(6) NOT NULL,
`StaffRoleE` smallint(6) NOT NULL,
`StaffRoleC` smallint(6) NOT NULL,
`StaffRoleF` smallint(6) NOT NULL,
`StaffRoleG` smallint(6) NOT NULL,
`StaffRoleH` smallint(6) NOT NULL,
`StaffRoleI` smallint(6) NOT NULL,
`StaffRoleJ` smallint(6) NOT NULL,
`StaffRoleK` smallint(6) NOT NULL,
`StaffRoleL` smallint(6) NOT NULL,
`StaffRoleM` smallint(6) NOT NULL,
`StaffRoleN` smallint(6) NOT NULL,
`StaffConnection` smallint(6) NOT NULL DEFAULT '2',
`MachineWorked` varchar(15) DEFAULT NULL,
`WorkOrderWorked` varchar(20) DEFAULT NULL,
`StaffGroup` varchar(50) NOT NULL,
`Creator` varchar(50) NOT NULL,
`Changer` varchar(50) DEFAULT NULL,
PRIMARY KEY (`AccountName`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `workorderb` (
`xWoNumber` varchar(20) NOT NULL,
`xMachCode` varchar(15) NOT NULL,
`xPlannedMoldCode` varchar(10) NOT NULL,
`xPartyNumber` smallint(6) NOT NULL,
`xStaffName` varchar(50) NOT NULL,
`xStopCode` smallint(6) NOT NULL,
`xStopStartTime` datetime NOT NULL,
`xStopFinishTime` datetime DEFAULT NULL,
`xStopTime` int(11) DEFAULT NULL,
PRIMARY KEY (`xMachCode`,`xStopStartTime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
Your query with the joins was nearly there: it was just missing a GROUP BYclause.
I have replaced workorder.* with workorder.WoNumber in the SELECT and added GROUP BY workorder.WoNumber.
You can add as many columns from workorder in the SELECT as you like but you must list them in the GROUP BY.
SELECT workorder.WoNumber,machine.MachModel,machine.MachStatus,rawmaterials.RawMaterialDescription,rawmaterials.RawMaterialColor,staff.StaffName,staff.StaffCode,SUM(workorderb.xStopTime)
FROM workorder
LEFT JOIN machine ON machine.MachCode=workorder.MachCode
LEFT JOIN rawmaterials ON rawmaterials.RawMaterialCode=workorder.ProductRawMaterial
LEFT JOIN staff ON staff.AccountName=workorder.AssignStaff
LEFT JOIN workorderb ON workorderb.xWoNumber=workorder.WoNumber
WHERE workorder.WoStatus=3
GROUP BY workorder.WoNumber \* <<= ADD OTHER COLUMNS HERE AS NEEDED *\
ORDER BY workorder.PlanProdStartDate DESC, workorder.WoSortNumber, workorder.WoNumber LIMIT 100;
db<>fiddle here
Use InnoDB, not MyISAM. MyISAM locks the entire table when INSERTing; InnoDB can often allow other threads to run when inserting.
Other notes
workorder has 4 identical indexes on wonumber; keep the PK, toss the rest. Note that a PRIMARY KEY is an index. Check the other tables for redundant Keys.
Do you need the mixture of DESC and ASC in ORDER BY PlanProdStartDate DESC, WoSortNumber, WoNumber? If not, there may be an optimization here.
As Kendle suggests, JOINs would be faster since there are cases where the same table is needed twice. If values might be missing, then LEFT might be useful; it won't change the performance.
Needed:
workorderb: INDEX(xWoNumber, xStopTime)
Is xStopTime an elapsed time? Or a time of day?

Slow query with NULL IS NULL OR condition syntax using Mysql 5.7

I'm getting some strange timing values from Mysql running a "simple" query.
This is the DDL of the table:
CREATE TABLE `frame` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`createdBy` varchar(255) DEFAULT NULL,
`createdDate` datetime(6) NOT NULL,
`lastModifiedBy` varchar(255) DEFAULT NULL,
`lastModifiedDate` datetime(6) DEFAULT NULL,
`sid` varchar(36) NOT NULL,
`version` bigint(20) NOT NULL,
`brand` varchar(255) DEFAULT NULL,
`category` varchar(255) DEFAULT NULL,
`colorCode` varchar(255) DEFAULT NULL,
`colorDescription` varchar(255) DEFAULT NULL,
`description` longtext,
`imageUrl` varchar(255) DEFAULT NULL,
`lastPurchase` datetime(6) DEFAULT NULL,
`lastPurchasePrice` decimal(19,2) DEFAULT NULL,
`lastSell` datetime(6) DEFAULT NULL,
`lastSellPrice` decimal(19,2) DEFAULT NULL,
`line` varchar(255) DEFAULT NULL,
`manufacturer` varchar(255) DEFAULT NULL,
`manufacturerCode` varchar(255) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`preset` bit(1) NOT NULL DEFAULT b'0',
`purchasePrice` decimal(19,2) DEFAULT NULL,
`salesPrice` decimal(19,2) DEFAULT NULL,
`sku` varchar(255) NOT NULL,
`stock` bit(1) NOT NULL DEFAULT b'1',
`thumbUrl` varchar(255) DEFAULT NULL,
`upc` varchar(255) DEFAULT NULL,
`arm` int(11) DEFAULT NULL,
`bridge` int(11) DEFAULT NULL,
`caliber` int(11) DEFAULT NULL,
`gender` varchar(255) DEFAULT NULL,
`lensColor` varchar(255) DEFAULT NULL,
`material` varchar(255) DEFAULT NULL,
`model` varchar(255) NOT NULL,
`sphere` decimal(10,2) DEFAULT NULL,
`type` varchar(255) NOT NULL,
`taxRate_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UK_k7s4esovkoacsc264bcjrre13` (`sid`),
UNIQUE KEY `UK_ajh6mr6a6qg6mgy8t9nevdym1` (`sku`),
UNIQUE KEY `UK_boqikmg9o89j8q0o5ujkj33b3` (`upc`),
KEY `idx_manufacturer` (`manufacturer`),
KEY `idx_brand` (`brand`),
KEY `idx_line` (`line`),
KEY `idx_colorcode` (`colorCode`),
KEY `idx_preset` (`preset`),
KEY `idx_manufacturer_model_color_caliber` (`manufacturer`,`model`,`colorCode`,`caliber`),
KEY `FK1nau29fd70s1nq905dgs6ft85` (`taxRate_id`),
CONSTRAINT `FK1nau29fd70s1nq905dgs6ft85` FOREIGN KEY (`taxRate_id`) REFERENCES `taxrate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=392179 DEFAULT CHARSET=utf8;
The query is created programatically from my application. The "strange" syntax (NULL IS NULL OR condition) is very convenient to me in order to make more compact my code and removing the need to create a different query based on the numbers of parameters.
For who understand how Hibernate HQL and JPA works, this is the query:
This query is generated when the user is not setting any filter, so all parameters in my condition are null and this is how the query comes out.
SELECT SQL_NO_CACHE COUNT(frame0_.`id`) AS col_0_0_ FROM `Frame` frame0_
WHERE (NULL IS NULL OR NULL LIKE CONCAT('%', NULL, '%') OR frame0_.`manufacturer` LIKE CONCAT('%', NULL, '%') OR frame0_.`manufacturerCode`=NULL OR frame0_.`sku`=NULL OR frame0_.`upc`=NULL OR frame0_.`line` LIKE CONCAT('%', NULL, '%') OR frame0_.`model` LIKE CONCAT('%', NULL, '%')) AND (NULL IS NULL OR frame0_.`manufacturer`=NULL) AND (NULL IS NULL OR frame0_.`line`=NULL) AND (NULL IS NULL OR frame0_.`caliber`=NULL) AND (NULL IS NULL OR frame0_.`type`=NULL) AND (NULL IS NULL OR frame0_.`material`=NULL) AND (NULL IS NULL OR frame0_.`model`=NULL) AND (NULL IS NULL OR frame0_.`colorCode`=NULL)
The query takes about 0.105s on a table of 137548 rows.
The EXPLAIN of the previous query returns:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE frame0_ \N ALL \ N \N \N \N 137548 100.00 \N
The previous query is identical to this one:
SELECT SQL_NO_CACHE COUNT(frame0_.`id`) AS col_0_0_ FROM `Frame` frame0_
This query takes just 0.05s for the same result in the same table.
Why for Mysql they are different and the first is taking so much time? Is there a way to improve performance of the first query keeping the syntax "NULL IS NULL or condition"?
I think Ryan has right, so the many or statement made the query so bad.
You should programatically build queries for better performance. So if the user not select by a possible filter, than you shouldn't include to the query!
(HQL)
if(!StringUtils.isEmpty(manufacturer)) {
query.and(m.manufacturer.eq(manufacturer))
}

insert data into mysql syntax error

Here is the sql that I am trying to insert and I get a error.
insert into instruments (symbol,exchange,FullName,IPOYear,Sector,Industry)
values('PIH','Nasdaq','1347 Property Insurance Holdings, Inc.','Finance','Property-Casualty Insurers','http://www.nasdaq.com/symbol/pih')
here is my ddl and I don't find anything wrong with my sql.
CREATE TABLE `instruments` (
`id` INT(10) NOT NULL AUTO_INCREMENT,
`symbol` VARCHAR(100) NOT NULL,
`exchange` VARCHAR(50) NOT NULL,
`FullName` VARCHAR(100) NULL DEFAULT NULL,
`IPOYear` VARCHAR(10) NULL DEFAULT NULL,
`Sector` VARCHAR(20) NULL DEFAULT NULL,
`Industry` VARCHAR(100) NULL DEFAULT NULL,
`LastUpdated` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`)
Sector VARCHAR(20) NULL DEFAULT NULL
You limited Sector to 20 characters and inserted 26.
Make it like this Sector VARCHAR(30) NULL DEFAULT NULL and it should work

Am I wrong in table design or wrong in selected index when made the table?

I've build web application as a tool to eliminate unnecessary data in peoples table, this application mainly to filter all data of peoples who valid to get an election rights. At first, it wasn't a problem when the main table still had few rows, but it is really bad (6 seconds) when the table is filled with about 200K rows (really worse because the table will be up to 6 million rows).
I have table design like below, and I am doing a join with 4 tables (region table start from province, city, district and town). Each region table is related to each other with their own id:
CREATE TABLE `peoples` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`id_prov` smallint(2) NOT NULL,
`id_city` smallint(2) NOT NULL,
`id_district` smallint(2) NOT NULL,
`id_town` smallint(4) NOT NULL,
`tps` smallint(4) NOT NULL,
`urut_xls` varchar(20) NOT NULL,
`nik` varchar(20) NOT NULL,
`name` varchar(60) NOT NULL,
`place_of_birth` varchar(60) NOT NULL,
`birth_date` varchar(30) NOT NULL,
`age` tinyint(3) NOT NULL DEFAULT '0',
`sex` varchar(20) NOT NULL,
`marital_s` varchar(20) NOT NULL,
`address` varchar(160) NOT NULL,
`note` varchar(60) NOT NULL,
`m_name` tinyint(1) NOT NULL DEFAULT '0',
`m_birthdate` tinyint(1) NOT NULL DEFAULT '0' ,
`format_birthdate` tinyint(1) NOT NULL DEFAULT '0' ,
`m_sex` tinyint(1) NOT NULL DEFAULT '0' COMMENT ,
`m_m_status` tinyint(1) NOT NULL DEFAULT '0' ,
`sex_double` tinyint(1) NOT NULL DEFAULT '0',
`id_import` bigint(10) NOT NULL,
`id_workspace` tinyint(4) unsigned NOT NULL DEFAULT '0',
`stat_valid` smallint(1) NOT NULL DEFAULT '0' ,
`add_manual` tinyint(1) unsigned NOT NULL DEFAULT '0' ,
`insert_by` varchar(12) NOT NULL,
`update_by` varchar(12) DEFAULT NULL,
`mark_as_duplicate` smallint(1) NOT NULL DEFAULT '0' ,
`mark_as_trash` smallint(1) NOT NULL DEFAULT '0' ,
`in_date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ind_import` (`id_import`),
KEY `ind_duplicate` (`mark_as_duplicate`),
KEY `id_workspace` (`id_workspace`),
KEY `tambah_manual` (`tambah_manual`),
KEY `il` (`stat_valid`,`mark_as_trash`,`in_date_time`),
KEY `region` (`id_prov`,`id_kab`,`id_kec`,`id_kel`,`tps`),
KEY `name` (`name`),
KEY `place_of_birth` (`place_of_birth`),
KEY `ind_birth` (`birthdate`(10)),
KEY `ind_sex` (`sex`(2))
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
town:
CREATE TABLE `town` (
`id` smallint(4) NOT NULL,
`id_district` smallint(2) NOT NULL,
`id_city` smallint(2) NOT NULL,
`id_prov` smallint(2) NOT NULL,
`name_town` varchar(60) NOT NULL,
`handprint` blob,
`pps_1` varchar(60) DEFAULT NULL,
`pps_2` varchar(60) DEFAULT NULL,
`pps_3` varchar(60) DEFAULT NULL,
`tpscount` smallint(2) DEFAULT NULL,
`pps_4` varchar(60) DEFAULT NULL,
`pps_5` varchar(60) DEFAULT NULL,
PRIMARY KEY (`id_prov`,`id_kab`,`id_kec`,`id`),
KEY `name_town` (`name_town`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
and the query like
SELECT `E`.`id`, `E`.`id_prov`, `E`.`id_city`, `E`.`id_district`, `E`.`id_town`,
`B`.`name_prov`,`C`.`name_city`,`D`.`name_district`, `A`.`name_town`,
`E`.`tps`, `E`.`urut_xls`, `E`.`nik`,`E`.`name`,`E`.`place_of_birth`,
`E`.`birth_date`, `E`.age, `E`.`sex`, `E`.`marital_s`, `E`.`address`,
`E`.`note`
FROM peoples E
JOIN test_prov B ON E.id_prov = B.id
JOIN test_city C ON E.id_city = C.id
AND (C.id_prov=B.id)
JOIN test_district D ON E.id_district = D.id
AND ((D.id_city = C.id) AND (D.id_prov= B.id))
JOIN test_town A ON E.id_town = A.id
AND ((A.id_district = D.id)
AND (A.id_city = C.id)
AND (A.id_prov = B.id))
AND E.stat_valid=1
AND E.mark_as_trash=0
mark_as_trash is a mark column which only contain 1 and zero just to know if the data has been mark as a deleted record, and stat_valid is the filtered result value - if value is 1 then the data is valid to get the rights of election.
I've tried to see the explain but no column is used as an index lookup. I believe that's the problem why the application so slow in 200K rows. The query above only shows two conditions, but the application has a feature to filter by name, place of birth, birth date, age with ranges and so on.
How can I make this perform better?
Can a city be in two provinces? If not then why do you check C.id_prov=B.id if E.id_city = C.id should give you just one row?
Also it seems that your query is slow because you're selecting 200k rows. Indexes will improve performance but do you really need all the rows at once? You should use pagination (limit, offset).

Odd MySQL Behavior - Query Optimization Help

We have a central login that we use to support multiple websites. To store our users' data we have an accounts table which stores each user account and then users tables for each site for site specific information.
We noticed that one query that is joining the tables on their primary key user_id is executing slowly. I'm hoping that some SQL expert out there can explain why it's using WHERE to search the users_site1 table and suggest how we can optimize it. Here is the slow query & the explain results:
mysql> explain select a.user_id as 'id',a.username,a.first_name as 'first',a.last_name as 'last',a.sex,u.user_id as 'profile',u.facebook_id as 'fb_id',u.facebook_publish as 'fb_publish',u.facebook_offline as 'fb_offline',u.twitter_id as 'tw_id',u.api_session as 'mobile',a.network from accounts a left join users_site1 u ON a.user_id=u.user_id AND u.status="R" where a.status="R" AND u.status="R" AND a.facebook_id='1234567890';
+----+-------------+-------+--------+----------------+---------+---------+-----------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------+---------+---------+-----------------------+-------+-------------+
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 79769 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,status | PRIMARY | 4 | alltrailsdb.u.user_id | 1 | Using where |
+----+-------------+-------+--------+----------------+---------+---------+-----------------------+-------+-------------+
2 rows in set (0.00 sec)
Here are the definitions for each table:
CREATE TABLE `accounts` (
`user_id` int(9) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(40) DEFAULT NULL,
`facebook_id` bigint(15) unsigned DEFAULT NULL,
`facebook_username` varchar(30) DEFAULT NULL,
`password` varchar(20) DEFAULT NULL,
`profile_photo` varchar(100) DEFAULT NULL,
`first_name` varchar(40) DEFAULT NULL,
`middle_name` varchar(40) DEFAULT NULL,
`last_name` varchar(40) DEFAULT NULL,
`suffix_name` char(3) DEFAULT NULL,
`organization_name` varchar(100) DEFAULT NULL,
`organization` tinyint(1) unsigned DEFAULT NULL,
`address` varchar(200) DEFAULT NULL,
`city` varchar(40) DEFAULT NULL,
`state` varchar(20) DEFAULT NULL,
`zip` varchar(10) DEFAULT NULL,
`province` varchar(40) DEFAULT NULL,
`country` int(3) DEFAULT NULL,
`latitude` decimal(11,7) DEFAULT NULL,
`longitude` decimal(12,7) DEFAULT NULL,
`phone` varchar(20) DEFAULT NULL,
`sex` char(1) DEFAULT NULL,
`birthday` date DEFAULT NULL,
`about_me` varchar(2000) DEFAULT NULL,
`activities` varchar(300) DEFAULT NULL,
`website` varchar(100) DEFAULT NULL,
`email` varchar(150) DEFAULT NULL,
`referrer` int(4) unsigned DEFAULT NULL,
`referredid` int(9) unsigned DEFAULT NULL,
`verify` int(6) DEFAULT NULL,
`status` char(1) DEFAULT 'R',
`created` datetime DEFAULT NULL,
`verified` datetime DEFAULT NULL,
`activated` datetime DEFAULT NULL,
`network` datetime DEFAULT NULL,
`deleted` datetime DEFAULT NULL,
`logins` int(6) unsigned DEFAULT '0',
`api_logins` int(6) unsigned DEFAULT '0',
`last_login` datetime DEFAULT NULL,
`last_update` datetime DEFAULT NULL,
`private` tinyint(1) unsigned DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
PRIMARY KEY (`user_id`),
UNIQUE KEY `username` (`username`),
KEY `status` (`status`),
KEY `state` (`state`)
);
CREATE TABLE `users_site1` (
`user_id` int(9) unsigned NOT NULL,
`facebook_id` bigint(15) unsigned DEFAULT NULL,
`facebook_username` varchar(30) DEFAULT NULL,
`facebook_publish` tinyint(1) unsigned DEFAULT NULL,
`facebook_checkin` tinyint(1) unsigned DEFAULT NULL,
`facebook_offline` varchar(300) DEFAULT NULL,
`twitter_id` varchar(60) DEFAULT NULL,
`twitter_secret` varchar(50) DEFAULT NULL,
`twitter_username` varchar(20) DEFAULT NULL,
`type` char(1) DEFAULT 'M',
`referrer` int(4) unsigned DEFAULT NULL,
`referredid` int(9) unsigned DEFAULT NULL,
`session` varchar(60) DEFAULT NULL,
`api_session` varchar(60) DEFAULT NULL,
`status` char(1) DEFAULT 'R',
`created` datetime DEFAULT NULL,
`verified` datetime DEFAULT NULL,
`activated` datetime DEFAULT NULL,
`deleted` datetime DEFAULT NULL,
`logins` int(6) unsigned DEFAULT '0',
`api_logins` int(6) unsigned DEFAULT '0',
`last_login` datetime DEFAULT NULL,
`last_update` datetime DEFAULT NULL,
`ip` varchar(20) DEFAULT NULL,
PRIMARY KEY (`user_id`)
);
Add a index on the column facebook_id in the accounts table.
Current, MySql is scanning the entire users table, since it cannot find the record directly in the account table.
The least create 3 indexes on accounts.user_id, user_site1.user_id and accounts.facebook_id. It's likely that user_id indexes already exist as they are defined as PKs though.
Your query is looking for rows in table accounts based on the Facebook ID and on the account "status". You don't have any indexes that help with this, so MySQL is doing a table scan. I suggest the following index:
ALTER TABLE accounts ADD INDEX (facebook_id, user_id)
If you wanted, you could even include the status column in the index. Whether this is a good idea or not would really depend on whether or not it would help to make the index an attractive choice for the optimiser for any other queries you plan to run.
PS. The comment "using where" is normal and is to be expected in most queries. The thing to be concerned about here is the fact that MySQL is not using an index, and that it thinks it has to examine a large number of rows (surely this should not be the case when you are passing in a specific ID number).
Maybe because you haven't created indexes on the columns you're searching on??? Try indexing the columns used on the join statements. Without indexing, you're scanning through all the dataset.
CREATE INDEX accounts_user_id_index ON accounts (user_id);
CREATE INDEX accounts.facebook_id_index ON accounts (status);
CREATE INDEX user_site1.user_id_index ON user_site1 (user_id);