Never ending MySQL query during data import - mysql

I'm working on a data import routine from a set of CSV files into my main database and am stuck with this particular set of data. I've used LOAD DATA LOCAL INFILE to dump the CSV data into my table, feed_hcp_leasenote:
CREATE TABLE `feed_hcp_leasenote` (
`BLDGID` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`LEASID` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`NOTEDATE` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`REF1` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`REF2` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`LASTDATE` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`USERID` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`NOTETEXT` varchar(1000) COLLATE utf8_unicode_ci DEFAULT NULL,
`tempid` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`tempid`),
KEY `BLDGID` (`BLDGID`),
KEY `LEASID` (`LEASID`),
KEY `REF1` (`REF1`),
KEY `NOTEDATE` (`NOTEDATE`)
) ENGINE=MyISAM AUTO_INCREMENT=65002 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I'm trying to import this data into two tables, lease_notes and customfield_data. lease_notes only stores a unique ID value, the note itself, and the lid which links it to the lease table. customfield_data stores a variety of data for system- and user-created fields, with each record linked to another table via the linkid field. Here's the lease_note table:
CREATE TABLE `lease_notes` (
`lnid` int(11) NOT NULL AUTO_INCREMENT,
`notetext` longtext COLLATE utf8_unicode_ci NOT NULL,
`lid` int(11) NOT NULL COMMENT 'Lease ID',
PRIMARY KEY (`lnid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And the customfield_data table:
CREATE TABLE `customfield_data` (
`cfdid` int(11) NOT NULL AUTO_INCREMENT,
`data_int` int(11) DEFAULT NULL,
`data_date` datetime DEFAULT NULL,
`data_smtext` varchar(1000) COLLATE utf8_unicode_ci DEFAULT NULL,
`data_lgtext` longtext COLLATE utf8_unicode_ci,
`data_numeric` decimal(20,2) DEFAULT NULL,
`linkid` int(11) DEFAULT NULL COMMENT 'ID value of specific item',
`cfid` int(11) NOT NULL COMMENT 'Custom field ID',
PRIMARY KEY (`cfdid`),
KEY `data_smtext` (`data_smtext`(333)),
KEY `linkid` (`linkid`),
KEY `cfid` (`cfid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The query that is getting stuck is as follows:
SELECT NOTEDATE, REF1, REF2, LASTDATE, USERID, feed_hcp_leasenote.NOTETEXT, leases.lid, lease_notes.lnid
FROM feed_hcp_leasenote
JOIN customfield_data mrileaseid ON feed_hcp_leasenote.LEASID = mrileaseid.data_smtext AND mrileaseid.cfid = ?
JOIN leases ON mrileaseid.linkid = leases.lid
JOIN suites ON leases.sid = suites.sid
JOIN floors ON suites.fid = floors.fid
JOIN customfield_data coid ON floors.bid = coid.linkid AND coid.cfid = ? AND coid.data_smtext = feed_hcp_leasenote.BLDGID
JOIN customfield_data status ON leases.lid = status.linkid AND status.cfid = ? AND status.data_smtext <> ?
LEFT JOIN lease_notes ON leases.lid = lease_notes.lid
LEFT JOIN customfield_data notedate ON lease_notes.lnid = notedate.linkid AND notedate.data_date = feed_hcp_leasenote.NOTEDATE AND notedate.cfid = ?
LEFT JOIN customfield_data ref1 ON lease_notes.lnid = ref1.linkid AND ref1.data_smtext = feed_hcp_leasenote.REF1 AND ref1.cfid = ?
My goal with this is to return all records in feed_hcp_leasenote and, depending on whether or not lease_notes.lnid is null, insert or update the records as needed (nulls would be inserts, not nulls would be updates.) The problem is that the provided data uses a combination of 4 fields to determine uniqueness: BLDGID, LEASID, NOTEDATE, and REF1. A note will not exist without a proper BLDGID and LEASID (translated in my query to a valid lid. It can match an existing record with a valid lid, NOTEDATE, and REF1, but if those don't match then I can assume it's a new record.
If I chop off all of the LEFT JOINs and the lease_notes.lnid from the SELECT, it executes properly and gives me all records. Since I couldn't get my original query to work I played with the idea of cycling all results and performing another SELECT to see if the notedate and ref1 matched. If not, I INSERTed, otherwise UPDATE. While this approach works it can only process about 20 records per second which is a problem when I'm dealing with 30,000 at a crack.
Since I got asked about it in a previous question, here's an EXPLAIN of my query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE status ref data_smtext,linkid,cfid cfid 4 const 934 Using where
1 SIMPLE mrileaseid ref data_smtext,linkid,cfid linkid 5 rl_hpsi.status.linkid 19 Using where
1 SIMPLE leases eq_ref PRIMARY,sid PRIMARY 4 rl_hpsi.mrileaseid.linkid 1 Using where
1 SIMPLE suites eq_ref PRIMARY,fid PRIMARY 4 rl_hpsi.leases.sid 1
1 SIMPLE floors eq_ref PRIMARY,bid PRIMARY 4 rl_hpsi.suites.fid 1
1 SIMPLE feed_hcp_leasenote ref BLDGID,LEASID LEASID 153 rl_hpsi.mrileaseid.data_smtext 19 Using where
1 SIMPLE coid ref data_smtext,linkid,cfid data_smtext 1002 rl_hpsi.feed_hcp_leasenote.BLDGID 10 Using where
1 SIMPLE lease_notes ALL NULL NULL NULL NULL 15000
1 SIMPLE notedate ref linkid,cfid linkid 5 rl_hpsi.lease_notes.lnid 24
1 SIMPLE ref1 ref data_smtext,linkid,cfid data_smtext 1002 rl_hpsi.feed_hcp_leasenote.REF1 10
Can anyone point me in the right direction? Thanks!

From our comments:
The answer is to add the columns that make an entry unique to your destination table and create a compound unique key on them. Then when inserting to that table use INSERT ON DUPLICATE KEY UPDATE to prevent duplicate data. When the insert is complete you can drop those columns if they are no longer necessary, to prevent storing data in multiple tables.

Related

MySQL doesn't use indexes in JOIN query

I have a main table with 500000+ rows.
CREATE TABLE `esc_questions`(
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`esc_id` INT(11) NOT NULL,
`question_text` LONGTEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_1` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_2` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_3` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_4` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_5` TEXT COLLATE utf8_unicode_ci NOT NULL,
`right_answer` VARCHAR(255) COLLATE utf8_unicode_ci NOT NULL,
`disciplinas_id` INT(11) UNSIGNED NOT NULL,
`assunto_id` INT(11) UNSIGNED NOT NULL,
`orgao_id` INT(11) UNSIGNED NOT NULL,
`cargo_id` INT(11) UNSIGNED NOT NULL,
`ano` INT(11) NOT NULL,
`banca_id` INT(11) UNSIGNED NOT NULL,
`question_type` TINYINT(4) NOT NULL,
`url` TEXT COLLATE utf8_unicode_ci NOT NULL,
`created_at` TIMESTAMP NULL DEFAULT NULL,
`updated_at` TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY(`id`),
KEY `idx_ano`(`ano`) USING BTREE,
KEY `idx_question_type`(`question_type`) USING BTREE,
KEY `idx_cargo_id`(`cargo_id`) USING BTREE,
KEY `idx_orgao_id`(`orgao_id`) USING BTREE,
KEY `idx_banca_id`(`banca_id`) USING BTREE,
KEY `idx_question_id`(`id`) USING BTREE,
KEY `idx_assunto_id`(`assunto_id`) USING BTREE,
KEY `idx_disciplinas_id`(`disciplinas_id`) USING BTREE,
CONSTRAINT `fk_assunto_id` FOREIGN KEY(`assunto_id`) REFERENCES `esc_assunto`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_banca_id` FOREIGN KEY(`banca_id`) REFERENCES `esc_bancas`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_cargo_id` FOREIGN KEY(`cargo_id`) REFERENCES `esc_cargo`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_disciplinas_id` FOREIGN KEY(`disciplinas_id`) REFERENCES `esc_disciplinas`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_orgao_id` FOREIGN KEY(`orgao_id`) REFERENCES `esc_orgao`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE = INNODB AUTO_INCREMENT = 516157 DEFAULT CHARSET = utf8 COLLATE = utf8_unicode_ci
Related data is stored to five additional tables, very similar to this one:
CREATE TABLE `esc_assunto`(
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY(`id`),
KEY `idx_assunto_id`(`id`) USING BTREE,
KEY `idx_assunto_name`(`name`(30)),
CONSTRAINT `fk_assunto` FOREIGN KEY(`id`) REFERENCES `esc_questions`(`assunto_id`) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = INNODB AUTO_INCREMENT = 3618 DEFAULT CHARSET = utf8 COLLATE = utf8_unicode_ci
I have pagination on my website. When I'm trying to get latest pages, time taken for data request is rising.
Here is my SELECT for this task:
SELECT
f.*,
d.name disciplinas,
o.name orgao,
c.name cargo,
b.name banca,
a.name assunto
FROM
`esc_questions` f
INNER JOIN
`esc_bancas` b
ON
f.banca_id = b.id
INNER JOIN
`esc_disciplinas` d
ON
f.disciplinas_id = d.id
INNER JOIN
`esc_assunto` a
ON
f.assunto_id = a.id
INNER JOIN
`esc_orgao` o
ON
f.orgao_id = o.id
INNER JOIN
`esc_cargo` c
ON
f.cargo_id = c.id
LIMIT 400020, 20
This query takes a long time on Sending Data stage showed in query profiler.
Sending Data 17.6 s 99.99% 1 17.6 s
EXPLAIN shows the following:
1 SIMPLE d ALL PRIMARY,idx_disciplinas_id 247
1 SIMPLE f ref idx_cargo_id,idx_orgao_id,idx_banca_id,idx_assunto_id,idx_disciplinas_id idx_disciplinas_id 4 concursos.d.id 1116
1 SIMPLE o eq_ref PRIMARY,idx_orgao_id PRIMARY 4 concursos.f.orgao_id 1
1 SIMPLE c eq_ref PRIMARY,idx_cargo_id PRIMARY 4 concursos.f.cargo_id 1
1 SIMPLE a eq_ref PRIMARY,idx_assunto_id PRIMARY 4 concursos.f.assunto_id 1
1 SIMPLE b eq_ref PRIMARY,idx_bancas_id PRIMARY 4 concursos.f.banca_id 1
I spent all day to make this work fast, but no success.
Can somebody tell me what's wrong with my select query or why MySQL doesn't use indexes?
Any help appreciated.
You have the wrong approach in several says. First, your query has no order by clause. A query is not guaranteed to return the results in the same order on multiple executions (although in practice queries do, debugging such a problem could be really hard).
So, you should add an order by, probably on the primary key of esc_questions and whatever secondary keys are necessary.
Second, the offset of 400020 is rather large. MySQL is going to generate 400,020 rows and discard them, before finding the 400,021st row.
My suggestion is to find the "id" used in the sort and then include a where clause:
where ?? > $last_id
. . .
order by ??
limit 20
This may not (or may) speed up the load the first time, but it should speed subsequent loads.
I found solution myself. I need to avoid LIMIT with offset in my JOIN query.
In order to do this I need to do some preparation:1. Get only ids from my main table without any joins on needed offset. This query took 0.0856 seconds
SELECT id FROM `esc_questions` WHERE 1 LIMIT 489980, 20
2. Create composite index in order you will then make your query. In my case I use the following index:
...
KEY `idx_filter_search` (`id`,`disciplinas_id`,`assunto_id`,`orgao_id`,`cargo_id`,`banca_id`) USING BTREE,
...
3. Finally make your query. Query took 0.0040 seconds:
SELECT SQL_NO_CACHE
f.*,
d.name disciplinas,
o.name orgao,
c.name cargo,
b.name banca,
a.name assunto
FROM
`esc_questions` f FORCE INDEX(idx_filter_search),
`esc_disciplinas` d,
`esc_assunto` a,
`esc_orgao` o,
`esc_cargo` c,
`esc_bancas` b
WHERE
f.id IN(
497442,
497444,
497445,
497447,
497449,
497450,
497452,
497453,
497454,
497456,
497458,
497459,
497461,
497462,
497464,
497465,
497467,
497468,
497470,
497471
) AND f.disciplinas_id = d.id AND f.assunto_id = a.id AND f.orgao_id = o.id AND f.cargo_id = c.id AND f.banca_id = b.id
ORDER BY
id
EXPLAIN this query will tell me that it's using my newly created index.
1 | SIMPLE | f | range | idx_filter_search | idx_filter_search | 4 | NULL | 20 | Using where
Hope this helps someone.
Thanks #GordonLinoff for pointing me to the right direction.

Huge innodb tables with SELECT performance issue

I have two huge innodb tables (page: +40M rows, +30Gb and stat: +45M rows, +10Gb). I have a query that selects rows from the join of these two tables and it used to take about a second for execution. Recently it's taking more than 20 seconds (sometime up to few minutes) for the exact same query to be completed. I suspected that with lot's of inserts and updates it might need an optimization. I ran OPTIMIZE TABLE on the table using phpMyAdmin but no improvements. I've Googled a lot but couldn't find any content helping me on this situation.
The query I mentioned earlier looks like below:
SELECT `c`.`unique`, `c`.`pub`
FROM `pages` `c`
LEFT JOIN `stat` `s` ON `c`.`unique`=`s`.`unique`
WHERE `s`.`isc`='1'
AND `s`.`haa`='0'
AND (`pubID`='24')
ORDER BY `eid` ASC LIMIT 0, 10
These are the tables structure:
CREATE TABLE `pages` (
`eid` int(10) UNSIGNED NOT NULL,
`ti` text COLLATE utf8_persian_ci NOT NULL,
`fat` text COLLATE utf8_persian_ci NOT NULL,
`de` text COLLATE utf8_persian_ci NOT NULL,
`fad` text COLLATE utf8_persian_ci NOT NULL,
`pub` varchar(100) COLLATE utf8_persian_ci NOT NULL,
`pubID` int(10) UNSIGNED NOT NULL,
`pubn` text COLLATE utf8_persian_ci NOT NULL,
`unique` tinytext COLLATE utf8_persian_ci NOT NULL,
`pi` tinytext COLLATE utf8_persian_ci NOT NULL,
`kw` text COLLATE utf8_persian_ci NOT NULL,
`fak` text COLLATE utf8_persian_ci NOT NULL,
`te` text COLLATE utf8_persian_ci NOT NULL,
`fae` text COLLATE utf8_persian_ci NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
ALTER TABLE `pages`
ADD PRIMARY KEY (`eid`),
ADD UNIQUE KEY `UNIQ` (`unique`(128)),
ADD KEY `pub` (`pub`),
ADD KEY `unique` (`unique`(128)),
ADD KEY `pubID` (`pubID`) USING BTREE;
ALTER TABLE `pages` ADD FULLTEXT KEY `faT` (`fat`);
ALTER TABLE `pages` ADD FULLTEXT KEY `faA` (`fad`,`fae`);
ALTER TABLE `pages` ADD FULLTEXT KEY `faK` (`fak`);
ALTER TABLE `pages` ADD FULLTEXT KEY `pubn` (`pubn`);
ALTER TABLE `pages` ADD FULLTEXT KEY `faTAK` (`fat`,`fad`,`fak`,`fae`);
ALTER TABLE `pages` ADD FULLTEXT KEY `ab` (`de`,`te`);
ALTER TABLE `pages` ADD FULLTEXT KEY `Ti` (`ti`);
ALTER TABLE `pages` ADD FULLTEXT KEY `Kw` (`kw`);
ALTER TABLE `pages` ADD FULLTEXT KEY `TAK` (`ti`,`de`,`kw`,`te`);
ALTER TABLE `pages`
MODIFY `eid` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
CREATE TABLE `stat` (
`sid` int(10) UNSIGNED NOT NULL,
`unique` tinytext COLLATE utf8_persian_ci NOT NULL,
`haa` tinyint(1) UNSIGNED NOT NULL,
`isc` tinyint(1) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
ALTER TABLE `stat`
ADD PRIMARY KEY (`sid`),
ADD UNIQUE KEY `Unique` (`unique`(128)),
ADD KEY `isc` (`isc`),
ADD KEY `haa` (`haa`),
ALTER TABLE `stat`
MODIFY `sid` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
The following query took only 0.0126 seconds with 38685601 total results as said by phpMyAdmin:
SELECT `sid` FROM `stat` WHERE `s`.`isc`='1' AND `s`.`haa`='0'
and this one took 0.0005 seconds with 5159484 total results
SELECT `eid`, `unique`, `pubn`, `pi` FROM `pages` WHERE `pubID`='24'
Am I missing something? Can anybody help?
The slowdown is probably due to scanning so many rows, and that is now more than can fit in cache. So, let's try to improve the query.
Replace INDEX(pubID) with INDEX(pubID, eid) -- This may allow both the WHERE and ORDER BY to be handled by the index, thereby avoiding a sort.
Replace TINYTEXT with VARCHAR(255) or some smaller limit. This may speed up tmp tables.
Don't use prefix index on eid -- its an INT !
Don't say UNIQUE with prefixing -- UNIQUE(x(128)) only checks the uniqueness of the first 128 columns !
Once you change to VARCHAR(255) (or less), you can apply UNIQUE to the entire column.
The biggest performance issue is filtering on two tables -- can you move the status flags into the main table?
Change LEFT JOIN to JOIN.
What does unique look like? If it is a "UUID", that could further explain the trouble.
If that is a UUID that is 39 characters, the string can be converted to a 16-byte column for further space savings (and speedup). Let's discuss this further if necessary.
5 million results in 0.5ms is bogus -- it was fetching from the Query cache. Either turn off the QC or run with SELECT SQL_NO_CACHE...
+1 to #RickJames answer, but following it I have done a test.
I would also recommend you do not use the name unique for a column name, because it's an SQL reserved word.
ALTER TABLE pages
CHANGE `unique` objectId VARCHAR(128) NOT NULL COMMENT 'Document Object Identifier',
DROP KEY pubId,
ADD KEY bktest1 (pubId, eid, objectId, pub);
ALTER TABLE stat
CHANGE `unique` objectId VARCHAR(128) NOT NULL COMMENT 'Document Object Identifier',
DROP KEY `unique`,
ADD UNIQUE KEY bktest2 (objectId, isc, haa);
mysql> explain SELECT `c`.`objectId`, `c`.`pub` FROM `pages` `c` JOIN `stat` `s` ON `c`.`objectId`=`s`.`objectId` WHERE `s`.`isc`='1' AND `s`.`haa`='0' AND (`pubID`='24') ORDER BY `eid` ASC LIMIT 0, 10;
+----+-------------+-------+------------+--------+-------------------------+---------+---------+-----------------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------+---------+---------+-----------------------------+------+----------+--------------------------+
| 1 | SIMPLE | c | NULL | ref | unique,unique_2,bktest1 | bktest1 | 4 | const | 1 | 100.00 | Using where; Using index |
| 1 | SIMPLE | s | NULL | eq_ref | bktest2,haa,isc | bktest2 | 388 | test.c.objectId,const,const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+-------------------------+---------+---------+-----------------------------+------+----------+--------------------------+
By creating the multi-column indexes, this makes them covering indexes, and you see "Using index" in the EXPLAIN report.
It's important to put eid second in the bktest1 index, so you avoid a filesort.
This is the best you can hope to optimize this query without denormalizing or partitioning the tables.
Next you should make sure your buffer pool is large enough to hold all the requested data.

Validate fields from one table to another in MySQL

The problem:
I have 1 table of aprox 5000 rows called imported_cities
I have 1 table of aprox 800 000 rows called postal_codes containing postal codes cities
I need to validate each distinct city from imported_cities against the cities in postal codes table based on city name and its province. See tables structure below.
If they match exactly (yes, exactly. The rest of cities are manually validated) I have to update a column on imported_city and
enter both city from imported_cities and city from postal_codes (side by side) into a third table called imported_cities_equiv
What I have tried:
Adding indexes to tables and make query below. It takes forever... :(
explain SELECT DISTINCT ic.destinationCity, pc.city FROM (imported_cities ic, postalcodes pc)
WHERE LOWER(ic.destinationCity) = LOWER(pc.city)
the result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ip index NULL company_city 478 NULL 4221 Using index; Using temporary
1 SIMPLE pc index NULL city_prov 160 NULL 765407 Using where; Using index; Using join buffer (Block...
--
-- Table structure for table postalcodes
CREATE TABLE IF NOT EXISTS `postalcodes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(11) NOT NULL,
`city` varchar(50) NOT NULL,
`province` varchar(50) NOT NULL,
`provinceISO` varchar(2) NOT NULL,
`latitude` decimal(17,13) NOT NULL,
`longitude` decimal(17,13) NOT NULL,
PRIMARY KEY (`id`),
KEY `code` (`code`),
KEY `city_prov` (`city`,`provinceISO`)
--
-- Table structure for table imported_cities
CREATE TABLE IF NOT EXISTS `imported_cities` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`companyName` varchar(30) CHARACTER SET utf8 NOT NULL,
`destinationCity` varchar(128) CHARACTER SET utf8 NOT NULL,
`destinationProvince` varchar(20) CHARACTER SET utf8 NOT NULL,
`equivCity` varchar(128) CHARACTER SET utf8 DEFAULT NULL,
`minAmount` decimal(6,2) NOT NULL
PRIMARY KEY (`id`),
KEY `company_city` (`companyName`,`destinationCity`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=7933 ;
--
-- Table structure for table imported_cities_equiv
CREATE TABLE IF NOT EXISTS `imported_cities_equiv` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`imported_city` varchar(128) CHARACTER SET utf8 NOT NULL,
`pc_city` varchar(128) CHARACTER SET utf8 NOT NULL,
`province` varchar(20) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=149 ;
Any help or suggestion is appreciated. Thank you.
The query you want to get your information is:
SELECT ip.*, (pc.city is not null) as exact match
FROM imported_prices ip left join
postalcodes pc
on LOWER(ip.destinationCity) = LOWER(pc.city) and
lower(ip.province) = lower(pc.province);
However, this will have really bad performance. Getting rid of the lower() would help:
SELECT ip.*, (pc.city is not null) as exact match
FROM imported_prices ip left join
postalcodes pc
on(ip.destinationCity) =(pc.city) and
(ip.province) = (pc.province);
Because then you can add an index on postalcodes(city, province).
If you cannot use remove lower(), then alter the table to add new columns and put the lower-case values in those columns. Then build an index on the new columns and use them in the join.
Thank you all for pointing me on the right direction.
Some changes have been made following your advices:
added indexes on imported_cities table on destinationCity and destinationProvince columns
added indexes on postalcodes table on city and provinceISO columns
JOIN clause have only one side upper since the field ic.destinationCity is already in uppercase
limit query by province on WHERE for performance
The final SQL is:
SELECT DISTINCT pc.city, pc.provinceISO
FROM postalcodes pc
LEFT JOIN imported_cities ic
ON upper(pc.city) = ic.destinationCity AND
pc.provinceISO = ic.destinationProvince
WHERE ic.destinationProvince = 'QC';
AND the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pc ref province province 8 const 278115 Using index condition; Using temporary
1 SIMPLE ip ref destinationCity,destinationProvince destinationCity 386 func 1 Using index condition; Using where; Distinct
Going forward I can now construct the INSERT query on PHP and make one INSERT query to insert all equivalent cities on the 3rd table. Thank you all.

How to create indexes efficiently

I wish to know how I can create indexes in my database according to my data structure. most of my queries are fetching data against the ID and the name as well with two or three tables joining while pagination. please advise how to make indexes according to below queries.
Query:1
SELECT DISTINCT topic, type FROM books where type like 'Tutor-Books' order by topic
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE books range faith faith 102 NULL 132 Using index condition; Using temporary; Using filesort
Query:2
SELECT books.name, books.name2, books.id, books.image, books.faith,
books.topic, books.downloaded, books.viewed, books.language,
books.size, books.author as author_id, authors.name as author_name,
authors.aid
from books
LEFT JOIN authors ON books.author = authors.aid
WHERE books.id = '".$id."'
AND status = 1
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE books const PRIMARY PRIMARY 4 const 1 NULL
1 SIMPLE authors const aid aid 4 const 1 NULL
Can i use indexes for pagination in offset case where same query returns total:
SELECT SQL_CALC_FOUND_ROWS books.name, books.name2, books.id,
books.image, books.topic, books.author as author_id,
authors.name as author_name, authors.aid
from books
LEFT JOIN authors ON books.author = authors.aid
WHERE books.author = '$pid'
AND status = 1
ORDER BY books.name
LIMIT $limit OFFSET $offset
Do I need to update my queries after creating indexes. please also suggest what should be the table format.
SHOW CREATE TABLE books:
Table Create Table
books CREATE TABLE `books` (
`name` varchar(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`name2` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`author` int(100) NOT NULL,
`translator` int(120) NOT NULL,
`publisher` int(100) NOT NULL,
`pages` int(50) NOT NULL,
`date` varchar(50) CHARACTER SET latin1 NOT NULL,
`downloaded` int(100) NOT NULL,
`alt_lnk` text NOT NULL,
`viewed` int(100) NOT NULL,
`language` varchar(100) CHARACTER SET latin1 NOT NULL,
`image` varchar(200) CHARACTER SET latin1 NOT NULL,
`faith` varchar(100) CHARACTER SET latin1 NOT NULL,
`id` int(100) NOT NULL AUTO_INCREMENT,
`sid` varchar(1200) CHARACTER SET latin1 DEFAULT NULL,
`topic` varchar(100) CHARACTER SET latin1 NOT NULL,
`last_viewed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`size` double NOT NULL,
`status` int(2) NOT NULL DEFAULT '0',
`is_scroll` int(2) NOT NULL,
`is_downloaded` int(2) NOT NULL,
`pdf_not_found` int(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `downloaded` (`downloaded`),
KEY `name2` (`name2`),
KEY `topic` (`topic`),
KEY `faith` (`faith`)
) ENGINE=InnoDB AUTO_INCREMENT=12962 DEFAULT CHARSET=utf8
where type like 'Tutor-Books' order by topic (or:)
where type = 'Tutor-Books' order by topic
--> INDEX(type, topic)
where type like '%Tutor-Books' order by topic
--> INDEX(topic) -- the leading % prevents indexing
LEFT JOIN authors ON books.author = authors.aid
--> PRIMARY KEY(aid)
Do you really need LEFT JOIN? If you can change it to JOIN, the optimizer might be able to start with authors. If it does, then
--> INDEX(author) -- in `books`
My cookbook for building indexes.
Other tips:
INT(100) and INT(2) are identical -- each is a 4-byte signed integer. Read about TINYINT UNSIGNED for numbers 0..255, etc. Use that for your flags (status, is_scroll, etc)
DATE is a datatype; using a VARCHAR is problematic if you ever want to compare or order.
Learn about composite indexes, such as my first example.
Your display widths are a little funky, but that wont cause a problem.
Query 1:
You're using the LIKE operator without a wildcard search %. You can likely swap this with an = operator.
I don't see the column type in your SHOW CREATE TABLE -- but it seems you don't have an index here, unless you renamed it to faith.
Do you need to type to be a string? could it be abstracted to a types table and then joined against using an integer? Or, if you have a fixed amount of types that's unlikely to change, could you use an enum?
Query 2:
You don't need to quote strings, also that's probably vulnerable to SQL injection. do ='.intval($id).' instead.
Make sure you have an index on authors.aid and that they're of the same type.

MySQL booking site: query/db optimization

I have a very bad performance in most of my queries. I've read a lot on stackoverflow, but still have some questions, maybe anyone could help or give me any hints?
Basically, i am working on a booking website, having among others the following tables:
objects
+----+---------+--------+---------+------------+-------------+----------+----------+-------------+------------+-------+-------------+------+-----------+----------+-----+-----+
| id | user_id | status | type_id | privacy_id | location_id | address1 | address2 | object_name | short_name | price | currency_id | size | no_people | min_stay | lat | lng |
+----+---------+--------+---------+------------+-------------+----------+----------+-------------+------------+-------+-------------+------+-----------+----------+-----+-----+
OR in MySQL:
CREATE TABLE IF NOT EXISTS `objects` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'object_id',
`user_id` int(11) unsigned DEFAULT NULL,
`status` tinyint(2) unsigned NOT NULL,
`type_id` tinyint(3) unsigned DEFAULT NULL COMMENT 'type of object, from object_type id',
`privacy_id` tinyint(11) unsigned NOT NULL COMMENT 'id from privacy',
`location_id` int(11) unsigned DEFAULT NULL,
`address1` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`address2` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`object_name` varchar(35) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'given name by user',
`short_name` varchar(12) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'short name, selected by user',
`price` int(6) unsigned DEFAULT NULL,
`currency_id` tinyint(3) unsigned DEFAULT NULL,
`size` int(4) unsigned DEFAULT NULL COMMENT 'size rounded and in m2',
`no_people` tinyint(3) unsigned DEFAULT NULL COMMENT 'number of people',
`min_stay` tinyint(2) unsigned DEFAULT NULL COMMENT '0=no min stay;else # nights',
`lat` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`lng` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1451046 ;
reservations
+----+------------+-----------+-----------+---------+--------+
| id | by_user_id | object_id | from_date | to_date | status |
+----+------------+-----------+-----------+---------+--------+
OR in MySQL:
CREATE TABLE IF NOT EXISTS `reservations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`by_user_id` int(11) NOT NULL COMMENT 'user_id of guest',
`object_id` int(11) NOT NULL COMMENT 'id of object',
`from_date` date NOT NULL COMMENT 'start date of reservation',
`to_date` date NOT NULL COMMENT 'end date of reservation',
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=890729 ;
There are a few questions:
1 - I have not set any additional key (except primary) - where should I set and which key should I set?
2 - I have read about MyISAM vs InnoDB, the conclusion for me was that MyISAM is faster when it comes to read-only, whereas InnoDB is designed for tables that get UPDATED or INSERTs more frequently. So, currently objects uses MyISAM and reservations InnoDB. Is this a good idea to mix? Is there a better choice?
3 - I need to query those objects that are available in a certain period (between from_date and end_date). I have read (among others) this post on stackoverflow: MySQL select rows where date not between date
However, when I use the suggested solution the query times out before returning any results (so it is really slow):
SELECT DISTINCT o.id FROM objects o LEFT JOIN reservations r ON(r.object_id=o.id) WHERE
COALESCE('2012-04-05' NOT BETWEEN r.from_date AND r.to_date, TRUE)
AND COALESCE('2012-04-08' NOT BETWEEN r.from_date AND r.to_date, TRUE)
AND o.location_id=201
LIMIT 20
What am I doing wrong? What is the best solution for doing such a query? How do other sites do it? Is my database structure not the best for this or is it only the query?
I would have some more questions, but I would be really grateful for getting any help on this! Thank you very much in advance for any hint or suggestion!
It appears you are looking for any "objects" that do NOT have a reservation conflict based on the from/to dates provided. Doing a coalesce() to always include those that are not ever found in reservations is an ok choice, however, being a left-join, I would try left joining where the IS a date found, and ignoring any objects FOUND. Something like
SELECT DISTINCT
o.id
FROM
objects o
LEFT JOIN reservations r
ON o.id = r.object_id
AND ( r.from_date between '2012-04-05' and '2012-04-08'
OR r.to_date between '2012-04-05' and '2012-04-08' )
WHERE
o.location_id = 201
AND r.object_id IS NULL
LIMIT 20
I would ensure an index on the reservations table by (object_id, from_date ) and another (object_id, to_date). By explicitly using the from_date between range, (and to date also), you are specifically looking FOR a reservation occupying this time period. If they ARE found, then don't allow, hence the WHERE clause looking for "r.object_id IS NULL" (ie: nothing is found in conflict within the date range you've provided)
Expanding from my previous answer, and by having two distinct indexes on (id, from date) and (id, to date), you MIGHT get better performance by joining on reservations for each index respectively and expecting NULL in BOTH reservation sets...
SELECT DISTINCT
o.id
FROM
objects o
LEFT JOIN reservations r
ON o.id = r.object_id
AND r.from_date between '2012-04-05' and '2012-04-08'
LEFT JOIN reservations r2
ON o.id = r2.object_id
AND r2.to_date between '2012-04-05' and '2012-04-08'
WHERE
o.location_id = 201
AND r.object_id IS NULL
AND r2.object_id IS NULL
LIMIT 20
I wouldn't mix InnoDB and MyISAM tables, but I would define all the tables as InnoDB (for foreing keys support). Generally all the columns with the _id suffix should be foreign keys refering to appropriate table (object_id => objects etc).
You don't have to define index on foreign key as it is defined automatically (since MySQL 4.1.2), but you can define additional indexes on reservations.from_date and reservations.to_date columns for faster comparison.
I know this is a year old, but if you've tried that solution above, the logic isn't complete. It misses reservations that start before the query start AND end after the query end. Also between doesn't cope with reservations that start and end at the same time.
This worked better for me:
SELECT venues.id
FROM venues LEFT JOIN reservations r
ON venues.id = r.venue_id && (r.date_end >':start' and r.date_start <':end')
WHERE r.venue_id IS NULL
ORDER BY venues.id