How to properly apply indexex to my mysql DB

How to properly apply indexex to my mysql DB - mysql

Currently I am having an issue with slow queries to my DB - query time varies from 0.0005 seconds to 70 seconds.
Currently my table structure with content is following:
CREATE TABLE IF NOT EXISTS `content` (
`content_id` int(11) NOT NULL AUTO_INCREMENT,
`content_url` text NOT NULL,
`content_text` text NOT NULL,
`seed_id` int(11) NOT NULL,
`created_at` bigint(20) NOT NULL,
`image` varchar(2000) DEFAULT NULL,
`price` varchar(300) DEFAULT NULL,
PRIMARY KEY (`content_id`),
UNIQUE KEY `CONTENT_TEXT_UNIQUE` (`content_text`(255)),
KEY `FK_SEED_CODE` (`seed_id`),
KEY `CONTENT_TEXT_TIME_INDEX` (`content_text`(255),`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=111357870 ;
ALTER TABLE `content`
ADD CONSTRAINT `FK_SEED_ID` FOREIGN KEY (`seed_id`) REFERENCES `seed` (`seed_id`) ON DELETE CASCADE ON UPDATE CASCADE;
Currently I have only 2 queries to Database:
SELECT seed.seed_code,content.content_id as id, content.content_url, content.content_text, content.created_at, content.image, content.price FROM content
LEFT JOIN seed ON content.seed_id = seed.seed_id
WHERE seed.seed_switch = 1 AND seed.seed_status_id = 3 AND seed.seed_id in (
SELECT seed_id FROM seed WHERE storage_id ='.$storage.') '.$filter.' ORDER BY content.content_id DESC, content.created_at DESC LIMIT 50
And
SELECT seed.seed_code,content.content_id as id, content.content_url, content.content_text, content.created_at, content.image, content.price FROM content
LEFT JOIN seed ON content.seed_id = seed.seed_id
WHERE seed.seed_switch = 1 AND seed.seed_status_id = 3 AND seed.seed_id in (
SELECT seed_id FROM seed WHERE storage_id ='.$storage.') ORDER BY content.content_id DESC, content.created_at DESC LIMIT 50
Table seed contains ± 20 entries. Which doesn't change mostly.
Indexes created on content table seems not working, because still I am having very big load time.
What could be the improvements of DB?
UPDATE 1
The content tables contains around 1mil entries and it grows every day with 1-2k entries.
$filter variable contains additional filters. So some other AND statements, which are generated randomly depending of user input. But it filters only content.text and created_at date.

EDIT
Ok, noticed the autoincrement in your create table. You have or have had millions of records (since increment is over 100 million) and are running a where-in subselect, not going to get ideal performance taking that approach. Try below query and see if that improves load times.
You haven't supplied all the details (for example, how many records the tables in question have and what the output of '.$filter.' is), but more than likely the subselect is the cause of the slow load time. Also, save yourself some typing and alias the tables! Cleaned up example:
SELECT s.seed_code, c.content_id as id, c.content_url, c.content_text, c.created_at, c.image, c.price
FROM content c
JOIN seed s USING(seed_id)
WHERE s.seed_switch = 1
AND s.seed_status_id = 3
AND s.storage_id ='.$storage.'
'.$filter.'
ORDER BY c.content_id DESC
LIMIT 50

Related

Selecting Max record in nested Join more efficiently

I am trying to figure out the most efficient method of writing the query below. Right now it is using a user table of 3k records, scheduleday of 12k records, and scheduleuser of 300k records.
The method I am using works, but it is not fast. It is plenty fast of 100 and under records, but not how I need it displayed. I know there must be a more efficient way of running this, if i take out the nested select, it runs in .00025 seconds. Add the nested, and we're pushing 9+ seconds.
All I am trying to do is get the most recent date a user was scheduled. The scheduleuser table only tells the scheduleid and dayid. This is then looked up in scheduleday to get the date. I cant use max(scheduleuser.rec) because the order entered may not be in date order.
The result of this query would be:
Bob 4/6/2022
Ralph 4/7/2022
Please note this query works perfectly fine, I am looking for ways to make it more efficient.
Percona Server Mysql 5.5
SELECT
(
SELECT MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y')) FROM scheduleuser su1
LEFT JOIN scheduleday ON scheduleday.scheduleid=su1.scheduleid AND scheduleday.dayid=su1.dayid WHERE su1.idUser=users.idUser
)
as lastsecheduledate, users.usersName
users
idUser
usersName
1
bob
2
ralph
scheduleday
scheduleid
dayid
ddate
1
1
4/5/2022
1
2
4/6/2022
1
3
4/7/2022
scheduleuser (su1)
rec
idUser
dayid
scheduleid
1
1
2
1
1
2
3
1
1
1
1
1
As requested, full query
SELECT users.iduser, users.adminName, users.firstname, users.lastname, users.lastLogin, users.area, users.type, users.terminationdate, users.termreason, users.cellphone,
(SELECT MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y')) FROM scheduleuser "
'mySQL=mySQL&" LEFT JOIN scheduleday ON scheduleday.scheduleid=scheduleuser.scheduleid AND scheduleday.dayid=scheduleuser.dayid WHERE scheduleuser.iduser=users.iduser "
'mySQL=mySQL&" ) as lastsecheduledate,
IFNULL(userrating.rating,'0.00') as userrating, IFNULL(location.area,'') as userarea, IFNULL(usertypes.name,'') as usertype, IFNULL(useropen.iduser,0) as useropen
FROM users
mySQL=mySQL&" LEFT JOIN userrating ON userrating.iduser=users.iduser "
mySQL=mySQL&" LEFT JOIN location ON location.idarea=users.area "
mySQL=mySQL&" LEFT JOIN usertypes ON usertypes.idtype=users.type "
mySQL=mySQL&" LEFT JOIN useropen ON useropen.iduser=users.iduser "
WHERE
users.type<>0 AND users.active=1
ORDER BY users.firstName
As requested, create tables
CREATE TABLE `users` (
`idUser` int(11) NOT NULL,
`usersName` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `users`
ADD PRIMARY KEY (`idUser`);
ALTER TABLE `users`
MODIFY `idUser` int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
CREATE TABLE `scheduleday` (
`rec` int(11) NOT NULL,
`scheduleid` int(11) NOT NULL,
`dayid` int(11) NOT NULL,
`ddate` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `scheduleday`
ADD PRIMARY KEY (`rec`),
ADD KEY `dayid` (`dayid`),
ADD KEY `scheduleid` (`scheduleid`);
ALTER TABLE `scheduleday`
MODIFY `rec` int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
CREATE TABLE `scheduleuser` (
`rec` int(11) NOT NULL,
`idUser` int(11) NOT NULL,
`dayid` int(11) NOT NULL,
`scheduleid` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `scheduleuser`
ADD PRIMARY KEY (`rec`),
ADD KEY `idUser` (`idUser`),
ADD KEY `dayid` (`dayid`),
ADD KEY `scheduleid` (`scheduleid`);
ALTER TABLE `scheduleuser`
MODIFY `rec` int(11) NOT NULL AUTO_INCREMENT;
COMMIT;

I think my recommendation would be to do that subquery once with a GROUP BY and join it. Something like
SELECT users.iduser, users.adminName, users.firstname, users.lastname, users.lastLogin, users.area, users.type, users.terminationdate, users.termreason, users.cellphone,
lsd.lastsecheduledate,
IFNULL(userrating.rating,'0.00') as userrating, IFNULL(location.area,'') as userarea, IFNULL(usertypes.name,'') as usertype, IFNULL(useropen.iduser,0) as useropen
FROM users
LEFT JOIN (SELECT iduser, MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y')) lastscheduledate FROM scheduleuser LEFT JOIN scheduleday ON scheduleday.scheduleid=scheduleuser.scheduleid AND scheduleday.dayid=scheduleuser.dayid
GROUP BY iduser
) lsd
ON lsd.iduser=users.iduser
LEFT JOIN userrating ON userrating.iduser=users.iduser
LEFT JOIN location ON location.idarea=users.area
LEFT JOIN usertypes ON usertypes.idtype=users.type
LEFT JOIN useropen ON useropen.iduser=users.iduser
WHERE
users.type<>0 AND users.active=1
ORDER BY users.firstName
This will likely be more efficient since the DB can do the query once for all users, likely using your scheduleuser.iduser index.
If you are using something like above and it's still not performant, I might suggest experimenting with:
ALTER TABLE scheduleuser ADD INDEX (scheduleid, dayid)
ALTER TABLE scheduleday ADD INDEX (scheduleid, dayid)
This would ensure it can do the entire join in the subquery with the indexes. Of course, there are tradeoffs to adding more indexes, so depending on your data profile it might not be worth it (and it might not actually improve anything).
If you are using your original query, I might suggest experimenting with:
ALTER TABLE scheduleuser ADD INDEX (iduser,scheduleid, dayid)
ALTER TABLE scheduleday ADD INDEX (scheduleid, dayid)
This would allow it to do the subquery (both the JOIN and the WHERE) without touching the actual scheduleuser table at all. Again, I say "experiment" since there are tradeoffs and this might not actually improve things much.

When you nest a query in the SELECT as you're doing, that query will get evaluated for each record in the result set because its WHERE clause is utilizing a column from outside the query. You really just want to calculate a result set of max dates only once and join your users on after it is done:
select usersName, last_scheduled
from users
left join (select su.iduser, max(sd.ddate) as last_scheduled
from scheduleuser as su left join scheduleday as sd on su.dayid = sd.dayid
and su.scheduleid = sd.scheduleid
group by su.iduser) recents on users.iduser = recents.iduser
I've obviously left your other columns off and just given you the name and date, but this is the general principle.

Bug:
MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y'))
Change to
STR_TO_DATE(MAX(scheduleday.ddate), '%m/%d/%Y')
Else you will be in for a rude surprise next January.
Possible better indexes. Switch from MyISAM to InnoDB. The following indexes assume InnoDB; they may not work as well in MyISAM.
users: INDEX(active, type)
userrating: INDEX(iduser, rating)
location: INDEX(idarea, area)
usertypes: INDEX(idtype, name)
useropen: INDEX(iduser)
scheduleday: INDEX(scheduleid, dayid, ddate)
scheduleuser: INDEX(iduser, scheduleid, dayid)
users: INDEX(iduser)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.

Current revision of entity in MySQL

Suppose I have the following table
CREATE TABLE `entities` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL
DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`data` VARCHAR(255),
PRIMARY KEY (`id`,`timestamp`)
);
Each entity would normally only be referenced by id, except that there are multiple revisions for each entity, disambiguated by timestamp. The majority of my queries will be selecting the most recent revision, with only a small handful inserting new revisions, and even fewer selecting all past revisions. I expect only about a dozen revisions per id on average.
What is the most efficient (in terms of performance and storage space) method of selecting the most recent revision? Is there an accepted practice for this problem?
As I see it, there are two methods: (1) Create views around a GROUP BY
CREATE VIEW groupedEntities AS
SELECT id, max(timestamp) AS maxt FROM entities GROUP BY id;
CREATE VIEW currentEntities AS
SELECT a.id, data, timestamp FROM groupedEntities AS a
INNER JOIN entities AS b ON b.id=a.id AND b.timestamp=a.maxt
WHERE timestamp <= CURRENT_TIMESTAMP;
SELECT * FROM currentEntities WHERE id=?;
Note the <=CURRENT_TIMESTAMP allows 'deleting' an entity by setting a timestamp to the distant future. And (2) Create a separate table to store current revisions
CREATE TABLE currentEntities (
`id` INT(10) UNSIGNED PRIMARY KEY,
`timestamp` TIMESTAMP,
CONSTRAINT FOREIGN KEY (`id`, `timestamp`)
REFERENCES `entities` (`id`,`timestamp`)
);
SELECT * FROM currentEntites INNER JOIN groupedEntities WHERE id=?;
Or some other option (3)?

Views will eat your lunch in terms of performance, because of the way that MySQL handles views. Specifically, MySQL materializes an intermediate MyISAM table for a view, and does not "push" predicates from an outer query into a view (stored or inline).
The option of having a separate table that holds the frequently used "current" revisions would be the better option of the two you present. That does add complexity, keeping everything in sync, different queries to get current vs. historical, and the overhead of extra inserts, etc.
Given just the original table (storing all the historical revisions in the same table as the current revision (no separate table for just the most recent revision)...
A query with an inline view with a predicate INSIDE the view definition will give the best performance:
SELECT e.id
, e.timestamp
, e.data
FROM `entities` e
JOIN ( SELECT m.id
, MAX(m.timestamp) AS `timestamp`
FROM `entities` m
WHERE m.id = ?
GROUP BY m.id
) c
ON c.id = e.id
AND c.timestamp = e.timestamp
The EXPLAIN output should show "Using where; Using index" on the step to materialize the inline view (derived table). The join predicate on the outer query is by primary key, which is optimal for the retrieval of the data column.

Realtime Performant Tag Search in MySQL or Redis

Problem Description:
A tag (tags) can be associated with arbitrary objects through a junction table (tagged_as). For a specific object type (specific_object), select the union or intersection of all of the objects associated with a series of tags, order the results by a numeric column on the object and limit the results for pagination purposes.
Contrived Schema:
CREATE TABLE tags (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE specific_object(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
vote_sum INT NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
CREATE TABLE tagged_as(
id INT NOT NULL AUTO_INCREMENT,
tag_id INT NOT NULL,
content_type_id INT NOT NULL,
object_id INT NOT NULL,
PRIMARY KEY (id)
);
For the purposes of this example, I am omitting many other columns in the specific_object table.
Table Row Counts:
tags: 12,297
tagged_as: 46,642,064
specific_object: 2,444,944
Naive MySQL Solution:
SELECT
specific_object.*
FROM
specific_object
JOIN
tagged_as
ON
specific_object.id = tagged_as.object_id
AND
tagged_as.content_type_id = <SPECIFIC_OBJECT_CONTENT_TYPE_ID>
WHERE
tagged_as.tag_id = <TAG_ONE_ID>
AND
tagged_as.tag_id = <TAG_TWO_ID>
...
ORDER BY specific_object.vote_sum DESC
LIMIT 50
The problem with this solution is that MySQL cannot utilize an index to resolve the ORDER BY clause because the "key used to fetch the rows is not the same as the one used in the ORDER BY" (http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html). Execution time: 20+ seconds
Naive Redis Solution:
for each specific object: SET specfic_object:<ID> <ID>
for each tagged as: SADD tag:<TAG ID> specific_object:<ID>
specific_object_ids = SUNION tag:<TAG_ONE_ID> tag:<TAG_TWO_ID> ...
specific_object_ids = SINTER tag:<TAG_ONE_ID> tag:<TAG_TWO_ID> ...
SELECT * FROM specific_object WHERE id IN (<specific_object_ids>) ORDER BY vote_sum DESC
The problem with this solution is that the ORDER BY still has to been done by MySQL. Also, a tag could potentially be associated with hundreds of thousands of specific objects which is a lot of data to move around. Execution Time: 20+ seconds for larger tags
Possible Solutions I Haven't Tried Yet
Denormalize
Perhaps move the vote_sum column into the tagged_as table. Remove the need for the join to do the order by. This might have the same issue as the naive solution.
Redis Sorted Sets
for each specific object: SET specific_object:<ID> <ID>
for each specific object: SET specific_object_weight:<ID> <VOTE_SUM>
for each tagged as: SADD tag:<TAG_ID> specific_object:<ID>
SINTERSTORE result:<timestamp> <TAG_ONE_ID> <TAG_TWO_ID> ...
SORT result:<timestamp> BY specific_object_weight_* LIMIT 0 50
specific_object_ids = SMEMBERS result:<timestamp>
DEL result:<timestamp>
SELECT * FROM specific_object WHERE id IN (<specific_object_ids>)
Move all of the sorting into Redis. This add extra complexity because now you have to maintain the vote_sum values in Redis as well. Not sure if this would be fast enough.
Question:
Are either of the possible solutions viable? Are there other solutions or different technologies that would help? I am open to pretty significant changes to solve this problem.

When the problem has been performance of a DESC sort, what I've done in the past is to solve the problem is to store the value of -1*vote_sum in a separate column, and then ORDER BY that column ASC. I've been able to get MySQL to use an index to do the sort on that column.
You could either store a redundant column (both vote_sum and neg_vote_sum, or you could just store the negative value, and just multiply it by -1 when you need to return it as a positive value.
But I'm suspicious that the source of your performance issue is the sort operation. How does the performance of the statement compare, as a test, when you do an ORDER BY vote_sum ASC ?

no more optimization for mysql table?

i think i've optimized what i could for the following tables structure:
CREATE TABLE `sal_forwarding` (
`sid` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`f_shop` INT(11) NOT NULL,
`f_offer` INT(11) DEFAULT NULL,
.
.
.
.
.
`f_affiliateId` TINYINT(3) UNSIGNED NOT NULL,
`forwardDate` DATE NOT NULL,
PRIMARY KEY (`sid`),
KEY `f_partner` (`f_partner`,`forwardDate`),
KEY `forwardDate` (`forwardDate`,`cid`),
KEY `forwardDate_2` (`forwardDate`,`f_shop`),
KEY `forwardDate_3` (`forwardDate`,`f_shop`,`f_partner`),
KEY `forwardDate_4` (`forwardDate`,`f_partner`,`cid`),
KEY `forwardDate_5` (`forwardDate`,`f_affiliateId`),
KEY `forwardDate_6` (`forwardDate`,`f_shop`,`sid`),
KEY `forwardDate_7` (`forwardDate`,`f_shop`,`cid`),
KEY `forwardDate_8` (`forwardDate`,`f_affiliateId`,`cid`)
) ENGINE=INNODB AUTO_INCREMENT=10946560 DEFAULT CHARSET=latin1
This is the explain Statement:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sal_forwarding range forwardDate,forwardDate_2,forwardDate_3,forwardDate_4,forwardDate_5,forwardDate_6,forwardDate_7,forwardDate_8 forwardDate_7 3 (NULL) 1221784 Using where; Using index; Using filesort
The following Query needs 23 seconds for reading 2300 rows:
SELECT COUNT(sid),f_shop, COUNT(DISTINCT(cid))
FROM sal_forwarding
WHERE forwardDate BETWEEN "2011-01-01" AND "2011-11-01"
GROUP BY f_shop
What can i do to improve the performance?
Thank you very much.

slight modification to what you had... use count(*) instead of an actual field. for the DISTINCT, you don't need () around it. It may be getting confused about all the indexes you have. Remove all other indexes on forwardDate with exception to having one based on (forwardDate, f_shop, cid ) (your current key7 index)
SELECT
COUNT(*),
f_shop,
COUNT(DISTINCT cid )
FROM
sal_forwarding
WHERE
forwardDate BETWEEN "2011-01-01" AND "2011-11-01"
GROUP BY
f_shop
Then, for grins, and since nothing else appears to be working for you, try putting in a pre-subquery on the records, then sum from that, so it's not relying on any other index pages based on your near 11 million records (implied per Auto-increment value)...
SELECT
f_shop,
sum( PreQuery.Presum) totalCnt,
COUNT(*) dist_cid
FROM
( select f_shop, cid, count(*) presum
from sal_forwarding
WHERE forwardDate BETWEEN "2011-01-01" AND "2011-11-01"
group by f_shop, cid ) PreQuery
GROUP BY
f_shop
Since the inner pre-query is doing a simple count of records and grouping by F_Shop and C_ID (optimizable by the index), you will now have your distinct already rolled-up via a simple count... then do a SUM() of the inner count's "presum" column. Again, just another option to try and turn the tables, hope it works for you.

I don't think the (forwardDate, f_shop, cid) is good for this query. Not any better than a simple (forwardDate) index, because of the range condition on the forwardDate column.
You may try a (f_shop, cid, forwardDate) index.

Trying to optimize a query and properly index tables

I wanted to simulate large number of data in a database and test how my query would perform under such conditions. I was not surprised when query turned out to be slow. So here I am seeking advice on how I could better index my tables and improve my queries.
Before I post tables's sql and the query I use, Let me explain what is what. I have a user's table, which is populated by 100 000 records. Most of the columns in it are enum type, like hair color, looking_for, etc... The first query I have is generated when a search is done. The query would consist of a where statement where some or all column values are searched for and only ids are retrieved limited by 20.
Then I have 3 more tables that hold about 50 - 1000 records per each user, so numbers could really grow. these tables hold information on who visited who's profile, who marked who as a favorite, who blocked who, and messaging table. My goal is to retrieve 20 records that match the search criteria, but also determine if I (user who's browsing) have:
blocked them
favorited them
was favorited by them
have unread messages from them
have sent or received any messages from them
For this I tried using both joins and subqueries, but the problem is that second query that retrieves users and data listed above is still slow. I think I need a better index and possibly a better queries. here is what I have right now, tables definitions first and 2 queries in the end. First des sarch and determiens IDs, second uses ids from first query to retrieve data. I hope you guys can help me create better indexes and optimize the query.
CREATE TABLE user (id BIGINT AUTO_INCREMENT, dname VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE, email_code VARCHAR(255), email_confirmed TINYINT(1) DEFAULT '0', password VARCHAR(255) NOT NULL, gender ENUM('male', 'female'), description TEXT, dob DATE, height MEDIUMINT, looks ENUM('thin', 'average', 'athletic', 'heavy'), looking_for ENUM('marriage', 'dating', 'friends'), looking_for_age1 BIGINT, looking_for_age2 BIGINT, color_hair ENUM('black', 'brown', 'blond', 'red'), color_eyes ENUM('black', 'brown', 'blue', 'green', 'grey'), marital_status ENUM('single', 'married', 'divorced', 'widowed'), smokes ENUM('no', 'yes', 'sometimes'), drinks ENUM('no', 'yes', 'sometimes'), has_children ENUM('no', 'yes'), wants_children ENUM('no', 'yes'), education ENUM('school', 'college', 'university', 'masters', 'phd'), occupation ENUM('no', 'yes'), country_id BIGINT, city_id BIGINT, lastlogin_at DATETIME, deleted_at DATETIME, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, INDEX country_id_idx (country_id), INDEX city_id_idx (city_id), INDEX image_id_idx (image_id), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
CREATE TABLE block (id BIGINT AUTO_INCREMENT, blocker_id BIGINT, blocked_id BIGINT, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, INDEX blocker_id_idx (blocker_id), INDEX blocked_id_idx (blocked_id), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
CREATE TABLE city (id BIGINT AUTO_INCREMENT, name_eng VARCHAR(30), name_geo VARCHAR(30), name_geo_shi VARCHAR(30), name_geo_is VARCHAR(30), country_id BIGINT NOT NULL, active TINYINT(1) DEFAULT '0', INDEX country_id_idx (country_id), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
CREATE TABLE country (id BIGINT AUTO_INCREMENT, code VARCHAR(2), name_eng VARCHAR(30), name_geo VARCHAR(30), name_geo_shi VARCHAR(30), name_geo_is VARCHAR(30), active TINYINT(1) DEFAULT '1', PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
CREATE TABLE favorite (id BIGINT AUTO_INCREMENT, favoriter_id BIGINT, favorited_id BIGINT, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, INDEX favoriter_id_idx (favoriter_id), INDEX favorited_id_idx (favorited_id), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
CREATE TABLE message (id BIGINT AUTO_INCREMENT, body TEXT, sender_id BIGINT, receiver_id BIGINT, read_at DATETIME, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, INDEX sender_id_idx (sender_id), INDEX receiver_id_idx (receiver_id), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
CREATE TABLE visitor (id BIGINT AUTO_INCREMENT, visitor_id BIGINT, visited_id BIGINT, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, INDEX visitor_id_idx (visitor_id), INDEX visited_id_idx (visited_id), PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = INNODB;
SELECT s.id AS s__id FROM user s WHERE (s.gender = 'female' AND s.marital_status = 'single' AND s.smokes = 'no' AND s.deleted_at IS NULL) LIMIT 20
SELECT s.id AS s__id, s.dname AS s__dname, s.gender AS s__gender, s.height AS s__height, s.dob AS s__dob, s3.id AS s3__id, s3.code AS s3__code, s3.name_geo AS s3__name_geo, s4.id AS s4__id, s4.name_geo AS s4__name_geo, s5.id AS s5__id, s6.id AS s6__id, s7.id AS s7__id, s8.id AS s8__id, s9.id AS s9__id FROM user s LEFT JOIN country s3 ON s.country_id = s3.id LEFT JOIN city s4 ON s.city_id = s4.id LEFT JOIN block s5 ON ((s.id = s5.blocked_id AND s5.blocker_id = '1')) LEFT JOIN favorite s6 ON ((s.id = s6.favorited_id AND s6.favoriter_id = '1')) LEFT JOIN favorite s7 ON ((s.id = s7.favoriter_id AND s7.favorited_id = '1')) LEFT JOIN message s8 ON ((s.id = s8.sender_id AND s8.receiver_id = '1' AND s8.read_at IS NULL)) LEFT JOIN message s9 ON (((s.id = s9.sender_id AND s9.receiver_id = '1') OR (s.id = s9.receiver_id AND s9.sender_id = '1'))) WHERE (s.id IN ('22', '36', '53', '105', '152', '156', '169', '182', '186', '192', '201', '215', '252', '287', '288', '321', '330', '351', '366', '399')) GROUP BY s.id ORDER BY s.id
Here are the results of EXPLAIN of the 2 queries above:
First:
1 SIMPLE s ALL NULL NULL NULL NULL 100420 Using Where
Second:
1 SIMPLE s range PRIMARY PRIMARY 8 NULL 20 Using where; Using temporary; Using filesort
1 SIMPLE s2 eq_ref PRIMARY PRIMARY 8 sagule.s.image_id 1 Using index
1 SIMPLE s3 eq_ref PRIMARY PRIMARY 8 sagule.s.country_id 1
1 SIMPLE s4 eq_ref PRIMARY PRIMARY 8 sagule.s.city_id 1
1 SIMPLE s5 ref blocker_id_idx,blocked_id_idx blocked_id_idx 9 sagule.s.id 5
1 SIMPLE s6 ref favoriter_id_idx,favorited_id_idx favorited_id_idx 9 sagule.s.id 6
1 SIMPLE s7 ref favoriter_id_idx,favorited_id_idx favoriter_id_idx 9 sagule.s.id 6
1 SIMPLE s8 ref sender_id_idx,receiver_id_idx sender_id_idx 9 sagule.s.id 7
1 SIMPLE s9 index_merge sender_id_idx,receiver_id_idx receiver_id_idx,sender_id_idx 9,9 NULL 66 Using union(receiver_id_idx,sender_id_idx); Using where

I'm a MSSQL guy and havent used mysql but the concepts should be the same.
Firstly can you remove the group and order by and comment out all tables except for the first one "user". Also comment out any columns of the removed tables. As I have below.
SELECT s.id AS s__id,
s.dname AS s__dname,
s.gender AS s__gender,
s.height AS s__height,
s.dob AS s__dob
-- s3.id AS s3__id,
-- s3.code AS s3__code,
-- s3.name_geo AS s3__name_geo,
-- s4.id AS s4__id,
-- s4.name_geo AS s4__name_geo,
-- s5.id AS s5__id,
-- s6.id AS s6__id,
-- s7.id AS s7__id,
-- s8.id AS s8__id,
-- s9.id AS s9__id
FROM user s --LEFT JOIN
-- country s3 ON s.country_id = s3.id LEFT JOIN
-- city s4 ON s.city_id = s4.id LEFT JOIN
-- block s5 ON ((s.id = s5.blocked_id AND s5.blocker_id = '1')) LEFT JOIN
-- favorite s6 ON ((s.id = s6.favorited_id AND s6.favoriter_id = '1')) LEFT JOIN
-- favorite s7 ON ((s.id = s7.favoriter_id AND s7.favorited_id = '1')) LEFT JOIN
-- message s8 ON ((s.id = s8.sender_id AND s8.receiver_id = '1' AND s8.read_at IS NULL)) LEFT JOIN
-- message s9 ON (((s.id = s9.sender_id AND s9.receiver_id = '1') OR (s.id = s9.receiver_id AND s9.sender_id = '1')))
WHERE (s.id IN ('22', '36', '53', '105', '152', '156', '169', '182', '186', '192', '201', '215', '252', '287', '288', '321', '330', '351', '366', '399'))
Run the query and record the time. Then add one table and its columns back in at a time and run it until you find which one causes it to slow significantly.
SELECT s.id AS s__id,
s.dname AS s__dname,
s.gender AS s__gender,
s.height AS s__height,
s.dob AS s__dob,
s3.id AS s3__id,
s3.code AS s3__code,
s3.name_geo AS s3__name_geo
-- s4.id AS s4__id,
-- s4.name_geo AS s4__name_geo,
-- s5.id AS s5__id,
-- s6.id AS s6__id,
-- s7.id AS s7__id,
-- s8.id AS s8__id,
-- s9.id AS s9__id
FROM user s LEFT JOIN
country s3 ON s.country_id = s3.id --LEFT JOIN
-- city s4 ON s.city_id = s4.id LEFT JOIN
-- block s5 ON ((s.id = s5.blocked_id AND s5.blocker_id = '1')) LEFT JOIN
-- favorite s6 ON ((s.id = s6.favorited_id AND s6.favoriter_id = '1')) LEFT JOIN
-- favorite s7 ON ((s.id = s7.favoriter_id AND s7.favorited_id = '1')) LEFT JOIN
-- message s8 ON ((s.id = s8.sender_id AND s8.receiver_id = '1' AND s8.read_at IS NULL)) LEFT JOIN
-- message s9 ON (((s.id = s9.sender_id AND s9.receiver_id = '1') OR (s.id = s9.receiver_id AND s9.sender_id = '1')))
WHERE (s.id IN ('22', '36', '53', '105', '152', '156', '169', '182', '186', '192', '201', '215', '252', '287', '288', '321', '330', '351', '366', '399'))
My guess is that it would be the block and both favorites and message joins that is giving you the performance hit (the one with the most rows will be the biggest hit).
For the block table, Can you remove one of the indexes and change the other to be something along the lines of (I am not sure of the syntax but you'll get the point)
INDEX blocker_id_idx (blocker_id,blocked_id),
and try it with the columns order swapped around to find witch order is best for your query
INDEX blocker_id_idx (blocked_id,blocker_id),
For the favorite table, change the indexes to
INDEX favoriter_id_idx (favoriter_id,favorited_id),
INDEX favorited_id_idx (favorited_id,favoriter_id),
Again try it with the columns swapped around to find which give better performance.
Do the same for the message indexes.
Do that and let me know if things improved. There are a few other things that can be done to improve it further. - EDIT: It seams I lied about the few other things, what I had intended would not have made any difference. But I can speed up your first query which is below.
EDIT This is for your first select query.
This one is a bit long, but I wanted to show you how indexes work so you can make your own.
Lets say the table contains 100,000 rows.
When you select from it, this is the general process it will take.
Are there any indexes that cover or
mostly cover the columns that I need.
(I your case, no there isn't.)
So use Primary Index and scan though
every row in the table to check for a
match.
Every row in the table will need to
be read from disk to find which
columns match you criteria. So to
return the approx 10,000 rows (this
is a guess) that match you data the
database engine has read all 100,000
rows.
You do have a top 20 in you query, so it will limit the amount of rows the engine will read from disk.
Example
read row 1: is match so add to result
read row 2: no match - skip
read row 3: no match - skip
read row 4: is match so add to
result.
stop after 20 rows identified
You potentially read about 5000 rows from disk to return 20.
We need to create an index that will help us read as few records as possible from the table/disk, but still get the rows we are after. So here goes.
Your query uses 4 filters to get to the data.
s.gender = 'female' AND
s.marital_status = 'single' AND
s.smokes = 'no' AND
s.deleted_at IS NULL
What we need to do now is identify which filter by itself will return the least amount of rows. I cant tell as I don't have any data, but this is what I would guess to be in your table.
The gender column support 2 values and it would be fair to estimate that half of the records in your database are male and the other female, so that filter you need will return approx 50,000 rows.
Now for marital status, supports four values, so if we say the data has an equal spread, it would mean we would get roughly 25,000 rows back. Of course, it depends on th actual data and I would say, that there are not too many widowed in the data, so a better estimate may be 30% share between the other three. So lets say 30,000 records marked as single.
Now for the smokes column. I have read that here in Australia about 10% of people smoke which is a fairly low number compared to other countries. So lets say 25% either smoke or smoke sometimes. That leaves us with approx 75,000 non smokers.
Now for the last column, deleted. A fair guess on my part but lets say 5% are marked as deleted. That leaves us with approx 95,000 rows.
So in summary (remember, this is all pure guess work on my part, your data may be different)
Gender 50,000 rows or 50%
Marital status 30,000 rows or 30%
Smokes 75,000 rows or 75%
Deleted 95,000 rows or 95%
So if we create an index with the four columns using the one that returns the least amount of rows first, we would get the following
INDEX index01_idx (marital_status,gender,smokes,deleted_at),
Now this is what will happen when we run the select.
The server will find an index that
covers all the columns in the WHERE
clause
It will narrow down the result set to
30,000 "single" records.
Of those 30,000, 50% will be female
that leaves 15,000 records
Of those 15,000, 75% will not smoke
that leaves 11,250 records
Of those 11,250, 95% will not be
deleted,
That leaves us with just over 10,000 records out of 100,000 total that we have identified as the records we want but not yet read from disk. You also have a limit 20 in the query so the database engine just needs to read the first 20 of the 10,000 and return the result. Its super quick, the hard disk will love you and the scary DBA will even mumble and grunt with approval.

In your second SELECT query, you can remove the GROUP BY clause because you aren't using any Aggregate functions (count, min, max...) in your SELECT clause.
I doubt this will help much improving performance, though.
In any case, I recommend to watch the first half of this talk "A Look into a MySQL DBA's Toolchest".
(The first two thirds of the video are about free open-source admin-tools for mysql on Unix, the last third or so is about replication)
Video A Look into a MySQL DBA's Toolchest
From the same talk: The Guide To Understanding mysqlreport

Without some Data to Test, it is not so easy to make a good advice.
Generating an Index for fields that are searched frequently, can help make your query faster. But with an Index your Inserts and Updates can get slower. You have to think about the tradeoff. So index the Columns that get searched freqeuently, but test the new Index on the Data so you can see if it runs faster.
I don't know which Tools you are using, but with the MySQL Workbench there is a Command "Explain Current Statement" under the "Query"-Menu. There you can see which actions were done by MySQL and which keys were used. Your Query shows "null" which means no key was used and MySQL had to run through the whole data comparing with the search term.
Hope this helps a bit.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to properly apply indexex to my mysql DB - mysql

Related

Selecting Max record in nested Join more efficiently

Current revision of entity in MySQL

Realtime Performant Tag Search in MySQL or Redis

no more optimization for mysql table?

Trying to optimize a query and properly index tables

Categories

Resources