mySql Join optimisation query

mySql Join optimisation query - mysql

I am trying to optimise my site and would be grateful for some help.
The site is a mobile phone comparison site and the query is to show the offers for a particular phone. The data is on 2 tables, the cmp_deals table has 931000 entries and the cmp_tariffs table has 2600 entries. The common field is the tariff_id.
###The deals table###
id int(11)
hs_id varchar(30)
tariff_id varchar(30)
hs_price decimal(4,2)
months_free int(2)
months_half int(2)
cash_back decimal(4,2)
free_gift text
retailer_id varchar(20)
###Deals Indexes###
hs_id INDEX 430 Edit Drop hs_id
months_half INDEX 33 Edit Drop months_half
months_free INDEX 25 Edit Drop months_free
hs_price INDEX 2223 Edit Drop hs_price
cash_back INDEX<br/>
###The tariff table###
ID int(11)
tariff_id varchar(30)
tariff_name varchar(255)
tariff_desc text
anytime_mins int(5)
offpeak_mins int(5)
texts int(5)
line_rental decimal(4,2)
cost_offset decimal(4,2)
No Indexes
The initial query is
SELECT * FROM `cmp_deals`
LEFT JOIN `cmp_tariffs` USING (tariff_id)
WHERE hs_id = 'iphone432gbwhite'
and then the results can be sorted by various items in the cmp_deals table such as cash_back or free_gift
SELECT * FROM `cmp_deals`
LEFT JOIN `cmp_tariffs` USING (tariff_id)
WHERE hs_id = 'iphone432gbwhite'
ORDER BY hs_price DESC
LIMIT 30
This is the result when I run an "EXPLAIN" command, but I don't really understand the results.
id select_type table type possible_keys key key_len ref rows Extra<br/>
1 SIMPLE cmp_deals ref hs_id hs_id 92 const 179 Using where; Using temporary; Using filesort<br/>
1 SIMPLE cmp_tariffs ALL NULL NULL NULL NULL 2582
I would like to know please if I am doing these queries in the most efficient way as the queries are averaging at 2 seconds plus.
Thanks in advance

Can't say I'm a fan of all those double IDs (numeric and human-readable). If you don't actually need the varchar versions, drop them.
Change the foreign key cmp_deals.tarrif_id to reference cmp_tarrifs.ID (ie, make it an INT and if using InnoDB, actually create foreign key constraints).
At the very least make cmp_tariffs.ID a primary key and optionally cmp_tariffs.tariff_id a unique index.
Having zero indexes on the tariffs table means it has to do a table scan to complete the join.

Related

MySQL Date Range Query Optimization

I have a MySQL table structured like this:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
`message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date.
"id" is auto-incrementing and date is also always increasing.
The queries generated by Grafana look like this:
SELECT
UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120
This query takes over 30 seconds to complete with 27 million records in the table.
Explaining the query results in this output:
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | messages | NULL | ALL | PRIMARY | NULL | NULL | NULL | 26952821 | 11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?

Plan A:
PRIMARY KEY(date, id), -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy
Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.
Plan B:
PRIMARY KEY(id),
INDEX(date, serverid)
Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.
But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.
Plan C: There may be a still better way:
PRIMARY KEY(id),
INDEX(server_id, date)
In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.
Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.

The index on (id, date) doesn't help because the first key is id not date.
You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.

2 Days ago mysql query was fine, today it's super slow

Okay, so as the title says, the queries were running fine 2 days ago, then all of a sudden yesterday the site was loading very slow. I tracked it down to a couple queries. One I was able to add an index which seems to have helped, but this one I just can't figure out. I tried running a repair and optimize on the tables, and that didn't help. I don't know what could have changed so much that would make it go from less than a second to query to 20+ seconds. Any help would be much appreciated.
SELECT city
FROM listings LEFT JOIN agencies
ON listings.agencyid_fk = agencies.agencyid
WHERE listingstatus IN (1,3) AND appid_fk = 101 AND active = 1
AND auction IS NULL AND agencies.multilist = 1
AND notagency IS NULL
GROUP BY city
ORDER BY city;
I wasn't sure how to export the explain query result to make it readable on here, so I just put it in a code snippet. click run to see it in an html table.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE listings ref appid_fk,listingstatus appid_fk 2 const 21699 Using where; Using temporary; Using filesort
1 SIMPLE agencies eq_ref PRIMARY,active PRIMARY 2 mls2.listings.agencyid_fk 1 Using where
And here are the tables...
listings table:
CREATE TABLE mls2.listings (
listingid INT(11) AUTO_INCREMENT NOT NULL,
appid_fk SMALLINT(3) NOT NULL DEFAULT '0',
agencyid_fk SMALLINT(3) NOT NULL DEFAULT '0',
listingstatus SMALLINT(3),
city VARCHAR(30) CHARACTER SET latin1 COLLATE latin1_swedish_ci,
multilist TINYINT(1),
auction TINYINT(1),
PRIMARY KEY (listingid)
) ENGINE = myisam ROW_FORMAT = DEFAULT CHARACTER SET latin1;
agencies table:
CREATE TABLE mls2.agenciesx (
agencyid SMALLINT(6) AUTO_INCREMENT NOT NULL,
multilist TINYINT(4) DEFAULT '0',
notagency TINYINT(1),
active TINYINT(1),
PRIMARY KEY (agencyid)
) ENGINE = myisam ROW_FORMAT = DEFAULT CHARACTER SET latin1;

Once you've taken on board the comments above, try adding the following indexes to your tables...
INDEX(city,listingstatus,appid_fk,auction)
INDEX(active,multilist,notagency)
In both cases, the order in which columns are arranged in the index may make a difference, so play around with those, although there are so few rows in the agencies table, that that one won't really matter.
Next, get rid of the GROUP BY clause, and write your query as follows.
SELECT DISTINCT l.city
FROM listings l
JOIN agencies a
ON a.agencyid = l.agencyid_fk
WHERE l.listingstatus IN (1,3)
AND l.appid_fk = 101
AND a.active = 1
AND l.auction IS NULL
AND a.multilist = 1
AND a.notagency IS NULL
ORDER
BY city;
Note: Although irrelevant for this particular problem, your original question showed that this schema is desperately in need of normalisation.

Use index for ORDER BY in "SELECT .. FROM .. WHERE column IN (...) ORDER BY"

Is there any way to make the following query use an index and not use filesort:
SELECT c1 FROM table WHERE c2 IN (val_1, val_2, ..., val_n) ORDER BY c3
I guess chances are bad so if it is not possible is there any way to make the following problem use indexes (or be fast):
The table contains comments from users:
CREATE TABLE `comments` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`comment` varchar(180) CHARACTER SET utf8 NOT NULL,
`timestamp` int(11) unsigned NOT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I want to output the comments of specific users (for example the ones who user_x is following) ordered by timestamp (compare query above).
The only way I can imagine making this query fast is to insert a new variable that is set to 1 for the last let's say 15 entries of a single user. So the first query would just get a maximum of 15 rows per user so the maximum amount of rows mysql has to order is 15*n, where n is the amount of users the comments are selected from.
Edit: This is what EXPLAIN outputs:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comments range idx_comments_user_id_timestamp idx_comments_user_id_timestamp 4 NULL 1113 Using where; Using index; Using filesort

MySQL Query Optimization for GPS Tracking system

I have the following query:
SELECT * FROM `alltrackers`
WHERE `deviceid`='FT_99000083401624'
AND `locprovider`!='none'
ORDER BY `id` DESC
This is the show create table:
CREATE TABLE `alltrackers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`deviceid` varchar(50) NOT NULL,
`gpsdatetime` int(11) NOT NULL,
`locprovider` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `statename` (`statename`),
KEY `gpsdatetime` (`gpsdatetime`),
KEY `locprovider` (`locprovider`),
KEY `deviceid` (`deviceid`(18))
) ENGINE=MyISAM AUTO_INCREMENT=8665045 DEFAULT CHARSET=utf8;
I've removed the columns which I thought were unnecessary for this question.
This is the EXPLAIN output for this query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE alltrackers ref locprovider,deviceid deviceid 56 const 156416 Using
where; Using filesort
This particular query is showing as taking several seconds in mytop (mtop). I'm a bit confused though, as the same query but with a different "deviceid" doesn't take as long. Although I only need the last row, I've already removed LIMIT 1 as that makes it take even longer. This table currently contains 3 million rows.
It is used for storing the locations from different GPS devices. Each GPS device has a unique device ID. Locations come in and are added to the table. For statistics I'm running the above query to find the time of the last received location from a certain device.
I'm open to advice on ways to further optimize the query or even the tables.
Many thanks in advance.

If you only need the last row, add an index on (deviceid, id, locprovider). It would be even faster with an index on (deviceid, id, locprovider, gpsdatetime):
ALTER TABLE alltrackers
ADD INDEX special_covering_IDX
(deviceid, id, locprovider, gpsdatetime) ;
Then try this out:
SELECT id, locprovider, gpsdatetime
FROM alltrackers
WHERE deviceid = 'FT_99000083401624'
AND locprovider <> 'none'
ORDER BY id DESC
LIMIT 1 ;

MySQL partial indexes on varchar fields and group by optimization

I am having some issues with a group query with MySQL.
Question
Is there a reason why a query won't use a 10 character partial index on a varchar(255) field to optimize a group by?
Details
My setup:
CREATE TABLE `sessions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`ref_source` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`guid` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`initial_path` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`referrer_host` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`campaign` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_sessions_on_user_id` (`user_id`),
KEY `index_sessions_on_referrer_host` (`referrer_host`(10)),
KEY `index_sessions_on_initial_path` (`initial_path`(10)),
KEY `index_sessions_on_campaign` (`campaign`(10))
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
A number of columns and indexes are not shown here since they don't really impact the issue.
What I want to do is run a query to see all of the referring hosts and the number of session coming from each. I don't have a huge table, but it is big enough where I full table scans aren't fun. The query I want to run is:
SELECT COUNT(*) AS count_all, referrer_host AS referrer_host FROM `sessions` GROUP BY referrer_host;
The explain gives:
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | sessions | ALL | NULL | NULL | NULL | NULL | 303049 | Using temporary; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
I have a partial index on referrer_host, but it isn't using it. Even if I try to USE INDEX or FORCE INDEX it doesn't help. The explain is the same, as is the performance.
If I add a full index on referrer_host, instead of a 10 character partial index, everything is works better, if not instantly. (350ms vs. 10 seconds)
I have tested partial indexes that are bigger than the longest entry in the field to no avail as well. The full index is the only thing that seems to work.

with the full index, the query will find scan the entire index and return the number of records pointed to for each unique key. the table isn't touched.
with the partial index, the engine doesn't know the value of the referrer_host until it looks at the record. It has to scan the whole table!
if most of the values for referrer_host are less than 10 chars then in theory, the optimiser could use the index and then only check rows that have more than 10 chars. But, because this is not a clustered index it would have to make many non-sequential disk reads to find these records. It could end up being even slower, because a table scan will at least be a sequential read. Instead of making assumptions, the optimiser just does a scan.

Try this query:
EXPLAIN SELECT COUNT(referrer_host) AS count_all, referrer_host FROM `sessions` GROUP BY referrer_host;
Now the count will fail for the group by on referrer_host = null, but I'm uncertain if there's another way around this.

You're grouping on referrer_host for all the rows in the table. As your index doesn't include referrer_host (it contains the first 10 chars!), it's going to scan the whole table.
I'll bet that this is faster, though less detailed:
SELECT COUNT(*) AS count_all, substring(referrer_host,1,10) AS referrer_host FROM `sessions` GROUP BY referrer_host;
If you need the full referrer, index it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

mySql Join optimisation query - mysql

Related

MySQL Date Range Query Optimization

2 Days ago mysql query was fine, today it's super slow

Use index for ORDER BY in "SELECT .. FROM .. WHERE column IN (...) ORDER BY"

MySQL Query Optimization for GPS Tracking system

MySQL partial indexes on varchar fields and group by optimization

Categories

Resources