It's my table t1; It has one million rows.
CREATE TABLE `t1` (
`a` varchar(10) NOT NULL,
`b` varchar(10) DEFAULT NULL,
`c` varchar(10) DEFAULT NULL,
`d` varchar(10) DEFAULT NULL,
`e` varchar(10) DEFAULT NULL,
`f` varchar(10) DEFAULT NULL,
`g` varchar(10) DEFAULT NULL,
`h` varchar(10) DEFAULT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Result:
mysql> select * from t1 where a=10000000;
Empty set (1.42 sec)
mysql> select * from t1 where b=10000000;
Empty set (1.41 sec)
Why select primary key is as fast as a normal field?
Try select * from t1 where a='10000000';.
You're probably forcing MySQL to convert all of those strings to integers - because integers have a higher type precedence than varchar - in which case an index on the strings is useless
Actually, apparently, I was slightly wrong. By my reading of the conversions documentation, I believe that in MySQL we end up forcing both sides of the comparison to be converted to float, since I can't see any bullet point above:
In all other cases, the arguments are compared as floating-point (real) numbers.
that would match a string on one side and an integer on the other.
Data is stored in blocks in almost all databases. Reading a block is an elementary Unit of IO.
Indexes helps the system in zeroing in on the datablock which holds the data that we are trying to read and would avoid reading all the datablocks. In a very small table which has single or very few data blocks the usage of index could actually be a overhead and might be skipped altogether. Even if used, the indexes would rarely provide any performance benefit. Try the same experiment on a rather large table.
PS: Indexes and Key (Primary Keys) are not interchangeable concepts. The Former is Physical and the latter is logical.
Related
I have a table with spatial index. When I run this query, it takes more than 0.7 to return results. Is there a way to make this query faster??
(MySQL 5.7.17 on AWS RDS, InnoDB)
CREATE TABLE
CREATE TABLE `geofences` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`sticker_key` VARCHAR(50) NOT NULL COLLATE 'utf8mb4_bin',
`geofence` GEOMETRY NOT NULL,
`location` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8mb4_bin',
`expires` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8mb4_bin',
PRIMARY KEY (`id`),
SPATIAL INDEX `geofence` (`geofence`)
)COLLATE='utf8mb4_bin'
ENGINE=InnoDB;
QUERY
SELECT geofences.sticker_key FROM geofences WHERE ST_CONTAINS(geofences.geofence, POINT(126.924394,37.552754)) GROUP BY sticker_key;
EXPLAIN
Example Data
This is the smallest row in my geofence table.
MULTIPOLYGON(((126.982169151306 37.5796080752196,126.984980106354 37.5792084484235,126.985055208206 37.5801097313528,126.985623836517 37.5800672182523,126.985656023026 37.5802542757132,126.986804008484 37.5801947574811,126.9868683815 37.5831110948965,126.986482143402 37.5831961175971,126.986385583878 37.5830685835098,126.986138820648 37.5828475239075,126.98569893837 37.582711486903,126.985720396042 37.5819207668924,126.985355615616 37.5817762257676,126.984947919846 37.5817677233397,126.984915733337 37.5814616352904,126.9846367836 37.5814616352904,126.984508037567 37.5817082063176,126.984218358994 37.581878254826,126.983692646027 37.5820398005493,126.98362827301 37.5822608625499,126.983746290207 37.5823373838587,126.983370780945 37.5834086739237,126.983295679092 37.5834766918201,126.983370780945 37.5843269102809,126.983531713486 37.5843609188173,126.983789205551 37.5849730698168,126.982952356338 37.5850580903908,126.982877254486 37.5845139570391,126.982491016388 37.5845479654901,126.982126235962 37.5845564676004,126.982115507126 37.5846329865497,126.981954574585 37.5846329865497,126.981739997864 37.5837147539679,126.981611251831 37.5823118767645,126.981300115585 37.5819972885508,126.980999708176 37.5815041475947,126.98145031929 37.5812235659375,126.981514692307 37.581036510912,126.982115507126 37.5800162024996,126.982308626175 37.5799226735289,126.982169151306 37.5796080752196)))
Some are bigger (like entire nation and such)
Simplify your geometries!
The more complex a shape is, the longer it takes for MySQL to tell whether it contains a point or not -- and some countries have very complex shapes indeed. For example, the boundary of Indonesia (which you mentioned in a comment) contains over 18,000 points, most of which define the complex border between Sarawak and Kalimantan.
Unless your application specifically needs to know exactly where a user sits with regard to that border, you'll probably be much better off with a simpler version. There are a number of tools available for this purpose; one popular one is mapshaper.
I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).
can you please advise why such a query would take so long (literally 20-30 minutes)?
I seem to have proper indexes set up, don't I?
UPDATE `temp_val_import_435` t1,
`attr_upc` t2 SET t1.`attr_id` = t2.`id` WHERE t1.`value` LIKE t2.`upc`
CREATE TABLE `attr_upc` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`upc` varchar(255) NOT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `upc` (`upc`),
KEY `last_update` (`last_update`)
) ENGINE=InnoDB AUTO_INCREMENT=102739 DEFAULT CHARSET=utf8
CREATE TABLE `temp_val_import_435` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`attr_id` int(11) DEFAULT NULL,
`translation_id` int(11) DEFAULT NULL,
`source_value` varchar(255) NOT NULL,
`value` varchar(255) DEFAULT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `core_value_id` (`core_value_id`),
KEY `translation_id` (`translation_id`),
KEY `source_value` (`source_value`),
KEY `value` (`value`),
KEY `count` (`count`)
) ENGINE=InnoDB AUTO_INCREMENT=32768 DEFAULT CHARSET=utf8
Ed Cottrell's solution worked for me. Using = instead of LIKE sped up a smaller test query on 1000 rows by a lot.
I measured 2 ways: 1 in phpMyAdmin, the other looking at the time for DOM load (which of course involves other processes).
DOM load went from 44 seconds to 1 second, a 98% increase.
But the difference in query execution time was much more dramatic, going from 43.4 seconds to 0.0052 seconds, a decrease of 99.988%. Pretty good. I will report back on results from huge datasets.
Use = instead of LIKE. = should be much faster than LIKE -- LIKE is only for matching patterns, as in '%something%', which matches anything with "something" anywhere in the text.
If you have this query:
SELECT * FROM myTable where myColumn LIKE 'blah'
MySQL can optimize this by pretending you typed myColumn = 'blah', because it sees that the pattern is fixed and has no wildcards. But what if you have this data in your upc column:
blah
foo
bar
%foo%
%bar
etc.
MySQL can't optimize your query in advance, because it's possible that the text it is trying to match is a pattern, like %foo%. So, it has to perform a full text search for LIKE matches on every single value of temp_val_import_435.value against every single value of attr_upc.upc. With a simple = and the indexes you have defined, this is unnecessary, and the query should be dramatically faster.
In essence you are joining on a LIKE which is going to be problematic (would need EXPLAIN to see is MySQL if utilizing indexes at all). Try this:
UPDATE `temp_val_import_435` t1
INNER JOIN `attr_upc` t2
ON t1.`value` LIKE t2.`upc`
SET t1.`attr_id` = t2.`id` WHERE t1.`value` LIKE t2.`upc`
I am trying to generate a group query on a large table (more than 8 million rows). However I can reduce the need to group all the data by date. I have a view that captures that dates I require and this limits the query bu it's not much better.
Finally I need to join to another table to pick up a field.
I am showing the query, the create on the main table and the query explain below.
Main Query:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
INNER JOIN summary_max
ON pgi_raw_data.wsp_channel = summary_max.wsp_channel
AND pgi_raw_data.dated > summary_max.race_date
INNER JOIN pgi_accounts
ON pgi_raw_data.account = pgi_accounts.account
GROUP BY pgi_raw_data.event_id
ORDER BY NULL
The create table:
CREATE TABLE `pgi_raw_data` (
`event_id` char(25) NOT NULL DEFAULT '',
`wsp_channel` varchar(5) NOT NULL,
`dated` date NOT NULL,
`time` time DEFAULT NULL,
`program` varchar(5) NOT NULL,
`track` varchar(25) NOT NULL,
`raceno` tinyint(2) NOT NULL,
`detail` varchar(30) DEFAULT NULL,
`ticket` varchar(20) NOT NULL DEFAULT '',
`breed` varchar(12) NOT NULL,
`pool` varchar(10) NOT NULL,
`gross` decimal(11,2) NOT NULL,
`refunds` decimal(11,2) NOT NULL,
`handle` decimal(11,2) NOT NULL,
`payout` decimal(11,4) NOT NULL,
`rebate` decimal(11,4) NOT NULL,
`profit` decimal(11,4) NOT NULL,
`account` mediumint(10) NOT NULL,
PRIMARY KEY (`event_id`,`ticket`),
KEY `idx_account` (`account`),
KEY `idx_wspchannel` (`wsp_channel`,`dated`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This is my view for summary_max:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost` SQL SECURITY DEFINER VIEW
`summary_max` AS select `pgi_summary_tbl`.`wsp_channel` AS
`wsp_channel`,max(`pgi_summary_tbl`.`race_date`) AS `race_date`
from `pgi_summary_tbl` group by `pgi_summary_tbl`.`wsp
And also the evaluated query:
1 PRIMARY <derived2> ALL 6 Using temporary
1 PRIMARY pgi_raw_data ref idx_account,idx_wspchannel idx_wspchannel
7 summary_max.wsp_channel 470690 Using where
1 PRIMARY pgi_accounts ref PRIMARY PRIMARY 3 gf3data_momutech.pgi_raw_data.account 29 Using index
2 DERIVED pgi_summary_tbl ALL 42282 Using temporary; Using filesort
Any help on indexing would help.
At a minimum you need indexes on these fields:
pgi_raw_data.wsp_channel,
pgi_raw_data.dated,
pgi_raw_data.account
pgi_raw_data.event_id,
summary_max.wsp_channel,
summary_max.race_date,
pgi_accounts.account
The general (not always) rule is anything you are sorting, grouping, filtering or joining on should have an index.
Also: pgi_summary_tbl.wsp
Also, why the order by null?
The first thing is to be sure that you have indexes on pgi_summary_table(wsp_channel, race_date) and pgi_accounts(account). For this query, you don't need indexes on these columns in the raw data.
MySQL has a tendency to use indexes even when they are not the most efficient path. I would start by looking at the performance of the "full" query, without the joins:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
-- pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
GROUP BY pgi_raw_data.event_id
If this has better performance, you may have a situation where the indexes are working against you. The specific problem is called "thrashing". It occurs when a table is too bit to fit into memory. Often, the fastest way to deal with such a table is to just read the whole thing. Accessing the table through an index can result in an extra I/O operation for most of the rows.
If this works, then do the joins after the aggregate. Also, consider getting more memory, so the whole table will fit into memory.
Second, if you have to deal with this type of data, then partitioning the table by date may prove to be a very useful option. This will allow you to significantly reduce the overhead of reading the large table. You do have to be sure that the summary table can be read the same way.
Here is my table:
CREATE TABLE `letters` (
`a` bigint(20) unsigned NOT NULL,
`b` bigint(20) unsigned NOT NULL,
`c` bigint(20) unsigned NOT NULL,
`d` bigint(20) unsigned NOT NULL,
`e` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
The table will have about 1+ billion rows.
Each column can be queried; each column can be referenced. e.g.:
SELECT [any column] FROM letters WHERE [any / any other column] IN ([subquery or list]);
My question: what indices should I add to speed up any query in the format above? (Also, if possible, please try to describe 'why' it/they should be added so that I can learn from your answer).
Thanks!
-- Extra info: inserts will happen on a fairly regular basis (a few/handful every second) but select queries will happen more frequently.
Since any column can appear in the WHERE clause, you've to add an index for each of the column, except for the field a since is already the PRIMARY KEY and as such is already indexed.
UPDATE: as for the subsequent discussion, Poodlehat pointed out that the column e has a low index selectivity, i.e. "The ratio of the number of distinct values in the indexed column / columns to the number of records in the table". For this reason, it's not clear whether adding an index on column e will help or slow down queries. So Lucas will try experimentally and hopefully share the results to us.
I think you need to have a unique index on a (or it should be a primary key), definitely indexes on b,c,d (on each). No need for index on e (it won't be used anyway since as you say it has just 15 different values)