I have a query that runs really slow (15 20 seconds) when is not on memory and quite fast when is on memory (2s - 0.6s)
select count(distinct(concat(conexiones.tMacAdres,date_format(conexiones.fFecha,'%Y%m%d')))) as Conexiones,
sum(if(conexiones.tEvento='megusta',1,0)) as MeGusta,sum(if(conexiones.tEvento='megusta',conexiones.nAmigos,0)) as ImpactosMeGusta,
sum(if(conexiones.tEvento='checkin',1,0)) as CheckIn,sum(if(conexiones.tEvento='checkin',conexiones.nAmigos,0)) as ImpactosCheckIn,
min(conexiones.fFecha) Fecha_Inicio, now() Fecha_fin,datediff(now(),min(conexiones.fFecha)) as dias
from conexiones, instalaciones
where conexiones.idInstalacion=instalaciones.idInstalacion and conexiones.idInstalacion=190
and (fFecha between '2014-01-01 00:00:00' and '2016-06-18 23:59:59')
group by instalaciones.tNombre
order by instalaciones.idCliente
This is Table SCHEMAS:
Instalaciones with 1332 rows:
CREATE TABLE `instalaciones` (
`idInstalacion` int(10) unsigned NOT NULL AUTO_INCREMENT,
`idCliente` int(10) unsigned DEFAULT NULL,
`tRouterSerial` varchar(50) DEFAULT NULL,
`tFacebookPage` varchar(256) DEFAULT NULL,
`tidFacebook` varchar(64) DEFAULT NULL,
`tNombre` varchar(128) DEFAULT NULL,
`tMensaje` varchar(128) DEFAULT NULL,
`tWebPage` varchar(128) DEFAULT NULL,
`tDireccion` varchar(128) DEFAULT NULL,
`tPoblacion` varchar(128) DEFAULT NULL,
`tProvincia` varchar(64) DEFAULT NULL,
`tCodigoPosta` varchar(8) DEFAULT NULL,
`tLatitud` decimal(15,12) DEFAULT NULL,
`tLongitud` decimal(15,12) DEFAULT NULL,
`tSSID1` varchar(40) DEFAULT NULL,
`tSSID2` varchar(40) DEFAULT NULL,
`tSSID2_Pass` varchar(40) DEFAULT NULL,
`fSincro` datetime DEFAULT NULL,
`tEstado` varchar(10) DEFAULT NULL,
`tHotspot` varchar(10) DEFAULT NULL,
`fAlta` datetime DEFAULT NULL,
PRIMARY KEY (`idInstalacion`),
UNIQUE KEY `tRouterSerial` (`tRouterSerial`),
KEY `idInstalacion` (`idInstalacion`)
) ENGINE=InnoDB AUTO_INCREMENT=1332 DEFAULT CHARSET=utf8;
Conexiones with 2370365 rows
CREATE TABLE `conexiones` (
`idConexion` int(10) unsigned NOT NULL AUTO_INCREMENT,
`idInstalacion` int(10) unsigned DEFAULT NULL,
`idUsuario` int(11) DEFAULT NULL,
`tMacAdres` varchar(64) DEFAULT NULL,
`tUsuario` varchar(128) DEFAULT NULL,
`tNombre` varchar(64) DEFAULT NULL,
`tApellido` varchar(64) DEFAULT NULL,
`tEmail` varchar(64) DEFAULT NULL,
`tSexo` varchar(20) DEFAULT NULL,
`fNacimiento` date DEFAULT NULL,
`nAmigos` int(11) DEFAULT NULL,
`tPoblacion` varchar(64) DEFAULT NULL,
`fFecha` datetime DEFAULT NULL,
`tEvento` varchar(20) DEFAULT NULL,
PRIMARY KEY (`idConexion`),
KEY `idInstalacion` (`idInstalacion`),
KEY `tMacAdress` (`tMacAdres`) USING BTREE,
KEY `fFecha` (`fFecha`),
KEY `idUsuario` (`idUsuario`),
KEY `insta_fecha` (`idInstalacion`,`fFecha`)
) ENGINE=InnoDB AUTO_INCREMENT=2370365 DEFAULT CHARSET=utf8;
This is EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE instalaciones const PRIMARY,idInstalacion PRIMARY 4 const 1
1 SIMPLE conexiones ref idInstalacion,fFecha,insta_fecha idInstalacion 5 const 110234 "Using where"
Thanks !
(Edited)
SHOW TABLE STATUS LIKE 'conexiones'
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
conexiones InnoDB 10 Compact 2305296 151 350060544 0 331661312 75497472 2433305 28/06/2016 22:26 NULL NULL utf8_general_ci NULL
Here's why it is so slow. And I will end with a possible speedup.
First, please do
SELECT COUNT(*) FROM conexiones
WHERE idInstalacion=190
and fFecha >= '2014-01-01'
and fFecha < '2016-06-19
in order to see how many rows we are dealing with. The EXPLAIN suggests 110234, but that is only a crude estimate.
Assuming there are 110K rows of conexiones involved in the query, and assuming the rows were (approximately) inserted in chronological order by fFecha, then...
There are a lot of rows to work with, and
They are scattered around the table on disk, hence
The query takes a lot of I/O, unless it is cached.
Let's further check on my last claim... How much RAM do you have? What is the value of innodb_buffer_pool_size? It should be about 70% of available RAM. Use a lower percentage if you have less than 4GB of RAM.
Assuming that conexiones is too big to be 'cached' in the 'buffer_pool', we need to find a way to decrease the I/O.
There are 1332 different values for idInstalacion. Perhaps you insert 1332 rows every few minutes/hours into conexiones? Since the PRIMARY KEY merely an AUTO_INCREMENT, those rows will be 'appended' to the end of the table.
Now let's look at where the idInstalacion=190 rows are. A new one of them occurs every 1332 (or so) rows. That means they are spread out. It means that (probably) no two rows are in the same block (16KB in InnoDB). That means that the 110234 will be in 110234 different blocks. That's about 2GB. If the buffer_pool is smaller than that, then there will be I/O. Even if it is bigger than that, that's a lot of data to touch.
But what to do about it? If we could arrange the =190 rows to be consecutive in the table, then the 2GB might drop to, say, 20MB -- a much more manageable and cacheable size. But how can that be done? By changing the PRIMARY KEY.
PRIMARY KEY(idInstalacion, fFecha, idConexion),
INDEX(idConexion)
and DROP any other indexes starting with idInstalacion or idConexion. To explain:
Since the PK is "clustered" with the data, all idInstalacion=190 rows over any consecutive fFetcha range will be consecutive in the data. So, fetching one block will get about 100 rows -- much less I/O.
A PK must be unique. Assuming (idInstalacion, fFecha) is not unique, I tacked on idConexion to make it unique.
I added INDEX(idConexion) to make AUTO_INCREMENT happy.
Potential drawback... Since this change rearranges the order of the data, other queries, including the INSERTs may be slowed down. The INSERTs will be scattered, but not really slowed down. 1332 "hots spots" would be accepting the new rows; that many blocks can easily be cached.
Arithmetic... If you have spinning drives, I would expect the existing structure to take about 1102 seconds (perhaps under 110 seconds for SSD) for 110234 rows. Since it is taking under 20 seconds, I suspect there is some caching (or you have SSDs) or the 110234 is grossly overestimated. My suggested change should decrease the "worst" time significantly, and slightly improve the "in memory" time. This "slight improvement" comes from being able to use the PK instead of a secondary key.
Caveat: Since 110234 * 1332 is nowhere near 2370365, much of my numerical analysis is probably nowhere near correct. For example, 2370365 rows with that schema is possible less than 1GB. Please provide SHOW TABLE STATUS LIKE 'conexiones'.
Addenda
"server has 2GB Ram and innodb_buffer_pool_size is 5368709120" -- Either that is a typo or it is terrible. Since the buffer_pool needs to reside in RAM, do not set the buffer_pool to 5GB. 500MB might be OK for your tiny 2GB of RAM.
The SHOW TABLE STATUS confirms that it (data + indexes) won't quite fit in 500M, so you may periodically experience I/O bound queries with 500M.
Increasing your RAM and buffer_pool would temporarily (until the data gets bigger) help performance.
Before putting this into production, test the ALTER and time the various queries you use:
ALTER TABLE conexiones
DROP PRIMARY KEY,
DROP INDEX insta_fecha,
DROP INDEX idInstalacion,
PRIMARY KEY(idInstalacion, fFecha, idConexion),
INDEX(idConexion)
Caution: The ALTER will need about 1GB of free disk space.
When timing, run with the Query Cache off, and run twice -- the first may involve I/O; the second is the 'in memory' as you mentioned.
Revised analysis: Since the bigger table has 300MB of data and some amount of indexes in use, and assuming 500MB buffer pool, I suspect that blocks are bumped out of the buffer pool some of the time. This fits well with your initial comment on the query's speed. My suggested index changes should help avoid the speed variance, but may hurt the performance of other queries.
Try to use a multi column index:
CREATE idx_nn_1 ON conexiones(idInstalacion,fFecha);
You might need to have it the other way around depending on the data, so test both. This avoids reading all the records for between condition on fFecha matching the idInstalacion condition, and should improve performance.
Try the following:
Either delete the idInstalacion INDEX or tell the engine to use the correct key in the from clause:
from conexiones use index (insta_fecha), instalaciones
And you don't need to JOIN, GROUP or ORDER. You are joining on a constant value (190) with one row. And you don't use any column from instalaciones.
So all you need is this:
select count(distinct(concat(conexiones.tMacAdres,date_format(conexiones.fFecha,'%Y%m%d')))) as Conexiones,
sum(if(conexiones.tEvento='megusta',1,0)) as MeGusta,sum(if(conexiones.tEvento='megusta',conexiones.nAmigos,0)) as ImpactosMeGusta,
sum(if(conexiones.tEvento='checkin',1,0)) as CheckIn,sum(if(conexiones.tEvento='checkin',conexiones.nAmigos,0)) as ImpactosCheckIn,
min(conexiones.fFecha) Fecha_Inicio, now() Fecha_fin,datediff(now(),min(conexiones.fFecha)) as dias
from conexiones -- use index (insta_fecha)
where conexiones.idInstalacion=190
and (fFecha between '2014-01-01 00:00:00' and '2016-06-18 23:59:59')
However - it doesn't mean it will be faster. MySQL will probably optimize all that stuff away.
Related
I have 2 MySQL UPDATE Query problem on my website.
Problem 1
I run a content site that updates page views for posts when users read.
Each time I send push notifications, my server times out; when I comment on the Update query that increments the page views, everything returns to normal.
This I think maybe as a result of hundreds of UPDATE query trying to update the views on the same row.
**The query that updated the tablename**
update table set views='$newview' where id=1
Query Explain
id: 1
select_type: SIMPLE
table: new_jobs
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 1
Extra: Using where
**tablename create table**
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`company_id` int(11) DEFAULT NULL,
`job_title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
`advert_date` date DEFAULT NULL,
`expiry_date` date DEFAULT NULL,
`no_deadline` int(1) DEFAULT 0,
`source` varchar(20) DEFAULT NULL,
`featured` int(1) DEFAULT 0,
`views` int(11) DEFAULT 1,
`email_status` int(1) DEFAULT 0,
`draft` int(1) DEFAULT 0,
`created_by` int(11) DEFAULT NULL,
`show_company_name` int(1) DEFAULT 1,
`display_application_method` int(1) DEFAULT 0,
`status` int(1) DEFAULT 1,
`upload_date` datetime DEFAULT NULL,
`country` int(1) DEFAULT 1,
`followers_email_status` int(1) DEFAULT 0,
`og_img` varchar(255) DEFAULT NULL,
`old_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_jobs_company_id_index` (`company_id`),
KEY `new_jobs_views_index` (`views`),
KEY `new_jobs_draft_index` (`draft`),
KEY `new_jobs_country_index` (`country`)
) ENGINE=InnoDB AUTO_INCREMENT=151359 DEFAULT CHARSET=utf8
What is the best way of handling this?
[Scenario 2 removed on request]
Scenario 1. I would expect the update of a 'view' count (or 'click' or 'like' or whatever) to be more like
UPDATE t SET views = views + 1 WHERE id = 123;
I assume you have an index (probably the PRIMARY KEY) on id?
Since there are other things going on with that table, it may be wise to split off the rapidly incrementing counter into a separate table. This would avoid interfering with other queries. You can get other data, plus the counter, by using JOIN .. USING(id).
Scenario 2 does not make sense. It seems to keep the latest date for each email, but what does country mean? Since it seems like more than just a counter, you might want a separate table to log those 3 columns.
Please provide SHOW CREATE TABLE.
There are many things that novices perceive as a "crash". Please describe further -- out of connections, out of disk space, sluggishness, the client gave error message, other operations taking too long, etc. Each has a different remedy.
Query
Are you are currently logically doing
BEGIN;
$ct = SELECT views ... FOR UPDATE;
...
UPDATE ... SET views = $ct+1 WHERE ...;
COMMIT;
If so, that is much less efficient than
(with autocommit = ON)
UPDATE ... SET views = views+1 ...;
Note that the first version hangs onto the row longer. If you fail to use FOR UPDATE, you will drop some counts.
Splitting into a separate table sort of forces you to run the UPDATE as its own transaction.
Other
innodb_flush_log_at_trx_commit:
Default is 1, which is secure, but leads to at least one IOPs for each transaction.
2 leads to a flush once a second. During intense times, this is much more efficient. But a crash could lose up to one second's worth of updates. the inaccuracy of "view count" due to a rare crash is, in my opinion, acceptable.
KEY(views) needs to be updated every time views is changed. But, thanks to the "change buffer", this is unlikely to involve any extra I/O, at least now while you are doing the UPDATE.
INT(1) takes 4 bytes; the (1) has no meaning. Suggest changing to TINYINT (1 byte), thereby saving about 27 bytes per row. (7 columns plus 2 indexes)
country INT(1) -- Is it a flag? What is the meaning? Is it normalized to another table? Using 4 bytes for an id and an extra table when standard abbreviations ('US', 'UK', 'RU', 'IN', etc) would take 2 bytes? Suggest country CHAR(2) CHARACTER SET ascii COLLATE ascii_general_ci.
Indexing flags rarely benefits. Let's see the queries where you think such indexes might be used. And the EXPLAIN SELECT ... for them.
I am using Mysql 5.6 with ~150 million records in Transaction table (InnodB). As the size is increasing this table is becoming unmanageable (adding column or index) and slow even with required indexing. After searching through internet I found it is appropriate time to partition the table. I am confidant that partitioning will solve following purpose for me
Improve DML statements response time (using partitioning pruning)
Improve archival process
But I am not sure wether (and how) it will improve DDL performance for this table or not. More specifically following DDL's performance.
ALTER TABLE ADD/DROP COLUMN
ALTER TABLE ADD/DROP INDEX
I went through Mysql documentation and internet but unable to find my answer. Can anyone please help me in this or provide any relevant documentation for this.
My table structure is as following
CREATE TABLE `TRANSACTION` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`parent_uuid` char(36) DEFAULT NULL,
`order_number` varchar(64) DEFAULT NULL,
`order_id` int(11) DEFAULT NULL,
`order_uuid` char(36) DEFAULT NULL,
`order_type` char(1) DEFAULT NULL,
`business_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`store_device_id` int(11) DEFAULT NULL,
`source` char(1) DEFAULT NULL COMMENT 'instore, online, order_ahead, etc',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`customer_lang` char(2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `business_id` (`business_id`,`store_id`,`store_device_id`),
KEY `parent_uuid` (`parent_uuid`),
KEY `order_uuid` (`order_uuid`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
And I am partitioning using following statement.
ALTER TABLE TRANSACTION PARTITION BY RANGE (id)
(PARTITION p0 VALUES LESS THAN (5000000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (10000000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Thanks!
Partitioning is not a performance panacea. Even the items you mentioned will not speed up; they may even slow down.
Instead, I will critique the table to look for ways to speed up some things.
UUIDs are terrible for performance once the index on it becomes too big to be cached. This is because of its randomness. Possible solutions: compact it into BINARY(16); shrink the table other ways; avoid UUIDs.
Why have both parent_id and parent_uuid??
Shrink the 4-byte INTs to smaller datatypes where practical.
Usually CHAR should be CHARACTER SET ascii (1-byte/character), not utf8mb4 (4 bytes/char).
Caution: 150M is getting remotely close to the 2-billion limit of INT SIGNED. Consider 4B limit of INT UNSIGNED. (Each is 4 bytes.)
Do you ever use created_at or updated_at?
MySQL 8.0.13 has a very fast ADD COLUMN and DROP COLUMN (for limited situations).
5.7.?? has a less-invasive ADD INDEX than previous versions, but I am not sure it applies to partitioned tables.
5.7.4: Online DDL support reduces table rebuild time and permits concurrent DML, which helps reduce user application downtime. For additional information, see Overview of Online DDL.
More importantly, let's see the main queries that are "too slow". There may be composite indexes and/or reformulations of the queries that will speed them up.
There is even a slim chance that partitioning will help but not on the PRIMARY KEY.
I think there are only 4 use cases where partitioning helps performance.
I have to tables with 65.5 Million rows:
1)
CREATE TABLE RawData1 (
cdasite varchar(45) COLLATE utf8_unicode_ci NOT NULL,
id int(20) NOT NULL DEFAULT '0',
timedate datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
type int(11) NOT NULL DEFAULT '0',
status int(11) NOT NULL DEFAULT '0',
branch_id int(20) DEFAULT NULL,
branch_idString varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (id,cdasite,timedate),
KEY idx_timedate (timedate,cdasite)
) ENGINE=InnoDB;
2)
Same table with partition (call it RawData2)
PARTITION BY RANGE ( TO_DAYS(timedate))
(PARTITION p20140101 VALUES LESS THAN (735599) ENGINE = InnoDB,
PARTITION p20140401 VALUES LESS THAN (735689) ENGINE = InnoDB,
.
.
PARTITION p20201001 VALUES LESS THAN (738064) ENGINE = InnoDB,
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
I'm using the same query:
SELECT count(id) FROM RawData1
where timedate BETWEEN DATE_FORMAT(date_sub(now(),INTERVAL 2 YEAR),'%Y-%m-01') AND now();
2 problems:
1. why the partitioned table runs longer then the regular table?
2. the regular table returns 36380217 in 17.094 Sec. is it normal, all R&D leaders think it is not fast enough, it need to return in ~2 Sec.
What do I need to check / do / change ?
Is it realistic to scan 35732495 rows and retrieve 36380217 in less then 3-4 sec?
You have found one example of why PARTITIONing is not a performance panacea.
Where does id come from?
How many different values are there for cdasite? If thousands, not millions, build a table mapping cdasite <=> id and switch from a bulky VARCHAR(45) to a MEDIUMINT UNSIGNED (or whatever is appropriate). This item may help the most, but perhaps not enough.
Ditto for status, but probably using TINYINT UNSIGNED. Or think about ENUM. Either is 1 byte, not 4.
The (20) on INT(20) means nothing. You get a 4-byte integer with a limit of about 2 billion.
Are you sure there are no duplicate timedates?
branch_id and branch_idString -- this smells like a pair that needs to be in another table, leaving only the id here?
Smaller -> faster.
COUNT(*) is the same as COUNT(id) since id is NOT NULL.
Do not include future partitions before they are needed; it slows things down. (And don't use partitioning at all.)
To get that query even faster, build and maintain a Summary Table. It would have at least a DATE in the PRIMARY KEY and at least COUNT(*) as a column. Then the query would fetch from that table. More on Summary tables: http://mysql.rjweb.org/doc.php/summarytables
I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).
I have a giant mysql table which is growing at all the time. It's recording chat data.
this what my table looks like
CREATE TABLE `log` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`channel` VARCHAR(26) NOT NULL,
`timestamp` DATETIME NOT NULL,
`username` VARCHAR(25) NOT NULL,
`message` TEXT NOT NULL,
PRIMARY KEY (`id`),
INDEX `username` (`username`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=2582573
;
Indexing the username is kinda important because queries for a username can take like 5 seconds otherwise.
Is there anyway of optimizing this table even more to prepare it for huge amounts of data.
So that even 100m rows won't be a problem.
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
Will you have more than 4 billion rows? If not, use INT UNSIGNED, saving 4 bytes per row. Plus another 4 bytes for each row in the secondary index.
`channel` VARCHAR(26) NOT NULL,
`username` VARCHAR(25) NOT NULL,
Normalize each -- that is, replace this by, say, a SMALLINT UNSIGNED and have a mapping between them. Savings: lots.
INDEX `username` (`username`)
That becomes user_id, saving even more.
Smaller --> more cacheable --> faster.
What other queries will you have?
"Memory usage" -- For InnoDB, set innodb_buffer_pool_size to about 70% of available RAM. Then, let it worry about what is in memory, what it not. Once the table is too big to be cached, you should shrink the data (as I mentioned above) and provide 'good' indexes (as mentioned in other comments) and perhaps structure the table for "locality of reference" (without knowing all the queries, I can't address this).
You grumbled about using IDs instead of strings... Let's take a closer look at that. How many distinct usernames are there? channels? How does the data come in -- do you get one row at a time, or batches? Is something doing direct INSERTs or feeding to some code that does the INSERTs? Could there be a STORED PROCEDURE to do the normalization and insertion? If you need hundreds of rows inserted per second, then I can discuss how to do both, and do them efficiently.
You did not ask about PARTITIONs. I do not recommend it for a simple username query.
2.5M rows is about the 85th percentile. 100M rows is more exciting -- 98th percentile.