We want to map the entries of the calibration_data to the calibration data by following query. But the duration of this query is quite too long in my opinion (>24h).
Is there any optimization possible?
We added for testing more Indexes as needed right now but it didn't had any impact on the duration.
[Edit]
The hardware shouldn't be the biggest bottleneck
128 GB RAM
1TB SSD RAID 5
32 cores
EXPLAIN result
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| 1 | SIMPLE | cal | NULL | ALL | NULL | NULL | NULL | NULL | 2009 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | m | NULL | ALL | visit | NULL | NULL | NULL | 3082466 | 100.00 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
Query which takes too long:
Insert into knn_data (SELECT cal.X AS X,
cal.Y AS Y,
cal.BeginTime AS BeginTime,
cal.EndTime AS EndTime,
avg(m.dbm_ant) AS avg_dbm_ant,
m.ant_id AS ant_id,
avg(m.location) avg_location,
count(*) AS count,
m.visit
FROM calibration cal
LEFT join calibration_data m
ON m.visit BETWEEN cal.BeginTime AND cal.EndTime
GROUP BY cal.X,
cal.Y,
cal.BeginTime,
cal. BeaconId,
m.ant_id,
m.macHash,
m.visit;
Table knn_data:
CREATE TABLE `knn_data` (
`X` int(11) NOT NULL,
`Y` int(11) NOT NULL,
`BeginTime` datetime NOT NULL,
`EndTIme` datetime NOT NULL,
`avg_dbm_ant` float DEFAULT NULL,
`ant_id` int(11) NOT NULL,
`avg_location` float DEFAULT NULL,
`count` int(11) DEFAULT NULL,
`visit` datetime NOT NULL,
PRIMARY KEY (`ant_id`,`visit`,`X`,`Y`,`BeginTime`,`EndTIme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Table calibration
BeaconId, X, Y, BeginTime, EndTime
41791, 1698, 3944, 2016-11-12 22:44:00, 2016-11-12 22:49:00
CREATE TABLE `calibration` (
`BeaconId` int(11) DEFAULT NULL,
`X` int(11) DEFAULT NULL,
`Y` int(11) DEFAULT NULL,
`BeginTime` datetime DEFAULT NULL,
`EndTime` datetime DEFAULT NULL,
KEY `x,y` (`X`,`Y`),
KEY `x` (`X`),
KEY `y` (`Y`),
KEY `BID` (`BeaconId`),
KEY `beginTime` (`BeginTime`),
KEY `x,y,beg,bid` (`X`,`Y`,`BeginTime`,`BeaconId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Table calibration_data
macHash, visit, dbm_ant, ant_id, mac, isRand, posX, posY, sources, ip, dayOfMonth, location, am, ar
'f5:dc:7d:73:2d:e9', '2016-11-12 22:44:00', '-87', '381', 'f5:dc:7d:73:2d:e9', NULL, NULL, NULL, NULL, NULL, '12', '18.077636300207715', 'inradius_41791', NULL
CREATE TABLE `calibration_data` (
`macHash` varchar(100) COLLATE utf8_bin NOT NULL,
`visit` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dbm_ant` int(3) NOT NULL,
`ant_id` int(11) NOT NULL,
`mac` char(17) COLLATE utf8_bin DEFAULT NULL,
`isRand` tinyint(4) DEFAULT NULL,
`posX` double DEFAULT NULL,
`posY` double DEFAULT NULL,
`sources` int(2) DEFAULT NULL,
`ip` int(10) unsigned DEFAULT NULL,
`dayOfMonth` int(11) DEFAULT NULL,
`location` varchar(80) COLLATE utf8_bin DEFAULT NULL,
`am` varchar(300) COLLATE utf8_bin DEFAULT NULL,
`ar` varchar(300) COLLATE utf8_bin DEFAULT NULL,
KEY `visit` (`visit`),
KEY `macHash` (`macHash`),
KEY `ant, time` (`dbm_ant`,`visit`),
KEY `beacon` (`am`),
KEY `ant_id` (`ant_id`),
KEY `ant,mH,visit` (`ant_id`,`macHash`,`visit`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Onetime task? Then it does not matter? After getting this data loaded, will you incrementally update the "summary table" each day?
Shrink datatypes -- bulky data takes longer to process. Example: a 4-byte INT DayOfMonth could be a 1-byte TINYINT UNSIGNED.
You are moving a TIMESTAMP into a DATETIME. This may or may not work as you expect.
INT UNSIGNED is OK for IPv4, but you can't fit IPv6 in it.
COUNT(*) probably does not need a 4-byte INT; see the smaller variants.
Use UNSIGNED where appropriate.
A mac-address takes 19 bytes the way you have it; it could easily be converted to/from a 6-byte BINARY(6). See REPLACE(), UNHEX(), HEX(), etc.
What is the setting of innodb_buffer_pool_size? It could be about 100G for the big RAM you have.
Do the time ranges overlap? If not, take advantage of that. Also, don't include unnecessary columns in the PRIMARY KEY, such as EndTime.
Have the GROUP BY columns in the same order as the PRIMARY KEY of knn_data; this will avoid a lot of block splits during the INSERT.
The big problem is that there is no useful index in calibration_data, so the JOIN has to do a full table scan again and again! An extimated 2K scans of 3M rows! Let me focus on that problem...
There is no good way to do WHERE x BETWEEN start AND end because MySQL does not know whether the datetime ranges overlap. There is no real cure for that in this context, so let me approach it differently...
Are start and end 'regular'? Like every hour? Of so, we can do some sort of computation instead of the BETWEEN. Let me know if this is the case; I will continue my thoughts.
That's a nasty and classical one on "range" queries: the optimiser doesnt use your indexes and end up in a full table scan. In your explain plan ou can see this on column type=ALL.
Ideally you should have type=range and something in the key column
Some ideas:
I doubt that changing you jointure from
ON m.visit BETWEEN cal.BeginTime AND cal.EndTime
to
ON m.visit >= cal.BeginTime AND m.visit <= cal.EndTime
will work, but still give it a try.
Do trigger an ANALYSE TABLE on both tables. This is will update the stats on your tables and might help the optimiser to take the right decision (ie using the indexes)
Change the query to this might also help to force the optimiser use indexes :
Insert into knn_data (SELECT cal.X AS X,
cal.Y AS Y,
cal.BeginTime AS BeginTime,
cal.EndTime AS EndTime,
avg(m.dbm_ant) AS avg_dbm_ant,
m.ant_id AS ant_id,
avg(m.location) avg_location,
count(*) AS count,
m.visit
FROM calibration cal
LEFT join calibration_data m
ON m.visit >= cal.BeginTime
WHERE m.visit <= cal.EndTime
GROUP BY cal.X,
cal.Y,
cal.BeginTime,
cal. BeaconId,
m.ant_id,
m.macHash,
m.visit;
That's all I am thinking off...
Related
I've went over this many times but I couldnt find a way to make this faster. I have a table with about 4 million records and I want to grab rows from a specific date range (which would only yield about 10000 results). My query takes 10 seconds to execute... why!?
SELECT *
FROM banjo_live.actions_activity
where userid IN (102,164,94,140)
AND actionsid=4
AND (actions_activity_timestamp between '2021-06-01 00:00:00'
AND '2021-06-31 23:23:23')
AND new_statusid NOT IN (10,13)
LIMIT 0, 50000
Surely this shouldnt take 10 seconds. What could be the issue?
Thanks
My table;
DROP TABLE IF EXISTS `actions_activity`;
CREATE TABLE `actions_activity` (
`actions_activity_id` int(11) NOT NULL AUTO_INCREMENT,
`orderid` int(11) NOT NULL,
`barcodeid` int(11) NOT NULL,
`skuid` int(11) NOT NULL,
`sku_code` varchar(50) CHARACTER SET latin1 COLLATE latin1_swedish_ci NULL DEFAULT NULL,
`actionsid` int(11) NOT NULL,
`action_note` text CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`starting_count` int(11) NOT NULL,
`new_count` int(11) NOT NULL,
`old_statusid` int(11) NOT NULL COMMENT 'Old Status',
`new_statusid` int(11) NOT NULL COMMENT 'New Status',
`userid` int(11) NOT NULL COMMENT 'Handled By',
`actions_activity_timestamp` timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP(0),
`actions_activity_created_at` timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sessionid` int(11) NULL DEFAULT NULL,
PRIMARY KEY (`actions_activity_id`) USING BTREE,
INDEX `FetchingIndex`(`barcodeid`) USING BTREE,
INDEX `skuindex`(`skuid`) USING BTREE,
INDEX `searchbysession`(`sessionid`) USING BTREE,
FULLTEXT INDEX `sku_code`(`sku_code`)
) ENGINE = InnoDB AUTO_INCREMENT = 4336767 CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
23:23:23 ?? -- Gordon's rewrite avoids typos like this. Or, I prefer this:
actions_activity_timestamp >= '2021-06-01' AND
actions_activity_timestamp < '2021-06-01' + INTERVAL 1 MONTH
Add a 2-column index where the second column is whichever of the other things in the WHERE is most selective:
INDEX(actionsid, ...)
Once you add an ORDER BY (cf, The Impaler), there may be a better index.
Are you really expecting 10K rows of output? That will choke most clients. Maybe there is some processing you could have SQL do so the output won't be as bulky?
First, I assume you intend:
SELECT *
FROM banjo_live.actions_activity
WHERE userid IN (102,164,94,140) AND
actionsid = 4 AND
actions_activity_timestamp >= '2021-06-01' AND
actions_activity_timestamp < '2021-07-01' AND
new_statusid NOT IN (10, 13)
LIMIT 0, 50000;
You want a composite index. Without knowing the sizes of the fields, I would suggest an index on (actionsid, userid, actions_activity_timestamp, new_statusid).
For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.
"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.
I have a monitoring system where my customers can register their terminals and his terminals send a periodically (5min) keepalive signal to my website to inform that it is online. customers also can access a monitoring page that show all his terminals and update it's status using ajax in an interval of 20sec.
Plus information: a terminal is a android device, customer have to install an app from google play.
THE PROBLEM IS:
With increasing customer number, many peoples access the monitoring page at the same time that is almost flooding server with many requests, and on the other side. each time more terminals is comming and flooding more with it's keepalive signal. so I have besides the common pages (login, many CRUDs etc) dozens phisical terminals sending keepalive signal through internet flooding my database, and many users accessing monitoring pages to get informed their terminals are online. it seems like a time bomb. because I don't know if mysql will support when number of terminals reach hundreds and counting.
PLUS we're already noting our server is decreasing performance along the time it is running. We restart it, and it's very fast, but along the time, it will lose performance
SOLUTION
What can I do to improve performance or make the model more scalable? there is an design pattern for this kind of monitoring system that is more scalable?
There is any gain if I separate two mysql databases, one for common use (access pages, cruds etc) and another for monitoring system?
There is any gain to use MongoDB just for the monitoring part of the system?
additional information:
mysql Ver 14.14 Distrib 5.5.43, for Linux (x86_64) using readline 5.1
PHP 5.4.40 (cli) (built: Apr 15 2015 15:55:28)
Jetty 8.1.14 (for java server side that comunicates with android app)
Server Mon
Free memory ........: 17.84 Gb
Total memory........: 20 Gb
Used memory.........: 2.16 Gb
RAM.................: 20 Kb
JVM Free memory.....: 1.56 Gb
JVM Maximum memory..: 3.93 Gb
JVM Total available.: 1.93 Gb
**************************************
Total (cores).: 10
CPU idle......: 4.9%
CPU nice......: 0.0%
CPU system....: 4183000.0%
CPU total.....: 5.0%
CPU user......: 2.6%
**************************************
Total space (bytes)..: 600 Gb
Free space (bytes)...: 595.64 Gb
Usable space (bytes).: 595.64 Gb
PART OF MODEL AND MONITORING PAGE'S QUERY
This is terminals table
CREATE TABLE IF NOT EXISTS `GM_PLAYER` (
`ID_PLAYER` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`DS_GCM_ID` VARCHAR(250) NULL,
`DT_CRIACAO` DATETIME NOT NULL,
`DS_PLAYER` VARCHAR(100) NOT NULL,
`DS_JANELA_HEIGHT` INT(11) NOT NULL DEFAULT '1024',
`DS_JANELA_WIDTH` INT(11) NOT NULL DEFAULT '768',
`DS_JANELA_POS_X` INT(11) NOT NULL DEFAULT '0',
`DS_JANELA_POS_Y` INT(11) NOT NULL DEFAULT '0',
`DS_WALLPAPER` VARCHAR(255) NULL DEFAULT NULL,
`FL_ATIVO` CHAR(1) NOT NULL DEFAULT 'N',
`FL_FULL_SCREEN` CHAR(1) NOT NULL DEFAULT 'S',
`FL_MOUSE_VISIBLE` CHAR(1) NOT NULL DEFAULT 'N',
`DS_SERIAL` VARCHAR(50) NULL DEFAULT NULL,
`VERSAO_APP` VARCHAR(20) NULL DEFAULT NULL,
`VERSAO_OS` VARCHAR(20) NULL DEFAULT NULL,
`FL_EXIBIR_STATUS_BAR` CHAR(1) NOT NULL DEFAULT 'S',
`ID_GRADE_PROGRAMACAO` BIGINT UNSIGNED NULL DEFAULT NULL,
`ID_CLIENTE` BIGINT UNSIGNED NULL,
`ID_PONTO` BIGINT UNSIGNED NULL,
`FL_ATIVO_SISTEMA` CHAR(1) NOT NULL DEFAULT 'S',
`FL_DEBUG` CHAR(1) NOT NULL DEFAULT 'N',
`VERSAO_APP_UPDATE` VARCHAR(20) NULL,
`FL_ESTADO_MONITOR` CHAR(1) NOT NULL DEFAULT 'L',
`FL_DEVICE_ROOTED` CHAR(1) DEFAULT 'N',
`DT_ATIVACAO` DATETIME ,
`DT_EXPIRA` DATETIME ,
`FL_EXCLUIDO` CHAR(1) DEFAULT 'N' ,
`ID_USUARIO` BIGINT UNSIGNED NOT NULL,
`ID_PACOTE` BIGINT UNSIGNED ,
`DS_IMG_BARRA` VARCHAR(255),
`FL_EXIBIR_HORA` CHAR(1),
`DS_TEXTO_BARRA` TEXT,
PRIMARY KEY (`ID_PLAYER`),
UNIQUE INDEX `UQ_GM_PLAYER_ID_PLAYER` (`ID_PLAYER` ASC),
INDEX `ID_GRADE_PROGRAMACAO` (`ID_GRADE_PROGRAMACAO` ASC),
INDEX `FK_GM_PLAYER_GM_CLIENTE_idx` (`ID_CLIENTE` ASC),
CONSTRAINT `FK_GM_PLAYER_GM_USUARIO` FOREIGN KEY (`ID_USUARIO`) REFERENCES `GM_USUARIO` (`ID_USUARIO`) ON DELETE RESTRICT,
CONSTRAINT `FK_GM_PLAYER_GM_GRADE_PROGRAMACAO` FOREIGN KEY (`ID_GRADE_PROGRAMACAO`) REFERENCES `GM_GRADE_PROGRAMACAO` (`ID_GRADE_PROGRAMACAO`) ON DELETE RESTRICT,
CONSTRAINT `FK_GM_PLAYER_GM_CLIENTE` FOREIGN KEY (`ID_CLIENTE`) REFERENCES `GM_CLIENTE` (`ID_CLIENTE`) ON DELETE RESTRICT
)
ENGINE = InnoDB
AUTO_INCREMENT = 5
DEFAULT CHARACTER SET = latin1;
another used tables
CREATE TABLE IF NOT EXISTS `GM_CLIENTE` (
`ID_CLIENTE` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`DT_CRIACAO` DATETIME NOT NULL,
`DS_CLIENTE` VARCHAR(50) NOT NULL,
`FL_ATIVO` ENUM('S','N') NULL DEFAULT 'S',
`ID_CONTATO` BIGINT UNSIGNED NOT NULL,
`ID_ENDERECO` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`ID_CLIENTE`),
UNIQUE INDEX `UQ_Cliente_ID_CLIENTE` (`ID_CLIENTE` ASC),
INDEX `fk_GM_CLIENTE_GM_CONTATO1_idx` (`ID_CONTATO` ASC),
INDEX `fk_GM_CLIENTE_GM_ENDERECO1_idx` (`ID_ENDERECO` ASC),
CONSTRAINT `fk_GM_CLIENTE_GM_CONTATO1`
FOREIGN KEY (`ID_CONTATO`)
REFERENCES `GM_CONTATO` (`ID_CONTATO`)
ON DELETE RESTRICT,
CONSTRAINT `fk_GM_CLIENTE_GM_ENDERECO1`
FOREIGN KEY (`ID_ENDERECO`)
REFERENCES `GM_ENDERECO` (`ID_ENDERECO`)
ON DELETE RESTRICT)
ENGINE = InnoDB
AUTO_INCREMENT = 2
DEFAULT CHARACTER SET = latin1;
CREATE TABLE GM_USUARIO_CLIENTE (
ID_USUARIO_CLIENTE INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
ID_CLIENTE BIGINT UNSIGNED ,
ID_USUARIO BIGINT UNSIGNED
);
This is the table where I update every time I receive a new terminal keepalive signal
CREATE TABLE IF NOT EXISTS `GM_LOG_PLAYER` (
`id_log_player` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`dt_criacao` DATETIME NOT NULL,
`id_player` BIGINT UNSIGNED NULL,
`qtd_midias_exibidas` INT(11) NULL,
`id_ultima_midia_exibida` BIGINT UNSIGNED NULL,
`up_time_android` bigint(20) unsigned default '0',
`up_time_app` bigint(20) unsigned default '0',
`mem_utilizada` BIGINT(20) NULL,
`mem_disponivel` BIGINT(20) NULL,
`hd_disponivel` BIGINT(20) NULL,
`hd_utilizado` BIGINT(20) NULL,
PRIMARY KEY (`id_log_player`),
UNIQUE INDEX `UQ_id_log_player` (`id_log_player` ASC),
INDEX `FK_GM_LOG_PLAYER_GM_PLAYER_idx` (`id_player` ASC),
INDEX `FK_GM_LOG_PLAYER_GM_MIDIA_idx` (`id_ultima_midia_exibida` ASC),
CONSTRAINT `FK_GM_LOG_PLAYER_GM_PLAYER`
FOREIGN KEY (`id_player`)
REFERENCES `GM_PLAYER` (`ID_PLAYER`)
ON DELETE CASCADE,
CONSTRAINT `FK_GM_LOG_PLAYER_GM_MIDIA`
FOREIGN KEY (`id_ultima_midia_exibida`)
REFERENCES `GM_MIDIA` (`ID_MIDIA`))
ENGINE = InnoDB
AUTO_INCREMENT = 3799
DEFAULT CHARACTER SET = latin1;
CREATE TABLE IF NOT EXISTS `GM_GRADE_PROGRAMACAO` (
`ID_GRADE_PROGRAMACAO` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`DT_CRIACAO` DATETIME NOT NULL,
`DS_GRADE_PROGRAMACAO` VARCHAR(100) NULL DEFAULT NULL,
`ID_USUARIO` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`ID_GRADE_PROGRAMACAO`),
UNIQUE INDEX `UQ_GM_GRADE_PROGRAMACAO_ID_GRADE_PROGRAMACAO` (`ID_GRADE_PROGRAMACAO` ASC),
INDEX `fk_GM_GRADE_PROGRAMACAO_GM_USUARIO1_idx` (`ID_USUARIO` ASC),
CONSTRAINT `fk_GM_GRADE_PROGRAMACAO_GM_USUARIO1`
FOREIGN KEY (`ID_USUARIO`)
REFERENCES `GM_USUARIO` (`ID_USUARIO`)
ON DELETE RESTRICT)
ENGINE = InnoDB
AUTO_INCREMENT = 3
DEFAULT CHARACTER SET = latin1;
This is the query executed periodically through ajax requests to update monitoring page
SELECT * FROM (
SELECT
LOG.id_log_player ,
LOG.dt_criacao ,
DATE_FORMAT (LOG.DT_CRIACAO , '%d/%m/%Y %H:%i:%s') F_DT_CRIACAO ,
(CURRENT_TIMESTAMP - LOG.DT_CRIACAO) AS IDADE_REGISTRO ,
LOG.qtd_midias_exibidas ,
LOG.id_ultima_midia_exibida ,
LOG.up_time_android ,
LOG.up_time_app ,
LOG.mem_utilizada ,
LOG.mem_disponivel ,
LOG.hd_disponivel ,
LOG.hd_utilizado ,
PLA.FL_MONITOR_LIGADO,
CLI.DS_CLIENTE ,
PLA.ID_PLAYER id_player ,
PLA.DS_PLAYER ,
PLA.ID_CLIENTE ,
PLA.VERSAO_APP ,
PLA.FL_ATIVO PLA_FL_ATIVO ,
PLA.ID_GRADE_PROGRAMACAO ,
PLA.FL_DEVICE_ROOTED ,
PLA.DS_GCM_ID ,
PLA.FL_HDMI_LIGADO ,
-- IF(PLA.FL_ATIVO='N',0,IF(PLA.ID_GRADE_PROGRAMACAO IS NULL,0,IF(PLA.ID_GRADE_PROGRAMACAO='0',0,1))) ATIVO,
IF(PLA.FL_ATIVO='N',0,1) ATIVO,
DATE_FORMAT (LOG.DT_CRIACAO , '%Y%m%d%H%i%s') TIME_STAMP_CRIACAO ,
DATE_FORMAT (LOG.DT_CRIACAO , '%d/%m às %H:%i') F_DT_CRIACAO_MIN ,
-- (CURRENT_TIMESTAMP - LOG.DT_CRIACAO) ESPERA_NOVA_COMUNICACAO ,
--GRA.ID_GRADE_PROGRAMACAO GRA_ID_GRADE ,
GRA.DS_GRADE_PROGRAMACAO GRA_DS_GRADE_PROGRAMACAO,
MID.DS_PATH_THUMB THUMB_ULTMID
FROM GM_PLAYER PLA
LEFT JOIN GM_CLIENTE CLI USING ( ID_CLIENTE )
LEFT JOIN GM_USUARIO_CLIENTE GUC USING ( ID_CLIENTE )
LEFT JOIN GM_LOG_PLAYER LOG USING ( ID_PLAYER )
LEFT JOIN GM_GRADE_PROGRAMACAO GRA USING ( ID_GRADE_PROGRAMACAO )
-- LEFT JOIN GM_GRADE_PROGRAMACAO GRA ON ( PLA.ID_GRADE_PROGRAMACAO = GRA.ID_GRADE_PROGRAMACAO )
LEFT JOIN GM_MIDIA MID ON ( LOG.ID_ULTIMA_MIDIA_EXIBIDA = MID.ID_MIDIA )
WHERE PLA.ID_USUARIO = ?
AND PLA.FL_EXCLUIDO = 'N'
AND PLA.FL_ATIVO = 'S'
ORDER BY LOG.DT_CRIACAO DESC
) TBALL
GROUP BY ID_PLAYER
ORDER BY PLA_FL_ATIVO DESC , DT_CRIACAO DESC
EXPLAIN QUERY ABOVE (taken from development database)
+----+-------------+------------+--------+------------------------------------------------------+----------------------------------------------+---------+--------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+------------------------------------------------------+----------------------------------------------+---------+--------------------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 37752 | Using temporary; Using filesort |
| 2 | DERIVED | PLA | ALL | NULL | NULL | NULL | NULL | 44 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | CLI | eq_ref | PRIMARY,UQ_Cliente_ID_CLIENTE | PRIMARY | 8 | imidiatv.PLA.ID_CLIENTE | 1 | NULL |
| 2 | DERIVED | GUC | ref | fk_GM_CLIENTE_has_GM_USUARIO_GM_CLIENTE1_idx | fk_GM_CLIENTE_has_GM_USUARIO_GM_CLIENTE1_idx | 8 | imidiatv.PLA.ID_CLIENTE | 1 | Using index |
| 2 | DERIVED | LOG | ref | FK_GM_LOG_PLAYER_GM_PLAYER_idx | FK_GM_LOG_PLAYER_GM_PLAYER_idx | 9 | imidiatv.PLA.ID_PLAYER | 858 | NULL |
| 2 | DERIVED | GRA | eq_ref | PRIMARY,UQ_GM_GRADE_PROGRAMACAO_ID_GRADE_PROGRAMACAO | PRIMARY | 8 | imidiatv.PLA.ID_GRADE_PROGRAMACAO | 1 | NULL |
| 2 | DERIVED | MID | eq_ref | PRIMARY,UQ_GM_MIDIA_ID_MIDIA | PRIMARY | 8 | imidiatv.LOG.id_ultima_midia_exibida | 1 | NULL |
+----+-------------+------------+--------+------------------------------------------------------+----------------------------------------------+---------+--------------------------------------+-------+----------------------------------------------+
Thanks in advance
Partial answer...
One aspect of scaling is to minimize the disk footprint so that caching will be more effective. Toward that end, here are some suggestions:
PRIMARY KEY (`id_log_player`),
UNIQUE INDEX `UQ_id_log_player` (`id_log_player` ASC),
A PRIMARY KEY is a UNIQUE key, so the latter is redundant and wasteful of disk space and INSERT time. DROP it.
INT is 4 bytes; BIGINT is 8 bytes. ID_xx INT UNSIGNED can handle up to 4 billion values; do you really need to go beyond 4 billion? In InnoDB, each secondary key contains a copy of the PRIMARY KEY, meaning that an unnecessary BIGINT PK consumes a lot more space.
Your tables are latin1; are you limiting the App to western languages? If you change to utf8 (or utf8mb4), I will point out wasted space for CHAR(1).
Please perform EXPLAIN SELECT ... with the tables as they stand; then make some of the changes below and do the EXPLAIN again. I'm thinking that the difference may be dramatic. I expect the part dealing with
LEFT JOIN GM_USUARIO_CLIENTE GUC USING ( ID_CLIENTE )
to be quite 'dramatic'.
If GM_USUARIO_CLIENTE is a "many-to-many" mapping, ...
Get rid of the AUTO_INCREMENT; instead use PRIMARY KEY(ID_CLIENTE, ID_USUARIO) to save some space and make it more efficient. (And if you do go beyond 4 billion CLIENTEs, etc, the INT would not suffice!)
Add two indexes so that lookups will be much faster. (1) the PK (above), and (2) the other direction: INDEX(ID_USUARIO, ID_CLIENTE). Without those, JOINs involving that table will be slower and slower as you scale.
Date arithmetic is not this simple:
(CURRENT_TIMESTAMP - LOG.DT_CRIACAO)
Study the manual page on date functions; it is more complex to subtract TIMESTAMP - DATETIME. If you will be spanning timezones, be careful which datatype you use for what.
I see this pattern:
SELECT * FROM (
SELECT ...
ORDER BY ... -- ??
) x
GROUP BY ...
What were you hoping to achieve? The optimizer is free to ignore the ORDER BY in the subquery. (Although, it may actually be performing it.)
Don't use LEFT unless you have a reason for it.
This clause
WHERE PLA.ID_USUARIO = ?
AND PLA.FL_EXCLUIDO = 'N'
AND PLA.FL_ATIVO = 'S'
would benefit (greatly?) from INDEX(ID_USUARIO, FL_EXCLUIDO, FL_ATIVO). The order (in this case) of the columns in the index does not matter. If those two flags are changing frequently, do not include them in the INDEX -- UPDATEs might be slowed down more than SELECTs would benefit.
Those were the easy-to-spot suggestions. EXPLAIN may help spot more suggestions. And do you have other SELECTs?
I said "partial solution". Was that SELECT the "monitoring select"? Let's also check the periodic UPDATEs.
I have a very bad performance in most of my queries. I've read a lot on stackoverflow, but still have some questions, maybe anyone could help or give me any hints?
Basically, i am working on a booking website, having among others the following tables:
objects
+----+---------+--------+---------+------------+-------------+----------+----------+-------------+------------+-------+-------------+------+-----------+----------+-----+-----+
| id | user_id | status | type_id | privacy_id | location_id | address1 | address2 | object_name | short_name | price | currency_id | size | no_people | min_stay | lat | lng |
+----+---------+--------+---------+------------+-------------+----------+----------+-------------+------------+-------+-------------+------+-----------+----------+-----+-----+
OR in MySQL:
CREATE TABLE IF NOT EXISTS `objects` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'object_id',
`user_id` int(11) unsigned DEFAULT NULL,
`status` tinyint(2) unsigned NOT NULL,
`type_id` tinyint(3) unsigned DEFAULT NULL COMMENT 'type of object, from object_type id',
`privacy_id` tinyint(11) unsigned NOT NULL COMMENT 'id from privacy',
`location_id` int(11) unsigned DEFAULT NULL,
`address1` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`address2` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`object_name` varchar(35) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'given name by user',
`short_name` varchar(12) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'short name, selected by user',
`price` int(6) unsigned DEFAULT NULL,
`currency_id` tinyint(3) unsigned DEFAULT NULL,
`size` int(4) unsigned DEFAULT NULL COMMENT 'size rounded and in m2',
`no_people` tinyint(3) unsigned DEFAULT NULL COMMENT 'number of people',
`min_stay` tinyint(2) unsigned DEFAULT NULL COMMENT '0=no min stay;else # nights',
`lat` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`lng` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1451046 ;
reservations
+----+------------+-----------+-----------+---------+--------+
| id | by_user_id | object_id | from_date | to_date | status |
+----+------------+-----------+-----------+---------+--------+
OR in MySQL:
CREATE TABLE IF NOT EXISTS `reservations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`by_user_id` int(11) NOT NULL COMMENT 'user_id of guest',
`object_id` int(11) NOT NULL COMMENT 'id of object',
`from_date` date NOT NULL COMMENT 'start date of reservation',
`to_date` date NOT NULL COMMENT 'end date of reservation',
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=890729 ;
There are a few questions:
1 - I have not set any additional key (except primary) - where should I set and which key should I set?
2 - I have read about MyISAM vs InnoDB, the conclusion for me was that MyISAM is faster when it comes to read-only, whereas InnoDB is designed for tables that get UPDATED or INSERTs more frequently. So, currently objects uses MyISAM and reservations InnoDB. Is this a good idea to mix? Is there a better choice?
3 - I need to query those objects that are available in a certain period (between from_date and end_date). I have read (among others) this post on stackoverflow: MySQL select rows where date not between date
However, when I use the suggested solution the query times out before returning any results (so it is really slow):
SELECT DISTINCT o.id FROM objects o LEFT JOIN reservations r ON(r.object_id=o.id) WHERE
COALESCE('2012-04-05' NOT BETWEEN r.from_date AND r.to_date, TRUE)
AND COALESCE('2012-04-08' NOT BETWEEN r.from_date AND r.to_date, TRUE)
AND o.location_id=201
LIMIT 20
What am I doing wrong? What is the best solution for doing such a query? How do other sites do it? Is my database structure not the best for this or is it only the query?
I would have some more questions, but I would be really grateful for getting any help on this! Thank you very much in advance for any hint or suggestion!
It appears you are looking for any "objects" that do NOT have a reservation conflict based on the from/to dates provided. Doing a coalesce() to always include those that are not ever found in reservations is an ok choice, however, being a left-join, I would try left joining where the IS a date found, and ignoring any objects FOUND. Something like
SELECT DISTINCT
o.id
FROM
objects o
LEFT JOIN reservations r
ON o.id = r.object_id
AND ( r.from_date between '2012-04-05' and '2012-04-08'
OR r.to_date between '2012-04-05' and '2012-04-08' )
WHERE
o.location_id = 201
AND r.object_id IS NULL
LIMIT 20
I would ensure an index on the reservations table by (object_id, from_date ) and another (object_id, to_date). By explicitly using the from_date between range, (and to date also), you are specifically looking FOR a reservation occupying this time period. If they ARE found, then don't allow, hence the WHERE clause looking for "r.object_id IS NULL" (ie: nothing is found in conflict within the date range you've provided)
Expanding from my previous answer, and by having two distinct indexes on (id, from date) and (id, to date), you MIGHT get better performance by joining on reservations for each index respectively and expecting NULL in BOTH reservation sets...
SELECT DISTINCT
o.id
FROM
objects o
LEFT JOIN reservations r
ON o.id = r.object_id
AND r.from_date between '2012-04-05' and '2012-04-08'
LEFT JOIN reservations r2
ON o.id = r2.object_id
AND r2.to_date between '2012-04-05' and '2012-04-08'
WHERE
o.location_id = 201
AND r.object_id IS NULL
AND r2.object_id IS NULL
LIMIT 20
I wouldn't mix InnoDB and MyISAM tables, but I would define all the tables as InnoDB (for foreing keys support). Generally all the columns with the _id suffix should be foreign keys refering to appropriate table (object_id => objects etc).
You don't have to define index on foreign key as it is defined automatically (since MySQL 4.1.2), but you can define additional indexes on reservations.from_date and reservations.to_date columns for faster comparison.
I know this is a year old, but if you've tried that solution above, the logic isn't complete. It misses reservations that start before the query start AND end after the query end. Also between doesn't cope with reservations that start and end at the same time.
This worked better for me:
SELECT venues.id
FROM venues LEFT JOIN reservations r
ON venues.id = r.venue_id && (r.date_end >':start' and r.date_start <':end')
WHERE r.venue_id IS NULL
ORDER BY venues.id
I have a mysql (5.0.22) myisam table with roughly 300k records in it and I want to do a lat/lon distance search within a five mile radius.
I have an index that covers the lat/lon fields and is fast (milisecond response) when I just select for lat/lon. But when I select for additional fields in the table is slows down horribly to 5-8 seconds.
I'm using myisam to take advantage of fulltext search. The other indexes perform well (e.g. select * from Listing where slug = 'xxxxx').
How can I optimize my query, table or index to speed things up?
My schema is:
CREATE TABLE `Listing` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(125) collate utf8_unicode_ci default NULL,
`phone` varchar(18) collate utf8_unicode_ci default NULL,
`fax` varchar(18) collate utf8_unicode_ci default NULL,
`email` varchar(55) collate utf8_unicode_ci default NULL,
`photourl` varchar(55) collate utf8_unicode_ci default NULL,
`thumburl` varchar(5) collate utf8_unicode_ci default NULL,
`website` varchar(85) collate utf8_unicode_ci default NULL,
`categoryid` int(10) unsigned default NULL,
`addressid` int(10) unsigned default NULL,
`deleted` tinyint(1) default NULL,
`status` int(10) unsigned default '2',
`parentid` int(10) unsigned default NULL,
`organizationid` int(10) unsigned default NULL,
`listinginfoid` int(10) unsigned default NULL,
`createuserid` int(10) unsigned default NULL,
`createdate` datetime default NULL,
`lasteditdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
`lastedituserid` int(10) unsigned default NULL,
`slug` varchar(155) collate utf8_unicode_ci default NULL,
`aclid` int(10) unsigned default NULL,
`alt_address` varchar(80) collate utf8_unicode_ci default NULL,
`alt_website` varchar(80) collate utf8_unicode_ci default NULL,
`lat` decimal(10,7) default NULL,
`lon` decimal(10,7) default NULL,
`city` varchar(80) collate utf8_unicode_ci default NULL,
`state` varchar(10) collate utf8_unicode_ci default NULL,
PRIMARY KEY (`id`),
KEY `idx_fetch` USING BTREE (`slug`,`deleted`),
KEY `idx_loc` (`state`,`city`),
KEY `idx_org` (`organizationid`,`status`,`deleted`),
KEY `idx_geo_latlon` USING BTREE (`status`,`lat`,`lon`),
FULLTEXT KEY `idx_name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=DYNAMIC;
My query is:
SELECT Listing.name, Listing.categoryid, Listing.lat, Listing.lon
, 3956 * 2 * ASIN(SQRT( POWER(SIN((Listing.lat - 37.369195) * pi()/180 / 2), 2) + COS(Listing.lat * pi()/180) * COS(37.369195 * pi()/180) * POWER(SIN((Listing.lon --122.036849) * pi()/180 / 2), 2) )) rawgeosearchdistance
FROM Listing
WHERE
Listing.status = '2'
AND ( Listing.lon between -122.10913433498 and -121.96456366502 )
AND ( Listing.lat between 37.296909665016 and 37.441480334984)
HAVING rawgeosearchdistance < 5
ORDER BY rawgeosearchdistance ASC;
Explain plan without geosearch:
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len |ref | rows | Extra |
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-------------+
| 1 | SIMPLE | Listing | range | idx_geo_latlon | idx_geo_latlon | 19 | NULL | 453 | Using where |
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-------------+
Explain plan with geosearch:
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-----------------------------+
| 1 | SIMPLE | Listing | range | idx_geo_latlon | idx_geo_latlon | 19 | NULL | 453 | Using where; Using filesort |
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-----------------------------+
Here's the explain plan with the covering index. Having the columns in the correct order made a big difference:
+----+-------------+--------+-------+---------------+---------------+---------+------+--------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------------+---------+------+--------+------------------------------------------+
| 1 | SIMPLE | Listing | range | idx_geo_cover | idx_geo_cover | 12 | NULL | 453 | Using where; Using index; Using filesort |
+----+-------------+--------+-------+---------------+---------------+---------+------+--------+------------------------------------------+
Thank you!
I think you really should consider the use of PostgreSQL (combined with Postgis).
I have given up on MySQL for geospatial data (for now) because of the following reasons:
MySQL only supports spatial datatypes / spatial indexes on MyISAM tables with the inherent disadvantages of MyISAM (concerning transactions, referential integrity...)
MySQL implements some of the OpenGIS
specifications only on a MBR-basis
(minimum bounding rectangle) which is
pretty useless for most serious
geospatial querying-processing (see
this link in the MySQL manual). Chances are you will need some of this functionality sooner of later.
PostgreSQL/Postgis with proper (GIST) spatial indexes and proper queries can be extremely fast.
Example: determining overlapping polygons between a 'small' selection of polygons and a table with over 5 million (!) very complex polygons, calculate the amount of overlap between these results + sort. Average runtime: between 30 and 100 milliseconds (This particular machine has a lot of RAM off course. Don't forget to tune your PostgreSQL install... (read the docs)).
You are probably using a 'covering index' in your lat/lon only query. A covering index occurs when the index used by the query contains the data that you are selecting for. MySQL only needs to visit the index and never the data rows. See this for more info. That would explain why the lat/lon query is so fast.
I suspect that the calculations and the sheer number of rows returned, slows down the longer query. (plus any temp table that has to be created for the having clause).
When I implemented geo radius search I just loaded all of the us Zipcodes into memory with their lat long and then used my starting point with radius to get a list of zipcodes in the radius and then used that for my db query. Of course I was using solr to do my searching because the search space was in the 20 million row range but the same principles should apply. Apologies for the shallowness of this response as I'm on my phone.
Depending on the number of your listings could you create a view that contains
Listing1Id, Listing2ID, Distance
Basically just have all of the distances "pre-calculated"
Then you could do something like:
Select listing2ID from v_Distance d
where distance < 5 and listing1ID =
XXX
You really should avoid doing that much math in your select statement. That's probably the source of a lot of your slowdowns. Remember, SQL is a query language; it's really not optimized for trigonometric functions.
SQL will be faster and your overall results will be faster if you do a very naive distance search (which will return more results) and then winnow your results.
If you want to be using distance in your query, at the very least, use a squared distance calculation; sqrt calculations are notoriously slow. Squared distance is much easier to use. A squared distance calculation is simply using the square of the distance instead of the distance; it is much simpler. For cartesian coordinate systems, since the sum of the squares of the short sides of a right triangle equals the square of the hypotenuse, it's easier to calculate the square distance (just sum the two squares) than it is to calculate the distance; all you have to do is make sure that you're squaring the distance you want to compare to (so instead of finding the precise distance and comparing that to your desired distance (let's say 5), you find the square distance, and compare that to the square of the desired distance (25, if your desired distance was 5).