Improve query speed suggestions - mysql

For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.

"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.

Related

Speed up mysql SQL query but with a huge dataset

I have a table that has over 2.5 million rows and I would like to run the following SQL Statment to get the
select count(*)
from workflow
where action_name= 'Workflow'
and release_date >= '2019-12-01 13:24:22'
and release_date <= '2019-12-31 13:24:22'
AND project_name= 'Web'
group
by page_id
, headline
, release_full_name
, release_date
The problem is that it takes over 2.7 seconds to return 0 rows as expected. Is there a way to speed it up more? I have 6 more SQL Statements that are similiar so that will take almost (2.7 seconds * 6) = 17 seconds at least.
Here is my table schema
CREATE TABLE workflow (
id int(11) NOT NULL AUTO_INCREMENT,
action_name varchar(100) NOT NULL,
project_name varchar(30) NOT NULL,
page_id int(11) NOT NULL,
headline varchar(200) NOT NULL,
create_full_name varchar(200) NOT NULL,
create_date datetime NOT NULL,
change_full_name varchar(200) NOT NULL,
change_date datetime NOT NULL,
release_full_name varchar(200) NOT NULL,
release_date datetime NOT NULL,
reject_full_name varchar(200) NOT NULL,
reject_date datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2948271 DEFAULT CHARSET=latin1
What I'm looking for in this query is to get the count of the pages that were released last month. that have project_name = "web" and action_name = "Workflow"
This is bit bigger for comments
Using Group by with Count function doesn't make any sense. Usually you need to count actual rows in DB not after aggregation. Not sure if this is your actual requirement reason being GROUP BY causes slowness of the query.
Use composite Index on (Web, start_date) as column project seems highest selective.
For other information, Please share the explain plan.
Assuming that you need counts for groups (you had listed), better to include the group fields in select (essentially) like
select page_id, headline, release_full_name, release_date, count(*)
from ...
Adding an index with (page_id, headline) would optimize well.

MYSQL - How to add index for group by / order by / sum / with where

I am processing a mysql table with 40K rows. Current execution time is around 2 seconds with the table indexed.could some one guide me how to optimized this query and table better? and how to getrid of "Using where; Using temporary; Using filesort" ??. Any help is appreciated.
The goup by with be for the following cases...
LS_CHG_DTE_OCR
LS_CHG_DTE_OCR/RES_STATE_HSE
LS_CHG_DTE_OCR/RES_STATE_HSE/RES_CITY_HSE
LS_CHG_DTE_OCR/RES_STATE_HSE/RES_CITY_HSE/POSTAL_CDE_HSE
Thanks in advance
SELECT DATE_FORMAT(`LS_CHG_DTE_OCR`, '%Y-%b') AS fmt_date,
SUM(IF(`TYPE`='Connect',COUNT_SUBS,0)) AS connects,
SUM(IF(`TYPE`='Disconnect',COUNT_SUBS,0)) AS disconnects,
SUM(IF(`TYPE`='Connect',ROUND(REV,2),0)) AS REV,
SUM(IF(`TYPE`='Upgrade',COUNT_SUBS,0)) AS upgrades,
SUM(IF(`TYPE`='Downgrade',COUNT_SUBS,0)) AS downgrades,
SUM(IF(`TYPE`='Upgrade',ROUND(REV,2),0)) AS upgradeRev FROM `hsd`
WHERE LS_CHG_DTE_OCR!='' GROUP BY MONTH(LS_CHG_DTE_OCR) ORDER BY LS_CHG_DTE_OCR ASC
CREATE TABLE `hsd` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`SYS_OCR` varchar(255) DEFAULT NULL,
`PRIN_OCR` varchar(255) DEFAULT NULL,
`SERV_CDE_OHI` varchar(255) DEFAULT NULL,
`DSC_CDE_OHI` varchar(255) DEFAULT NULL,
`LS_CHG_DTE_OCR` datetime DEFAULT NULL,
`SALESREP_OCR` varchar(255) DEFAULT NULL,
`CHANNEL` varchar(255) DEFAULT NULL,
`CUST_TYPE` varchar(255) DEFAULT NULL,
`LINE_BUS` varchar(255) DEFAULT NULL,
`ADDR1_HSE` varchar(255) DEFAULT NULL,
`RES_CITY_HSE` varchar(255) DEFAULT NULL,
`RES_STATE_HSE` varchar(255) DEFAULT NULL,
`POSTAL_CDE_HSE` varchar(255) DEFAULT NULL,
`ZIP` varchar(100) DEFAULT NULL,
`COUNT_SUBS` double DEFAULT NULL,
`REV` double DEFAULT NULL,
`TYPE` varchar(255) DEFAULT NULL,
`lat` varchar(100) DEFAULT NULL,
`long` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx` (`LS_CHG_DTE_OCR`,`CHANNEL`,`CUST_TYPE`,`LINE_BUS`,`RES_CITY_HSE`,`RES_STATE_HSE`,`POSTAL_CDE_HSE`,`ZIP`,`COUNT_SUBS`,`TYPE`)
) ENGINE=InnoDB AUTO_INCREMENT=402342 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
Using where; Using temporary; Using filesort[enter image description here][1]
The only condition you apply is LS_CHG_DTE_OCR != "". Other than that you are doing a full table scan because of the aggregations. Index wise you can't do much here.
I ran into the same problem. I had fully optimized my queries (I had joins and more conditions) but the table kept growing and with it query time. Finally I decided to mirror the data to ElasticSearch. In my case it cut down query time to about 1/20th to 1/100th (for different queries).
The only possible index for that SELECT is INDEX(LS_CHG_DTE_OCR). But it is unlikely for it to be used.
Perform the WHERE -- If there are a lot of '' values, then the index may be used for filtering.
GROUP BY MONTH(...) -- You might be folding the same month from multiple years. The Optimizer can't tell, so it will punt on using the index.
ORDER BY LS_CHG_DTE_OCR -- This is done after the GROUP BY; the ORDER BY can't be performed until the data is gathered -- too late for any index. However, if multiple years are folded together, you could get some strange results. Cure it by making the ORDER BY be the same as the GROUP BY. This will also prevent an extra sort that is caused by the GROUP BY and ORDER BY being different.
Yeah, if that idx you added has all the columns in the SELECT, then it is a "covering index". But it won't help any because of the comments above. "Using index" won't help a lot.
GROUP BY LS_CHG_DTE_OCR/RES_STATE_HSE -- Eh? Divide a DATETIME by a VARCHAR? That sounds like a disaster.
This table will grow even bigger over time, correct? Consider building and maintaining Summary Table(s) with month as part of the PRIMARY KEY.

Why is this query being logged as "not using indexes"?

For some reason my slow query log is reporting the following query as "not using indexes" and for the life of me I cannot understand why.
Here is the query:
update scheduletask
set active = 0
where nextrun < date_sub( now(), interval 2 minute )
and enabled = 1
and active = 1;
Here is the table:
CREATE TABLE `scheduletask` (
`scheduletaskid` int(11) NOT NULL AUTO_INCREMENT,
`schedulethreadid` int(11) NOT NULL,
`taskname` varchar(50) NOT NULL,
`taskpath` varchar(100) NOT NULL,
`tasknote` text,
`recur` int(11) NOT NULL,
`taskinterval` int(11) NOT NULL,
`lastrunstart` datetime NOT NULL,
`lastruncomplete` datetime NOT NULL,
`nextrun` datetime NOT NULL,
`active` int(11) NOT NULL,
`enabled` int(11) NOT NULL,
`creatorid` int(11) NOT NULL,
`editorid` int(11) NOT NULL,
`created` datetime NOT NULL,
`edited` datetime NOT NULL,
PRIMARY KEY (`scheduletaskid`),
UNIQUE KEY `Name` (`taskname`),
KEY `IDX_NEXTRUN` (`nextrun`)
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=latin1;
Add another index like this
KEY `IDX_COMB` (`nextrun`, `enabled`, `active`)
I'm not sure how many rows your table have but the following might apply as well
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.)
try using the "explain" command in mysql.
http://dev.mysql.com/doc/refman/5.5/en/explain.html
I think explain only works on select statements, try:
explain select * from scheduletask where nextrun < date_sub( now(), interval 2 minute ) and enabled = 1 and active = 1;
Maybe if you use, nextrun = ..., it will macht the key IDX_NEXTRUN. In your where clause has to be one of your keys, scheduletaskid, taskname or nextrun
Sorry for the short answer but I don't have time to write a complete solution.
I believe you can fix your issue by saving date_sub( now(), interval 2 minute ) in a temporary variable before using it in the query, see here maybe: MySql How to set a local variable in an update statement (Syntax?).

mysql select distinct date takes FOREVER on database w/ 374 million rows

I have a MYSQL DB with table definition like this:
CREATE TABLE `minute_data` (
`date` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`open` decimal(10,2) DEFAULT NULL,
`high` decimal(10,2) DEFAULT NULL,
`low` decimal(10,2) DEFAULT NULL,
`close` decimal(10,2) DEFAULT NULL,
`volume` decimal(10,2) DEFAULT NULL,
`adj_close` varchar(45) DEFAULT NULL,
`symbol` varchar(10) NOT NULL DEFAULT '',
PRIMARY KEY (`symbol`,`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
It stores 1 minute data points from the stock market. The primary key is a combination of the symbol and date columns. This way I always have only 1 data point for each symbol at any time.
I am wondering why the following query takes so long that I can't even wait for it to finish:
select distinct date from test.minute_data where date >= "2013-01-01"
order by date asc limit 100;
However I can select count(*) from minute_data; and that finishes very quickly.
I know that it must have something to do with the fact that there are over 374 million rows of data in the table, and my desktop computer is pretty far from a super computer.
Does anyone know something I can try to speed up with query? Do I need to abandon all hope of using a MySQL table this big??
Thanks a lot!
When you have a composite index on 2 columns, like your (symbol, date) primary key, searching and grouping by a prefix of they key will be fast. But searching for something that doesn't include the first column in the index requires scanning all rows or using some other index.
You can either change your primary key to (date, symbol) if you don't usually need to search for symbol without date. Or you can add an additional index on date:
alter table minute_data add index (date)

Ordering in MySQL Bogs Down

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing