optimize mysql query with huge data using indexing - mysql

every minute the table size is increasing with 3 columns TimeInsert(timestamp), errorMsg (varchar 50) as 'Success' , 'Failure' and amount (int) . When i run the below query to generate reports of last 3 days , the below query takes a lot of time , how can i make it run fast , using multi column indexes , shall i use indexing for the 2 columns , timeinsert and errorMsg ? the table is myisam
select sum(amount) from sdpcallbackairtel where TimeInsert >= NOW() - INTERVAL 3 DAY and errorMsg ='Success' and hour(timeinsert)=16 ;
CREATE TABLE tablename (
`Id` int(10) NOT NULL DEFAULT '0',
`errorMsg` varchar(10) DEFAULT NULL,
`amount` int(5) DEFAULT NULL,
`TimeInsert` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Related

How I can optimize query with where index?

I have query
select `price`, `asset_id`
from `history_average_pairs`
where `currency_id` = 1
and date(`created_at`) >= DATE_SUB(NOW(), INTERVAL 7 DAY)
group by hour(created_at), date(created_at), asset_id
order by `created_at` asc
And table
CREATE TABLE IF NOT EXISTS history_average_pairs (
id bigint(20) unsigned NOT NULL,
asset_id bigint(20) unsigned NOT NULL,
currency_id bigint(20) unsigned NOT NULL,
market_cap bigint(20) NOT NULL,
price double(20,6) NOT NULL,
volume bigint(20) NOT NULL,
circulating bigint(20) NOT NULL,
change_1h double(8,2) NOT NULL,
change_24h double(8,2) NOT NULL,
change_7d double(8,2) NOT NULL,
created_at timestamp NOT NULL DEFAULT current_timestamp(),
updated_at timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
total_supply bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE history_average_pairs
ADD PRIMARY KEY (id),
ADD KEY history_average_pairs_currency_id_asset_id_foreign (currency_id,asset_id),
ALTER TABLE history_average_pairs
MODIFY id bigint(20) unsigned NOT NULL AUTO_INCREMENT;
It contains more than 10 000 000 rows, and query takes
Showing rows 0 - 24 (32584 total, Query took 27.8344 seconds.)
But without currency_id = 1, it takes like 4 sec.
UPDATE 1
Okey, I updated key from currency_id, asset_id to currency_id, asset_id, created_at and it takes
Showing rows 0 - 24 (32784 total, Query took 6.4831 seconds.)
Its much faster, any proposal to do it more faster?
GROUP BY here to take only first row for every hour.
For example:
19:01:10
19:02:14
19:23:15
I need only 19:01:10
You can rephrase the filtering predicate to avoid using expressions on columns. For example:
select max(`price`) as max_price, `asset_id`
from `history_average_pairs`
where `currency_id` = 1
and created_at >= date_add(curdate(), interval - 7 day)
group by hour(created_at), date(created_at), asset_id
order by `created_at` asc
Then, this query could be much faster if you added the index:
create index ix1 on `history_average_pairs` (`currency_id`, created_at);
You must make the test "sargeable"; change
date(`created_at`) >= DATE_SUB(NOW(), INTERVAL 7 DAY)
to
created_at >= CURDATE() - INTERVAL 7 DAY
Then the optimal index is
INDEX(currency_id, -- 1st because of "=" test
created_at, -- 2nd to finish out WHERE
asset_id) -- only for "covering"
When designing an index, it is usually best to handle the WHERE first.
The GROUP BY cannot use the index. Did you really want the hour first?
"I need only 19:01:10" is unclear, so I have not factored that in. Where's the date? Where's the asset_id? See "only_full_group_by". Do you need "groupwise max"?
Making the ORDER BY have the same columns as the GROUP BY avoids a sort. (In your query, the order may be slightly different, but it probably does not matter.)
Datatype issues...
BIGINT takes 8 bytes; INT takes only 4 bytes and is usually big enough. Shrinking the table provides some speed.
double(8,2) takes 8 bytes -- Don't use (m,n) on FLOAT or DOUBLE; it adds an extra rounding. Perhaps you meant DECIMAL(8,2), which takes 4 bytes.

How to design Cassandra Scheme for User Actions Log?

I have a table like this in MYSQL to log user actions :
CREATE TABLE `actions` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`module` VARCHAR(32) NOT NULL,
`controller` VARCHAR(64) NOT NULL,
`action` VARCHAR(64) NOT NULL,
`date` Timestamp NOT NULL,
`userid` BIGINT(20) NOT NULL,
`ip` VARCHAR(32) NOT NULL,
`duration` DOUBLE NOT NULL,
PRIMARY KEY (`id`),
)
COLLATE='utf8mb4_general_ci'
ENGINE=MyISAM
AUTO_INCREMENT=1
I have a MYSQL Query Like this to find out count of specific actions per day :
SELECT COUNT(*) FROM actions WHERE actions.action = "join" AND
YEAR(date)=2017 AND MONTH(date)=06 GROUP BY YEAR(date), MONTH(date),
DAY(date)
this takes 50 - 60 second to me to have a list of days with count of "join" action with only 5 million rows and index in date and action.
So, I want to log actions using Cassandra, so How can I design Cassandra scheme and How to query to get such request less than 1 second.
CREATE TABLE actions (
id timeuuid,
module varchar,
controller varchar,
action varchar,
date_time timestamp,
userid bigint,
ip varchar,
duration double,
year int,
month int,
dt date,
PRIMARY KEY ((action,year,month),dt,id)
);
Explanation:
With abobe table Defination
SELECT COUNT(*) FROM actions WHERE actions.action = "join" AND yaer=2017 AND month=06 GROUP BY action,year,month,dt
will hit single partition.
In dt column only date will be there... may be you can change it to only day number with int as datatype and since id is timeuuid.. it will be unique.
Note: GROUP BY is supported by cassandra 3.10 and above

Performance challenge on MySQL RDS with simultaneous queries

I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.
So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).
It seems that the database is receiving the queries and make a queue of them.
I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.
TableA (13M records) schema:
CREATE TABLE `TableA` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(20) DEFAULT NULL,
`columnC` varchar(15) DEFAULT NULL,
`columnD` varchar(20) DEFAULT NULL,
`columnE` varchar(255) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
`columnH` varchar(10) DEFAULT NULL,
`columnI` bigint(11) DEFAULT NULL,
`columnJ` bigint(11) DEFAULT NULL,
`columnK` varchar(5) DEFAULT NULL,
`columnL` varchar(50) DEFAULT NULL,
`columnM` varchar(20) DEFAULT NULL,
`columnN` int(1) DEFAULT NULL,
`columnO` int(1) DEFAULT '0',
`columnP` datetime NOT NULL,
`columnQ` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnO` (`columnO`),
KEY `columnK` (`columnK`),
KEY `columnN` (`columnN`),
FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;
TableB (15M records) schema:
CREATE TABLE `TableB` (
`columnA` int(11) NOT NULL AUTO_INCREMENT,
`columnB` varchar(50) DEFAULT NULL,
`columnC` varchar(50) DEFAULT NULL,
`columnD` int(1) DEFAULT NULL,
`columnE` datetime NOT NULL,
`columnF` datetime NOT NULL,
PRIMARY KEY (`columnA`),
KEY `columnB` (`columnB`),
KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;
Query:
SELECT COUNT(*) AS total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "3764301"
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
)
AND columnM > 2;
1 execution return in 2s
10 executions return the first result in 20s and the another results after that time.
To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".
It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.
UPDATE
Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.
-- USING IN
SELECT COUNT(*) as total
FROM TableA
WHERE TableA.columnB IN (
SELECT TableB.columnC
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableB.columnC NOT IN (
SELECT field
FROM tableX
WHERE fieldX = 15
)
)
AND columnM > 2;
-- USING EXISTS
SELECT COUNT(*) as total
FROM TableA
WHERE EXISTS (
SELECT *
FROM TableB
WHERE TableB.columnB = "103550181"
AND TableA.columnB = TableB.columnC
AND NOT EXISTS (
SELECT *
FROM tableX
WHERE fieldX = 15
AND fieldY = TableB.columnC
)
)
AND columnM > 2;
-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)
Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time.
Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.
Recommendation 1
Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.
AND TableB.columnC NOT IN (
SELECT field
FROM table
WHERE table.field = 10
-->
AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )
table needs an index on field.
IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.
Recommendation 2
To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.
Recommendation 3
Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.
Recommendation 4
It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.
What I can see here is that HUGE temporary table is being build for each query. Consider different architecture.

Why is this query being logged as "not using indexes"?

For some reason my slow query log is reporting the following query as "not using indexes" and for the life of me I cannot understand why.
Here is the query:
update scheduletask
set active = 0
where nextrun < date_sub( now(), interval 2 minute )
and enabled = 1
and active = 1;
Here is the table:
CREATE TABLE `scheduletask` (
`scheduletaskid` int(11) NOT NULL AUTO_INCREMENT,
`schedulethreadid` int(11) NOT NULL,
`taskname` varchar(50) NOT NULL,
`taskpath` varchar(100) NOT NULL,
`tasknote` text,
`recur` int(11) NOT NULL,
`taskinterval` int(11) NOT NULL,
`lastrunstart` datetime NOT NULL,
`lastruncomplete` datetime NOT NULL,
`nextrun` datetime NOT NULL,
`active` int(11) NOT NULL,
`enabled` int(11) NOT NULL,
`creatorid` int(11) NOT NULL,
`editorid` int(11) NOT NULL,
`created` datetime NOT NULL,
`edited` datetime NOT NULL,
PRIMARY KEY (`scheduletaskid`),
UNIQUE KEY `Name` (`taskname`),
KEY `IDX_NEXTRUN` (`nextrun`)
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=latin1;
Add another index like this
KEY `IDX_COMB` (`nextrun`, `enabled`, `active`)
I'm not sure how many rows your table have but the following might apply as well
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.)
try using the "explain" command in mysql.
http://dev.mysql.com/doc/refman/5.5/en/explain.html
I think explain only works on select statements, try:
explain select * from scheduletask where nextrun < date_sub( now(), interval 2 minute ) and enabled = 1 and active = 1;
Maybe if you use, nextrun = ..., it will macht the key IDX_NEXTRUN. In your where clause has to be one of your keys, scheduletaskid, taskname or nextrun
Sorry for the short answer but I don't have time to write a complete solution.
I believe you can fix your issue by saving date_sub( now(), interval 2 minute ) in a temporary variable before using it in the query, see here maybe: MySql How to set a local variable in an update statement (Syntax?).

Speed up this MySQL query

I have a query which is getting slower and slower because there are more and more records in my table. So I'm trying to speed things up.
Database size:
Records: 1,200,000
Data 22,9 MiB
Index 46,8 MiB
Total 69,7 MiB
The purpose of the query is counting the number of records that exist that match the conditions. The conditions are a date (current date) and a status number. See query below:
SELECT
COUNT(id) AS total
FROM
order_process
WHERE
DATE(datetime) = CURDATE() AND
status = '7';
At the moment, this query is taking 800ms. And I need to run this query multiple times with different dates. These are all in the same script so script execution is going over the 3 seconds at the moment. How can I speed this up?
What have I already done:
Created indexes (Index on status and datetime both don't speed up the query).
Tested InnoDB engine (which is slower, mostly reading on this table)
To make it complete, below the current table setup.
CREATE TABLE IF NOT EXISTS `order_process` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`remark` text NOT NULL,
PRIMARY KEY (`id`),
KEY `orderid` (`order_id`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When you use date() function on a timestamp/datetime column and even if the column is indexed it can't use the index
So you need to construct the query as
where
datetime >= concat(CURDATE(),' 00:00:00')
and datetime <= concat(CURDATE(),' 23:59:59')
and status = '7'