MySQL - Select unique column with max timestamp - mysql

I'm trying to prepare a query and I'm having a hard time with it. I need some MySQL gurus to help please...
Take the following table as an example...
CREATE TABLE order_revision (
id int(11) NOT NULL,
parent_order_id int(11) NOT NULL,
user_id int(11) DEFAULT NULL,
sub_total decimal(19,4) NOT NULL DEFAULT '0.0000',
tax_total decimal(19,4) NOT NULL DEFAULT '0.0000',
status smallint(6) NOT NULL DEFAULT '1',
created_at int(11) NOT NULL,
updated_at int(11) DEFAULT NULL
)
I need a query to select all unique 'parent_order_id' with the max 'updated_at' value. This query should return all rows that have unique 'parent_order_id's based on the max timestamp of the 'updated_at' column.
In other words, each row returned should have an unique 'parent_order_id' and be the maximum timestamp of the'updated_at' column.
Basically this query would find the latest "order revision" for each "parent order"

You mean:
SELECT parent_order_id,max(updated_at) FROM order_revision GROUP BY parent_order_id
For MySQL, the GROUP BY-clause isn't even necessary, nevertheless I would include it for clarification (and most other SQL-conform servers require it).

For anyone interested, this query turned out to be the one I was looking for...
SELECT main.*
FROM order_revision AS main
WHERE main.id = (
SELECT sub.id
FROM order_revision AS sub
WHERE main.parent_order_id = sub.parent_order_id
ORDER BY sub.updated_at DESC
LIMIT 1
);

Related

Speed up mysql SQL query but with a huge dataset

I have a table that has over 2.5 million rows and I would like to run the following SQL Statment to get the
select count(*)
from workflow
where action_name= 'Workflow'
and release_date >= '2019-12-01 13:24:22'
and release_date <= '2019-12-31 13:24:22'
AND project_name= 'Web'
group
by page_id
, headline
, release_full_name
, release_date
The problem is that it takes over 2.7 seconds to return 0 rows as expected. Is there a way to speed it up more? I have 6 more SQL Statements that are similiar so that will take almost (2.7 seconds * 6) = 17 seconds at least.
Here is my table schema
CREATE TABLE workflow (
id int(11) NOT NULL AUTO_INCREMENT,
action_name varchar(100) NOT NULL,
project_name varchar(30) NOT NULL,
page_id int(11) NOT NULL,
headline varchar(200) NOT NULL,
create_full_name varchar(200) NOT NULL,
create_date datetime NOT NULL,
change_full_name varchar(200) NOT NULL,
change_date datetime NOT NULL,
release_full_name varchar(200) NOT NULL,
release_date datetime NOT NULL,
reject_full_name varchar(200) NOT NULL,
reject_date datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2948271 DEFAULT CHARSET=latin1
What I'm looking for in this query is to get the count of the pages that were released last month. that have project_name = "web" and action_name = "Workflow"
This is bit bigger for comments
Using Group by with Count function doesn't make any sense. Usually you need to count actual rows in DB not after aggregation. Not sure if this is your actual requirement reason being GROUP BY causes slowness of the query.
Use composite Index on (Web, start_date) as column project seems highest selective.
For other information, Please share the explain plan.
Assuming that you need counts for groups (you had listed), better to include the group fields in select (essentially) like
select page_id, headline, release_full_name, release_date, count(*)
from ...
Adding an index with (page_id, headline) would optimize well.

Improve query speed suggestions

For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.
"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.

Connecting two tables and having count, distinct, and average of time difference

First, I apologize if my question is not correctly organized.
I am trying to run an SQL Query in Java in order to return all the records of time difference. So to explain more:
I have two tables. Table A has the following structure:
Table `A` (
`interaction_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`job_id` int(11) NOT NULL,
`task_id` varchar(250) NOT NULL,
`task_time` datetime DEFAULT NULL,
`task_assessment` float DEFAULT NULL,
)
Table `B` (
`task_id` varchar(250) NOT NULL,
`task_type` varchar(250) DEFAULT NULL,
`task_weight` float DEFAULT NULL,
`task_due` datetime DEFAULT NULL,
`Job_id` int(11) NOT NULL
)
what I need is to get the count(distinct) from table A -and I do that using the interaction_id
and then get their times -using the task_time for each user and i use "WHERE user_id='" + userId (a java parameter).
After that I want to link Table A with Table B using Job_id
so that I can get the difference date (in hour, so i used SELECT TIMEDIFF(Hour, A(task_time), B(task_due)).
Finally, i need to get Average of the time difference.
I believe its a bit complicated when describing. But, I would appreciate your advanced help!
Thank you very much
This query should gather the results that you are expecting:
select count(*) as countLines,
avg(time_to_sec(timediff(A.task_time, B.task_due)) / 3600)
from A
inner join B on A.job_id = B.job_id
where A.user_id = #userId

MySQL group by desc -how to explain this sql: SELECT ts,avg_time FROM xx WHERE ip="MAX" GROUP BY id DESC LIMIT 300?

table is like:
CREATE TABLE `api_stats` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip` varchar(30) DEFAULT NULL,
`app_name` varchar(50) DEFAULT NULL,
`api_name` varchar(100) DEFAULT NULL,
`avg_time` float(10,5) DEFAULT NULL,
`ok` int(10) DEFAULT NULL,
`err` int(10) DEFAULT NULL,
`ts` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6741231 DEFAULT CHARSET=latin1
this weird sql is:
SELECT ts,avg_time FROM api_stats WHERE ip="MAX" GROUP BY id DESC LIMIT 300
seems wrong, but it runs...
my questions:
select columns is not in aggregate function like sum, count
group by id desc is what?
The query is a bit dumb.
It selects all records with ip = 'MAX' (yes, single quotes should be used for string literals).
It groups the results per ID, which doesn't change anything because this is the primary key.
It limits the results to 300 arbitrary rows, as there is no ORDER BY clause. (In older versions GROUP BY was guaranteed to also sort in MySQL, which is why the DESC keyword is allowed for GROUP BY, which otherwise wouldn't make any sense. So this may have been valid one time and should nowadays read GROUP BY id ORDER BY id DESC LIMIT 300 instead.)
It shows non-aggregated ts, avg_time, but well, as mentioned no aggregation takes ever place in this query anyway.
Maybe it's just a typo and GROUP BY id was meant to be ORDER BY id really, which would make the query perfectly valid (aside from the non-standard quotes that are valid in MySQL however).

MySQL select optimization

A table with a few Million rows, something like this:
my_table (
`CONTVISITID` bigint(20) NOT NULL AUTO_INCREMENT,
`NODE_ID` bigint(20) DEFAULT NULL,
`CONT_ID` bigint(20) DEFAULT NULL,
`NODE_NAME` varchar(50) DEFAULT NULL,
`CONT_NAME` varchar(100) DEFAULT NULL,
`CREATE_TIME` datetime DEFAULT NULL,
`HITS` bigint(20) DEFAULT NULL,
`UPDATE_TIME` datetime DEFAULT NULL,
`CLIENT_TYPE` varchar(20) DEFAULT NULL,
`TYPE` bigint(1) DEFAULT NULL,
`PLAY_TIMES` bigint(20) DEFAULT NULL,
`FIRST_PUBLISH_TIME` bigint(20) DEFAULT NULL,
PRIMARY KEY (`CONTVISITID`),
KEY `cont_visit_contid` (`CONT_ID`),
KEY `cont_visit_createtime` (`CREATE_TIME`),
KEY `cont_visit_publishtime` (`FIRST_PUBLISH_TIME`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=57676834 DEFAULT CHARSET=utf8
I had a query that I have managed to optimize to the following departing from a flat select:
SELECT a.cont_id, SUM(a.hits)
FROM (
SELECT cont_id,hits,type,first_publish_time
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time>1398310263000
AND type=1) as a group by a.cont_id
order by sum(HITS) DESC LIMIT 10;
Can this be further optimized?
Edit:
I started with a FLAT select like I mentioned before, what I mean by flat select not to have a composite select like my current one. Instead of the single select that someone responded with. A single select is twice slower, so not viable in my case.
Edit2: I have a DBA friend who suggested me to change the query to this:
SELECT a.cont_id, SUM(a.hits)
FROM (
SELECT cont_id,hits
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time>1398310263000
AND type=1) as a group by a.cont_id
order by sum(HITS) DESC LIMIT 10;
As I do not need the fields extra (type,first_publish_time) and the TMP table is smaller, this makes the query faster about about 1/4 total time of the fastest version I have. He also suggested to add a composite index between (create_time, cont_id, hits). He says with this index I will get really good performance, but I have not done that as this is a production DB and the alter might affect replication. I will post results once done.
INDEX(type, first_publish_time)
INDEX(type, create_time)
Then do
SELECT cont_id, SUM(hits) AS tot_hits
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time > 1398310263000
AND type = 1
group by cont_id
order by tot_hits DESC
LIMIT 10;
Start the index with any = filters (type, in this case); then you get one chance to us a range.
The reason for 2 indexes -- The Optimizer will look at statistics and decide which look better based on the values given.
Consider shrinking the BIGINTs (8 bytes) to some smaller INT type. Saving space will help speed, especially if the table is too big to be cached.
For further discussion, please provide EXPLAIN SELECT ...;.