Calculate the average date difference - mysql

This is the essential setup of the table (only the DDL for relevant columns is present). MySQL version 8.0.15
The intent is to show an average of date difference interval between orders.
CREATE TABLE final (
prim_id INT(11) NOT NULL AUTO_INCREMENT,
order_ID INT(11) NOT NULL,
cust_ID VARCHAR(45) NOT NULL,
created_at DATETIME NOT NULL,
item_name VARCHAR(255) NOT NULL,
cust_name VARCHAR(255) NOT NULL,
PRIMARY KEY (prim_id),
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=145699
Additional information:
cust ID -> cust_name (one-to-many)
cust_ID -> order_ID (one-to-many)
order ID -> item_name (one-to-many)
order ID -> created_at (one-to-one)
prim_id -> *everything* (one-to-many)
I've thought of using min(created_at) and max(created_at) but that will exclude all the orders between oldest and newest. I need a more refined solution.
The end result should be like this:
Information about average time intervals between all orders, (not just min and max because there are quite often times, more than two) measured in days, next to a column showing the client's name (cust_name).

If I get this right you might use a subquery getting the date of the previous order. Use datediff() to get the difference between the dates and avg() to get the average of that differences.
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM final f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM final f1
GROUP BY f1.cust_id;
Edit:
If there can be more rows for one order ID, as KIKO Software mentioned we need to do the SELECT from the distinct set of orders like:
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM (SELECT DISTINCT f3.cust_id,
f3.created_at,
f3.order_id
FROM final f3) f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM (SELECT DISTINCT f3.cust_id,
f3.created_at,
f3.order_id
FROM final f3) f1
GROUP BY f1.cust_id;
This may fail if there can be two rows for an order with different customer IDs or different creation time stamps. But in that case the data is just complete garbage and needs to be corrected before anything else.
2nd Edit:
Or alternatively getting the maximum creation timestamp per order if these can differ:
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM (SELECT max(f3.cust_id) cust_id,
max(f3.created_at) created_at,
f3.order_id
FROM final f3
GROUP BY f3.order_id) f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM (SELECT max(f3.cust_id) cust_id,
max(f3.created_at) created_at,
f3.order_id
FROM final f3
GROUP BY f3.order_id) f1
GROUP BY f1.cust_id;

Related

Deleting rows using a limit and offset without using IN clause

I want to delete rows with an offset, so I am forced to use a nested query since its not support in the raw DELETE clause.
I know this would worked (ID is the primary key):
DELETE FROM customers
WHERE ID IN (
SELECT ID
FROM customers
WHERE register_date > '2012-01-01'
ORDER BY register_date ASC
LIMIT 5, 10
);
However, this is unsupported in my version as I get the error
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'.
Server version: 10.4.22-MariaDB
What can I do to achieve the same result as above that is supported in my version.
CREATE TABLE customers (
ID INT PRIMARY KEY AUTO_INCREMENT,
NAME VARCHAR(32) NOT NULL,
REGISTER_DATE DATETIME NOT NULL
);
Join the table to a subquery that uses ROW_NUMBER() window function to sort the rows and filter the rows that you want to be deleted:
DELETE c
FROM customers c
INNER JOIN (
SELECT *, ROW_NUMBER() OVER (ORDER BY register_date) rn
FROM customers
WHERE register_date > '2012-01-01'
) t ON t.ID = c.ID
WHERE t.rn > 5 AND t.rn <= 15; -- get 10 rows with row numbers 6 - 15
See the demo.
If I did not miss something a simple delete with join will do the job...
delete customers
from (select *
from customers
WHERE register_date > '2012-01-01'
order by register_date asc
limit 5, 2) customers2
join customers on customers.id = customers2.id
Here is a demo for your version of MariaDB
You could try assigning a rank to your rows with the ROW_NUMBER window function, then catch those rows whose rank position is between 5 and 15.
DELETE FROM customers
WHERE ID IN (
SELECT *
FROM (SELECT ID,
ROW_NUMBER() OVER(
ORDER BY IF(register_date>'2012-01-01', 0, 1)
register_date ) AS rn
FROM customers) ranked_ids
WHERE rn > 4
AND rn < 16
);
This would safely avoid the use of LIMIT, though achieves the same result.
EDIT. Doing it with a join.
DELETE FROM customers c
INNER JOIN (SELECT ID,
ROW_NUMBER() OVER(
ORDER BY IF(register_date>'2012-01-01', 0, 1)
register_date ) AS rn
FROM customers) ranked_ids
WHERE
) ids_to_delete
ON c.ID = ids_to_delete.ID
AND ids_to_delete.rn > 4
AND ids_to_delete.rn < 16

Get Data According to Group by date field

Here is my table
Which have field type which means 1 is for income and 2 is for expense
Now requirement is for example in table there is two transaction made on 2-10-2018 so i want data as following
Expected Output
id created_date total_amount
1 1-10-18 10
2 2-10-18 20(It calculates all only income transaction made on 2nd date)
3 3-10-18 10
and so on...
it will return an new field which contains only incom transaction made on perticulur day
What i had try is
SELECT * FROM `transaction`WHERE type = 1 ORDER BY created_date ASC
UNION
SELECT()
//But it wont work
SELECT created_date,amount,status FROM
(
SELECT COUNT(amount) AS totalTrans FROM transaction WHERE created_date = created_date
) x
transaction
You can Also See Schema HERE http://sqlfiddle.com/#!9/6983b9
You can Count() the total number of expense transactions using conditional function If(), on a group of created_date.
Similarly, you can Sum() the amount of expense done using If(), on a created_date.
Try the following:
SELECT
`created_date`,
SUM(IF (`type` = 2, `amount`, 0)) AS total_expense_amount,
COUNT(IF (`type` = 2, `id`, NULL)) AS expense_count
FROM
`transaction`
GROUP BY `created_date`
ORDER BY `created_date` ASC
Do you just want a WHERE clause?
SELECT t.created_date, SUM(amount) as total_amount
FROM transaction t
WHERE type = 2
GROUP BY t.created_date
ORDER BY created_date ASC ;

How to speed up a very slow MySQL query?

I have a very slow MySQL syntax which is basically unusable since the table has grown to over 5000 entries. It takes more than 30 sec so the server sends error code and quits.
The syntax is:
SELECT
id,
user_id,
date
FROM
table
WHERE
id IN (
SELECT
MAX(id)
FROM
table
GROUP BY date
)
AND
company_id = '1'
AND
date > '1473700785'
AND
complete = '1'
AND
name = "random string"
ORDER BY id ASC
Structure:
id - int(11)
user_id - int(10)
company_id - int(11)
date - varchar(20)
complete - varchar(2)
name - varchar(75)
Do you have any idea what could be slowing it? It used to function as expected with a much smaller table size (under 1000 entries).
Apart from subquery (like below), the best method is indexing. Like what most people here suggested
SELECT id, user_id, date
FROM table min
--sub queries sometimes run faster than IN / NOT IN
JOIN (
SELECT SELECT MAX(id)
FROM table
GROUP BY date
)
max on max.id = min.id
WHERE min.company_id = '1'
AND min.date > '1473700785'
AND min.complete = '1'
AND min.name = "random string"
ORDER BY min.id ASC
At first you need index for date field.
And you need store date as integer, because you use this expression
date > '1473700785'
Indexing is good, but I don't see the need for a SUB-SELECT
SELECT
MAX(t.id) as id,
u.user_id,
t.date
FROM table t
JOIN table u ON u.id=MAX(t.id )
WHERE
t.company_id = '1'
AND
t.date > '1473700785'
AND
t. complete = '1'
AND
t.name = "random string"
GROUP BY t.date
ORDER BY t.id ASC

Get rid of the subqueries for the sake of sorting grouped data

Tables
CREATE TABLE `aircrafts_in` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city_from` int(11) NOT NULL COMMENT 'Откуда',
`city_to` int(11) NOT NULL COMMENT 'Куда',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 COMMENT='Самолёты по направлениям'
CREATE TABLE `aircrafts_in_parsed_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`price` int(11) NOT NULL COMMENT 'Ценник',
`airline` varchar(255) NOT NULL COMMENT 'Авиакомпания',
`date` date NOT NULL COMMENT 'Дата вылета',
`info_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `info_id` (`info_id`),
KEY `price` (`price`),
KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=940682 DEFAULT CHARSET=utf8
date - departure date
CREATE TABLE `aircrafts_in_parsed_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('success','error') DEFAULT NULL,
`type` enum('roundtrip','oneway') NOT NULL,
`date` datetime NOT NULL COMMENT 'Дата парсинга',
`aircrafts_in_id` int(11) DEFAULT NULL COMMENT 'ID направления',
PRIMARY KEY (`id`),
KEY `aircrafts_in_id` (`aircrafts_in_id`)
) ENGINE=InnoDB AUTO_INCREMENT=577759 DEFAULT CHARSET=utf8
date - created date, when was parsed
Task
Get lowest price of ticket and date of departure for each month. Be aware that the minimum price is relevant, not just the minimum. If multiple dates with minimum cost, we need a first.
My solution
I think that there's something not quite right.
I don't like subqueries for grouping, how to solve this problem
select *
from (
select * from (
select airline,
price,
pdata.`date` as `date`
from aircrafts_in_parsed_data `pdata`
inner join aircrafts_in_parsed_info `pinfo`
on pdata.`info_id` = pinfo.`id`
where pinfo.`aircrafts_in_id` = {$id}
and pinfo.status = 'success'
and pinfo.`type` = 'roundtrip'
and `price` <> 0
group by pdata.`date`, year(pinfo.`date`) desc, month(pinfo.`date`) desc, day(pinfo.`date`) desc
) base
group by `date`
order by price, year(`date`) desc, month(`date`) desc, day(`date`) asc
) minpriceperdate
group by year(`date`) desc, month(`date`) desc
Takes 0.015 s without cache, table size can view in auto increment
SELECT MIN(price) AS min_price,
LEFT(date, 7) AS yyyy_mm
FROM aircrafts_in_parsed_data
GROUP BY LEFT(date, 7)
will get the lowest price for each month. But it can't say 'first'.
From my groupwise-max cheat-sheet, I derive this:
SELECT
yyyy_mm, date, price, airline -- The desired columns
FROM
( SELECT #prev := '' ) init
JOIN
( SELECT LEFT(date, 7) != #prev AS first,
#prev := LEFT(date, 7)
LEFT(date, 7) AS yyyy_mm, date, price, airline
FROM aircrafts_in_parsed_data
ORDER BY
LEFT(date, 7), -- The 'GROUP BY'
price ASC, -- ASC to do "MIN()"
date -- To get the 'first' if there are dup prices for a month
) x
WHERE first -- extract only the first of the lowest price for each month
ORDER BY yyyy_mm; -- Whatever you like
Sorry, but subqueries are necessary. (I avoided YEAR(), MONTH(), and DAY().)
You are right, your query is not correct.
Let's start with the innermost query: You group by pdata.date + pinfo.date, so you get one result row per date combination. As you don't specify which price or airline you are interested in for each date combination (such as MAX(airline) and MIN(price)), you get one airline arbitrarily chosen for a date combination and one price also arbitrarily chosen. These don't even have to belong to the same record in the table; the DBMS is free to chose one airline and one price matching the dates. Well, maybe the date combination of pdata.date and pinfo.date is already unique, but then you wouldn't have to group by at all. So however we look at this, this isn't proper.
In the next query you group by pdata.date only, thus again getting arbitrary matches for airline and price. You could have done that in the innermost query already. It makes no sense to say: "give me a randomly picked price per pdata.date and pinfo.date and from these give me a randomly picked price per pdata.date", you could just as well say it directly: "give me a randomly picked price per pdata.date". Then you order your result rows. This is completely useless, as you are using the results as a subquery (derived table) again, and such is considered an unordered set. So the ORDER BY gives the DBMS more work to do, but is in no way guaranteed to influence the main queries results.
In your main query then you group by year and month, again resulting in arbitrarily picked values.
Here is the same query a tad shorter and cleaner:
select
pdata.airline, -- some arbitrily chosen airline matching year and month
pdata.price, -- some arbitrily chosen price matching year and month
pdata.date -- some arbitrily chosen date matching year and month
from aircrafts_in_parsed_data pdata
inner join aircrafts_in_parsed_info pinfo on pdata.info_id = pinfo.id
where pinfo.aircrafts_in_id = {$id}
and pinfo.status = 'success'
and pinfo.type = 'roundtrip'
and pdata.price <> 0
group by year(pdata.date), month(pdata.date)
order by year(pdata.date) desc, month(pdata.date) desc
As to the original task (as far as I understand it): Find the records with the lowest price per month. Per month means GROUP BY month. The lowest price is MIN(price).
select
min_price_record.departure_year,
min_price_record.departure_month,
min_price_record.min_price,
full_record.departure_date,
full_record.airline
from
(
select
year(`date`) as departure_year,
month(`date`) as departure_month,
min(price) as min_price
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
group by year(`date`), month(`date`)
) min_price_record
join
(
select
`date` as departure_date,
year(`date`) as departure_year,
month(`date`) as departure_month,
price,
airline
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
) full_record on full_record.departure_year = min_price_record.departure_year
and full_record.departure_month = min_price_record.departure_month
and full_record.price = min_price_record.min_price
order by
min_price_record.departure_year desc,
min_price_record.departure_month desc;

MYSQL Query : How to get values per category?

I have huge table with millions of records that store stock values by timestamp. Structure is as below:
Stock, timestamp, value
goog,1112345,200.4
goog,112346,220.4
Apple,112343,505
Apple,112346,550
I would like to query this table by timestamp. If the timestamp matches,all corresponding stock records should be returned, if there is no record for a stock for that timestamp, the immediate previous one should be returned. In the above ex, if I query by timestamp=1112345 then the query should return 2 records:
goog,1112345,200.4
Apple,112343,505 (immediate previous record)
I have tried several different ways to write this query but no success & Im sure I'm missing something. Can someone help please.
SELECT `Stock`, `timestamp`, `value`
FROM `myTable`
WHERE `timestamp` = 1112345
UNION ALL
SELECT `Stock`, `timestamp`, `value`
FROM `myTable`
WHERE `timestamp` < 1112345
ORDER BY `timestamp` DESC
LIMIT 1
select Stock, timestamp, value from thisTbl where timestamp = ? and fill in timestamp to whatever it should be? Your demo query is available on this fiddle
I don't think there is an easy way to do this query. Here is one approach:
select tprev.*
from (select t.stock,
(select timestamp from t.stock = s.stock and timestamp <= <whatever> order by timestamp limit 1
) as prevtimestamp
from (select distinct stock
from t
) s
) s join
t tprev
on s.prevtimestamp = tprev.prevtimestamp and s.stock = t.stock
This is getting the previous or equal timestamp for the record and then joining it back in. If you have indexes on (stock, timestamp) then this may be rather fast.
Another phrasing of it uses group by:
select tprev.*
from (select t.stock,
max(timestamp) as prevtimestamp
from t
where timestamp <= YOURTIMESTAMP
group by t.stock
) s join
t tprev
on s.prevtimestamp = tprev.prevtimestamp and s.stock = t.stock