How to add calculated column with LAG in SQL? - mysql

I have a MySQL table (version 8.0.26) with stock prices and want to calculate the log price change for future analysis. Here's my table and data.
CREATE TABLE `prices` (
`ticker` varchar(7) NOT NULL,
`date` datetime NOT NULL,
`price` double DEFAULT NULL,
PRIMARY KEY (`ticker`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
INSERT INTO `sandbox`.`prices` (`ticker`, `date`, `price`) VALUES ('A', '2021-01-01', '10');
INSERT INTO `sandbox`.`prices` (`ticker`, `date`, `price`) VALUES ('A', '2021-01-02', '10.1');
INSERT INTO `sandbox`.`prices` (`ticker`, `date`, `price`) VALUES ('A', '2021-01-03', '11');
INSERT INTO `sandbox`.`prices` (`ticker`, `date`, `price`) VALUES ('B', '2021-01-01', '50');
INSERT INTO `sandbox`.`prices` (`ticker`, `date`, `price`) VALUES ('B', '2021-01-02', '51.5');
INSERT INTO `sandbox`.`prices` (`ticker`, `date`, `price`) VALUES ('B', '2021-01-03', '49');
I can write this query but the column isn't saved.
SELECT *, LN(price / lag(price, 1) OVER (PARTITION BY ticker)) AS ln_open_return FROM sandbox.prices;
I put together this code from these answers but I'm still getting a "1064 syntax error: 'WITH' is not valid at this position. Expecting an expression."
ALTER TABLE sandbox.prices
ADD COLUMN ln_change DOUBLE AS (
WITH temp AS (
SELECT
*,
LAG(price, 1) OVER(PARTITION BY ticker ORDER BY date) AS prior
FROM sandbox.prices
)
SELECT
*,
COALESCE(LN(price / prior)) AS ln_change
FROM temp) PERSISTED;

Calculated columns (aka computed columns, aka generated columns) in a TABLE (as in CREATE TABLE or ALTER TABLE) cannot contain queries, they can only be expressions derived from other columns in the same row.
https://dev.mysql.com/doc/refman/8.0/en/create-table-generated-columns.html
Values of a generated column are computed from an expression included in the column definition.
Generated column expressions must adhere to the following rules
[...]
Subqueries are not permitted.
Instead, you can do this using a VIEW. Your application code or reports would then query the view (prices_with_delta), not the base table (prices):
CREATE VIEW sandbox.prices_with_delta AS
SELECT
p2.*,
COALESCE( LN( p2.price / p2.prior ) ) AS ln_change
FROM
(
SELECT
p.*,
LAG( p.price, 1 ) OVER( PARTITION BY p.ticker ORDER BY p.date ) AS prior
FROM
sandbox.prices AS p
) AS p2

Related

Find all records where between 2 dates have a certain value for each record

I have a table with vacation houses which have some availability (column value, value 1 means available ).
How can I find all houses (column unit_id) that are are available between 2 dates.
table
CREATE TABLE `houseavailability` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` varchar(100) DEFAULT NULL,
`value` varchar(100) DEFAULT NULL,
`unit_id` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `houseavailability_unit_id_IDX` (`unit_id`,`date`) USING BTREE,
KEY `houseavailability_unit_id_IDX_solo` (`unit_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=16648943 DEFAULT CHARSET=latin1;
test data
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814115, '2022-07-23', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814116, '2022-07-24', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814117, '2022-07-25', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814118, '2022-07-26', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814119, '2022-07-27', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814120, '2022-07-28', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814121, '2022-07-29', '1', '1004004');
INSERT INTO houseavailability
(id, `date`, value, unit_id)
VALUES(15814122, '2022-07-30', '0', '1004004');
attempt
SELECT houseavailability.*
FROM houseavailability
WHERE houseavailability.date BETWEEN '2022-07-23' AND '2022-07-30'
AND houseavailability.unit_id = 1004004;
http://sqlfiddle.com/#!9/094547/2
For example, find a unit_id which is available during the whole specified period.
SELECT unit_id
FROM houseavailability
WHERE date BETWEEN '2022-07-23' AND '2022-07-30'
GROUP BY unit_id
HAVING sum(value) = datediff('2022-07-30','2022-07-23') + 1;
You can try to use the condition aggregate function in HAVING to compare whether all the rows for which this is true between your date condition.
Query 1:
SELECT unit_id
FROM houseavailability
WHERE date BETWEEN '2022-07-23' AND '2022-07-30'
GROUP BY unit_id
HAVING COUNT(DISTINCT date) = COUNT(DISTINCT CASE WHEN value = '1' THEN date END)
Results:
DISTINCT which is in aggregate function will count only once, if there are duplicate days have 1 value in your tables, but if you want to count multiple when you met that situation you can remove DISTINCT from the aggregate function.
EDIT
Due to there being a UNIQUE constraint from your unit_id and date columns, you don't need to use DISTINCT on your aggregate function.
SELECT unit_id
FROM houseavailability
WHERE date BETWEEN '2022-07-23' AND '2022-07-30'
GROUP BY unit_id
HAVING COUNT(*) = COUNT(CASE WHEN value = '1' THEN date END)

Operand should contain 1 column(s) Or Error in sql syntax

I checked almost all related topics, not working for me.
I am a beginner
Either I get "operand should contain 1 columns" or "you have an error in your sql syntax check the manual that corresponds to your mysql server version for the right sytax to use near ' ' at line 1 "
Here is my query :
create database makla;
use makla;
create table orders(
order_id int auto_increment primary key,
order_date DATE
);
create table productionitem(
order_id int not null,
item_name varchar (20),
item_description varchar (100),
constraint order_fk foreign key (order_id) references orders (order_id)
);
insert into orders(order_date) values ('2014/11/4');
insert into orders(order_date) values ('2017/9/30');
insert into orders(order_date) values ('2019/4/13');
insert into productionitem(order_id, item_name, item_description)
values (1, 'tv', 'samsung X');
insert into productionitem(order_id, item_name, item_description)
values (1, 'watch', 'swatch X');
insert into productionitem(order_id, item_name, item_description)
values (2, 'pan', 'metal X');
insert into productionitem(order_id, item_name, item_description)
values (3, 'cup', 'world X');
insert into productionitem(order_id, item_name, item_description)
values (3, 'chair', 'plastic X');
select *
from productionitem
where order_id in (select order_id
from orders
where order_date between '2015/11/4' and '2020/11/4')
please help,
You may need to put the date in proper format yyyy-mm-dd
insert into orders(order_date) values ('2014-11-04');
insert into orders(order_date) values ('2017-09-30'); -- notice 09 not just 9
insert into orders(order_date) values ('2019-04-13');
Same date format will be used for SELECT queries.

Max() + group by not working as expected in mysql

CREATE TABLE IF NOT EXISTS `order_order_status` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`order_status_id` int(11) NOT NULL,
`order_id` int(11) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_order_status_order_status_id_index` (`order_status_id`),
KEY `order_order_status_order_id_index` (`order_id`),
KEY `order_order_status_created_at_index` (`created_at`),
KEY `order_order_status_updated_at_index` (`updated_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=5 ;
--
-- Dumping data for table `order_order_status`
--
INSERT INTO `order_order_status` (`id`, `order_status_id`, `order_id`, `created_at`, `updated_at`) VALUES
(1, 2, 1, '2016-10-01 01:57:37', '2016-10-01 01:57:37'),
(2, 2, 2, '2016-10-01 01:57:54', '2016-10-01 01:57:54'),
(3, 2, 3, '2016-10-02 02:12:49', '2016-10-02 02:12:49'),
(4, 6, 3, '2016-10-02 02:14:19', '2016-10-02 02:14:19');
What i want to select is:
1, 2, 1, '2016-10-01 01:57:37', '2016-10-01 01:57:37'
2, 2, 2, '2016-10-01 01:57:54', '2016-10-01 01:57:54'
4, 6, 3, '2016-10-02 02:14:19', '2016-10-02 02:14:19'
that is the newest entry of order_order_status grouped by order_id
now the problem:
running
select *, max(created_at) from `order_order_status` group by `order_order_status`.`order_id`
returns me:
or in prosa
it returns me NOT the newest entry, instead it returns the older one for order_id 3
MySQL is working exactly as expected. The problem is your expectations.
select * with a group by doesn't make sense. You want to get the maximum, do something like this:
select oos.*
from order_order_status
where oos.created_at = (select max(oos2.created_at)
from order_order_status oos2
where oos2.order_id = oos.order_id
);
Aggregation (group by) produces one row per group. An aggregation function such as max() gets the maximum value of a column -- nothing more. It just operates on a column.
When you use select *, you have a bunch of columns that are not in the group by and not the arguments to aggregation columns. MySQL allows this syntax (unfortunately -- few other databases do). The values for the unaggregated columns are arbitrary values from indeterminate rows in the group.
using order by order_id desc may solve your problem
select *, max(created_at) from `order_order_status` group by `order_order_status`.`order_id` order by `order_order_status`.`order_id` desc

Getting Stratified data from MySQL

I have two tables. Teams and Players. What I want to do is create a query that tells me some statistical data about the salary of the largest team. Specifically I want a count of how many players make less than 5K. How many make between 5K and 10K ....in increments of 5K to the max player.
Here is the SQL:
CREATE TABLE `formsfiles`.`Teams` (
`ID` INT NOT NULL AUTO_INCREMENT ,
`Name` VARCHAR(45) NULL ,
PRIMARY KEY (`ID`) );
INSERT INTO `Teams` (`Name`) VALUES ('Sharks');
INSERT INTO `Teams` (`Name`) VALUES ('Jets');
INSERT INTO `Teams` (`Name`) VALUES ('Fish');
INSERT INTO `Teams` (`Name`) VALUES ('Dodgers');
CREATE TABLE `Players` (
`ID` INT NOT NULL AUTO_INCREMENT ,
`Name` VARCHAR(45) NULL ,
`Team_ID` INT NULL ,
`Salary` INT NUll ,
PRIMARY KEY (`ID`) );
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Jim', '1', '4800');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Tom', '1', '12000');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Harry', '2', '1230');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Dave', '2', '19870');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Tim', '3', '1540');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Trey', '4','7340');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Jay', '4', '4800');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Steve', '4','6610');
INSERT INTO `Players` (`Name`, `Team_ID`, salary) VALUES ('Chris', '4','17754');
Given this data: The Dodgers are the largest team (ID =4)
We would like an output of:
0-5000 1
5000-10000 2
10000-15000 0
15000-20000 1
If this code looks familiar it is because it is an evolution of a problem of a prior problem I posted here. Kindly don't beat me down!
Here is my attempt at this. It uses joins to satisfy the conditions:
select sr.range,
SUM(case when p.salary >= sr.low and p.salary < sr.high then 1 else 0 end)
from Players p join
(select t.id
from Players p join
Teams t
on p.team_id = t.id
group by t.team_id
order by SUM(p.salary) desc
limit 1
) team
on p.team_id = team.id cross join
(select '0-5000' as range, 0 as low, 5000 as high union all
select '5000-10000', 5000, 10000 union all
select '10000-15000', 10000, 15000 union all
select '15000-20000', 15000, 20000
) sr
group by sr.range
order by min(sr.low)
Notice the use of a separate query for the ranges, to be sure that you get rows with a count of 0.
This code will do almost what you want
SELECT 5000 * FLOOR(Salary / 5000), count(*)
FROM Players
WHERE Team_ID = 4
GROUP BY FLOOR(Salary / 5000)
It returns the low border of the range and the number of entries
0 1
5000 2
15000 1
Note that it does not return empty ranges.

Select top 10 records calculated by amount spent over a period of time?

Currently I'm struggling on retrieving top 10 records calculated by amount spend over a period of time.
MySQL table:
create table `payment_holder` (
`user_id` int (11),
`amount` Decimal (6),
`date_added` datetime
);
Demo data:
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('4','3.75','2012-03-15 00:41:39');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('5','32.20','2012-03-15 00:42:10');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('6','32.20','2012-03-15 00:42:58');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('7','0.89','2012-03-15 00:48:05');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('8','3.75','2012-03-15 00:50:54');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('4','3.75','2012-03-15 00:41:39');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('5','32.20','2012-03-15 00:42:10');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('6','32.20','2012-03-15 00:42:58');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('7','0.89','2012-03-15 00:48:05');
insert into `payment_holder` (`user_id`, `amount`, `date_added`) values('8','3.75','2012-03-15 00:50:54');
I would like retrieve a result like the following it from this example:
user_id amount
------------------
6 64.40
5 64.40
4 7.5
8 7.5
7 1.78
So in short, which user_id has the highest amount of purchases based on date_added in 2012?
Have you tried something like this which will return all data for all years:
select user_id, sum(amount) Amount
from payment_holder
group by user_id
order by amount desc
limit 0, 10
See SQL Fiddle with Demo
But if you want to limit by year, you can add a WHERE clause which will apply the YEAR() function to the date_added field:
select user_id, sum(amount) Amount
from payment_holder
where year(date_added) = 2012
group by user_id
order by amount desc
limit 0, 10