Stuck on a SQL query - mysql

I am running MYSQL 5.1.41 on Ubuntu 10.04.
I have two tables. stocks which contains basic info about a group of stocks and has the following columns: pk, name, pur_date, pur_price, avg_vol, mkt_cap. The other table is data which contains price history about the stocks in the stocks table. The data table contains the following columns: pk, ticker, date, open,high, low, close, volume, adj_close.
I need a query that will show the high for a stock since it's purchase date and the date it occurred.
I have this query:
SELECT ticker, date, MAX(high)
FROM data, stocks
WHERE ticker = sym AND date > pur_date
GROUP BY ticker
ORDER BY ticker
LIMIT 0, 100
The query will give me each stocks high but always returns the latest date in the data table, which happens to be 2011-12-23. How do I change the query to show the date the stock reached its high?
Thanks for your help

That's the classic greatest-per-group issue. One solution is to query the maximum high for the ticket in a subquery:
select ticker
, date
, high
from data d
join stocks s
on d.ticker = s.sym
where d.date > pur_date
and d.high =
(
select max(high)
from data d2
where d2.ticker = d.ticker
and d2.date > pur_date
)
See Quassnoi's explain extended blog for a detailed discussion.

You originally had your grouping by stock ticker, which may not give the correct expected results as I'll explain later. I've actually done a pre-query on a per purchase stock entry and getting the highest price since it was purchased... Consider the following scenario.
Person buys 100 Shares of Stock X on Jan 1 for $20 per share. Buys ANOTHER set of 100 shares on Jan 25 at $23 per share, and another 100 on Feb 18 at $24 per share. The actual data shows Stock X had a high on Jan 22 (before second purchase) of $27 per share then dropped by the time the $23 on Jan 25... shifted down/up from there before Feb purchase, but only on Feb 27 did it kick back up to $25 per share.
The first trade
Jan 1 Bought $20 would show a high of $27 on Jan 22
Jan 25 Bought $23 would show the high of $25 on Feb 27
Feb 18 Bought $24 would also show high of $25 on Feb 27
So grouping by ticker might not necessarily show. Now, above scenario being shown, would you want from here that Stock X across-the-board for all purchases showed a high of $27? even though the most recent purchases never saw that price?
select
s2.*,
d2.*
from
( select
s.pk,
s.sym,
max( d.high ) as HighSincePurchDay
from
stocks s
join data d
on s.sym = d.ticker
AND s.pur_date <= d.date
group by
s.pk
) PreQuery
JOIN stocks s2
on PreQuery.PK = s2.PK
JOIN data d2
on PreQuery.Sym = d.Ticker
AND s.Pur_Date <= d.Date
AND d.high = PreQuery.HighSincePurchDay

Related

Daily consumption delta based on purchase dates

I need to make a (Tableau) daily graph depicting consumption dynamics against previous day grouped by those clients who increased consumption, decreased consumption, and net change overall. Sample is below.
Calculation logic for sample: for every day for every client calculate difference vs previous day for that client, sum those above 0, sum those below 0, sum total.
The sample was made manually from a relatively small data set.
The real table has over 2 mil rows, and is not very consistent in that clients start buying at different days, may skip various periods buying nothing.
Initial table structure is like that:
client_id date sales
1 2018-09-01 4
1 2018-09-02 5
1 2018-09-04 3
2 2018-09-1 2
2 2018-09-2 2
While calculating table difference per date is simple, calculating pure growth and pure churn is hard, because the date row is not continuous for all clients.
I thought of adding the delta_to_previous column to each row when loading the initial dataset from the data storage, like:
WITH orders AS (
SELECT client_id,
date,
SUM(sales) as sales
FROM dwh_orders
GROUP BY client_id, date
)
SELECT
client_id,
date,
sales,
LAG(sales, 1) OVER (
PARTITION BY client_id
ORDER BY date
) as prev_date_order_value,
sales - LAG(sales, 1) OVER (
PARTITION BY client_id
ORDER BY date
) as prev_date_order_delta
FROM
orders;
Then for each date I can just show sum of positive values, negative values, total.
Problem, this approach will show consumption change at the next date of purchase, and if client buys 5 items on March 1 and then 5 on May 1, there will be no change for him at all. What it should do is show -5 for March 2 and +5 for May 1.
I am a bit puzzled at the optimal approach to this. The general solution could also use some review probably.
If someone dealt with a similar problem, I could really use your advice.
If you are experienced with sql, I could use your advice on how to convert the initial dataset (see sample above) into something like
client_id date sales delta
1 2018-09-01 4 0
1 2018-09-02 5 1
1 2018-09-03 0 -5
1 2018-09-04 3 3
2 2018-09-1 2 0
2 2018-09-2 2 0
If you know a bit about Tableau, I could use help on building graphs like this using its tools.
with cdates as (
select client_id, min(date) as dte, max(date) as maxd
from dwh_orders
group by client_id
union all
select client_id, dateadd(day, 1, dte), maxd
from cdates
where dte < maxd
),
cd as (
select client_id, date, sum(sales) as sales
from dwh_orders
group by client_id, date
)
select cdates.client_id, cdates.date,
coalesce(sales, 0) as sales,
(coalesce(sales, 0) -
lag(coalesce(sales, 0)) over (partition by cdates.client_id order by cdates.date
) as delta
from cdates left join
cd
on cdates.client_id = cd.client_id and
cdates.date = cd.date
option (maxrecursion 0);

Find out the percentage for each in SQL

I got two tables here, trip and users
My question is between Oct 1, 2013 at 10am PDT and Oct 22, 2013 at 5pm PDT, what percentage of requests made by unbanned clients each day were canceled in each city?
SELECT
t.city_id,
DATE(t.request_at) AS day,
(100*SUM(IF(t.status = 0, 0, 1))/COUNT(t.id)) AS cancellation_percentage
FROM trips t INNER JOIN users u ON t.client_id = u.users_id
WHERE u.banned = FALSE
AND t.request_at BETWEEN '2013-10-01 17:00:00' AND '2013-10-23'
GROUP BY t.city_id, DATE(t.request_at)
Should produce output showing the city_id and percentage for each city. This assumes that no users have been deleted or that you don't want to include trips for deleted users. It assumes that both the numerator and denominator should exclude banned users.
The GROUP BY t.city_id, DATE(t.request_at) tells it that you want aggregated numbers, aggregating by the city ID and requested day. We output the t.city_id, so we know which city (or at least its ID). We exclude banned users with the where clause after joining the users table in the from clause.
We limit to between 10 AM PDT on 10/1/2013 and 5 PM PDT on 10/22/2013 in the where clause as well. Because both those times were within daylight savings time in the United States in 2013, the offset from UTC is seven hours. Adding that moves the latter date to the 23rd (at midnight or 00:00:00, the default). The former merely moves to 5 PM UTC time or 17:00:00.
The COUNT(t.id) simply counts all trips requested by unbanned clients between the described date times. The IF(t.status = 0, 0, 1) checks if the trip was completed. If so it represents that as 0 trips cancelled on that trip. If not, it counts 1 trip as cancelled on that trip. We then sum all those, getting a count of the cancelled trips. We divide the cancelled count by the total count and multiply the result by 100 to get the percentage.
It is possible that you are expected to do a
set time_zone = `PST8PDT`
or similar rather than changing the dates.
AND t.request_at BETWEEN '2013-10-01 10:00:00' AND '2013-10-22 17:00:00'
These are the requested date/times in PDT.
It's possible that you should do another join to get the city name rather than ID, but there isn't enough schema information here for that.
This query should work:
SELECT
A.usersid,
A.email,
((SELECT
COUNT(B.id)
FROM
trips B
WHERE
B.client_id = A.usersid
AND request_at BETWEEN '10/1/2013' AND '10/22/2013'
AND status <> 0) / (SELECT
COUNT(C.id)
FROM
trips C
WHERE
C.client_id = A.usersid
AND request_at BETWEEN '10-1-2013' AND '10-22-2013') * 100) AS 'PERCENT'
FROM
users A
WHERE
A.role = 0 AND banned = FALSE

Calculate new users subscription amount MySQL

I have a dataset where I need to find out New subscribers revenue.
These are subscribers that are paying either weekly or monthly depending on the subscription they are on.
The unique identifier is "customer" and the data is at timestamp level, but I want it rolled up at monthly level.
Now for each month, we need to find out revenue for only NEW subscribers.
Basically, imagine customers being on monthly/weekly subscriptions and we only want their FIRST Payments to be counted here.
Here's a sample dataset and
created customer amount
16-Feb-18 14:03:55 cus_BwcisIF1YR1UlD 33300
16-Feb-18 14:28:13 cus_BpLsCvjuubYZAe 156250
15-Feb-18 19:19:14 cus_C3vT6uVBqJC1wz 50000
14-Feb-18 23:00:24 cus_BME5vNeXAeZSN2 162375
9-Feb-18 14:27:26 cus_BpLsCvjuubYZAe 156250
....and so on...
here is the final desired output
yearmonth new_amount
Jan - 2018 100000
Feb - 2018 2000
Dec - 2017 100002
This needs to be done in MySQL interface.
Basically, you want to filter the data to the first customer. One method of doing this involves a correlated subquery.
The rest is just aggregating by year and month. So, overall the query is not that complicated, but it does consist of two distinct parts:
select year(created) as yyyy, month(created) as mm,
count(*) as num_news,
sum(amount) as amount_news
from t
where t.created = (select min(t2.created)
from t t2
where t2.customer = t.customer
)
group by yyyy, mm
We can have sql subquery for only the 1st payment of the new customer with
amount for every month and year
The query is as follows
SELECT month(created) as mm,year(created) as yyyy,
sum(amount) as new_amount
FROM t
WHERE t.created=(select min(t2.created) from t t2 where
t2.customer=t.customer)

How get earnings per month MYSQL Query

I have two tables, one table called "tbl_materiales", and another called "tbl_pedidos".. In the table called tbl_materiales" I have information about all my products, Like Description, and the most important "Price"...
In my table "tbl_pedidos", i register all information of products that the user register in the website.
For example:
tbl_materiales:
IdProduct Description Price
5 Product one 8
6 Product three 10
7 Product four 15
tbl_pedidos
IdProduct Quantity Month
5 10 January
6 5 January
7 2 February
So, I want to know all the earnings PER month...
I want to have this: The column earnings is the multiplication of tbl_pedidos.Quantity * tbl_materiales.Price, obviously it depends of the price of the product, and the quantity sold out.
Month Earnings
January 130
February 30
Now, I have this, but it doesn't bring me the correct information...
SELECT tbl_pedidos.Mes
, (SUM(tbl_pedidos.Cantidad) * tbl_materiales.Precio) as Total
FROM tbl_pedidos
INNER JOIN tbl_materiales
ON tbl_pedidos.IdMaterial = tbl_materiales.IdMaterial
GROUP BY tbl_pedidos.Mes
ORDER BY tbl_pedidos.Fecha;
SELECT tbl_pedidos.Mes , SUM(tbl_pedidos.Cantidad*tbl_materiales.Precio) as Total
FROM tbl_pedidos
INNER JOIN tbl_materiales
ON tbl_pedidos.IdMaterial = tbl_materiales.IdMaterial
GROUP BY tbl_pedidos.Mes
ORDER BY tbl_pedidos.Fecha;
Check http://sqlfiddle.com/#!9/de665b/1
The query can be like :
SELECT tbl_p.Month
,sum(as tbl_m.Price*TP.Quantity) AS Earnings
FROM tbl_materiales AS tbl_m
JOIN tbl_pedidos AS tbl_p
ON tbl_m.IdProduct = tbl_p.IdProduct
GROUP
BY tbl_p.Month;
In this case I have used Where instead of Join, maybe de next sentence resolve your problem:
select TP.Month,sum(TM.Price*TP.Quantity) as Earnings
from TBL_Pedidos TBP,TBL_Materiales TM
where TP.IdProduct = TM.Id_Product
group by TP.Month
Group by is the solution

Complex MySQL SELECT Query

Hi still getting my head around MySQL so was hoping someone may be able to shed some light on this one
I have a table named customers which has the following columns
msisdn BIGINT 20
join_date DATETIME
The msisdn is a unique value to identify customers.
There is a second table named ws_billing_all which has the following structure
id INTEGER 11 (Primary Key)
msisdn BIGINT 20
event_time DATETIME
revenue INTEGER
So this table stores all transactions for each of the customers in the customers table as identified by the msisdn.
What I need to do is to determine the amount from all customers that joined on a particular day after 30 days.
So for example, on the 2nd of Dec 2010, 1,100 customers were acquired. Based on the data in ws_billing_all, how much total revenue did the customers that joined on this day generate 30 days from this date.
I will probably need another table for this but not sure and really not sure on how to go about extracting this data. Any help would be appreciated.
#Cularis was very close... You only care about those customers that joined on the ONE DAY, and want all THEIR REVENUEs earned for the next 30 days... In this scenario, a customer would never have sales prior to their join date, so I didn't add an explicit between on their actual sales dates of consideration.
SELECT
date( c.Join_Date ) DateJoined,
count( distinct c.msisdn ) DistinctMembers,
count(*) NumberOfOrders,
SUM(w.revenue) AmountOfRevenue
FROM
customers c
JOIN ws_billing_all w
ON c.msisdn = w.msisdn
AND date( w.event_time ) <= date_add( c.Join_Date, INTERVAL 30 DAY )
WHERE
c.Join_Date >= SomeDateParameterValue
group by
date( c.Join_Date )
order by
date( C.Join_Date )
EDIT -- For clarification...
If you had 150 people join on Dec 1, 45 people on Dec 2, 83 people on Dec 3, you want to see the total revenue per group of people based on the day they joined going out 30 days of their sales... So...
Joined on Number of People Total Revenue after 30 days
Dec 1 150 $21,394 (up to Dec 31)
Dec 2 45 $ 4,182 (up to Jan 1)
Dec 3 83 $ 6,829 (up to Jan 2)
Does this better clarify what you want? Then we can adjust the query...
FINAL EDIT ...
I think I have what you INTENDED (with a count of orders too that might be useful). In the future, providing a sample output of something of complex nature would be helpful, even if it was as simple as I've done here.
With respect to my WHERE clause from the customers table.... Say you only cared about customers who joined within a given time frame, or only after a given date... THIS is where you would update the clause... if you want based on ALL people, then just remove it completely.
SELECT c.msisdn, SUM(w.revenue)
FROM customers c
INNER JOIN ws_billing_all w ON c.msisdn=w.msisdn
WHERE w.event_time BETWEEN c.join_date AND DATE_ADD(c.join_date, INTERVAL 30 DAY)
GROUP BY c.msisdn
You have to join both tables on the customer id. Then select all events that happened between the join date and 30 days after that. Group by the customer id and use SUM() to get total revenue per costumer.