MySQL get subquery value - mysql

I'm trying to calculate the distance of my centroid point, that's calculated through the total number of tags, and sum of the instant time that tags appear. So that's the concept of (tc_sum/cnt).
However the SELECT on the subquery, doesn't allow me to get the centroid point, because the "centr" is not calculated yet, and so i can't get the "distance".
Any help?
SELECT cnt, tc_sum, ROUND(tc_sum/cnt) as centr, distance
FROM (
SELECT SUM(timecode) as tc_sum, count(timecode) as cnt, ABS( centr - '".$timecode."' ) AS distance
FROM dados d
WHERE tag = 'donald'
AND filename = 'donald.mp4'
AND group_id = '1'
) d

SELECT
SUM(timecode) as tc_sum,
SUM(timecode) as cnt,
ABS( SUM(timecode) / SUM(timecode) - '".$timecode."' ) AS distance,
ROUND(SUM(timecode) / SUM(timecode)) AS centr
FROM dados d
WHERE tag = 'donald'
AND filename = 'donald.mp4'
AND group_id = '1'
A query works on row after row and you can't refer to aliases this way. You have to "recalculate" them again. "Recalculate" is not the right word, since the result isn't really calculated multiple times. The optimizer will take care of it being only calculated once. But an alias is only known after the query ran. I'm afraid my english sucks too much to explain it in a good way :)

Try
SELECT cnt, tc_sum, ROUND(tc_sum/cnt) as centr, distance
FROM (
SELECT SUM(timecode) as tc_sum, count(timecode) as cnt, ABS( ROUND(tc_sum/cnt)- '".$timecode."' ) AS distance
FROM dados d
WHERE tag = 'donald'
AND filename = 'donald.mp4'
AND group_id = '1'
) d

Related

correlation query in mysql

I have also seen this link which calculates the correlation between two columns. IMy question is different from that because my query is more complex, since it's not between column. I want to find the correlation between two different conditions in query.
I have a table which has data of search query history of website in it. I want to calculate the correlation of the search_no in different days. To calculate the number of search queries I have implemented the following:
select to_date(time), query, platform, count(query) as search_no
from search
where `_month` = 2 and time between '2021-02-05 00:00:00' and '2021-02-05 23:59:59' and platform = 'application'
group by to_date(time), query, platform
order by search_no desc limit 1000
It works perfect. It calculates the number of searches as search_no for 2021-02-05. What I want to find is the correlation between two different dates like 2021-02-05 and 2021-01-29.
The correlation formula is as follows:
PS: x is data of the first day (2021-02-05) and y is the data of the second day (2021-01-29).
What I have tried
select (sum((x.search_no - avg(x.search_no)) * (y.search_no - avg(y.search_no))) / ((count(x.search_no) - 1) * (stddev_samp(x.search_no) * stddev_samp(y.search_no))
from (
(
select to_date(time), query, platform, count(query) as search_no
from search
where `_month` = 2 and time between '2021-02-05 00:00:00' and '2021-02-05 23:59:59' and platform = 'application'
group by to_date(time), query, platform
order by search_no desc limit 1000
) as x,
(
select to_date(time), query, platform, count(query) as search_no
from search
where `_month` = 1 and time between '2021-01-29 00:00:00' and '2021-01-29 23:59:59' and platform = 'application'
group by to_date(time), query, platform
order by search_no desc limit 1000
) as y
)
I don't know how can I implement it.
If I understand correctly, you want the correlation of the summaries from two different days. That would be starting with this data:
select query,
sum(date(time) = '2021-02-05') as x,
sum(date(time) = '2021-02-06') as y,
count(*) as cnt
from search
where `_month` = 2 and
time >= '2021-02-05' and
time < '2021-02-07' and
platform = 'application'
group by query;
You can then plug this directly into your formula:
with dataset as (
select query,
sum(date(time) = '2021-02-05') as x,
sum(date(time) = '2021-02-06') as y,
count(*) as cnt
from search
where `_month` = 2 and
time >= '2021-02-05' and
time < '2021-02-07' and
platform = 'application'
group by query
)
select (sum( (x - avg_x) * (y - avg_y) ) /
sqrt(nullif( sum(power(x - avg_x, 2) * power(y - avg_y, 2)), 0))
) as pearson_correlation
from (select d.*,
avg(x) over () as avg_x,
avg(y) over () as avg_y
from dataset d
) d;
Obviously, you need to adjust the date range in the where clause for whatever day you want. I see no reason to use limit -- that will just through the population of queries off.

Sub select of row that has max column

I'm trying to do something that sounds really simple but I have been going round in circles a little with it..
I have a stored procedure that currently works as required missing only one bit of functionality, to return a name for a corrosponding max calculation...
So I return
average calculation &
max calculation but want to return 'the name from another column' for the max value.
Here is an example of my SP, apologies that it may not seem very natural as I have had to rename and omit non relevant bits so may seem a little contrived::
SELECT
IFNULL(ROUND(AVG(TABLE1.TotalCapacityPercentageUsage / TABLE1.TotalSnapshotsForTimeSegment), 2), 0.0) AS TotalAvgCapacityPercentageUsage,
IFNULL(ROUND(MAX(TABLE1.MaxCapacityPercentageUsage), 2), 0.0) AS TotalMaxCapacityPercentageUsage,
-- TODO return the QueuesTmp.QueueName for max calculation (This could be more than one row, so I was going to use something like the following:
-- (SELECT GROUP_CONCAT(QueuesTmp.QueueName SEPARATOR ' ') to ensure only one field is returned..
FROM TABLE1
INNER JOIN QueuesTmp ON QueuesTmp.QueueID = TABLE1.QueueID
RIGHT JOIN TimesTmp ON TABLE1.TimeSegment = TimesTmp.QuarterHour AND
TABLE1.Date = DATE(TimesTmp.StartOfRangeUTC)
GROUP BY TimesTmp.QuarterHour;
I started by doing a Sub select but it seems I would then have to repeat all of the Joins, WHERE and Group By (Seems this is not even possible because that's what having is for)..
Can anybody guide me in the right direction as to how this can be achieved?
Thanks in advance.
WORKING SOLUTION
GROUP_CONCAT(DISTINCT QueuesTmp.QueueName ORDER BY MYCOLUMN DESC
SEPARATOR ':') AS MaxColumnQueueName,
I'm not sure that I'm on the right way. You need the QueueName of that row with the max - calculation. So use the group_concat with an ORDER BY of this calculation and get with SUBSTRING_INDEX the first element of this list.
substring_index(
GROUP_CONCAT(DISTINCT QueuesTmp.QueueName ORDER BY `maxCalculation` DESC) SEPARATOR ':',
':',
1
)
Additional question.
Sorry unfortunately the max comment space has reached. Here a query.
I used your example - query for sub and select the queueId as comma-separated list and the max(maxColumn) as additional.
After that I join to queue-table again with queueId and maxColumn. I can't guarantee if that works.
SELECT
sub.TotalAvgCapacityPercentageUsage,
sub.TotalMaxCapacityPercentageUsage,
GROUP_CONCAT(DISTINCT QueuesTmp.QueueName ORDER BY MYCOLUMN DESC SEPARATOR ':') AS MaxColumnQueueName
FROM(
SELECT
TimesTmp.QuarterHour,
IFNULL(
ROUND(
AVG(
TABLE1.TotalCapacityPercentageUsage /
TABLE1.TotalSnapshotsForTimeSegment
),
2
),
0.0
) AS TotalAvgCapacityPercentageUsage,
IFNULL(
ROUND(
MAX(TABLE1.MaxCapacityPercentageUsage),
2
),
0.0
) AS TotalMaxCapacityPercentageUsage,
max(QueuesTmp.maxColumn) AS maxColumn,
group_concat(DISTINCT QueueID) AS QueueID
FROM TABLE1
INNER JOIN QueuesTmp
ON QueuesTmp.QueueID = TABLE1.QueueID
RIGHT JOIN TimesTmp
ON TABLE1.TimeSegment = TimesTmp.QuarterHour
AND TABLE1.Date = DATE(TimesTmp.StartOfRangeUTC)
GROUP BY TimesTmp.QuarterHour
) AS sub
LEFT JOIN QueuesTmp
ON QueuesTmp.QueueID IN(sub.QueueID)
AND QueuesTmp.maxColumn = sub.maxColumn

sql calculate change and percent by year

I have an data set that simulates the rate of return for a trading account. There is an entry for each day showing the balance and the open equity. I want to calculate the yearly, or quarterly, or monthly change and percent gain or loss. I have this working for daily data, but for some reason I can't seem to get it to work for yearly data.
The code for daily data follows:
SELECT b.`Date`, b.Open_Equity, delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM tim_account_history p
CROSS JOIN
(SELECT #pequity:= NULL
FROM tim_account_history
ORDER by `Date` LIMIT 1) as a
ORDER BY `Date`) as b
ORDER by `Date` ASC
Grouping by YEAR(Date) doesn't seem to make the desired difference. I have tried everything I can think of, but it still seems to return daily rate of change even if you group by month or year, etc. I think I'm not using windowing correctly, but I can't seem to figure it out. If anyone knows of a good book about this sort of query I'd appreciate that also.
Thanks.sqlfiddle example
Using what Lolo contributed, I have added some code so the data comes from the last day of the year, instead of the first. I also just need the Open_Equity, not the sum.
I'm still not certain I understand why this works, but it does give me what I was looking for. Using another select statement as a from seems to be the key here; I don't think I would have come up with this without Lolo's help. Thank you.
SELECT b.`yyyy`, b.Open_Equity,
concat('$',round(delta, 2)) as delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM (SELECT (EXTRACT(YEAR FROM `Date`)) as `yyyy`,
(SUBSTRING_INDEX(GROUP_CONCAT(CAST(`Open_Equity` AS CHAR) ORDER BY `Date` DESC), ',', 1 )) AS `Open_Equity`
FROM tim_account_history GROUP BY `yyyy` ORDER BY `yyyy` DESC) p
CROSS JOIN
(SELECT #pequity:= NULL) as a
ORDER BY `yyyy` ) as b
ORDER by `yyyy` ASC
Try this:
SELECT b.`Date`, b.Open_Equity, delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM (SELECT YEAR(`Date`) `Date`, SUM(Open_Equity) Open_Equity FROM tim_account_history GROUP BY YEAR(`Date`)) p
CROSS JOIN
(SELECT #pequity:= NULL) as a
ORDER BY `Date` ) as b
ORDER by `Date` ASC

Sum of a joined table

It is possible that this has been answered somewhere already but I couldn't find it.
So would appreciate if someone could help me with this sql statement again.
This is the sql statement which I have so far:
SELECT * , Round( (Rate * TIME_TO_SEC( Total ) /3600 ) , 2) AS revenue
FROM (SELECT event.eventID, event.staffID, event.role, TIMEDIFF( Time, Pause )
AS Total,
CASE WHEN Position = 'Teamleader'
THEN (Teamleader)
WHEN Position = 'Waiter'
THEN (Waiter)
ELSE '0'
END AS Rate
FROM event, rates, eventoverview
WHERE Storno =0
AND event.eventID= eventoverview.eventID
AND event.clientid = rates.clientid
GROUP BY event.eventID, event.clientID)q1
GROUP BY q1.staffID
The table I am getting is now giving me a total rate per staff and event.
But what I would like to achieve is a sum of those rates per staff.
So basically a sum of the revenue.
Hope someone can help me. Thanks in advance
You can enclose your query in a subquery and do that in the outer query like this:
SELECT *,
SUM(revenue)
FROM
(
SELECT * ,
Round( (Rate * TIME_TO_SEC( Total ) /3600 ) , 2) AS revenue
FROM
(
SELECT
event.eventID,
event.staffID,
event.role,
TIMEDIFF( Time, Pause ) AS Total,
CASE WHEN Position = 'Teamleader' THEN (Teamleader)
WHEN Position = 'Waiter' THEN (Waiter)
ELSE '0'
END AS Rate
FROM event
INNER JOIN rates ON event.clientid = rates.clientid
INNER JOIN eventoverview ON event.eventID = eventoverview.eventID
WHERE Storno =0
GROUP BY event.eventID, event.clientID
)q1
GROUP BY q1.staffID
) AS t
GROUP BY staffID;
Note that: You might get inconsistent data, due to the use of SELECT * with GROUP BY staffID only, the columns that are not in the GROUP BY clause need to be enclosed with an aggregate function otherwise mysql will get an arbitrary value for it. This is not recommended and it it is not the standard way to do so.

How to make sure HAVING happens before GROUP BY

I'm grabing a list of banks that are a certain distance from a point
ICBC 6805 119.86727673154
Bank of Shanghai 7693 372.999006839511
Bank of Ningbo 7626 379.19406334356
ICBC 6790 399.580754911156
Minsheng Bank 8102 485.904900718796
Standard Chartered Bank 8205 551.038506011767
Guangdong Development Bank 8048 563.713291030103
Bank of Shanghai 7688 575.327270234431
Bank of Nanjing 7622 622.249663674778
however I just want to grab 1 venue of each chain.
The query so far
SELECT name, id , (
GLength( LineStringFromWKB( LineString( `lnglat` , POINT( 121.437478728836, 31.182877821277 ) ) ) )
) *95000 AS `distance`
FROM `banks`
WHERE (
lnglat != ""
)
AND (
published =1
)
HAVING (
distance <700
)
ORDER BY `distance` ASC
using group by name doesn't work because it evaluates then the distance does not fall into the range. In other words if there is an ICBC over 700 m away with a lower id, then ICBC will not appear in the results even though two ICBC are withing 700 m. So I suspect this happens because group by happens before having
Or maybe there is a different solution?
I could not move the distance check to the where as it is not a real column #1054 - Unknown column 'distance' in 'where clause'
Select your entire query as a table and do then do the Group By on that.
E.g.
Select * FROM
(SELECT name, id , (
GLength( LineStringFromWKB( LineString( `lnglat` , POINT( 121.437478728836, 31.182877821277 ) ) ) )
) *95000 AS `distance`
FROM `banks`
WHERE (
lnglat != ""
)
AND (
published =1
)
HAVING (
distance <700
)
ORDER BY `distance` ASC) t
GROUP BY t.name
Are you sure your sample is complete as there is no GROUP BY condition? If you want banks within 700 miles then put that in WHERE condition. If you only want 1 bank reported then put that in the GROUP BY. You might need to repeat the Glenght in the Group By rather than using the alias - depends on your version of SQL. You are not grabbing a list of banks a certain distance from a point - you are grabbing banks with position and calculating the distance from a certain point. You want only banks with 700 of the calculated distance and if a bank is repeated you only want it listed once.
SELECT name, id, (GLength( ...) AS [distance]
FROM [banks]
WHERE [lnglat] != "" ... AND [distance] <700
Group By [name], [id], [distance]
ORDER BY [distance] ASC
I'm not sure if this is what you are looking for, just getting one bank with the least distance.
SELECT banks.name, banks.id, banks_with_least_distance.distance
FROM banks JOIN
(
SELECT name, min(
GLength( LineStringFromWKB( LineString( `lnglat` , POINT( 121.437478728836, 31.182877821277 ) ) ) )
*95000) AS `distance`
FROM `banks`
WHERE (lnglat != "") AND (published =1) AND (GLength( LineStringFromWKB( LineString( `lnglat` , POINT( 121.437478728836, 31.182877821277 ) ) ) ) *95000 < 700)
GROUP BY `name`
) AS banks_with_least_distance ON banks.name = banks_with_least_distance.name
ORDER BY banks_with_least_distance.distance DESC
Edited: changed the distance in the where clause to the actual formula.