How to combine rows and take highs and lows from those rows - mysql

I am trying to generate a 5M OHLC chart (Open, High, Low, Close). I'm currently reading in data by the minute, but I want to make a 5 minute data chart as well which I can do pretty easy by simply doing
SELECT * FROM intraday_data.intraday WHERE intraday.id mod 5 = 0;
However, this doesn't accurately represent the data because for the OHLC chart to be accurate it would need to also have the open from the very first row and close from the very last row, and it would have to have the highest high from all 5 and the lower low from all 5 if that makes sense, I would also like to implement where it is able to add up all of the volumes.
Here is the current schema:
As you can see the open in highlighted would need to be pulled out in the final row, the highlighted value in the 5th row which is the highest high as well as the remaining rows and the total volume, so essentially after running the function the row presented should be:
id 5: open 4402.75: high 4403: low 4402.5: volume : 12+24+37+32+29
Obviously, I would need to iterate over all of the rows and return every 5th row with the data combined from the last 5 rows,
Current Updated Query:
select open_close_t.cross_id,open_close_t.open_val,open_close_t.close_val,high_low_t.high,high_low_t.low,high_low_t.total_volume from (
select open_close.max_id as cross_id,open_t.open as open_val,close_t.close as close_val from
(select max(id) as max_id,min(id) as min_id from intraday group by FLOOR(id/5)) as open_close
inner join intraday as open_t on (open_t.id=open_close.min_id)
inner join intraday as close_t on (close_t.id=open_close.min_id)
) as open_close_t
left join (
select max(id) as cross_id,max(high_val) as high,min(low_val) as low,sum(volume) as total_volume
from (select id,GREATEST(open,high,low,close) as high_val,GREATEST(open,high,low,close) as low_val,volume from intraday_data.intraday) as _t
group by FLOOR(id/5)
) as high_low_t on (open_close_t.cross_id=high_low_t.cross_id)
Current Updated Results:

we can seperate this question with two main query.
get the the open and close value
get max and min id of every 5th row
use that max_id and min id to get close and open value
get highest and lowest value
first we need get max and min cross open,high,low,close per row
group by 5th rows the previous generated and get highest and lowest values
join two previous generated table by max_id(cross_id in the sql) of each 5th rows
select open_close_t.cross_id,open_close_t.open_val,open_close_t.close_val,high_low_t.high,high_low_t.low,high_low_t.total_volume from (
select open_close.max_id as cross_id,open_t.open as open_val,close_t.close as close_val from
(select max(id) as max_id,min(id) as min_id from intraday group by CEILING(id/5)) as open_close
inner join intraday as open_t on (open_t.id=open_close.min_id)
inner join intraday as close_t on (close_t.id=open_close.max_id)
) as open_close_t
left join (
select max(id) as cross_id,max(high_val) as high,min(low_val) as low,sum(volume) as total_volume
from (select id,greatest(open,high,low,close) as high_val,least(open,high,low,close) as low_val,volume from intraday) as _t
group by CEILING(id/5)
) as high_low_t on (open_close_t.cross_id=high_low_t.cross_id)
fix every 4 instead every 5 bug,because I should use ceiling not floor
the open_close temp table join problem from close_t.id=open_close.min_id to close_t.id=open_close.max_id
the lowest value using least not greatest
I made a db-fiddle example,if has further problem we can test on db-fiddle

Related

Problem with last record in ms access query

I have 3 Access queries , The first query (Q0) filters a table to get specific data :
SELECT Table1.ID, Table1.Machine, Table1.Po, Table1.Priority, Table1.Zdate, Table1.Status
FROM Table1
WHERE (((Table1.Status)<>"Not Needed"));
The second query (Q1) Sorts the first query depending on multiple condition like date and priority :
SELECT Q0.Machine, Q0.Priority, Q0.Zdate, Q0.Po, Q0.ID, Q0.Status
FROM Q0
GROUP BY Q0.Machine, Q0.Priority, Q0.Zdate, Q0.Po, Q0.ID, Q0.Status
ORDER BY Q0.Machine, Q0.Priority DESC , Q0.Zdate;
The third query (Q2) :
SELECT Table2.MachineNumber, Last(Q1.Priority) AS LastOfPriority, Last(Q1.Zdate) AS LastOfZdate, Last(Q1.Po) AS LastOfPo, Last(Q1.ID) AS LastOfID, Last(Q1.Status) AS LastOfStatus
FROM Table2 LEFT JOIN Q1 ON Table2.MachineNumber = Q1.Machine
GROUP BY Table2.MachineNumber;
Q2 should get the last record for each machine number in the Q1 with the rest of data in the same record , So the priority should be all "1" with the right date , the right Po and the right status , Unfortunately it doesn’t happen and get the record with the last ID in each machine number .
The correct result should be like
Sorry For long question guys , Thanks in advance
Edit#1
I just need the last record from each machine number in Q1 no matter what condition because it is not always the latest date or the highest priority i need the Q2 to Strictly tied to the last record of each machine number
Don't use Last but Max:
SELECT Table2.MachineNumber, Max(Q1.Priority) AS HighPriority, Max(Q1.Zdate) As LatestDate, ... etc.

MySQL Left Join throwing off my count numbers

I'm doing a left join on a table to get the number of leads we've generated today and how many times we've called those leads. I figured a left join would be the best thing to do, so I wrote the following query:
SELECT
COUNT(rad.phone_number) as lead_number, rals.lead_source_name as source, COUNT(racl.phone_number) as calls, SUM(case when racl.contacted = 1 then 1 else 0 end) as contacted
FROM reporting_app_data rad
LEFT JOIN reporting_app_call_logs racl ON rad.phone_number = racl.phone_number, reporting_app_lead_sources rals
WHERE DATE(rad.created_at) = CURDATE() AND rals.list_id = rad.lead_source
GROUP BY rad.lead_source;
But the problem with that, is that if in the reporting_app_call_logs table, there are multiple entries for the same phone number (so a phone number has been called multiple times that day), the lead_number (which I want to count how many leads were generated on the current day grouped by lead_source) equals how many calls there are. So the count from the LEFT table equals the count from the RIGHT table.
How do I write a SQL query that gets the number of leads and the total number of calls per lead source?
Try COUNT(DISTINCT expression)
In other words, change COUNT(rad.phone_number) to COUNT(DISTINCT rad.phone_number)

SQL Capture duplicate records across two DIFFERENT columns

I am writing an Exception Catching Page using MySQL for catching duplicate billing entries the following scenario.
Items details are entered in a table which has the following two columns (among others).
ItemCode VARCHAR(50), BillEntryDate DATE
It often happens that same item's bill is entered multiple times, but over a period of few days. Like,
"Football","2019-01-02"
"Basketball","2019-01-02"
...
...
"Football","2019-01-05"
"Rugby","2019-01-05"
...
"Handball","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
In the above example, the item Football is billed twice - first on 2Jan and again on 5Jan. Similarly, item Rugby is billed thrice on 5,7,10Jan.
I am looking to write simple SQL which can pickup each item [say, using distinct(ItemCode) clause], and then display all the records which are duplicates over a period of 30 days.
In the above case, the expected output should be the following 5 records:
"Football","2019-01-02"
"Football","2019-01-05"
"Rugby","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
I am trying to run the following SQL:
select * from tablen a, tablen b, where a.ItemCode=b.ItemCode and a.BillEntryDate = b.BillEntryDate+30;
However, this seems to be highly inefficient as it is running for long without displaying any records.
Is there any possibility for getting a less complex and faster method?
I did explore existing topics (like How do I find duplicates across multiple columns?), but it is catching duplicates where BOTH columns have same value. My requirement is one column same value, and second column varying over a month-long date range.
You can use:
select t.*
from tablen t
where exists (select 1
from tablen t2
where t2.ItemCode = t.ItemCode and
t2.BillEntryDate <> t.BillEntryDate and
t2.BillEntryDate >= t1.BillEntryDate - interval 30 day and t2.BillEntryDate <= t1.BillEntryDate + interval 30 day
);
This will pick up both duplicates in the pair.
For performance, you want an index on (ItemCode, BillEntryDate).
With EXISTS:
select ItemCode, BillEntryDate
from tablename t
where exists (
select 1 from tablename
where
ItemCode = t.ItemCode
and
abs(datediff(BillEntryDate, t.BillEntryDate)) between 1 and 30
)

Hive Query that returns distinct value that each User has

I have a mysql table-
User Value
A 1
A 12
A 3
B 4
B 3
B 1
C 1
C 1
C 8
D 34
D 1
E 1
F 1
G 56
G 1
H 1
H 3
C 3
F 3
E 3
G 3
I need to run a query which returns 2nd distinct value that each user has.
Means if any 2 values are accessed by each user , then based on the occurrence, pick the 2nd distinct value.
So as above 1 & 3 is being accessed by each User. Occurrence of 1 is
more than 3 , so 2nd distinct will be 3
So I thought first I will get all distinct user.
create table temp AS Select distinct user from table;
Then I will have an outer query-
Select value from table where value in (...)
In programmatically way , I can iterate through each of the value user contains like Map but in Hive query I just couldn't write that.
This will return the second most frequented value from your list that spans all users. There isn't one of these values in the table which I expect is a typo in the data. In real data you will likely have muliple ties that you need to figure out how to handle.
Select value as second_distinct from
(select value, rank() over (order by occurrences desc) as rank
from
(SELECT value, unique_users, max(count_users) as count_users, count(value) as occurrences
from
(select value, size(collect_set(user) over (partition by value))
as count_users from my_table
) t
left outer join
(select count(distinct user) as unique_users from my_table
) t2 on (1=1)
where unique_users=count_users
group by value, unique_users
) t3
) t4
where rank = 2;
This works. It returns NULL because there is only value that visited every user (value of 1). Value 3 is not a solution because not every user has seen that value in your data. I expect you intended that three should be returned but again it doesn't span all the users (user D did not see value 3).
Not sure how #invoketheshell's answer was marked correct; it doesn't run and it needs 6 MR jobs. This will get you there in 4 and is less code.
Query:
select value
from (
select value, value_count, rank() over (order by value_count desc) rank
from (
select value, count(value) value_count
from (
select value, num_users, max(num_users) over () max_users
from (
select value
, size(collect_set(user) over (partition by value)) num_users
from db.table ) x ) y
where num_users = max_users
group by value ) z ) f
where rank = 2
Output:
3
EDIT: Let me clarify my solution as there seems to be some confusion. The OP's example says
"So as above 1 & 3 is being accessed by each User ... "
As my comment below the question suggests, in the example given, user D never accesses value 3. I made the assumption that this was a typo and added this to the dataset and then added another 1 as well to make there be more 1's than 3's. So my code correctly returns 3, which was the desired output. If you run this script on the actual dataset it will also produce the correct output which is nothing because there isn't a "2nd Distinct". The only time it could produce an incorrect value, is if there was no one specific number that was accessed by all users, which illustrates the point I was trying to make to #invoketheshell: if there is no single number that every user has accessed, running a query with 6 map-reduce jobs is an absurd way to find that out. Since we are using Hive I believe it would be fair to assume that if this problem were a "real-world" problem, it would most likely be executed on at least 100's of TBs of data (probably more). I the interest of preserving time and resources, it would behoove an individual to at least check that one number had been accessed by all users before running a massive query whose analysis hinges on that assumption being true.

SSRS List formatting

Need to make a list in SSRS that has a numbered range from 0-30 in the rows and allows for null values to be entered in as a dash. i.e if I have 8 players who scored 10 points the list would show an 8 in the row value next to the row value 10 but for the other 29 numbers it would show a dash(-)?
You may have to make a couple of adjustments, but this should get you most of the way there. There wasn't a lot of information to go by to determine exactly what you want.
Adjust your dataset query in SSRS to the following, replacing the subquery Z with your current query that provides the points and player count. I inserted dummy data in there for now so I would have data for this example (8 = 1, 13 = 1, 17 = 2).
With X as
(
select top (30) n = ROW_NUMBER() OVER (ORDER BY m1.number)
from master.dbo.spt_values as m1
CROSS JOIN master.dbo.spt_values as m2
)
,Y as
(
Select Points, PlayerCount, ROW_NUMBER() OVER (Order by PlayerCount) RowNum
from (
--replace this with your query to return the data
--with your 2 columns for points and player count
Select 8 as Points, 1 as PlayerCount
UNION
Select 17 as Points, 2 as PlayerCount
UNION
Select 13 as Points, 1 as PlayerCount) Z
)
Select x.n as Points, /*isnull(Y.Points,0),*/ Isnull(Y.PlayerCount,0) PlayerCount
from X
left join Y on X.n = Y.Points;
The CTE labeled X is what creates the 30 spots. If you want it to be 31 spots (0 - 30 inclusive), change the query in X to be
select top (31) n = ROW_NUMBER() OVER (ORDER BY m1.number)-1
from master.dbo.spt_values as m1
CROSS JOIN master.dbo.spt_values as m2
You end up with a data set with 2 columns: Points and PlayerCount.
Now create your list in SSRS.
Insert a list. In that list, insert 2 columns to the right. Then delete the left most (original) column.
Set the expression for the left textbox in the list to the points field.
Set the expression for the right textbox in the list to the PlayerCount field.
Add a row outside the group above. Type in column headers for each column. I used Points and Player Count.
In the Group Properties for the Details row group, go to Sorting and set the Sort By column to points. The order should be A to Z.
Adjust the height of the rows to whatever looks best to you. I used .25 for the header and detail rows.
In the Tablix properties, check the box next to Keep together on one page if possible.
On the text box containing the Player Count field (bottom right), go to text box properties, Number category. Set it to 0 decimal places. Check the box next to Show zero as:. Make sure - is selected.
That gives you a list that looks like this:
If for some reason you want to see only the numbers that have > 0 players with that amount of points at the top and then the rest, you can do this with a calculated column.
Right click on your data set and add a calculated field. Set the Name to PointsCountSort. The the expression to =iif(Fields!PlayerCount.Value = 0, 2, 1)
. Click OK.
In the Sort Order of the Details group. Change the sort order to go by PointsCountSort A to Z then Points A to Z.
That makes the list sorted like this: