How to stack ols results as a table in dolphindb? - regression

I need to run a regression with group by in dolphindb. Each group's regression result is a vector and I would like to stack all results as a table, with each coefficient estimate as a column.
t=table(1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 as id, rand(1.0, 15) as y, 1..15 as x1, 3 2 9 5 0 6 7 8 9 10 3 5 0 8 0 as x2)
select ols(y, (x1, x2)) from t group by id
I got the following error message:
select ols(y, x1, x2) as ols_y from t group by id => The column 'ols(y, x1, x2)' must use aggregate function.
Does anyone know a better way to stack the results than to use to a -for loop?

As the result of ols is a vector for each group, you can specify the result as a composite column in dolphindb in the following way:
t=table(1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 as id, rand(1.0, 15) as y, 1..15 as x1, 3 2 9 5 0 6 7 8 9 10 3 5 0 8 0 as x2)
select ols(y, (x1, x2)) as `intercept`x1`x2 from t group by id

Related

Count the duplicate values on multiple array and sort it to highest

I have a target where I have this DUPLICATE checker with COUNT for multiple arrays. As we can see. Any suggestions on how I make this work? Thank you and have a nice day.
DATA:
id
value
1
[5,6,8,4,2]
2
[2,3,4,1,8]
3
[9,3,2,1,10]
Normal result:
number
count
1
2
2
3
3
2
4
2
5
1
6
1
7
0
8
2
9
1
10
1
This is my target result with sorting (Highest count):
number
count
2
3
1
2
3
2
4
2
8
2
5
1
6
1
10
1
9
1

Query with CASE statement in Group By I need to create an entry where no results occur in the WHEN

I have a query which groups up incoming payments into date ranges (1-7 days, 3-6 months etc.) and it largely works as I had hoped. However, I want to return a row which says 0 when no income is expected in the date range.
The group by looks like this:
group by
CASE WHEN timestampdiff(day,curdate(),data.duedate) between 0 and 7 then 1
WHEN timestampdiff(day,curdate(),data.duedate) between 8 and 14 then 2
WHEN timestampdiff(day,curdate(),data.duedate) between 15 and 30 then 3
WHEN timestampdiff(month,curdate(),data.duedate) between 1 and 2 then 4
WHEN timestampdiff(month,curdate(),data.duedate) between 2 and 3 then 5
WHEN timestampdiff(month,curdate(),data.duedate) between 3 and 6 then 6
WHEN timestampdiff(month,curdate(),data.duedate) between 6 and 12 then 7
WHEN timestampdiff(year,curdate(),data.duedate) between 1 and 2 then 8
WHEN timestampdiff(year,curdate(),data.duedate) between 2 and 3 then 9
WHEN timestampdiff(year,curdate(),data.duedate) between 3 and 4 then 10
WHEN timestampdiff(year,curdate(),data.duedate) between 5 and 6 then 11
WHEN timestampdiff(year,curdate(),data.duedate) >= 7 then 12
This works correctly in that it will give me the correct amounts, but I want to force the code to give me a 0. So I currently get this:
1 300000
5 150000
8 300000
What I actually want is this:
1 300000
2 0
3 0
4 0
5 150000
6 0
7 0
8 300000
etc.
This is the entire query - I've tried using an IFNULL() but had no success:
select
sum(data.principaldue+data.interestdue) as m
from
(select
la.id
,rep.duedate
,rep.PRINCIPALDUE
,rep.INTERESTDUE
from repayment rep
join loanaccount la on la.ENCODEDKEY = rep.PARENTACCOUNTKEY
join loanproduct lp on lp.ENCODEDKEY = la.PRODUCTTYPEKEY
group by
CASE WHEN timestampdiff(day,curdate(),data.duedate) between 0 and 7 then 1
WHEN timestampdiff(day,curdate(),data.duedate) between 8 and 14 then 2
WHEN timestampdiff(day,curdate(),data.duedate) between 15 and 30 then 3
WHEN timestampdiff(month,curdate(),data.duedate) between 1 and 2 then 4
WHEN timestampdiff(month,curdate(),data.duedate) between 2 and 3 then 5
WHEN timestampdiff(month,curdate(),data.duedate) between 3 and 6 then 6
WHEN timestampdiff(month,curdate(),data.duedate) between 6 and 12 then 7
WHEN timestampdiff(year,curdate(),data.duedate) between 1 and 2 then 8
WHEN timestampdiff(year,curdate(),data.duedate) between 2 and 3 then 9
WHEN timestampdiff(year,curdate(),data.duedate) between 3 and 4 then 10
WHEN timestampdiff(year,curdate(),data.duedate) between 5 and 6 then 11
WHEN timestampdiff(year,curdate(),data.duedate) >= 7 then 12
END
Order by
CASE WHEN timestampdiff(day,curdate(),data.duedate) between 0 and 7 then 1
WHEN timestampdiff(day,curdate(),data.duedate) between 8 and 14 then 2
WHEN timestampdiff(day,curdate(),data.duedate) between 15 and 30 then 3
WHEN timestampdiff(month,curdate(),data.duedate) between 1 and 2 then 4
WHEN timestampdiff(month,curdate(),data.duedate) between 2 and 3 then 5
WHEN timestampdiff(month,curdate(),data.duedate) between 3 and 6 then 6
WHEN timestampdiff(month,curdate(),data.duedate) between 6 and 12 then 7
WHEN timestampdiff(year,curdate(),data.duedate) between 1 and 2 then 8
WHEN timestampdiff(year,curdate(),data.duedate) between 2 and 3 then 9
WHEN timestampdiff(year,curdate(),data.duedate) between 3 and 4 then 10
WHEN timestampdiff(year,curdate(),data.duedate) between 5 and 6 then 11
WHEN timestampdiff(year,curdate(),data.duedate) >= 7 then 12
END
This is not a complete answer, but would be too big for comments;
A temporary table with numbers could be useful:
MySql temporary tables:
CREATE TEMPORARY TABLE TempTable (num int);
INSERT INTO TmpTable VALUES(1,2,3,4,5,6,7,8,9,10,11,12 ...);
Then you could right join on this table to make sure missing values are included.
Lets say you have this:
results(num, val):
1 300000
5 150000
8 300000
This should result in your desired output:
SELECT numbers.num, COALESCE(results.val, 0) as val
FROM results RIGHT JOIN TempTable numbers on results.num = numbers.num
WHERE numbers.num <= 12 --or other max number
1 300000
2 0
...
5 150000
...
Hope this helps.
Edit:
If you don't have permission to create temporary tables, look for a workaround to select consecutive integers, for example:
SELECT #row := #row + 1 as row, t.*
FROM some_table t, (SELECT #row := 0) r
Where some_table is any table with enough rows.
Probably use a top N on that.
Another dirty workaround, might be good enough if you don't need many numbers:
SELECT 1 num
UNION
SELECT 2 num
UNION
...
Edit:
Slightly tidier workaround:
SELECT * FROM (VALUES (1), (2), (3), ... ) x(i)

MySQL COUNT values fo each column

I’m trying to figure out how to count the values in more than one column.
It seem the first COUNT I do gives me the correct results but everything I’ve tried to get the second column count gives the wrong result.
For example, with the following two columns,
Q2 Q3
1 1
1 1
2 2
1 1
1 1
5 5
3 5
5 3
4 1
2 2
3 3
3 3
5 5
3 3
2 1
2 1
3 2
4 1
1 1
1 1
2 2
5 5
3 3
2 1
3 3
1 1
2 1
SELECT COUNT(Q2) AS QU2 FROM mytable GROUP BY Q2
QU2 = 7 7 7 2 4
gives me the count for Q2. 7 one’s, 7 two’s and so on...
However, the following gives me an unexpected result.
SELECT COUNT(Q2) AS QU2, COUNT(Q3) AS QU3 FROM mytable GROUP BY Q2, Q3
7 4 3 1 5 1 2 1 3
I think its something with the GROUP BY but I don’t know how to get around it to get the needed result.
So I'm tying to get the result of
QU2 = 7 7 7 2 4
QU3 = 13 4 6 4
Or
QU2 QU3
7 13
7 4
7 6
2 4
4
and so on for QU4 QU5 ... I would appreciate any help.
Thank you
I think that this will get you closest to what you want. You can replace the numbers table with any method that generates the numbers 1 to whatever the max value is in Q2 or Q3.
CREATE TABLE dbo.Numbers (num INT)
INSERT INTO dbo.Numbers (num) VALUES (1), (2), (3), (4), (5)
SELECT
N.num,
SUM(CASE WHEN MT.Q2 = N.num THEN 1 ELSE 0 END) AS QU2,
SUM(CASE WHEN MT.Q3 = N.num THEN 1 ELSE 0 END) AS QU3
FROM
dbo.Numbers N
CROSS JOIN dbo.My_Table MT
GROUP BY
N.num
Adding additional columns (for Q4, etc.) just means adding another SUM(CASE...)
How GROUP BY works?
Let's talk about your first query (I added the column Q2 in the SELECT clause to make its output more clear):
SELECT Q2, COUNT(*) AS QU2
FROM mytable
GROUP BY Q2
First, it gets all the rows matching the WHERE criteria, if a WHERE clause exists. Because your query doesn't have a WHERE clause, all the rows from the table are read.
On the next step the rows read on the previous step are grouped by the expression specified in the GROUP BY clause (let's assume it contains only one expression, as the query above does). Internally, grouping the rows requires sorting them first.
This is how the data is organized on this step. I added horizontal separators between the rows that go in each group to make everything clear:
Q2 Q3
-------
1 1
1 1
1 1
1 1
1 1
1 1
1 1
-------
2 1
2 1
2 1
2 1
2 2
2 2
2 2
-------
3 2
3 3
3 3
3 3
3 3
3 3
3 5
-------
4 1
4 1
-------
5 3
5 5
5 5
5 5
-------
On the next step, from each group it creates a single row that goes to the generated result set.
The query above clearly returns:
Q2 QU2
--------
1 7
2 7
3 7
4 2
5 4
What happens when the GROUP BY clause contains more than one expression?
Let's take your second query (again, I added some columns to show its behaviour):
SELECT Q2, Q3, COUNT(*) AS cnt
FROM mytable
GROUP BY Q2, Q3
It works similar with the previous query but, because the GROUP BY clause contains two expression, each group created for the values of Q2 is split in sub-groups based on the value of Q3. Assuming there is another expression (let' say, Q4) in the GROUP BY clause, each sub-group created for a pair (Q2, Q3) is further divided into sub-groups for all the values of Q4 and so on.
For your table, the groups and sub-groups are as follows:
Q2 Q3
=======
1 1
1 1
1 1
1 1
1 1
1 1
1 1
=======
2 1
2 1
2 1
2 1
---
2 2
2 2
2 2
=======
3 2
---
3 3
3 3
3 3
3 3
3 3
---
3 5
=======
4 1
4 1
=======
5 3
---
5 5
5 5
5 5
=======
I used double lines to separate the groups and smaller single lines to separate the subgroups inside each group.
The output of this query is:
Q2 Q3 cnt
------------
1 1 7
2 1 4
2 2 3
3 2 1
3 3 5
3 5 1
4 1 2
5 3 1
5 5 3
How to get the desired result?
It is not possible to get the result you want using a single query. Even more, the result sets you suggest doesn't make much sense.
You can combine two queries using UNION in order to get the data you need and additional information that helps you know where those numbers come from:
SELECT 'Q2' AS source, Q2 AS q, COUNT(Q2) AS cnt FROM mytable GROUP BY Q2
UNION
SELECT 'Q3' AS source, Q3 AS q, COUNT(Q3) AS cnt FROM mytable GROUP BY Q3
The output is:
source q cnt
----------------
Q2 1 7
Q2 2 7
Q2 3 7
Q2 4 2
Q2 5 4
Q3 1 13
Q3 2 4
Q3 3 6
Q3 5 4
Pretty clear, isn't it? The first 5 rows come from the query GROUP BY Q2 and their value in the column q tells what was the value of Q2 for each group (there are 7 occurrences of 1 in column Q2, 7 of 2, 7 of 3, 2 of 4 and so on). The last 4 rows tell the similar story about Q3 (13 rows have 1 in column Q3 and so on).
Remark
There is a difference between COUNT(*) and COUNT(Q2): COUNT(*) counts the rows from the group, COUNT(Q2) counts the not-NULL values in the column Q2. It doesn't care about duplicate, it only ignore the NULL values. If you want to count the distinct values then you have to add the DISTINCT keyword: COUNT(DISTINCT Q2).
I think you need to unpivot the data. In this case, that just means multiple group by connected by union all:
select 'q2' as which, q2, count(*) as cnt
from mytable
group by q2
union all
select 'q3' as which, q3, count(*) as cnt
from mytable
group by q3;
You can add as many more subqueries as you like.
Note: this puts the values in separate rows, rather than in separate columns.
I reckon Asaph is on the right track , however I would alter this slightly try select distinct count(Q2) as QU2, count(Q3) as QU3 from myTable;

how to join two different datasets in ssrs

I have two different datasets source and destination datsets
Source Dataset
Type A B C D E F G
X 1 2 3 4 5 6 7
Y 2 1 3 5 6 7 8
Z 3 4 5 6 7 8 9
Destination Dataset
Type A B C D E F G
X 0 2 3 6 3 7 9
Y 1 1 5 5 4 8 0
Z 2 3 4 4 5 9 9
Is it possible two create a report in the following format?
Type A B C D E F G
Source X 1 2 3 4 5 6 7
Destin X 0 2 3 6 3 7 9
Source Y 2 1 3 5 6 7 8
Destin Y 1 1 5 5 4 8 0
Source Z 3 4 5 6 7 8 9
Destin Z 2 3 4 4 5 9 9
Handle this in SQL itself with query like this:
SELECT * FROM
(SELECT 'Source' AS myField, Type, A, B, C, D, E, F, G
FROM Table1 T1
UNION ALL
SELECT 'Destination' AS myField, Type, A, B, C, D, E, F, G
FROM Table1 T2 ) A
ORDER BY myField Desc, Type
It will be better way instead of handling it in SSRS.
To solve it in SSRS, you would need to know if the Types in both the datasets are mutually exclusive or not. If there are Types which exists in one but not in other, then you would have to do lot of hardcoding. All changes in the input data you would need to change the report. If the types in both dataset are not mutually exclusive then you might be able to use Lookup functions.
You can use Lookup functionality,
OR instead of doing the join in SSRS it's better to do this is in SQL.

How to order by max(a,b) in SQL?

Consider the following table:
id a b
--------------
1 5 1
2 2 3
3 4 2
4 3 6
5 0 1
6 2 2
I would like to order it by max(a,b) in descending order, so that the result will be:
id a b
--------------
4 3 6
1 5 1
3 4 2
2 2 3
6 2 2
5 0 1
What will be the SQL query to perform such ordering ?
Use GREATEST :
SELECT *
FROM table
ORDER BY GREATEST(a, b) DESC