Counting distinct multi-column patterns - sql-server-2014

I'm using SQL Server 2014 and i nee some help with a hard query.
I have the following table (MyTable). These columns names are just for the example. They are actually totally different from each other.
id int,
col1 int,
col2 int,
..
..
..
col70 int
For each pairs of sequential columns {(col1, col2), (col2_col3)...(col69_col70)}, i need to calculate the following: The number of different pairs that each values has - col_i is the static column, and col_i+1 is the other one. Each value need to be divided by the total amount of records in the table. For example:
col1 | col2
45 | 789
56 | 345
99 | 234
45 | 789
45 | 222
89 | 678
89 | 345
45 | 789
90 | 234
12 | 567
Calculation:
((45, 789)+(45, 222))/10
(56, 345)/10
(99, 234)/10
(45, 789)+(45, 222)/10
(45, 789)+(45, 222)/10
(89, 678)+(89, 345)/10
(89, 678)+(89, 345)/10
((45, 789)+(45, 222))/10
(90, 234)/10
(12, 567)/10
Output:
col1_col2
0.2
0.1
0.1
0.2
0.2
0.2
0.2
0.2
0.1
0.1
Explanation for the first records:
45 is the value of the static column ,so now i'll check how many different combination we can find with col2:
45 | 789
45 | 789
45 | 222
45 | 789
Total distinct combinations divided by number of records in the table: 2/10 = 0.2
This calculation need for each pairs of sequential columns. Any recommendation? Is there's a smart way to calculate it automatically instead of writing a query with line for each pair?

An example assuming you have a primary key:
create table my_table
(column_id int not null,
column1 int not null,
column2 int not null);
insert into my_table
(column_id, column1, column2)
values
(1, 45,789),
(2, 56,345),
(3, 99,234),
(4, 45,789),
(5, 45,222),
(6, 89,678),
(7, 89,345),
(8, 45,789),
(9, 90,234),
(10, 12,567);
declare #column_a as nvarchar(100) = N'column1';
declare #column_b as nvarchar(100) = N'column2';
declare #result_column as nvarchar(100) = N'column1_2';
declare #sql_string as nvarchar(4000)
set #sql_string =
'select a.column_id,
1.0 * count( distinct b.' + #column_b + ') / (count(a.' + #column_a + ') over ()) as ' + #result_column
+ ' from my_table a
inner join my_table b
on a.' + #column_a + ' = b.' + #column_a +
' group by a.column_id, a.' + #column_a +
' order by a.column_id';
-- print #sql_string;
execute(#sql_string);
If there's no primary key you could use the rownumber() function to create an identifier, but the result order would change. The print command can be useful for checking the dynamic sql string, here commented out.
Putting the dynamic SQL into a stored procedure:
create procedure column_freq #column_a nvarchar(100), #column_b nvarchar(100), #result_column nvarchar(100)
as
begin
declare #sql_string as nvarchar(4000);
set #sql_string =
'select a.column_id,
1.0 * count( distinct b.' + #column_b + ') / (count(a.' + #column_a + ') over ()) as ' + #result_column
+ ' from my_table a
inner join my_table b
on a.' + #column_a + ' = b.' + #column_a +
' group by a.column_id, a.' + #column_a +
' order by a.column_id';
execute(#sql_string);
end;
go
exec column_freq N'column1', N'column2', N'column1_2';
go

Related

How split multiples subvalues with multiples SubIndex in columns mysql?

CREATE TABLE tablename (id INT,C1 text);
INSERT INTO tablename VALUES
(1, '[AU 1] string 1; [AU 2] string 2; [AU 3] string 3.1; string 3.2; [AU 4] string 4.1; string 4.2; [AU 5] string 5'),
(2, '[AU 1; AU 2] string 1'),
(3, '[AU 1] string 1; [AU 2] string 2');
CREATE TABLE numbers (n INT PRIMARY KEY);
INSERT INTO numbers VALUES (1),(2),(3),(4),(5),(6);
As close as I got by following the examples of '#fthiella' and '#RGarcia'.
Please see fiddle here.
The result I get is different than expected in "I want output like this:"
I want output like this
| ID | AU | ORG |
| 1 |[AU 1]|string_1|
| 1 |[AU 2]|string_2|
| 1 |[AU 3]|string_3.1|
| 1 |[AU 3]|string_3.2|
| 1 |[AU 4]|string_4.1|
| 1 |[AU 4]|string_4.2|
| 1 |[AU 5]|string_5|
| 2 |[AU 1; AU 2]|string_1|
| 3 |[AU 1]|string_1|
| 3 |[AU 2]|string_2|
WITH RECURSIVE
cte1 AS ( SELECT id,
TRIM(TRAILING ';' FROM TRIM(SUBSTRING_INDEX(C1, '[', 2))) one_group,
SUBSTRING(C1 FROM LENGTH(SUBSTRING_INDEX(C1, '[', 2))) slack,
1 ordinality_au
FROM test
UNION ALL
SELECT id,
TRIM(TRAILING ';' FROM TRIM(SUBSTRING_INDEX(slack, '[', 2))),
SUBSTRING(slack FROM LENGTH(SUBSTRING_INDEX(slack, '[', 2))),
ordinality_au + 1
FROM cte1
WHERE LOCATE('[', slack) ),
cte2 AS ( SELECT id,
CONCAT(SUBSTRING_INDEX(one_group, ']', 1), ']') AU,
TRIM(SUBSTRING_INDEX(one_group, ']', -1)) ORG,
ordinality_au
FROM cte1 ),
cte3 AS ( SELECT id,
AU,
ordinality_au,
SUBSTRING_INDEX(ORG, ';', 1) ORG,
TRIM(TRIM(LEADING ';' FROM TRIM(LEADING SUBSTRING_INDEX(ORG, ';', 1) FROM ORG))) slack,
1 ordinality_org
FROM cte2
UNION ALL
SELECT id,
AU,
ordinality_au,
SUBSTRING_INDEX(slack, ';', 1),
TRIM(TRIM(LEADING ';' FROM TRIM(LEADING SUBSTRING_INDEX(slack, ';', 1) FROM slack))),
ordinality_org + 1
FROM cte3
WHERE TRIM(slack) != '' )
SELECT id,
AU,
ORG
FROM cte3
ORDER BY id, ordinality_au, ordinality_org;
https://dbfiddle.uk/?rdbms=mariadb_10.4&fiddle=a3258f8f1cd92eca0c480ea6673f13f1

How to find median given frequency of numbers?

The Numbers table keeps the value of number and its frequency.
+----------+-------------+
| Number | Frequency |
+----------+-------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+----------+-------------+
In this table, the numbers are 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3, so the median is (0 + 0) / 2 = 0. How to find median (output shown) given frequency of numbers?
+--------+
| median |
+--------|
| 0.0000 |
+--------+
I found the following solution here. However, I am unable to understand it. Can someone please explain the solution and/or post a different solution with explanation?
SELECT AVG(n.Number) AS median
FROM Numbers n LEFT JOIN
(
SELECT Number, #prev := #count AS prevNumber, (#count := #count + Frequency) AS countNumber
FROM Numbers,
(SELECT #count := 0, #prev := 0, #total := (SELECT SUM(Frequency) FROM Numbers)) temp ORDER BY Number
) n2
ON n.Number = n2.Number
WHERE
(prevNumber < floor((#total+1)/2) AND countNumber >= floor((#total+1)/2))
OR
(prevNumber < floor((#total+2)/2) AND countNumber >= floor((#total+2)/2))
Here's the SQL script for reproducibility:
CREATE TABLE `Numbers` (
`Number` INT NULL,
`Frequency` INT NULL);
INSERT INTO `Numbers` (`Number`, `Frequency`) VALUES ('0', '7');
INSERT INTO `Numbers` (`Number`, `Frequency`) VALUES ('1', '1');
INSERT INTO `Numbers` (`Number`, `Frequency`) VALUES ('2', '3');
INSERT INTO `Numbers` (`Number`, `Frequency`) VALUES ('3', '1');
Thanks!
You can use a cumulative sum and then take the midway point. I think the logic looks like this:
select avg(number)
from (select t.*, (#rf := #rf + frequency) as running_frequency
from (select t.* from t order by number) t cross join
(select #rf := 0) params
) t
where running_frequency - frequency >= ceil(#rf / 2) and
running_frequency <= ceil((#rf + 1) / 2);

Row and column total in dynamic pivot

In SQL Server 2008, I have a table (tblStock) with 3 columns:
PartCode (NVARCHAR (50))
StockQty (INT)
Location (NVARCHAR(50))
some example data below:
PartCode StockQty Location
......... ......... .........
A 10 WHs-A
B 22 WHs-A
A 1 WHs-B
C 20 WHs-A
D 39 WHs-F
E 3 WHs-D
F 7 WHs-A
A 9 WHs-C
D 2 WHs-A
F 54 WHs-E
How to create procedure to get the result as below?
PartCode WHs-A WHs-B WHs-C WHs-D WHs-E WHs-F Total
........ ..... ..... ..... ...... ..... ..... .....
A 10 1 9 0 0 0 20
B 22 0 0 0 0 0 22
C 20 0 0 0 0 0 20
D 2 0 0 0 0 39 41
E 0 0 0 3 0 0 3
F 7 0 0 0 54 0 61
Total 61 1 9 3 54 39 167
Your help is much appreciated, thanks.
SAMPLE TABLE
SELECT * INTO #tblStock
FROM
(
SELECT 'A' PartCode, 10 StockQty, 'WHs-A' Location
UNION ALL
SELECT 'B', 22, 'WHs-A'
UNION ALL
SELECT 'A', 1, 'WHs-B'
UNION ALL
SELECT 'C', 20, 'WHs-A'
UNION ALL
SELECT 'D', 39, 'WHs-F'
UNION ALL
SELECT 'E', 3, 'WHs-D'
UNION ALL
SELECT 'F', 7, 'WHs-A'
UNION ALL
SELECT 'A', 9, 'WHs-C'
UNION ALL
SELECT 'D', 2, 'WHs-A'
UNION ALL
SELECT 'F', 54, 'WHs-E'
)TAB
Get the columns for dynamic pivoting and replace NULL with zero
DECLARE #cols NVARCHAR (MAX)
SELECT #cols = COALESCE (#cols + ',[' + Location + ']', '[' + Location + ']')
FROM (SELECT DISTINCT Location FROM #tblStock) PV
ORDER BY Location
-- Since we need Total in last column, we append it at last
SELECT #cols += ',[Total]'
--Varible to replace NULL with zero
DECLARE #NulltoZeroCols NVARCHAR (MAX)
SELECT #NullToZeroCols = SUBSTRING((SELECT ',ISNULL(['+Location+'],0) AS ['+Location+']'
FROM (SELECT DISTINCT Location FROM #tblStock)TAB
ORDER BY Location FOR XML PATH('')),2,8000)
SELECT #NullToZeroCols += ',ISNULL([Total],0) AS [Total]'
You can use CUBE to find row and column total and replace NULL with Total for the rows generated from CUBE.
DECLARE #query NVARCHAR(MAX)
SET #query = 'SELECT PartCode,' + #NulltoZeroCols + ' FROM
(
SELECT
ISNULL(CAST(PartCode AS VARCHAR(30)),''Total'')PartCode,
SUM(StockQty)StockQty ,
ISNULL(Location,''Total'')Location
FROM #tblStock
GROUP BY Location,PartCode
WITH CUBE
) x
PIVOT
(
MIN(StockQty)
FOR Location IN (' + #cols + ')
) p
ORDER BY CASE WHEN (PartCode=''Total'') THEN 1 ELSE 0 END,PartCode'
EXEC SP_EXECUTESQL #query
Click here to view result
RESULT
NOTE : If you want NULL instead of zero as values, use #cols instead of #NulltoZeroCols in dynamic pivot code
EDIT :
1. Show only Row Total
Do not use the code SELECT #cols += ',[Total]' and SELECT #NullToZeroCols += ',ISNULL([Total],0) AS [Total]'.
Use ROLLUP instead of CUBE.
2. Show only Column Total
Use the code SELECT #cols += ',[Total]' and SELECT #NullToZeroCols += ',ISNULL([Total],0) AS [Total]'.
Use ROLLUP instead of CUBE.
Change GROUP BY Location,PartCode to GROUP BY PartCode,Location.
Instead of ORDER BY CASE WHEN (PartCode=''Total'') THEN 1 ELSE 0 END,PartCode, use WHERE PartCode<>''TOTAL'' ORDER BY PartCode.
UPDATE : To bring PartName for OP
I am updating the below query to add PartName with result. Since PartName will add extra results with CUBE and to avoid confusion in AND or OR conditions, its better to join the pivoted result with the DISTINCT values in your source table.
DECLARE #query NVARCHAR(MAX)
SET #query = 'SELECT P.PartCode,T.PartName,' + #NulltoZeroCols + ' FROM
(
SELECT
ISNULL(CAST(PartCode AS VARCHAR(30)),''Total'')PartCode,
SUM(StockQty)StockQty ,
ISNULL(Location,''Total'')Location
FROM #tblStock
GROUP BY Location,PartCode
WITH CUBE
) x
PIVOT
(
MIN(StockQty)
FOR Location IN (' + #cols + ')
) p
LEFT JOIN
(
SELECT DISTINCT PartCode,PartName
FROM #tblStock
)T
ON P.PartCode=T.PartCode
ORDER BY CASE WHEN (P.PartCode=''Total'') THEN 1 ELSE 0 END,P.PartCode'
EXEC SP_EXECUTESQL #query
Click here to view result
you need to use case based aggregation to pivot the data
To get the total row use union
In case the Location values are not known in advance, you need to construct a dynamic query
you can also use pivot keyword to do the same.
select partCode,
sum( case when Location='WHs-A' then StockQty
else 0 end
) as 'Whs-A',
sum( case when Location='WHs-B' then StockQty
else 0 end
) as 'Whs-B',
sum(StockQty) as 'Total'
from tblStock
group by partCode
union all
select 'Total' as 'partCode',
sum( case when Location='WHs-A' then StockQty
else 0 end ) as 'Whs-A',
sum( case when Location='WHs-B' then StockQty
else 0 end) as 'Whs-B',
sum(StockQty) as 'Total'
from tblStock

SQL dynamically pivot and group results

I have a table set up like below:
CLIENTNAME MONTHANDYEAR RESOURCE COST
abc JAN2011 res1 1000
abc FEB2011 res1 2000
def JAN2011 res2 1500
def MAR2011 res1 2000
ghi MAR2011 res3 2500
I need an output like below. Months are to be generated dynamically in 3-month intervals. In this case, is there a way to pivot by MONTHANDYEAR as well as group by clientname?
RESOURCE CLIENTNAME JAN2011 FEB2011 MAR2011
res1 abc 1000 1000
res1 def 2000
res2 def 1500
res3 ghi 2500
This is what the PIVOT operator is for:
SELECT
Resource, ClientName,
[JAN2011], [FEB2011], [MAR2011]
FROM
(
SELECT
*
FROM tblname
) AS SourceTable
PIVOT
(
SUM(COST)
FOR MONTHANDYEAR IN ([JAN2011], [FEB2011], [MAR2011])
) AS PivotTable;
Since your months are selected dynamically using #startDate as a base month, you can use the following dynamic query:
DECLARE #startDate datetime
SET #startDate = '2011-01-01'
DECLARE #sql varchar(MAX)
SET #sql = 'SELECT
Resource, ClientName, [' +
REPLACE(SUBSTRING(CONVERT(varchar, #startDate, 13), 4, 8), ' ', '') + '], [' +
REPLACE(SUBSTRING(CONVERT(varchar, DATEADD(MONTH, 1, #startDate), 13), 4, 8), ' ', '') + '], [' +
REPLACE(SUBSTRING(CONVERT(varchar, DATEADD(MONTH, 2, #startDate), 13), 4, 8), ' ', '') + ']
FROM
(
SELECT
*
FROM tblName
) AS SourceTable
PIVOT
(
SUM(COST)
FOR MONTHANDYEAR IN (' +
QUOTENAME(REPLACE(SUBSTRING(CONVERT(varchar, #startDate, 13), 4, 8), ' ', '')) + ', ' +
QUOTENAME(REPLACE(SUBSTRING(CONVERT(varchar, DATEADD(MONTH, 1, #startDate), 13), 4, 8), ' ', '')) + ', ' +
QUOTENAME(REPLACE(SUBSTRING(CONVERT(varchar, DATEADD(MONTH, 2, #startDate), 13), 4, 8), ' ', '')) + ')
) AS PivotTable'
execute(#sql)
working sqlfiddle here
This data transformation can be done with the PIVOT function.
If you know the values, then you can hard-code the monthandyear dates:
select resource,
clientname,
isnull(jan2011, '') Jan2011,
isnull(feb2011, '') Feb2011,
isnull(mar2011, '') Mar2011
from
(
select clientname, monthandyear, resource, cost
from yourtable
) src
pivot
(
sum(cost)
for monthandyear in (Jan2011, Feb2011, Mar2011)
) piv;
See SQL Fiddle with Demo.
But if the dates are unknown, then you will need to use dynamic SQL:
DECLARE #cols AS NVARCHAR(MAX),
#colNames AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(monthandyear)
from yourtable
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
select #colNames = STUFF((SELECT distinct ', isnull(' + QUOTENAME(monthandyear)+', 0) as '+QUOTENAME(monthandyear)
from yourtable
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT resource, clientname,' + #colNames + ' from
(
select clientname, monthandyear, resource, cost
from yourtable
) x
pivot
(
sum(cost)
for monthandyear in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo.
The result of both is:
| RESOURCE | CLIENTNAME | JAN2011 | FEB2011 | MAR2011 |
-------------------------------------------------------
| res1 | abc | 1000 | 2000 | 0 |
| res1 | def | 0 | 0 | 2000 |
| res2 | def | 1500 | 0 | 0 |
| res3 | ghi | 0 | 0 | 2500 |
SELECT Resource, Clientname
, SUM(CASE WHEN MonthAndYear = 'JAN2011' THEN COST ELSE 0 END) AS JAN2011
, SUM(CASE WHEN MonthAndYear = 'FEB2011' THEN COST ELSE 0 END) AS FEB2011
, SUM(CASE WHEN MonthAndYear = 'MAR2011' THEN COST ELSE 0 END) AS MAR2011
FROM yourtable
GROUP BY Resource, Clientname
You can also remove the ELSE 0 to return a NULL value for resource/clientname combinations without data

Sql query to select the values and group them by Series

I am working on an sql server 2008 database.
I have a table like this"
Id Year Series Value
----+------+--------+------
1 1990 a 1.5
1 1990 b 1.6
1 1990 c 1.7
1 1991 a 1.8
1 1991 b 1.9
1 1991 c 2.5
Is there a query that can select the values and return them like this?
Year a b c
------+------+--------+------
1990 1.5 1.6 1.7
1991 1.8 1.9 2.5
Thanks a lot for any help.
If series is fixed to a,b,c you can do this:
CREATE TABLE #t (Id INT, Year INT,
Series VARCHAR(5), Value DECIMAL(10,1))
INSERT #t
VALUES
(1, 1990, 'a', 1.5),
(1, 1990, 'b', 1.6),
(1, 1990, 'c', 1.7),
(1, 1991, 'a', 1.8),
(1, 1991, 'b', 1.9),
(1, 1991, 'c', 2.5)
SELECT pvt.Year,
pvt.a,
pvt.b,
pvt.c
FROM #t
PIVOT (
MIN(Value) FOR Series IN ([a], [b], [c])
) pvt
If there would be other values you can use dynamic pivot:
DECLARE #series VARCHAR(100) =
STUFF(( SELECT DISTINCT ',[' + Series + ']'
FROM #t
FOR XML PATH(''))
,1, 1, '')
DECLARE #query VARCHAR(2000) = '
SELECT pvt.Year, ' + #series +'
FROM #t
PIVOT (
MIN(Value) FOR Series IN (' + #series + ')
) pvt
';
EXEC(#query)
In a scenario with fixed series, there is also possibility with CROSS JOIN:
SELECT a.Year,
MAX(CASE WHEN a.Series = 'a' THEN a.Value END) a,
MAX(CASE WHEN a.Series = 'b' THEN a.Value END) b,
MAX(CASE WHEN a.Series = 'c' THEN a.Value END) c
FROM #t a
CROSS JOIN #t b
GROUP BY a.Id, a.Year
ORDER BY a.Year