Unexpected results when joining two subqueries with SQLAlchemy

Unexpected results when joining two subqueries with SQLAlchemy - sqlalchemy

I have a large SQL table as follows,
Id Firstname Lastname tran_id insert_datetime
==============================================================
1 Tom Smith 0 2020-08-07 15:37:32
2 Tom Smith 0 2020-08-06 06:33:44
3 Tom Smith 1 2020-08-07 12:43:53
4 Foo Bar 7 2020-08-24 23:43:21
5 Foo Bar 0 2020-08-25 14:23:24
....
and I'm trying to group it by (firstname, lastname, date) and obtain the number of transactions except those with tran_id = 0 as follows:
Firstname Lastname date num_tran
==============================================================
Tom Smith 2020-08-25 0
Tom Smith 2020-08-26 2
Foo Bar 2020-08-25 1
Foo Bar 2020-08-26 0
....
I was able to achieve this in MySQL by doing the following:
SELECT a.firstname, a.lastname, a.date,
CASE
WHEN b.num_tran IS NOT NULL THEN b.num_tran
ELSE 0
END AS num_tran
FROM
(SELECT firstname, lastname, DATE(insert_datetime) AS date
FROM table
WHERE a.insert_datetime >= '2020-08-25' AND a.insert_datetime <= ' 2020-08-27'
GROUP BY firstname, lastname, date
) a
LEFT OUTER JOIN
(SELECT firstname, lastname, DATE(insert_datetime) AS date, COUNT(Id) AS num_tran
FROM table
WHERE tran_id != 0 AND a.insert_datetime >= '2020-08-25' AND a.insert_datetime <= ' 2020-08-27'
GROUP BY firstname, lastname, date
) b
ON a.firstname = b.firstname, a.lastname = b.lastname, a.date = b.date
I tried to transform this to SQLAlchemy as shown below:
from sqlalchemy import func, case, cast, Date
from sqlalchemy.sql import label
# First full table subquery
full_table = (
request.dbsession.query(
table.firstname,
table.lastname,
label('date', cast(table.insert_datetime, Date))
)
.filter(
table.insert_datetime >= '2020-08-25',
table_insert_datetime <= '2020-08-27')
.group_by(
table.firstname,
table.lastname,
cast(table.insert_datetime, Date))
.subquery()
)
# Second subquery
num_transactions = (
request.dbsession.query(
table.firstname,
table.lastname,
label('date', cast(table.insert_datetime, Date)
label('num_tran', func.count(table.id))
)
.filter(
table.tran_id != 0,
table.insert_datetime >= '2020-08-25',
table.insert_datetime <= '2020-08-27')
.group_by(
table.firstname,
table.lastname,
cast(table.insert_datetime, Date))
.subquery()
)
# Left outer join to get final table
result_table = (
request.dbsession.query(
full_table.c.firstname,
full_table.c.lastname,
full_table.c.date,
case(
[(num_transactions.c.num_tran == None, 0)],
else_=num_transaction.c.num_tran)
.label('num_tran'))
.join(num_transactions,
(full_table.c.firstname == num_transaction.c.firstname) &
(full_table.c.lastname == num_transaction.c.lastname) &
(full_table.c.date == num_transaction.c.date),
isouter=True)
)
However, I get tens of thousands of results and it obviously does not match the results of the query in MySQL. Is there somewhere I'm going wrong in how I'm writing the query using SQLAlchemy?

Related

How make request more readable and scalable?

i have request:
SELECT user_id FROM merchant_data
WHERE user_id IN (
SELECT user_id FROM merchant_data
WHERE merchant_id = 1134
AND created_date = '2022-12-02'
GROUP BY user_id
HAVING COUNT(*) > 2)
AND merchant_id = 1167
AND created_date = '2022-12-02'
GROUP BY user_id
HAVING COUNT(*) = 2;
That request return me data from something like log table. In this case i need to get all users that have 2 more rows with merchant_id == 1134 and 2 rows merchant_id == 1167. But how make it for 4 or 5 or 6 condition like merchant_id == ...?

SELECT user_id FROM merchant_data
WHERE created_date = '2022-12-02'
AND merchant_id IN (1134, 1167, 1186, ...)
GROUP BY user_id
HAVING SUM(merchant_id = 1134) >= 2
AND SUM(merchant_id = 1167) >= 2
AND SUM(merchant_id = 1186) >= 2
AND ...
That depends on an odd MySQL feature that booleans are literally the integer values 1 for true and 0 for false, so you can SUM() a boolean expression. You can't do that in standard SQL.
You could make it more standard SQL by using CASE expressions with no ELSE clause. CASE returns NULL if there is no match, and COUNT() will ignore NULLs.
SELECT user_id FROM merchant_data
WHERE created_date = '2022-12-02'
AND merchant_id IN (1134, 1167, 1186, ...)
GROUP BY user_id
HAVING COUNT(CASE merchant_id WHEN 1134 THEN 1 END) >= 2
AND COUNT(CASE merchant_id WHEN 1167 THEN 1 END) >= 2
AND COUNT(CASE merchant_id WHEN 1186 THEN 1 END) >= 2
AND ...

I am trying to fetch all those "num" from the given table where the "OUT" > "IN"

We can have multiple records for a single "num" value. We need to print the output in ascending order of num.
Table :
create table bill
(
type varchar(5),
num varchar(12),
dur int
);
insert into bill values
('OUT',1818,13),
('IN', 1818,10),
('OUT',1818,7),
('OUT',1817,15),
('IN',1817,18),
('IN',1819,18),
('OUT',1819,40),
('IN',1819,18)
This is what I am querying : I am grouping the records on "type" in different sub - queries and fetching the records where "OUT" > "IN".
select a.num
from
(select num,sum(dur) as D
from bill
where type ='OUT'
group by num) a ,
(select num,sum(dur) as D
from bill
where type ='IN'
group by num) b
where a.D > b.D
group by a.num
order by 1
My output: Expected output:
num num
1817 1818
1818 1819
1819
Thank you

Use conditional aggregation:
SELECT num
FROM bill
GROUP BY num
HAVING SUM(CASE WHEN type = 'OUT' THEN dur ELSE 0 END) >
SUM(CASE WHEN type = 'IN' THEN dur ELSE 0 END);

I would use condition aggregate function in a subquery and add where type in ('IN','OUT') which might get better performance if you created an index on type at first column
SELECT num
FROM (
SELECT num,
SUM(CASE WHEN type = 'OUT' THEN dur ELSE 0 END) outval,
SUM(CASE WHEN type = 'IN' THEN dur ELSE 0 END) inval
FROM bill
WHERE type in ('IN','OUT')
GROUP BY num
) t1
WHERE outval > inval
sqlfiddle

Count Age With Distinctly in MySQL

I have a table like this
PersonID Gender Age CreatedDate
================================
1 M 32 10/09/2011
2 F 33 10/09/2011
2 F 33 10/11/2011
1 M 32 10/11/2011
3 F 33 10/11/2011
I want to find Gender Count By Age with group by created date,The age range will be 30-34 and getting person will be distinctly.
Desired output should like this:
Gender AgeRange CreatedDate CountResult
================================
M 30_34 10/09/2011 1
F 30_34 10/09/2011 1
F 30_34 10/11/2011 1
So I tried this but couldtn help:
SELECT t.Gender,'30_34' AS AgeRange,t.CreatedDate,
SUM(CASE WHEN t.Age BETWEEN 30 AND 34 THEN 1 ELSE 0 END) AS CountResult,
FROM (
SELECT DISTINCT PersonID,Gender,Age,CreatedDate
FROM MyTable
GROUP PersonID,Gender,Age,CreatedDate
HAVING COUNT(PersonID)=1
) t
What can I do for solution?
Thanks

If you are want the earliest created date per personid this might do
drop table if exists mytable;
create table mytable(PersonID int, Gender varchar(1),Age int, CreatedDate date);
insert into mytable values
(1 , 'M', 32 , '2011-09-10'),
(2 , 'F', 33 , '2011-09-10'),
(2 , 'F', 33 , '2011-11-10'),
(1 , 'M', 32 , '2011-11-10'),
(3 , 'F', 33 , '2011-11-10');
select mt.gender,
mt.createddate,
sum(case when mt.age between 32 and 34 then 1 else 0 end) as Age32to34
from mytable mt
where createddate = (select min(mt1.createddate) from mytable mt1 where mt1.personid = mt.personid)
group by gender,mt.createddate

How about:
SELECT
Gender
, '30_34' AS AgeRange
, CreatedDate
, COUNT(*) AS CountResult
FROM MyTable A
JOIN (
SELECT PersonID, MIN(CreatedDate) MinCreatedDate
FROM MyTable GROUP BY PersonID
) B ON B.PersonID = A.PersonID AND B.MinCreatedDate = A.CreatedDate
WHERE Age BETWEEN 30 AND 34
GROUP BY Gender, CreatedDate
ORDER BY CreatedDate, Gender DESC

You would appear to want:
SELECT t.Gender, '30_34' AS AgeRange, t.CreatedDate,
COUNT(DISTINCT t.PersonId) AS CountResult
FROM MyTable
WHERE t.Age BETWEEN 30 AND 34
GROUP BY t.Gender, t.CreatedDate;

How to split SQL query results into columns based on two WHERE conditions and two calculated COUNT fields?

I have the following (simplified) database schema:
Persons:
[Id] [Name]
-------------------
1 'Peter'
2 'John'
3 'Anna'
Items:
[Id] [ItemName] [ItemStatus]
-------------------
10 'Cake' 1
20 'Dog' 2
ItemDocuments:
[Id] [ItemId] [DocumentName] [Date]
-------------------
101 10 'CakeDocument1' '2016-01-01 00:00:00'
201 20 'DogDocument1' '2016-02-02 00:00:00'
301 10 'CakeDocument2' '2016-03-03 00:00:00'
401 20 'DogDocument2' '2016-04-04 00:00:00'
DocumentProcessors:
[PersonId] [DocumentId]
-------------------
1 101
1 201
2 301
I have also set up an SQL fiddle to play with: http://www.sqlfiddle.com/#!3/e6082
The relation logic is the following: every Person can work on zero or infinite number of ItemDocuments (many-to-many); each ItemDocument belongs to exactly one Item (one-to-many). Item has status 1 - Active, 2 - Closed
What I need is a report that fulfills the following requirements:
for each person in Persons table, display count of Items that have ItemDocuments related to this person
the counts should be split in two columns by ItemStatus
the query should be filterable by two optional date periods (using two BETWEEN conditions on ItemDocuments.Date field) and the Item counts should also be split into two periods
if a Person does not have any ItemDocuments assigned, it still should be shown in the results with all count values set to 0
if a Person has more than one ItemDocument for an Item, the Item still should be counted only once
Essentially, here is how the results should look like if I use both periods to NULL (to read all the data):
[PersonName] [Active Items for period 1] [Closed Items for period 1] [Active Items for period 2] [Closed Items for period 2]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'Peter' 1 1 1 1
'John' 1 0 1 0
'Anna' 0 0 0 0
While I can create an SQL query for each requirement separately, I have a problem to understand how to combine all of them together into one.
For example, I can split ItemStatus counts in two columns using
COUNT(CASE WHEN t.ItemStatus = 1 THEN 1 ELSE NULL END) AS Active,
COUNT(CASE WHEN t.ItemStatus = 2 THEN 1 ELSE NULL END) AS Closed
and I can filter by two periods (with max/min date constants from MS SQL server specification to avoid NULLs for optional period dates) using
between coalesce(#start1, '1753-01-01') and coalesce(#end1, '9999-12-31')
between coalesce(#start2, '1753-01-01') and coalesce(#end2, '9999-12-31')
but how to combine all of this together, considering also JOINs between tables?
Is there any technique, join or MS SQL Server specific approach to do this in efficient way?
My first attempt seems to work as required but it looks like ugly subquery duplications multiple times:
DECLARE #start1 DATETIME, #start2 DATETIME, #end1 DATETIME, #end2 DATETIME
-- SET #start2 = '2017-01-01'
SELECT
p.Name,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Active1,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Closed1,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Active2,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Closed2
FROM Persons p

I'm not absolutely sure if I really got what you want, but you might try this
WITH AllData AS
(
SELECT p.Id AS PersonId
,p.Name AS Person
,id.Date AS DocDate
,id.DocumentName AS DocName
,i.ItemName AS ItemName
,i.ItemStatus AS ItemStatus
,CASE WHEN id.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod1
,CASE WHEN id.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod2
FROM Persons AS p
LEFT JOIN DocumentProcessors AS dp ON p.Id=dp.PersonId
LEFT JOIN ItemDocuments AS id ON dp.DocumentId=id.Id
LEFT JOIN Items AS i ON id.ItemId=i.Id
)
SELECT PersonID
,Person
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ActiveIn1
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ClosedIn1
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ActiveIn2
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ClosedIn2
FROM AllData
GROUP BY PersonID,Person

Concatenating row values sql server 2008 r2

I have two tables register and att_bottom and I want to display only the students at a certain building who have been tardy based on today's date with the periods separated by a comma.
This is the way the data is displayed when joining both tables:
Student ID | Building | Period | Grade
12345 2 1 11
12345 2 5 11
43210 2 1 12
I want this:
Student ID | <u>Building | Period | Grade
12345 2 1,5 11
43210 2 1 12
This is my query:
select r.STUDENT_ID,
r.BUILDING ,
(select ab.attendancePeriod + ','
from att_bottom ab
where ab.STUDENT_ID = r.student_id
and ab.building = '2'
and ab.attendance_c ='T'
and ab.SCHOOL_YEAR =2014
CONVERT(date,ab.attendance_date,102) = convert(date,getdate(),102)
FOR XML PATH ('') ) AS PERIODS,
r.GRADE
FROM register r
where r.CURRENT_STATUS = 'A'
and r.BUILDING ='2'
I'm getting all the students at building 2 and even if they don't have an attedance_c of T; a NULL value for Periods is being retrieved:
Student ID | Building | Period | Grade
12345 2 1 , 5 11
43210 2 1 , 12
95687 2 NULL 09
78417 2 NULL 10
20357 2 NULL 11
I have tried and ab.attendancePeriod is Not NULL and I still get the same results.
Any thoughts?

The outer query doesn't listen to any filters in the subquery; it will return NULL for any rows that aren't matched by the join conditions. You need to filter differently. Here is one way (this also eliminates the errant trailing comma, and avoids comparing dates by converting them expensively to strings):
;WITH x AS
(
SELECT DISTINCT s = r.Student_ID, r.building,
p = ab.attendancePeriod, r.grade
FROM dbo.Register AS r
INNER JOIN dbo.att_bottom AS ab
ON r.Student_ID = ab.Student_ID
AND r.building = ab.building
WHERE ab.building = '2'
AND ab.attendance_c = 'T'
AND ab.SCHOOL_YEAR = 2014
AND ab.attendance_date >= CONVERT(DATE, GETDATE())
AND ab.attendance_date < DATEADD(DAY, 1, CONVERT(DATE, GETDATE()))
AND r.building = '2'
AND r.CURRENT_STATUS = 'A'
)
SELECT DISTINCT
[Student ID] = x.s,
x.building,
Period = STUFF((SELECT ',' + x2.p FROM x AS x2 WHERE x2.s = x.s
FOR XML PATH(''),
TYPE).value(N'./text()[]',N'nvarchar(max)'),1,1,''),
x.grade
FROM x;
Another way:
SELECT DISTINCT
r.Student_ID,
r.building,
Period = STUFF(b.p.value(N'./text()[1]', N'nvarchar(max)'),1,1,''),
r.grade
FROM dbo.Register AS r
CROSS APPLY
(
SELECT p = ',' + ab.attendancePeriod
FROM dbo.att_bottom AS ab
WHERE ab.building = '2'
AND ab.attendance_c = 'T'
AND ab.SCHOOL_YEAR = 2014
AND ab.attendance_date >= CONVERT(DATE, GETDATE())
AND ab.attendance_date < DATEADD(DAY, 1, CONVERT(DATE, GETDATE()))
AND ab.student_id = r.student_id
AND ab.building = r.building
FOR XML PATH(''),TYPE
) AS b(p)
WHERE b.p IS NOT NULL
AND r.building = '2'
AND r.CURRENT_STATUS = 'A';

Move the AS PERIODS select to be an inner join to r.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Unexpected results when joining two subqueries with SQLAlchemy - sqlalchemy

Related

How make request more readable and scalable?

I am trying to fetch all those "num" from the given table where the "OUT" > "IN"

Count Age With Distinctly in MySQL

How to split SQL query results into columns based on two WHERE conditions and two calculated COUNT fields?

Concatenating row values sql server 2008 r2

Categories

Resources