CASE condition issue in Clickhouse - mysql

I wrote this code in mySQL and it runs fine there. But having some syntax issue in clickhouse db on CASE Condition :
select
BTOP.id,
BTOP.customer_id,
BTOP.brand_id,
(case when (SELECT count(customer_id) from c_bill_transaction where customer_id = BTOP.customer_id AND added_on < '2023-01-01 00:00:00') <=0 then "new" else "exiting" end) as "customer_type",
BT.id,
BT.status,
BTOP.added_on
from c_bill_transaction_online_precheckout BTOP
Left Join c_bill_transaction BT on BT.precheckout_id = BTOP.id
where BTOP.added_on > '2023-01-01 00:00:00'
Error Clickhouse is throwing :
ClickHouse exception, code: 1002, host: 127.0.0.1, port: 8123; Code: 47. DB::Exception: Missing columns: 'BTOP.customer_id' while processing query: 'SELECT count(customer_id) FROM c_bill_transaction WHERE (customer_id = BTOP.customer_id) AND (added_on < '2023-01-01 00:00:00')', required columns: 'customer_id' 'BTOP.customer_id' 'added_on', maybe you meant: ['customer_id','customer_id','added_on']: While processing (SELECT count(customer_id) FROM c_bill_transaction WHERE (customer_id = BTOP.customer_id) AND (added_on < '2023-01-01 00:00:00')) AS _subquery8: While processing ((SELECT count(customer_id) FROM c_bill_transaction WHERE (customer_id = BTOP.customer_id) AND (added_on < '2023-01-01 00:00:00')) AS _subquery8) <= 0: While processing multiIf(((SELECT count(customer_id) FROM c_bill_transaction WHERE (customer_id = BTOP.customer_id) AND (added_on < '2023-01-01 00:00:00')) AS _subquery8) <= 0, new, exiting) AS customer_type. (UNKNOWN_IDENTIFIER) (version 22.7.1.2484 (official build))

You are using a subquery that refers to a column value from the outer query. This is called a correlated subquery, and is not yet supported by clickhouse. Per https://github.com/ClickHouse/ClickHouse/issues/6697 this may become supported this year.
You can instead join c_bill_transaction and group by BTOP.id. Something like:
select
BTOP.id,
BTOP.customer_id,
BTOP.brand_id,
(case when count(BTcust.customer_id) then "new" else "exiting" end) as "customer_type",
BT.id,
BT.status,
BTOP.added_on
from c_bill_transaction_online_precheckout BTOP
Left Join c_bill_transaction BT on BT.precheckout_id = BTOP.id
left join c_bill_transaction BTcust on BTcust.customer_id=BTOP.customer_id and BTcust.added_on < '2023-01-01 00:00:00'
where BTOP.added_on > '2023-01-01 00:00:00'
group by BTOP.id

Related

SQL how to create aggregate of two aggregated columns for each row

I'm working on a big query in SQL (MySQL 5.7) to calculate aggregated columns based on raw values in my table. I've created several aggregated columns (see attached screenshot and SQL) and I now need to create a conversion_percent column for each tlp_aff_id in my query.
This conversion_percent should be a division of the aggregated JoinedSessions.total_sessions and COUNT(Report.tlp_aff_id) as leads_total.
My current SQL:
SELECT
# Application details
Report.tlp_aff_id,
# Revenue
JoinedRevenue.total_revenue,
# Commission
JoinedCommission.total_commission,
# Profit
JoinedProfit.total_profit,
# Sessions
JoinedSessions.total_sessions,
# Submits
COUNT(Report.tlp_aff_id) as total_submits,
# Leads
COUNT(Report.tlp_aff_id) as leads_total,
SUM(case when Report.application_result = 'Accepted' then 1 else 0 end) leads_accepted,
SUM(case when Report.application_result = 'Rejected' then 1 else 0 end) leads_rejected
# Conversion percent
# JoinedConversion.conversion_percent
FROM
tlp_payout_report_minute AS Report
INNER JOIN
(
SELECT
JoinedRevenue.tlp_aff_id,
JoinedRevenue.minute_rounded_timestamp,
SUM(commission) AS total_revenue
FROM
tlp_payout_report_minute AS JoinedRevenue
WHERE
JoinedRevenue.minute_rounded_timestamp >= 1664841600
AND
JoinedRevenue.minute_rounded_timestamp <= 1664927999
GROUP BY
JoinedRevenue.tlp_aff_id
) AS JoinedRevenue
ON JoinedRevenue.tlp_aff_id = Report.tlp_aff_id
INNER JOIN
(
SELECT
ReportCommission.tlp_aff_id,
ReportCommission.seller_code,
ReportCommission.minute_rounded_timestamp,
SUM(commission) AS total_commission
FROM
tlp_payout_report_minute AS ReportCommission
WHERE
ReportCommission.minute_rounded_timestamp >= 1664841600
AND
ReportCommission.minute_rounded_timestamp <= 1664927999
AND
ReportCommission.seller_code != 44
GROUP BY
ReportCommission.tlp_aff_id
) AS JoinedCommission
ON JoinedCommission.tlp_aff_id = Report.tlp_aff_id
INNER JOIN
(
SELECT
ReportProfit.tlp_aff_id,
ReportProfit.seller_code,
ReportProfit.application_result,
ReportProfit.minute_rounded_timestamp,
SUM(commission) AS total_profit
FROM
tlp_payout_report_minute AS ReportProfit
WHERE
ReportProfit.minute_rounded_timestamp >= 1664841600
AND
ReportProfit.minute_rounded_timestamp <= 1664927999
AND
ReportProfit.application_result = 'Accepted'
AND
ReportProfit.seller_code = 44
GROUP BY
ReportProfit.tlp_aff_id
) AS JoinedProfit
ON JoinedProfit.tlp_aff_id = Report.tlp_aff_id
INNER JOIN
(
SELECT
Conversion.aff_id,
Conversion.conversion_type,
COUNT(Conversion.ip_address) as total_sessions
FROM
tlp_conversions AS Conversion
WHERE
Conversion.conversion_time >= '2022-10-04 00:00:00'
AND
Conversion.conversion_time <= '2022-10-04 23:59:59'
AND
Conversion.aff_id IS NOT NULL
AND
Conversion.conversion_type = 2
GROUP BY
Conversion.aff_id
) AS JoinedSessions
ON JoinedSessions.aff_id = Report.tlp_aff_id
WHERE
Report.minute_rounded_timestamp >= 1664841600
AND
Report.minute_rounded_timestamp <= 1664927999
GROUP BY
Report.tlp_aff_id
ORDER BY
JoinedRevenue.total_revenue DESC
I'm thinking something along the lines of:
INNER JOIN
(
...
) AS JoinedConversion
ON JoinedConversion.aff_id = Report.tlp_aff_id
But I don't think this is necessary for conversion_percent.
What's the right approach here?

Query using group by month

I have problem with the query using group by month. This query returns total_revenue per month. but if month of year doesn't contain any data then total_revnue is increased unnecessarily.
SELECT COUNT(CT.cumTxnReportId),
CT.cumTxnReportId,
CT.ticketNum,
DATE_FORMAT(CT.exitDateTimeUtc,'%m-%Y'),
sum(netAmount) AS total_revenue,
D.name,
HOUR(CT.entranceDateTimeUtc) AS entryHour,
HOUR(CT.exitDateTimeUtc) AS exitHour,
CT.entranceDateTimeUtc,
CT.exitDateTimeUtc,
CT.netAmount AS netAmount,
CT.grossAmount,
CT.discountAmount,
CT.rate,
CT.txnType,
CT.ticketType,
CT.txnNum,
CT.numDiscounts
FROM Parkloco.ParkingArea PA
JOIN IParcPro.Device D ON PA.id = D.parkingAreaId
JOIN Parkloco.RateCard RC ON PA.id = RC.parkingAreaId
JOIN IParcPro.CumTxn CT ON D.id = CT.deviceId
WHERE PA.uuid = '27d842c1-7057-11e6-a0eb-1245b0d35d23'
AND (CT.txnType = 'Allowed'
OR CT.txnType = 'Add'
OR CT.txnType = 'Normal'
OR CT.txnType = 'Offline'
OR CT.txnType = 'Repay')
AND ((CT.entranceDateTimeUtc >= '2016-08-01 00:00:00'
AND CT.exitDateTimeUtc <= '2017-04-31 23:59:59'))
AND (RC.state = 'active'
OR RC.state = 'archived')
AND RC.fromDateTimeUtc <= '2017-04-31 23:59:59'
AND (RC.thruDateTimeUtc IS NULL
OR RC.thruDateTimeUtc >= '2016-08-01 00:00:00')
AND (TIMESTAMPDIFF (SECOND, CT.entranceDateTimeUtc, CT.exitDateTimeUtc) >= '0' * 60)
AND (TIMESTAMPDIFF (SECOND, CT.entranceDateTimeUtc, CT.exitDateTimeUtc) < '1441' * 60)
AND CT.numDiscounts=0
AND CT.ticketNum !=0
GROUP BY DATE_FORMAT(CT.exitDateTimeUtc,'%m-%Y')
but when I am increasing the range month - at that point of time I am getting unneccessary increment in total_revenue
SELECT COUNT(CT.cumTxnReportId),
CT.cumTxnReportId,
CT.ticketNum,
DATE_FORMAT(CT.exitDateTimeUtc,'%m-%Y'),
sum(netAmount) AS total_revenue,
D.name,
HOUR(CT.entranceDateTimeUtc) AS entryHour,
HOUR(CT.exitDateTimeUtc) AS exitHour,
CT.entranceDateTimeUtc,
CT.exitDateTimeUtc,
CT.netAmount AS netAmount,
CT.grossAmount,
CT.discountAmount,
CT.rate,
CT.txnType,
CT.ticketType,
CT.txnNum,
CT.numDiscounts
FROM Parkloco.ParkingArea PA
JOIN IParcPro.Device D ON PA.id = D.parkingAreaId
JOIN Parkloco.RateCard RC ON PA.id = RC.parkingAreaId
JOIN IParcPro.CumTxn CT ON D.id = CT.deviceId
WHERE PA.uuid = '27d842c1-7057-11e6-a0eb-1245b0d35d23'
AND (CT.txnType = 'Allowed'
OR CT.txnType = 'Add'
OR CT.txnType = 'Normal'
OR CT.txnType = 'Offline'
OR CT.txnType = 'Repay')
AND ((CT.entranceDateTimeUtc >= '2016-08-01 00:00:00'
AND CT.exitDateTimeUtc <= '2017-07-31 23:59:59'))
AND (RC.state = 'active'
OR RC.state = 'archived')
AND RC.fromDateTimeUtc <= '2017-07-31 23:59:59'
AND (RC.thruDateTimeUtc IS NULL
OR RC.thruDateTimeUtc >= '2016-08-01 00:00:00')
AND (TIMESTAMPDIFF (SECOND, CT.entranceDateTimeUtc, CT.exitDateTimeUtc) >= '0' * 60)
AND (TIMESTAMPDIFF (SECOND, CT.entranceDateTimeUtc, CT.exitDateTimeUtc) < '1441' * 60)
AND CT.numDiscounts=0
AND CT.ticketNum !=0
GROUP BY DATE_FORMAT(CT.exitDateTimeUtc,'%m-%Y')
output such as :
can anyone help me on this? Thanks in advance if you could let me know.
Despite MySQL allow this weird group by rules, in my opinion, you should to avoid use it. I explain, usually, all select clause non aggregate fields should appear on group by clause:
select a,b,c, sum(z)
from t
group by a,b,c
vs
select a,b,c, sum(z)
from t
group by a #<--- MySQL allow this!
Then, if b and c are not in group by, how MySQL figure up the right fields to be selected? Like this on <5.6:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In my opinion, in your query has no sense: Look entryHour and total_revenue. One is for an entry the other one is for all month.
I guess you should to rethink the hole sql statement. Because the result of this one is incoherent.
Also, remember this is not 'code revision service'. Please, read how to create a Minimal, Complete, and Verifiable example in order your question also help other users.

Passing column in subquery is not working

Please take a look at my query
SELECT ht_tutor.id
,(
SELECT group_concate(days)
FROM (
SELECT fkTutorId
,days
,(
CASE
WHEN endTime <= '14:00:00'
THEN '00:00:00'
WHEN TIMEDIFF(startTime, '14:00:00') < '00:00:00' && TIMEDIFF('17:00:00', endTime) < '00:00:00'
THEN TIMEDIFF('17:00:00', '14:00:00')
ELSE '00:00:00'
END
) AS intersect_time
FROM ht_tutorAvailablity
WHERE ht_tutorAvailablity.fkTutorId = ht_tutor.id
) AS avail_table
) AS days_avail
FROM ht_tutor
LIMIT 0,10
ERROR: #1054 - Unknown column 'ht_tutor.id' in 'where clause'
How can I pass ht_tutor.id their ?
If I'm passing tutor_id manually like ht_tutorAvailablity.fkTutorId = "12" than it is working fine.
Hope you need to use JOIN in your scenario. The below query will JOIN with the TUT.id = AVA.fkTutorId condition:
SELECT group_concate(days), id
FROM (
SELECT AVA.fkTutorId, AVA.days, TUT.id
CASE
WHEN AVA.endTime <= '14:00:00'
THEN '00:00:00'
WHEN TIMEDIFF(AVA.startTime, '14:00:00') < '00:00:00' && TIMEDIFF('17:00:00', AVA.endTime) < '00:00:00'
THEN TIMEDIFF('17:00:00', '14:00:00')
ELSE '00:00:00'
END AS intersect_time
FROM ht_tutorAvailablity AVA
JOIN ht_tutor TUT ON TUT.id = AVA.fkTutorId
) AS avail_table
LIMIT 0, 10

SQL Distinct Multiple Query Combine

I need to get information from two tables as you can see below but by shift so currently they are running basically the same query 4 times, 2 times for each shift.
For first shift they grab all carton numbers that are NOT EQUAL to DELC and the second query for all cartons EQUAL to DELC. The problem is we want distinct carton numbers and if the carton was half processed on first shift and finished on second shift, even though we are doing distinct the carton shows up twice because its only distinct for each query.
Is there a way to run ALL 4 queries together and do a distinct over the entire data?
1 Day old 1st shift = DELC
select count(distinct a.Barcode)
from [RL_Ship].[dbo].[mSCAN] as a inner join
[RL_Ship].dbo].wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where BagToteFlag = 'Y' and direction = 'Send'
and timeStamp >= '2016-06-14 03:00:00'
and timeStamp < '2016-06-14 15:00:00'
and substring(message,64,4) = 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
1 Day old 1st shift <> DELC
select count(distinct a.Barcode)
from [RL_Ship].[dbo].[mSCAN] as a inner join
[RL_Ship].dbo].wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where a.BagToteFlag = 'Y' and a.direction = 'Send'
and a.timeStamp >= '2016-06-14 03:00:00'
and a.timeStamp < '2016-06-14 15:00:00'
and substring(message,64,4) <> 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
1 Day old 2nd shift = DELC
select count(distinct a.Barcode)
from [RL_Ship].[dbo].[mSCAN] as a inner join [RL_Ship].[dbo].
[wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where a.BagToteFlag = 'Y' and a.direction = 'Send'
and a.timeStamp >= '2016-06-14 15:00:00'
and a.timeStamp < '2016-06-15 03:00:00'
and substring(message,64,4) = 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
1 Day old 2nd shift <> DELC
select count(distinct a.Barcode)
from [RL_Ship].[dbo].[mSCAN] as a inner join [RL_Ship].[dbo].
[wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where a.BagToteFlag = 'Y' and a.direction = 'Send'
and a.timeStamp >= '2016-06-14 15:00:00'
and a.timeStamp < '2016-06-15 03:00:00'
and substring(message,64,4) <> 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
If I have understood your question correct the first query I would have tried to get the result you want is something like
SELECT count(distinct barcode) FROM (
select distinct a.Barcode <rest of query>
UNION
select distinct a.Barcode <rest of next query>
UNION ... )
If I do a union and take off the count part I get the correct number as the duplicates are removed across ALL queries, but if I do a union and a count the number is off, it doesn't remove the duplicates.......
and I actually need a total count per query as they are different shifts/time ranges.
The answer of doing a count and then from is good but one I couldn't get that to work and 2 I think that is only going to give me a count off ALL right?
I tried the following but it gave me the following error:
Msg 156, Level 15, State 1, Line 36
Incorrect syntax near the keyword 'select'.
EDIT: Ok I got this working by adding 'as table1' at the end but as I suspected it is only giving me a total, but without duplicates, which is great but I need a count for each of the 4 queries. Any ideas on that?
select count(distinct Barcode) from
(select distinct a.barcode
from [RL_Ship].[dbo].[mSCAN] as a inner join [RL_Ship].[dbo].
[wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where BagToteFlag = 'Y' and direction = 'Send'
and timeStamp >= '2016-06-14 03:00:00' and timeStamp < '2016-06-14
15:00:00'
and substring(message,64,4) = 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
union
select distinct a.Barcode
from [RL_Ship].[dbo].[mSCAN] as a inner join [RL_Ship].[dbo].
[wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where a.BagToteFlag = 'Y' and a.direction = 'Send'
and a.timeStamp >= '2016-06-14 03:00:00'
and a.timeStamp < '2016-06-14 15:00:00'
and substring(message,64,4) <> 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
union
select distinct a.Barcode
from [RL_Ship].[dbo].[mSCAN] as a inner join [RL_Ship].[dbo].
[wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where a.BagToteFlag = 'Y' and a.direction = 'Send'
and a.timeStamp >= '2016-06-14 15:00:00'
and a.timeStamp < '2016-06-15 03:00:00'
and substring(message,64,4) = 'DELC'
and SUBSTRING(rawdata,21,20) > '0'
union
select distinct a.Barcode
from [RL_Ship].[dbo].[mSCAN] as a inner join [RL_Ship].[dbo].
[wmsInboundQueue] on
a.barcode = substring(message,26,20)
Where a.BagToteFlag = 'Y' and a.direction = 'Send'
and a.timeStamp >= '2016-06-14 15:00:00'
and a.timeStamp < '2016-06-15 03:00:00'
and substring(message,64,4) <> 'DELC'
and SUBSTRING(rawdata,21,20) > '0')

SQL statement for GROUP BY

I am really stucked with one sql select statement.
This is output/result which I get from sql statement below:
WHAT I need: I need to have columns assignedVouchersNumber and usedVouchersNumber together in one row by msisdn. So for example if you can see "msisdn" 723709656 there are two rows now.. one with assignedVouchersNumber = 1 and second with assignedVouchersNumber = 1 too.
But I need to have it in one row with assignedVouchersNumber = 2. Do you now where is the problem?
SELECT eu.msisdn,
eu.id as userId,
sum(case ev.voucherstate when '1' then 1 else 0 end) as assignedVouchersNumber,
sum(case ev.voucherstate when '2' then 1 else 0 end) as usedVouchersNumber,
ev.extra_offer_id,
ev.create_time,
ev.use_time,
ev.id as voucherId,
ev.voucherstate
FROM extra_users eu
JOIN (SELECT sn.msisdn AS telcislo,
stn.numberid
FROM stats_number sn
JOIN stats_target_number AS stn
ON ( sn.numberid = stn.numberid )
WHERE stn.targetid = 1) xy
ON eu.msisdn = xy.telcislo
JOIN extra_vouchers AS ev
ON ( eu.id = ev.extra_user_id )
WHERE ev.create_time BETWEEN '2012-07-23 00:00:00' AND '2013-08-23 23:59:59'
AND ev.use_time <= '2013-08-23 23:59:59'
AND ev.use_time >= '2012-07-23 00:00:00'
AND ev.voucherstate IN ( 1, 2 )
AND Ifnull(ev.extra_offer_id IN( 2335, 3195, 30538 ), 1)
GROUP BY eu.msisdn, ev.extra_offer_id, ev.voucherState
ORDER BY eu.msisdn ASC
You have two different extra_offer_id for same msisdn and VouchersNumber. Thats why you get two rows.
I got it... there should not be groupping by ev.voucherState in
GROUP BY eu.msisdn, ev.extra_offer_id, ev.voucherState
After then I have removed ev.voucherState it is working now.