If there are three tables, TableItem, TableAbcd and TablePqrs, as below
TableItem
ID item
1 item1
TableAbcd
ID Item ColA ColB ColC ColD
1 item1 A1 B1 C1 D1
TablePqrs
ID item ColA ColB ColC ColD ColValue
1 item1 A1 B1 null null 10000
2 item1 A1 B1 C1 D1 100
Here, for a given Item, There has to be just one record in the output which has the maximum columns matching in TableAbcd and TablePqrs.
Since row 1 of TableAbcd has maximum matching columns with TablePqrs row 2.
My output for join with above three tables should be,
item ColA ColB ColC ColD ColValue
item1 A1 B1 C1 D1 100
Code tried so far,
Select item, ColA, ColB, ColC, ColD, ColValue
FROM TableItem a
LEFT OUTER JOIN TableAbcd b
ON a.item = b.item
LEFT OUTER JOIN TablePqrs c
ON (b.ColA = c.ColA AND b.ColB = c.ColB AND b.ColC = c.ColC AND b.ColD = c.ColD)
OR (b.ColA = c.ColA AND b.ColB = c.ColB AND b.ColC = c.ColC)
OR (b.ColA = c.ColA AND b.ColB = c.ColB)
if fetch's me two records, i know there may be design issues, but we are getting data from third party legacy system, which has table structure as per its needs and sending this to another interface.
Please suggest.
Here the question is: How many columns match between B and C?
For the join clause you only need that at least one column of b matches the same column in c:
from c
left join b
on c.A = b.A or c.B = b.B or c.C = b.C or c.D = b.D
You can calc it by:
(case when c.A = b.A then 1 else 0 end)
+ (case when c.B = b.B then 1 else 0 end)
+ (case when c.C = b.B then 1 else 0 end)
+ (case when c.D = b.D then 1 else 0 end) as matches
Then simply order by matching rows (descendant) and limit the result to 1 row.
select
c.id, c.item, c.A, c.B, c.C, c.D, c.colValue,
(case when c.A = b.A then 1 else 0 end)
+ (case when c.B = b.B then 1 else 0 end)
+ (case when c.C = b.B then 1 else 0 end)
+ (case when c.D = b.D then 1 else 0 end) as matches
from c
left join b
on c.A = b.A or c.B = b.B or c.C = b.C or c.D = b.D
order by
((case when c.A = b.A then 1 else 0 end)
+ (case when c.B = b.B then 1 else 0 end)
+ (case when c.C = b.B then 1 else 0 end)
+ (case when c.D = b.D then 1 else 0 end)) desc
limit 1;
I've set up a rextester example just to check it: http://rextester.com/IPA67860
With TableAbcd called a and TablePqrs called p, the number of matches is (p.cola = a.cola) + (p.colb = a.colb) + (p.colc = a.colc) + (p.cold = a.cold), because in MySQL true is 1 and false is 0.
Now you are looking for the p records for which no other p record exists with a higher number of matches:
select *
from tablepqrs p1
where not exists
(
select *
from tablepqrs p2
join tableabcd a on a.item = p2.item
where p2.item = p1.item
and (p2.cola = a.cola) + (p2.colb = a.colb) + (p2.colc = a.colc) + (p2.cold = a.cold) >
(p1.cola = a.cola) + (p1.colb = a.colb) + (p1.colc = a.colc) + (p1.cold = a.cold)
);
In the code below you can see another option to filter. Its a similar approach to the one proposed by McNets, but using window functions.
The key is to compute a ranking which allows to determine the TablePqrs row with the best match. In the other hand, if two rows have the same ranking for the same item value, we have to use additional criteria to undo the tie. in the example, the criteria is the ID of the TableAbcd table. I'm not using outer joins so there will be no results for TableItems records without match ranking.
I'm not pretty sure if it really fits what you really want, just try it and get your own conclusions.
SELECT TableItem.id,
TableItem.item,
TablePqrs.colA,
TablePqrs.colB,
TablePqrs.colC,
TablePqrs.colD,
TablePqrs.value
FROM TableItem
INNER JOIN (SELECT DISTINCT
tableItemId,
FIRST_VALUE(tablePqrsId) OVER (PARTITION BY tableItemId ORDER BY ranking DESC, tablePqrsId DESC) tablePqrsId
FROM (SELECT rankTableItem.ID tableItemId,
rankTablePqrs.ID tablePqrsId,
CASE WHEN rankTablePqrs.colA IS NULL THEN 0 ELSE 1 END +
CASE WHEN rankTablePqrs.colB IS NULL THEN 0 ELSE 1 END +
CASE WHEN rankTablePqrs.colC IS NULL THEN 0 ELSE 1 END +
CASE WHEN rankTablePqrs.colD IS NULL THEN 0 ELSE 1 END ranking
FROM TableItem rankTableItem
INNER JOIN TableAbcd rankTableAbcd ON rankTableItem.item = rankTableAbcd.item
INNER JOIN TablePqrs rankTablePqrs ON rankTablePqrs.item = rankTableAbcd.item
AND (rankTableAbcd.colA = rankTablePqrs.colA
OR rankTableAbcd.colB = rankTablePqrs.colB
OR rankTableAbcd.colC = rankTablePqrs.colC
OR rankTableAbcd.colD = rankTablePqrs.colD))) pivotTable ON pivotTable.tableItemId = TableItem.Id
INNER JOIN TablePqrs ON TablePqrs.Id = pivotTable.tablePqrsId
I tried the below thing and it worked, the coalesce helps me prioritise which value to pick depending upon the order i mention in it.
Select item, ColA, ColB, ColC, ColD, ColValue
FROM TableItem a
LEFT OUTER JOIN (
SELECT item,
COALESCE(c1.ColValue,c2.ColValue,c3.ColValue) ColValue
FROM abc b
LEFT OUTER JOIN pqr c1
ON b.ColA = c1.ColA AND b.ColB = c1.ColB AND b.ColC = c1.ColC AND b.ColD = c1.ColD
LEFT OUTER JOIN pqr c2
ON b.ColA = c2.ColA AND b.ColB = c2.ColB AND b.ColC = c2.ColC
LEFT OUTER JOIN pqr c3
ON b.ColA = c3.ColA AND b.ColB = c3.ColB
GROUP BY item
) as Fact
ON Fact.item = a.item
Related
Im having and issue where in my table FarmerGroups I have multiple records by BSI_Code and I am getting double results for GallonsIssued due to this inner join. Is there a way to get the unique value of GallonsIssued or a way to just get results by individual BSI_CODE
With Summary as (
Select B_NAME as Branch, LOC as Location
,SUM(payment) as Gallons
,SUM(case when printed = 1 THEN Fee ELSE NULL END) as FeeCollected
,SUM(case when printed = 0 THEN Fee ELSE NULL END) as FeeNotCollected
,SUM(case when printed = 1 THEN Payment ELSE NULL END) as GallonsIssued
,SUM(case when printed = 0 THEN Payment ELSE NULL END) as GallonsNotIssued
From SicbWeeklyDeliveriesFuel F Inner Join FarmerGroups G ON G.BSI_CODE = F.BSI_CODE AND G.CROP_SEASON = F.CROP_SEASON AND F.B_NAME = G.BRANCH
Where F.CROP_SEASON = #cropseason
Group By B_NAME, LOC
)
SELECT Branch
,Location
,Gallons
,GallonsIssued
,GallonsNotIssued
,FeeCollected
,FeeNotCollected
,((GallonsIssued/Gallons) * 100) as pct_GallonsCollected
FROM Summary
Order by Location, Branch
For SicbWeeklyDeliveriesFuel
BSI_CODE
Payment
LOC
CROP_SEASON
Fee
B_NAME
FNAME
66
125
CZ
5
12.5
DOUGLAS
John K
55
147
OW
5
14.7
CALEDONIA
Tim H
66
95
CZ
5
9.5
DOUGLAS
John K
For Farmer Groups
BSI_CODE
Farmer
CROP_SEASON
BRANCH
TEST_GROUP
66
John K
5
DOUGLAS
1A
55
Tim H
5
CALEDONIA
1B
66
John K
5
DOUGLAS
2A
Your selection for the JOIN of G.BSI_CODE = F.BSI_CODE AND G.CROP_SEASON = F.CROP_SEASON AND F.B_NAME = G.BRANCH does not uniquely define the rows.
You will need to also include F..FNAME = G.Farmer otherwise the first row of SicbWeeklyDeliveriesFuel (BSI_CODE = 66, CROP_SEASON = 5 and B_NAME = DOUGLAS) matches both the first and last rows of FarmerGroups. Likewise the third row also matches the same two rows in FarmerGroups.
The reason for the duplication is the field TEST_GROUP in FarmerGroups Table.
But you don't need this field in the Join.
First,a CTE to get the info you need in the join without duplicates.
then your old join to the new CTE.
Try this:
WITH FarmersGroup AS
(
SELECT DISTINCT
BSI_CODE
, CROP_SEASON
, BRANCH
FROM FarmerGroups
)
, Summary AS
(
SELECT
Branch = B_NAME
, Location = LOC
, Gallons = SUM(payment)
, FeeCollected = SUM(case when printed = 1 THEN Fee ELSE NULL END)
, FeeNotCollected = SUM(case when printed = 0 THEN Fee ELSE NULL END)
, GallonsIssued = SUM(case when printed = 1 THEN Payment ELSE NULL END)
, GallonsNotIssued = SUM(case when printed = 0 THEN Payment ELSE NULL END)
FROM SicbWeeklyDeliveriesFuel F
JOIN FarmerGroup G ON G.BSI_CODE = F.BSI_CODE
AND G.CROP_SEASON = F.CROP_SEASON
AND G.BRANCH = F.B_NAME
WHERE F.CROP_SEASON = #cropseason
GROUP BY
B_NAME, LOC
)
SELECT
Branch
, Location
, Gallons
, GallonsIssued
, GallonsNotIssued
, FeeCollected
, FeeNotCollected
, pct_GallonsCollected = ((GallonsIssued/Gallons) * 100)
FROM Summary
ORDER BY
Location
, Branch
You can use Andy's code above and it should do the job or you can just replace the table join in your current query
Change the following
Inner Join FarmerGroups G ON G.BSI_CODE = F.BSI_CODE AND G.CROP_SEASON = F.CROP_SEASON AND F.B_NAME = G.BRANCH
to
Inner join (select SELECT DISTINCT
BSI_CODE
, CROP_SEASON
, BRANCH
FROM FarmerGroups ) G on
ON G.BSI_CODE = F.BSI_CODE AND G.CROP_SEASON = F.CROP_SEASON AND F.B_NAME = G.BRANCH
I am having abnormal values when I run this part in my sql code. SQL syntax wise, everything is okay with this?
select
COUNT(CASE WHEN bt.idBillingStatus = 2
THEN 1
ELSE NULL END) AS successfulbillinghits,
SUM(CASE WHEN bt.idBillingStatus = 2
THEN price
ELSE 0.0 END)
AS old_revenue
from table
Overall Query is this. The result of successfulbillinghits should be equal to timesbilled
SELECT
cs.idCustomerSubscription,
cs.msisdn,
pro.name AS promoterName,
c.name AS ClubName,
c.idClub AS ClubID,
o.name AS operatorName,
o.idOperator AS OperatorID,
co.name AS country,
-- cu.customerSince AS CustomerSince,
cs.subscribeddate AS subscribeddate,
-- cs.subscriptionNotificationSent AS SubNotificationSent,
-- cs.eventId AS EventId,
cs.unsubscribeddate AS unsubscribeddate,
cs.firstBillingDate AS FirstBillingDate,
cs.lastBilledDate As LastBilledDate,
cs.lastAttemptDate AS LastAttemptDate,
-- smp.code AS packageName,
-- o.mfactor AS mmfactor,
-- cs.idSubscriptionSource AS SubscriptionChannel,
-- cs.idUnsubscribeSource AS UnsubscriptionChannel,
-- DATE(bt.creationDate) AS BillingCreationDate,
-- bt.price AS pricePerBilling,
-- cs.lastRetryDate As LastRetryDate,
-- cs.lastRenewalDate AS LastRenewalDate,
-- cs.isActive AS ActiveStatus,
-- COUNT(bt.idBillingTransaction) AS BillingAttempts,
curr.idcurreny_symbol AS CurrencyID,
curr.symbol AS currency,
date(bt.creationDate) AS BillingDate,
cs.lastBilledAmount As LastBilledAmount,
cs.timesbilled,
price,
-- sum(price),
-- revenueShareAmountLocal,
-- o.mfactor,
-- count(IFF (bt.idBillingStatus = 2,1,0)) as otherversion,
count(CASE WHEN bt.idBillingStatus = 2
THEN 1
ELSE 0 END) AS successfulbillinghits,
SUM(CASE WHEN bt.idBillingStatus = 2
THEN price
ELSE 0.0 END)
AS old_revenue
FROM
customersubscription cs
LEFT JOIN
billing_transaction bt
ON CONVERT(cs.msisdn USING latin1) = bt.msisdn
AND cs.idClub = bt.idClub
AND bt.creationDate BETWEEN cs.SubscribedDate AND COALESCE(cs.UnsubscribedDate, now())
INNER JOIN customer cu ON (cs.idCustomer = cu.idcustomer)
INNER JOIN operator o ON (o.idoperator = cu.idoperator)
INNER JOIN country co ON (co.`idCountry` = o.idCountry)
INNER JOIN curreny_symbol curr ON (curr.idcurreny_symbol = co.idCurrencySymbol)
LEFT JOIN Promoter pro ON cs.idPromoter = pro.id
INNER JOIN club_operator_relationships cor ON cor.clubId = cs.idClub
INNER JOIN club c ON c.idClub = cs.idClub
-- INNER JOIN operator op ON op.idOperator = cu.idOperator
WHERE
-- (cs.timesbilled > 0 and cs.subscribeddate < '2016-09-01 00:00:00' )
cs.subscribeddate between '2017-04-20 00:00:00' and '2017-04-21 00:00:00'
AND cs.idClub IN (39)
GROUP BY idCustomerSubscription, ClubName, operatorName, promoterName
Successfulbillinghits is much greater than timesbilled in the result
Instead of COUNTuse SUM, as count counts blanks or nulls also
select
SUM(CASE WHEN bt.idBillingStatus = 2
THEN 1
ELSE 0 END) AS successfulbillinghits,
SUM(CASE WHEN bt.idBillingStatus = 2
THEN price
ELSE 0.0 END)
AS old_revenue
from table
Instead of using CASE, you can use WHERE clause with these aggregate functions, e.g.:
SELECT COUNT(*) as `successfulbillinghits`, SUM(price) as `old_revenue`
FROM table bt
WHERE bt.idBillingStatus = 2;
I have 5 tables
1. SCHOOL[id(bigInt, primary), name(varchar)]
2. SELECTED_INDICATOR[id(bigInt, primary), school_id(bigint)]
3. TEACHER[id(bigint, primary), indicator_id(bigInt), attendance_id(int)]
4. STUDENT[id(bigint, primary), indicator_id(bigInt), attendance_id(int)]
5. MIDDAY_MEAL[id(bigint,primary), indicator_id(bigint), served(boolean), consumed_number(int)]
in TEACHER table, attendance_id can have value: 1 or 2 or 3.
Similarly, in STUDENT table, attendance_id can have value: 1 or 2.
I have to generate a report based on the SELECTED_INDICATOR id, in a format as:
School_id | School_Name | Total_Teacher | Teacher_1 | Teacher_2 | Teacher_3 | Total_Student | Student_1 | Student_2 | served | consumed_number
for this I have tried as:
select A.id, A.school_id, SC.name,
SUM(CASE WHEN T.attendance_id IN (1,2,3) THEN 1 ELSE 0 END) as TOTAL_TEACHER,
SUM(CASE WHEN T.attendance_id IN (1) THEN 1 ELSE 0 END) as TEACHERS_1,
SUM(CASE WHEN T.attendance_id IN (2) THEN 1 ELSE 0 END) as TEACHERS_2,
SUM(CASE WHEN T.attendance_id IN (3) THEN 1 ELSE 0 END) as TEACHERS_3,
SUM(CASE WHEN S.attendance_id IN (1,2) THEN 1 ELSE 0 END) as TOTAL_STUDENT,
SUM(CASE WHEN S.attendance_id IN (1) THEN 1 ELSE 0 END) as STUDENTS_1,
SUM(CASE WHEN S.attendance_id IN (2) THEN 1 ELSE 0 END) as STUDENTS_2,
M.served, M.consumed_number
from SELECTED_INDICATOR A
join SCHOOL SC on A.school_id = SC.id
join TEACHER T on A.id = T.indicator_id
join STUDENT S on A.id = S.indicator_id
join MIDDAY_MEAL M on A.id = M.indicator_id
WHERE A.STATUS = 'COMPLETED' group by A.id;
When I join TEACHER or STUDENT with SELECTED_INDICATOR one at a time, it gives me the correct data. But when I join both the TEACHER and STUDENT with SELECTED_INDICATOR as in the above query, I get huge numbers for teacher and student related fields.
what is wrong with my query? Please help to correct it, or give any alternative query.
Try using COUNT() that have the options to use distincted values. The problem is that the tables are multiplying the results.
select A.id, A.school_id, SC.name,
COUNT(DISTINCT CASE WHEN T.attendance_id IN (1,2,3) THEN t.TeacherID END) as TOTAL_TEACHER,
COUNT(DISTINCT CASE WHEN T.attendance_id IN (1) THEN t.TeacherID END) as TEACHERS_1,
....
FROM ....
SELECT A.id, A.school_id, SC.name,
(SELECT COUNT(T.attendance_id) FROM TEACHER T GROUP BY T.indicator_id HAVING A.id = T.indicator_id) AS TOTAL_TEACHER,
(SELECT COUNT(T.attendance_id) FROM TEACHER T WHERE T.attendance_id = 1 GROUP BY T.indicator_id HAVING A.id = T.indicator_id) AS TEACHERS_1,
(SELECT COUNT(T.attendance_id) FROM TEACHER T WHERE T.attendance_id = 2 GROUP BY T.indicator_id HAVING A.id = T.indicator_id) AS TEACHERS_2,
(SELECT COUNT(T.attendance_id) FROM TEACHER T WHERE T.attendance_id = 3 GROUP BY T.indicator_id HAVING A.id = T.indicator_id) AS TEACHERS_3,
(SELECT COUNT(S.attendance_id) FROM STUDENT S GROUP BY S.indicator_id HAVING A.id = S.indicator_id) AS TOTAL_STUDENT,
(SELECT COUNT(S.attendance_id) FROM STUDENT S WHERE S.attendance_id = 1 GROUP BY S.indicator_id HAVING A.id = S.indicator_id) AS STUDENTS_1,
(SELECT COUNT(S.attendance_id) FROM STUDENT S WHERE S.attendance_id = 2 GROUP BY S.indicator_id HAVING A.id = S.indicator_id) AS STUDENTS_2,
M.served, M.consumed_number
FROM SELECTED_INDICATOR A
JOIN SCHOOL SC on A.school_id = SC.id
JOIN MIDDAY_MEAL M on A.id = M.indicator_id
I am trying to add index in datetime, but the result still same.
SELECT s.id, s.player,
COUNT(case when dg.winner = 1 AND dp.colour <= 5 then 1 when dg.winner = 2 AND dp.colour > 5 then 1 else null end) as totalwin,
COUNT(case when dg.winner = 2 AND dp.colour <= 5 then 1 when dg.winner = 1 AND dp.colour > 5 then 1 else null end) as totallose,
COUNT(dg.winner) as totalgames
FROM dotaplayers AS dp
LEFT JOIN gameplayers AS gp ON gp.gameid = dp.gameid and dp.colour = gp.colour
LEFT JOIN stats AS s ON s.player_lower = gp.name
LEFT JOIN dotagames AS dg ON dg.gameid = dp.gameid
LEFT JOIN games AS g ON g.id = dp.gameid
LEFT JOIN bans as b ON b.name=gp.name
WHERE MONTH(g.datetime) = 4
GROUP by gp.name
ORDER BY totalwin DESC LIMIT 0,10
Showing rows 0 - 9 (10 total, Query took 7.7552 seconds.)
I want order the most winner in 4th month (April). Then it shows id, username, totalwins, totallose, totaldraw, totalgames. The case in my query is the how to get that. The result is correct, but slow.
Assuming g.datetime is indexed, try this instead:
WHERE g.`datetime` BETWEEN 20150401 AND 20150430`
Using the MONTH function, or any other function, on the field data in the WHERE eliminates the benefits of any indexes you might have on those fields; this results in the query requiring a full scan of the values in the table.
Rearranging the order of JOINs will probably help as well:
SELECT s.id, s.player
, SUM(case
when dg.winner = 1 AND dp.colour <= 5 then 1
when dg.winner = 2 AND dp.colour > 5 then 1
else 0
end
) as totalwin
, SUM(case
when dg.winner = 2 AND dp.colour <= 5 then 1
when dg.winner = 1 AND dp.colour > 5 then 1
else 0
end
) as totallose
, COUNT(dg.winner) as totalgames -- Not, sure of the nature of dg.`winner`, a SUM might be more appropriate here as well.
FROM games AS g
INNER JOIN dotaplayers AS dp ON g.id = dp.gameid
LEFT JOIN gameplayers AS gp ON gp.gameid = dp.gameid and dp.colour = gp.colour
LEFT JOIN stats AS s ON s.player_lower = gp.name
LEFT JOIN dotagames AS dg ON dg.gameid = dp.gameid
LEFT JOIN bans as b ON b.name=gp.name
WHERE g.`datetime` BETWEEN 20150401000000 AND 20150430235959
GROUP by gp.name
ORDER BY totalwin DESC
LIMIT 0,10
;
Another thing to note: Depending on the relationship between tables, some of the intermediate joins may result in effectively multiplying the resulting totals; this can be resolved by doing the sums in subqueries and joining those instead.
I have 2 tables odds and matches :
matches : has match_id and match_date
odds : has id, timestamp, result, odd_value, user_id, match_id
I had a query that get the following information from those tables for each user:
winnings : the winning bets for each user. (when odds.result = 1)
loses : the lost bets for each user.(when odds.result != 1)
points : the points of each user.(the sum of the odds.odd_value) for each user.
bonus : for each continuous 5 winnings i want to add extra bonus to this variable. (for each user)
How to calculate bonus?
I tried to use this query and I faced a problem : (you can check it here SQL Fiddle)
the calculated bonus are not right for all the users :
first user:(winnings:13, bonus=2).
second user:(winnings:8, bonus=2)bonus here should be 1.
third user:(winnings:14, bonus=3)bonus here should be 2.
why does the query not calculate the bonus correctly?
select d.user_id,
sum(case when d.result = 1 then 1 else 0 end) as winnings,
sum(case when d.result = 2 then 1 else 0 end) as loses,
sum(case when d.result = 1 then d.odd_value else 0 end) as points,
f.bonus
FROM odds d
INNER JOIN
(
SELECT
user_id,SUM(CASE WHEN F1=5 THEN 1 ELSE 0 END) AS bonus
FROM
(
SELECT
user_id,
CASE WHEN result=1 and #counter<5 THEN #counter:=#counter+1 WHEN result=1 and #counter=5 THEN #counter:=1 ELSE #counter:=0 END AS F1
FROM odds o
cross join (SELECT #counter:=0) AS t
INNER JOIN matches mc on mc.match_id = o.match_id
WHERE MONTH(STR_TO_DATE(mc.match_date, '%Y-%m-%d')) = 2 AND
YEAR(STR_TO_DATE(mc.match_date, '%Y-%m-%d')) = 2015 AND
(YEAR(o.timestamp)=2015 AND MONTH(o.timestamp) = 02)
) Temp
group by user_id
)as f on f.user_id = d.user_id
group by d.user_id
I am not sure how your result related to matches table,
you can add back WHERE / INNER JOIN clause if you need.
Here is link to fiddle
and the last iteration according to your comments:
And here is a query:
SET #user:=0;
select d.user_id,
sum(case when d.result = 1 then 1 else 0 end) as winnings,
sum(case when d.result = 2 then 1 else 0 end) as loses,
sum(case when d.result = 1 then d.odd_value else 0 end) as points,
f.bonus
FROM odds d
INNER JOIN
(
SELECT
user_id,SUM(bonus) AS bonus
FROM
(
SELECT
user_id,
CASE WHEN result=1 and #counter<5 AND #user=user_id THEN #counter:=#counter+1
WHEN result=1 and #counter=5 AND #user=user_id THEN #counter:=1
WHEN result=1 and #user<>user_id THEN #counter:=1
ELSE
#counter:=0
END AS F1,
#user:=user_id,
CASE WHEN #counter=5 THEN 1 ELSE 0 END AS bonus
FROM odds o
ORDER BY user_id , match_id
) Temp
group by user_id
)as f on f.user_id = d.user_id
group by d.user_id