both tables table_a and table_b has to be compared using employee_id column which is present in both of them.
Both Tables have MILLIONS of rows.
3 Results must be displayed-
employee_id present in table_a but not present in table_b.
vice-versa.
there would be case when a particular employee_id is present in both tables but the data in other columns for that employee_id might not be same in both tables. These rows must also be displayed showing the columns where there is data mismatch.
Since there are millions of rows in both tables, the process must be fast so that both tables can be compared quickly.
I am using MySQL server to write query.
This is rather tricky, but here is an example assuming that employee_id is unique in each table:
select employee_id,
(case when max(which) = 'a' then 'A-only'
when min(which) = 'b' then 'B-only'
else 'both'
end) as which,
concat_ws(',',
(case when count(*) = 2 and not min(col1) <=> max(col1) then 'col1' end),
(case when count(*) = 2 and not min(col2) <=> max(col2) then 'col2' end)
) as differences
from ((select 'a' as which, employee_id, col1, col2
from a
) union all
(select 'b' as which, employee_id, col1, col2
from b
)
) ab
group by employee_id;
Note that this uses the NULL-safe comparison operator.
Related
I want combine mutiple rows columns as single record in MySQL
For eg: Actual Data
--------------------------------------------------------------
id name Loantype Amount
--------------------------------------------------------------
1 ABC 1 500000
2 ABC 2 3500000
3 XYZ 1 250000
4 XYZ 2 2500000
I tried with the following query
SELECT
id,
(
CASE Loantype
WHEN 1 THEN Amount
ELSE NULL
END
) AS PersonalLoan,
(
CASE Loantype
WHEN 2 THEN Amount
ELSE NULL
END
) AS HomeLoan
FROM
customer
WHERE
name = 'ABC'
but result comes as below
--------------------------------------------------------------
id name PersonalLoan HomeLoan
--------------------------------------------------------------
1 ABC 500000 NULL
1 ABC NULL 2500000
Expected Result set
--------------------------------------------------------------
id name PersonalLoan HomeLoan
--------------------------------------------------------------
1 ABC 500000 3500000
You can self-join the table so that you will be able to combine the 2 kinds of loans into single row:
SELECT t1.id, t1.name, t1.amount AS personal, t2.amount AS home
FROM customer AS t1
LEFT JOIN customer AS t2 ON t1.name = t2.name AND t2.loantype = 2
WHERE t1.loantype = 1
If you are looking for just 1 user - you can speedup the query by limiting the size of the JOIN:
SELECT t1.id, t1.name, t1.amount AS personal, t2.amount AS home
FROM (SELECT * FROM customer WHERE name = "ABC" AND loantype = 1) AS t1
LEFT JOIN customer AS t2 ON t1.name = t2.name AND t2.loantype = 2
Note that generally speaking you shouldn't denormalize data in SQL (e.g. converting rows to columns: SQL is row-oriented, not column-oriented) - I assume this is to simplify display logic - just be careful of using queries like this when you want to pass meaningful data to other parts of the database rather than directly to the user.
You need to GROUP BY first.
You can also simplify your CASE expressions:
ELSE NULL is always implicit and if the CASE WHEN expression is an equality comparison then you can use the simpler switch-style syntax.
So CASE WHEN a = b THEN c ELSE NULL END can be simplified to CASE a WHEN b THEN c END.
I've added COALESCE so the query will return 0 for the SUM aggregates if there are no matching rows instead of NULL. Note that COUNT (unlike SUM) generally doesn't need to be wrapped in a COALESCE (though I forget precisely how MySQL handles this - it also depends on what version of MySQL you're using and what strict-mode and ANSI/ISO-compliance options are enabled).
Note that the database design you posted seems to allow the same customer.name to have multiple loans of the same loantype.
You can avoid this by adding a UNIQUE CONSTRAINT or UNIQUE INDEX or use a Composite Primary Key:
CREATE UNIQUE INDEX UX_name_loantype ON customer ( name, loantype )
To prevent those rows from causing issues in this query, this uses a SUM and a COUNT to make it clear to readers that the data is an aggregate over multiple rows:
SELECT
name,
COUNT( CASE Loantype WHEN 1 THEN 1 END ) AS CountPersonalLoans,
COALESCE( SUM( CASE Loantype WHEN 1 THEN Amount END ), 0 ) AS SumPersonalLoans,
COUNT( CASE Loantype WHEN 2 THEN 1 END ) AS CountHomeLoans,
COALESCE( SUM( CASE Loantype WHEN 2 THEN Amount END ), 0 ) AS SumHomeLoans
FROM
customer
GROUP BY
name
To maximise query code reuse if you want to filter by name then convert this to a VIEW - or if it's a one-off then make it a CTE query, like so:
WITH aggs AS (
SELECT
name,
COUNT( CASE Loantype WHEN 1 THEN 1 END ) AS CountPersonalLoans,
COALESCE( SUM( CASE Loantype WHEN 1 THEN Amount END ), 0 ) AS SumPersonalLoans,
COUNT( CASE Loantype WHEN 2 THEN 1 END ) AS CountHomeLoans,
COALESCE( SUM( CASE Loantype WHEN 2 THEN Amount END ), 0 ) AS SumHomeLoans
FROM
customer
GROUP BY
name
)
SELECT
name,
CountPersonalLoans,
SumPersonalLoans,
CountHomeLoans,
SumHomeLoans
FROM
aggs
WHERE
name = 'ABC'
ORDER BY
name
Hi I am trying to create a mysql query that will convert multiple rows in a table to unique columns
The data I have is as follows:
The table I would like to see is as follows:
GEID|Username|First Name|Last Name|Email|Country|Dealer Code
The statement which could be used is
UPDATE table_name
SET column1 = value 1 , column 2 = value 2 ...
Where condition;
Sorry but my SQL isn't the best but hope the statement helps
This is a real pain, because you don't have an id identifying groups that are the same. In other words, you are missing the entity id.
I think you can construct one by counting the number of GEID values before any given row. The rest is just aggregation:
select max(case when fieldname = 'GEID' then fieldData end) as GEID,
max(case when fieldname = 'Username' then fieldData end) as Username,
. . .
from (select t.*,
(select count(*) from t t2 where t2.id <= t.id and t2.fieldName = 'GEID'
) as grp
from t
) t
group by grp;
I have 2 table TblA and TblB. TblA has columns A B C ...Z and TblB also has A B C D...Z columns. I want to have the columns' name where TblA and TblB data differ for a particular row. Assume column A is the primary key and never changes i.e can perform join on column A.
Sadly, when comparing a version/default/history table to another, there's no better way to do in in a query than column by column
select
case when a.B!=b.B then 'B' else null end,
case when a.C!=b.C then 'C' else null end,
....(repeat for each column)
from tbla a
left join tblb b
on a.A=b.A
Keep in mind that if columns can contain null, null=anything is null (not true or false), so you might need to wrap each column in ifnull() to compare
EDIT: I have not handled nulls here. You can use a function for your database to handle those.
This is sample query for 3 columns. You can extend it for other columns.
with s1 (A ,B ,C) as
(select 1,22,23 from dual union
select 2,45,47 from dual union
select 3,66, 68 from dual
),
t1 (A ,B ,C) as
(select 1,23,24 from dual union
select 2,45,47 from dual union
select 3,66, 69 from dual
),
chng as(
select s1.*, case when s1.B = t1.B then '' else 'B' end as B1 , case when s1.C = t1.C then '' else 'C' end as C1
from s1 ,t1
where s1.A = t1.A
)
,chkChange as(
select a, (B1||C1) as Changes from chng
)
select * from chkChange
where changes is not null
Given this 2 million+ entry table,ID auto incrementing, and index1(MainId,SubID,Column1)
index2(MainId,SubID,Column2):
ID MainID SubID Column1 Column2
--------------------------------------
1 1 A 1A_data_1
2 1 A 1A_data_2
3 2 B 2B_data_1
4 2 B 2B_data_2
5 1 A ignore_me
6 1 A 1A_data_3
I can get the row ID that contains the desired column value using indexes with:
Select max(ID)
From table where column1 is not null and column1 <>'ignore_me'
Group By MainID,SubID
Select max(id)
From table where column2 is not null and column2 <>'ignore_me'
Group By MainID,SubID
But what I can't do is find an efficient way to join these against a MainID,SubID group by to get these results:
MainID SubID Column1 Column2
--------------------------------
1 A 1A_data_1 1A_data_3
2 B 2B_data_1 2B_data_2
I've tried a lot of different approaches, but nothing that doesnt take forever. Do I need another index? I feel like I'm overlooking something simple as the group by queries are super fast. Can anyone point me in the right direction?
You can calculate the two IDs in a single query using conditional aggregation:
SELECT
MainID,
SubID,
MAX(CASE WHEN Column1 <> 'ignore_me' THEN ID END) AS ID1,
MAX(CASE WHEN Column2 <> 'ignore_me' THEN ID END) AS ID2
FROM atable
GROUP BY
MainID,
SubID
;
You could also explicitly add AND ColumnN IS NOT NULL to the WHEN conditions but that's not necessary, NULL values would be ignored anyway.
Now you can simply do two left joins with the above subquery as a derived table:
SELECT
tm.MainID,
tm.SubID,
t1.Column1,
t2.Column2
FROM (
SELECT
MainID,
SubID,
MAX(CASE WHEN Column1 <> 'ignore_me' THEN ID END) AS ID1,
MAX(CASE WHEN Column2 <> 'ignore_me' THEN ID END) AS ID2
FROM atable
GROUP BY
MainID,
SubID
) tm
LEFT JOIN atable t1 ON tm.ID1 = t1.ID
LEFT JOIN atable t2 ON tm.ID2 = t2.ID
;
UPDATE (converting to a view, in answer to comments)
So far I can see only one alternative that would be VIEW-friendly:
SELECT
MainID,
SubID,
(
SELECT Column1
FROM atable
WHERE MainID = t.MainID
AND SubID = t.SubID
AND Column1 <> 'ignore_me'
ORDER BY ID DESC
LIMIT 1
) AS ID1,
(
SELECT Column2
FROM atable
WHERE MainID = t.MainID
AND SubID = t.SubID
AND Column2 <> 'ignore_me'
ORDER BY ID DESC
LIMIT 1
) AS ID2
FROM atable t
GROUP BY
MainID,
SubID
;
This query may be slower than the previous one, though: it uses two correlated subqueries, and I'm not sure if queries (or, in particular, views) with correlated subqueries can be efficient in MySQL. Proper indexing might help. In general, you'll probably need to test this for yourself.
Hi guys I've got this table structure
field 1 field 2
---------------------------------------------
1 1
1 2
2 1
Then I want it to be like this when selecting Key Field2 = 1
field 1 field 2
---------------------------------------------
2 1
I don't want to return field1 = 1 because It contains different values field1 IN (1,2)
Thanks you so much.
Your post seems unclear because I think you mixed up column names in certain parts of your description. However, judging by your sample output, I'm going to assume you mean the following:
Select rows from the table where field2 contains identical values for the same field1.
If you only need to output field1 and field2, you could do the following:
SELECT field1, MAX(field2) AS field2
FROM atable
GROUP BY field1
HAVING COUNT(DISTINCT field2) = 1
You can omit DISTINCT if your table cannot hold duplicate pairs of (field1, field2).
However, if there are more columns in the table and some or all of them need to be returned too, you could first just get the field1 values like above, then join that row set back to atable to get complete rows, like this:
SELECT t.* /* or specify the necessary columns explicitly */
FROM atable AS t
INNER JOIN (
SELECT field1
FROM atable
GROUP BY field1
HAVING COUNT(DISTINCT field2) = 1
) s ON t.field1 = s.field1
Again, DISTINCT can be omitted, as explained above.
Since you are using SQL Server 2008, you could also use windowed aggregating. If your table doesn't contain duplicates of (field1, field2), you could use the following:
;
WITH counted AS (
SELECT
*,
cnt = COUNT(*) OVER (PARTITION BY field1)
FROM atable
)
SELECT
field1,
field2,
…
FROM counted
WHERE cnt = 1
But if the duplicates are allowed, you'll need to use a slightly different approach, because there's no windowing counterpart for COUNT(DISTINCT …). Here's what you could try:
;
WITH counted AS (
SELECT
*,
f2min = MIN(field2) OVER (PARTITION BY field1),
f2max = MAX(field2) OVER (PARTITION BY field1)
FROM atable
)
SELECT
field1,
field2,
…
FROM minmaxed
WHERE f2min = f2max
That is, you are getting the minimum and the maximum value of field2 for every field1 value. Then you are filtering out rows where f2min is not the same as f2max, because that would imply that there are different field2 values in the group.
SELECT * FROM table WHERE field_2 = 1 AND field_1 <> 1