Locate duplicates with a temp table - duplicates

I am trying to locate all valid/ non expired duplicate ids from a temp table that is created. When I run this script I get 240,000 results. Only about half of the results are duplicates (when transferring them over into excel) What am I missing to get only duplicate results.
I have tried looking in previous questions about duplicates and none of them have worked. The HAVING COUNT (E.ID_NUMBER) > 1 seems to be a common way to do it.
My knowledge is limited in the use of temp tables and not sure if that has something to do with why I am not getting the results need.
I have tried putting the HAVING COUNT(E.ID_NUMBER) > 1 in the 2nd part of the script and only get about 700 results (there should be more than this)
Any help/ suggestions would be greatly appreciated.
IF OBJECT_ID('TempDb..#DUPLICATE_ID') IS NOT NULL DROP TABLE #DUPLICATE_ID ;
CREATE TABLE #DUPLICATE_ID
(ID CHAR(9))
INSERT INTO
#DUPLICATE_ID
SELECT
distinct
E.ID_NUMBER
FROM
EXAMPLEDb..ENROLLEES E
WHERE
E.ID_NUMBER IS NOT NULL
GROUP BY
E.ID_NUMBER
HAVING
COUNT(E.ID_NUMBER) > 1
-----------------------------------------------
SELECT
DI.ID, E.LAST_NAME, E.FIRST_NAME, E.ADDRESS1, E.CITY, E.STATE, E.ZIP
FROM
EXAMPLEDb..ENROLLEES E
LEFT JOIN #DUPLICATE_ID DI ON E.ID_NUMBER = DI.ID
WHERE
DI.ID IS NOT NULL
and is null(e.TERMINATION_DATE,'1/1/2020') > '4/27/2016'
group BY
DI.ID, E.LAST_NAME, E.FIRST_NAME, E.ADDRESS1, E.CITY, E.STATE, E.ZIP

Related

Case statement-How to achieve the target format in MYSQL?

My source data looks something like this:
enter image description here
Now, for every distinct PK, I want to extract only data related to Z99 code, which has closest date with either Z10 or Z39 Codes and additionally calculate the difference of the z99 and z10/39 date too.
Expected result:enter image description here
Can someone guide me on how to achieve this using case statement or any other better way?
Thanks.
I tried the below code and it worked for me.
Note:
I have created intermediate tables to get the results. If you don't have permission, you can go ahead with creating temptable or CTE
If diff is same for z39 and z10, then both will be taken as you did not specify any priority
/* tbl refers to your input table */
/* daten refers to date */
create table ads as
select
a.daten,
b.daten as z99_dt,
a.pk, a.code as codea,
b.code as codeb,
(a.daten - b.daten) as diff
from tbl as a
inner join (select distinct code, pk, daten
from tbl
where code='z99') as b
on a.code<>b.code and a.pk = b.pk
where codea IN ('z39', 'z10');
In the above snippet, an ads would be created with necessary calculations.
Following snippet will find minimum of diff
create table ref as
select pk, min(diff) as min_diff
from ads
group by pk;
Following snippet joins ref table and ads table to get the final result
select a.pk, codea as st_code, daten as st_dt, codeb as end_code, z99_dt as end_dt, diff from ads as a
inner join ref as b
on a.pk=b.pk and a.diff=b.min_diff;
Please try this out and let me know if this works!

Get only one entry for columns that have the same id

I have two tables:
Table 1:
employer(id_em, nom_em)
Table 2:
the second one is
travailler(id_em, id_depart, date_chnge)
a certain id_em can have multiple entries in the travailler table but I want only to get a table with the latest entry of id_em, so basically the one with the biggest date.
so the result of my query should be something like this :
(id_em, nom_em, id_depart, date_change)
but only one entry for every id_em, the one that has the latest date
I've tried this but it shows all of them, I don't know what's wrong
SELECT employe.nom_em,
travailler.id_em,
travailler.id_depart,
Max(travailler.date_chnge)
FROM employe
INNER JOIN travailler
ON employe.id_em = travailler.id_em
GROUP BY employe.id_em
Please help!
SELECT
e.nom_em,
t.id_em,
t.id_depart,
t.date_chnge
FROM
employe e
INNER JOIN
travailler t
ON e.id_em=t.id_em
WHERE
t.date_chnge = (select max(tr.date_chnge) from travailler tr where tr.id_em = e.id_em)

Crosstab Query from a Query with a Subquery

Can anyone help me with a problem I am having with a CrossTab Query to compare current prices from our suppliers?
The select Query that it works from has a sub query that selects on only the most resent prices for our price comparison and this works perfectly for the data we need, see below:
qryPriceComp:
SELECT tblPriceComp.SupplyerID, tblPriceComp.ProductID,
tblPriceComp.Effdt, tblPriceComp.CostPrice,
tblProduct.Product, tblSupplier.Supplier
FROM tblSupplier INNER JOIN
(tblProduct INNER JOIN tblPriceComp ON tblProduct.ProductID = tblPriceComp.ProductID)
ON tblSupplier.SupplierID = tblPriceComp.SupplyerID
WHERE (((tblPriceComp.Effdt) In
(SELECT MAX(B.EffDt) AS MaxOfDt FROM tblPriceComp AS B
WHERE tblPriceComp.ProductID=B.ProductID
AND tblPriceComp.SupplyerID=B.SupplyerID
AND B.EffDt <= Date()+1)));
This is then used for the crosstab query
qryPriceComp_Crosstab:
TRANSFORM Sum(qryPriceComp.CostPrice) AS SumOfCostPrice
SELECT qryPriceComp.Product
FROM qryPriceComp
GROUP BY qryPriceComp.Product
ORDER BY qryPriceComp.Product, qryPriceComp.Supplier
PIVOT qryPriceComp.Supplier;
But when run it gives an error that both tblPriceComp.ProductID and tblSupplier.SupplierID are invalid. I have tried adding them as perimeters but when run this gives a box to enter the ID numbers which is no good as we want to see all productIDs and SupplyerIDs. If anyone can help it would be greatly appreciated!
Not a real solution, but a usable workaround:
Change qryPriceComp to a INSERT INTO tempTable query, and then base the crosstab query on tempTable.
Before each INSERT run, a DELETE * FROM tempTable must be executed.

MySQL Query gets too complex for me

I'm trying to write a MYSQL Query that updates a cell in table1 with information gathered from 2 other tables;
The gathering of data from the other 2 tables goes without much issues (it is slow, but that's because one of the 2 tables has 4601537 records in it.. (because all the rows for one report are split in a separate record, meaning that 1 report has more than 200 records)).
The Query that I use to Join the two tables together is:
# First Table, containing Report_ID's: RE
# Table that has to be updated: REGI
# Join Table: JT
SELECT JT.report_id as ReportID, REGI.Serienummer as SerialNo FROM Blancco_Registration.TrialTable as REGI
JOIN (SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92) AS JT ON JT.Value_string = REGI.Serienummer
WHERE REGI.HardwareType="PC" AND REGI.BlanccoReport=0 LIMIT 100
This returns 100 records (I limit it because the database is in use during work hours and I don't want to steal all resources).
However, I want to use these results in a Query that updates the REGI table (which it uses to select the 100 records in the first place).
However, I get the error that I cannot select from the table itself while updateing it (logically). So I tried selecting the select statement above into a temp table and than Update it; however, then I get the issue that I get to much results (logically! I only need 1 result and get 100) however, I'm getting stuck in my own thougts.. I ultimately need to fill the ReportID into each record of REGI.
I know it should be possible, but I'm no expert in MySQL.. is there anybody that can point me into the right direction?
Ps. fixing the table containing 400k records is not an option, it's a program from an external developer and I can only read that database.
The errors I'm talking about are as follows:
Error Code: 1093. You can't specify target table 'TrialTable' for update in FROM clause
When I use:
UPDATE TrialTable SET TrialTable.BlanccoReport =
(SELECT JT.report_id as ReportID, REGI.Serienummer as SerialNo FROM Blancco_Registration.TrialTable as REGI
JOIN (SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92) AS JT ON JT.Value_string = REGI.Serienummer
WHERE REGI.HardwareType="PC" AND REGI.BlanccoReport=0 LIMIT 100)
WHERE TrialTable.HardwareType="PC" AND TrialTable.BlanccoReport=0)
Then I tried:
UPDATE TrialTable SET TrialTable.BlanccoReport = (SELECT ReportID FROM (<<and the rest of the SQL>>> ) as x WHERE X.SerialNo = TrialTable.Serienummer)
but that gave me the following error:
Error Code: 1242. Subquery returns more than 1 row
Haveing the Query above with a LIMIT 1, gives everything the same result
Firstly, your query seems to be functionally identical to the following:
SELECT RE.report_id ReportID
, REGI.Serienummer SerialNo
FROM Blancco_Registration.TrialTable REGI
JOIN Blancco_new.mc_report_Entry RE
ON RE.Value_string = REGI.Serinummer
WHERE REGI.HardwareType = "PC"
AND REGI.BlanccoReport=0
AND RE.path_id=92
LIMIT 100
So, why not use that?
EDIT:
I still don't get it. I can't see what part of the problem the following fails to solve...
UPDATE TrialTable REGI
JOIN Blancco_new.mc_report_Entry RE
ON RE.Value_string = REGI.Serinummer
SET TrialTable.BlanccoReport = RE.report_id
WHERE REGI.HardwareType = "PC"
AND REGI.BlanccoReport=0
AND RE.path_id=92;
(This is not an answer, but maybe a pointer towards a few points that need further attention)
Your JT sub query looks suspicious to me:
(SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92
GROUP BY RE.report_id)
You use group by but don't actually use any aggregate functions. The column RE.Value_string should strictly be something like MAX(RE.Value_string) instead.

how to perform a table join when the other table has many rows corresponding to one row in mother table

I have two table --> tbl_book_details and tbl_table_traking
tbl_book_details has columns bd_book_code,
bd_isbn,
bd_title,
bd_edition,
bd_author,
bd_publisher,
bd_supplier,
bd_page,
bd_price_type,
bd_cost_price,
bd_price,
bd_Tax,
bd_covering,
bd_availability,
bd_keywords,
bd_notes,
bd_details,
bd_news_latter,
bd_etDate,
bd_weight,
bd_expire_date,
bd_status
tbl_table_traking has columns
tt_id,
tt_action,
tt_table,
tt_record_id,
tt_on_date,
tt_user,
tt_status
the process is a trigger is defined on tbl_book_details which in case of insert/modify insert the data in tbl_table_traking for traking when and who has modified the records.
till now i have been using following query which is not a join -->
SELECT
tbl_books_details.bd_book_code AS bkid,
tbl_books_details.bd_isbn,
tbl_books_details.bd_title AS title,
-- This part is what I believe is slowing down my query
(SELECT
tt_on_date
FROM
tbl_table_tracking
WHERE tt_action = 'MODIFY'
AND tt_record_id = tbl_books_details.bd_book_code ORDER BY tt_on_date) AS bd_etdate
it was working fine when the records count were below 3 million, but now script time out is occurring.
I have made the index on tbl_table_traking on 'tt_ondate' and on tt_action,
If there any way i can convert it to a join or improve the performance?
the table traking query is returning the mostrecent date on which the record was modified.
my database is in mysql.
You should be able to do something like this.
SELECT
tbl_books_details.bd_book_code AS bkid,
tbl_books_details.bd_isbn,
tbl_books_details.bd_title AS title,
tbl_table_traking.tt_on_date
FROM
tbl_books_details
INNER JOIN tbl_table_tracking
tbl_books_details.bd_book_code = tbl_table_tracking.tt_record_id
WHERE tt_action = 'MODIFY';
Hope this helps...