Data inconsistencies between two tables - mysql

I have an SQL question in which I am struggling to understand and find relevant resources to help me.
The question is:
"Write an SQL query to identify data inconsistencies between two tables."
I need to compare the following tables of data:
AssetManager
AssetManagerName
John Doe
Joe Smith
Dave Grey
Lisa Sparks
Kate Green
Trip
PropertyCode
AssetManagerName
Date
P001
John Doe
2022-01-22
P001
Joe Smith
2022-01-19
P002
Dave Grey
2022-02-25
P002
John Doe
2022-04-23
P003
Kate Greens
2022-02-25
P004
Joe Smith
2022-05-29
P002
Dave Grey
2022-01-25
P001
John Doe
2022-02-24
Image translated to text from Original Source
What are the inconsistencies in this case? Is it maybe that "Kate Green" is in the AssetManager table, and you have "Kate Greens" in the Trip table? That's the only thing I can see.
What MySQL commands could I use that would help me to achieve this query?

In SQL, when we talk about inconsistencies, we are generally referring to data that would not correctly translate into a normalised form, when we try to join between tables this would result in missing data or orphaned rows. Commonly inconsistencies arise when there is no referential constraints in a schema to maintain consistency. In such cases simple spelling mistakes can easily creep into the dataset, but entirely wrong values could also be used. In this case, If there is a table that represents all the possible Asset Managers, then we would expect that in other tables that refer to Asset Managers that only values from the Asset Managers table would be used, spelling mistakes and entirely missing names will be treated the same.
In the Trip Table we can identify inconsistency with the AssetManager table by looking for any records in Trip that do not have a match in AssetManager using the AssetManagerName column.
One simple way to do this is to use an OUTER JOIN and to exclude all the matches:
SELECT Trip.*
FROM Trip
LEFT OUTER JOIN AssetManager ON Trip.AssetManagerName = AssetManager.AssetManagerName
WHERE AssetManager.AssetManagerName IS NULL
This returns the following result: (See db-fiddle)
PropertyCode
AssetManagerName
Date
P003
Kate Greens
2022-02-25
The LEFT OUTER JOIN (or LEFT JOIN) will return all the rows from the Trip table, even if there is no corresponding match in the AssetManager table on the AssetManagerName column. For the rows that do not match, all the values for the AssetManager table in the result set will be NULL.
We can then use a WHERE clause to exclude all the matches data records and only return those records that DO NOT MATCH, we do this by only allowing where AssetManager.AssetManagerName has a null value.
There are no records in Trip with a legitimate null value in the AssetManagerName, the null only exists in the recordset at a result of the LEFT OUTER JOIN evaluation.
You could also use a NOT EXISTS Clause, this syntax is sometimes easier to read and identify the intent, we want to find the records that DO NOT MATCH. But specifically in MySQL it's execution plan generally less efficient than the LEFT OUTER JOIN expression above.
SELECT Trip.*
FROM Trip
WHERE NOT EXISTS (
SELECT AssetManager.AssetManagerName
FROM AssetManager
WHERE AssetManager.AssetManagerName = Trip.AssetManagerName
)
Another variation of this is to use NOT IN. For this query we first evaluate a list of possible values for AssetManagerName and use that to identify the values that do not match.
This is helpful when there might be some legitimate null values in either of the tables for AssetManagerName as IN handles NULL values differently to EXISTS
SELECT Trip.*
FROM Trip
WHERE Trip.AssetManagerName NOT IN (
SELECT AssetManager.AssetManagerName
FROM AssetManager
)
For an interesting analysis of these options and performace considerations have a read over this article:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

Related

How to loop through this specifig relation model in SQL?

I have a database with a table called "Relations" that looks as follows:
Relations (PersonId1, PersonId2, RelationTypeId)
The primary key is (PersonId1, PersonId2, RelationTypeId)
There are two other tables, referencing to the foreign keys but that does not really matter here.
So a relation is defined for example (Mary, Andre, 3) where 3 would be referenced to an other table and would mean for example ("a friend").
My requirement is to see all friends of a specifig person but also the friends of the persons friends, so not only the first layer but also the second.
For example this would be the relation table
Andre Mary 3
Mary Carl 3
Chris James 3 (irrelevent in our case)
So i want a query where I have the PersonId of Andre and the RelationTypeId. The result should be this:
Andre Mary 3
Mary Carl 3
In my understanding it is not possible to build a query that would give this result, but i am not sure, that is why i want to know it.
Hope you understand my question, thanks in advance.
Below query will return the list of friends of person1 and their friends.
select
distinct personId2
from
relations
where
personId1 in (select distinct personId2 from relations where personId1 = <person_name>)
or personId1 = <person_name>
It’s a recursive CTE (common table expression). It’ll process query results multiple times because the main SELECT query calls the CTE part recursively. CTE is a part of a SELECT query (starts with WITH). This code will return the data subsets you’re looking for.
I use it to boost data access efficiency, when I need to, e.g., select, paginate, or display page rows linked with a specific page, etc. It’s works in actual for MySQL 8.

MYSQL select query on multiple tables

I'm not seeing a clean way to write this query without subselects which I avoid because they are generally not portable, and harder to read and debug than individual queries.
Table A has exactly 2 foreign keys to table B, which are always different, but always defined. Sort of like:
MARRIAGE_TABLE
M_KEY
LAST_NAME
PERSON_HUSBAND_FK
PERSON_WIFE_FK
PERSON_TABLE
PERSON_KEY
SEX
FIRST_NAME
The PERSON_HUSBAND_FK will always point at a SEX=MALE, and the WIFE_FK will always point at a female. There will always be one of each. (this is in no way a statement on same-sex marriage BTW I'm all for it)..
I want to create a result like:
MARRIAGE HUSBAND WIFE
-------- ------- ----
SMITH TOM KATHY
JONES BILL EVE
My current approach is to get all records from the MARRIAGE TABLE and store them in a hash. Then I augment the hash with names {wife_name} and {husband_name} using 2 more queries using the husband and wife FK's. Then I format and print the hash. It works, but I'm not wild about 3 queries per row.
I'm not sure I ever encountered a table having >1 FK to another table. I've done years of table-design, but I'm not really sure this design even meets normalization. It seems like no, to me. Like they created a many-many without an intermediate table; a cheat?
Just join table PERSON_TABLE twice:
SELECT m.last_name AS marriage, p1.first_name AS husband, p2.first_name AS wife
FROM marriage_table m
INNER JOIN person_table p1 ON p1.person_key = m.person_husband_fk
INNER JOIN person_table p2 ON p2.person_key = m.person_wife_fk

Finding non-matches on same table in MS Access

I'm a bit of a novice in MS Access but I've started doing some data validation at work and figured it was time to get down to a more simplified way of doing it.
First time posting, I'm having an issue trying to "only" display non-matching values within the same table i.e Errors
I have a table (query) where I have employee details one from one database and one from another. Both have the same information in them however there is a some details in both which are not correct and need to be updated. As an example see below:
Table1
Employee ID Surname EmpID Surname1
123456789 Smith 123456789 Smith
654987321 Daniels 654987321 Volate
987654321 Hanks 987654321 Hanks
741852963 Donald 741852963 Draps
Now what I want to identify is the ones that are not matched by "Surname" and "Surname1"
This should be Employee ID
741852963 Donald 741852963 Draps
654987321 Daniels 654987321 Volate
I'm going to append this to an Errors table with I can list all the errors where values don't match.
What I've tried is the following:
Field: Matches: IIf([Table1].[Surname]<>[Table1].[Surname1],"Yes","No")
This doesn't seem to work as all the results display as Yes and I know for a fact there are inconsistencies.
Does anyone know what or how to do this? Ask any questions if need be.
Thanks
UPDATE
Ok I think it might be better if I gave you all the actual names of the columns. I thought it would be easier to simplify it but maybe not.
Assignment PayC HRIS Assignment No WAPayCycle
12345678 No Payroll 12345678 Pay Cycle 1
20001868 SCP Pay Cycle 1 20001868 SCP Pay Cycle 1
20003272-2 SCP Pay Cycle 1 #Error
20014627 SCP Pay Cycle 1 20014627 SCP Pay Cycle 1
So this gives and idea of what I am doing and the possible errors I need to counter for. The first one has a mismatch so I expect that to Error. The 3rd row has a Null value in one column and a Null in another however one is #Error where the other is just blank. The rest are matched.
LINK TO SCREEN DUMPS
https://drive.google.com/open?id=0B-5TRrOketfyb0tCbElYSWNSM1k
This option handles Errors an Nulls in [HRIS Assignment No]:
SELECT * , IIf([Assignment]<>IIf(IsError([HRIS Assignment No]),"",Nz([HRIS Assignment No]​),""),"Yes","No") As Err
FROM [pc look up]
WHERE [Assignment]<>IIf(IsError([HRIS Assignment No]),"",Nz([HRIS Assignment No]​),"")
This should work:
SELECT *
FROM Table
WHERE EmployeeID = EmpID
AND Surname <> Surname1
OR Len(Nz(Surname,'')) = 0
OR Len(Nz(Surname1,'')) = 0
Kind regards,
Rene
In your question you state "one from one database and one from another".
Assuming you start with two tables (you've shown us a query joining the four fields together?) then this query would work:
SELECT T1.[Employee ID]
,T1.Surname
,T2.EmpID
T2.Surname1
FROM Table1 T1 INNER JOIN Table2 T2 ON T1.[Employee ID] = T2.EmpID AND
T1.Surname <> T2.Surname1
ORDER BY T1.[Employee ID]
An INNER JOIN will give you the result you're after. A LEFT JOIN will show all the values in Table1 (aliased as T1) and only those matching in Table2 (aliased as T2) - the other values will be NULL, a RIGHT JOIN will show it the other way around.

Mysql two ways to select where. Which way uses less resources and is faster?

For example have url like domain.com/transport/cars
Based on the url want to select from mysql and show list of ads for cars
Want to choose fastest method (method that takes less time to show results and will use less resources).
Comparing 2 ways
First way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
1 | Text1 car
2 | Text1xx lorry
1 | Text another car
FirstLevSubcat Type is int
Then another mysql table subcategories
Id | NameOfSubcat
---------------------------------
1 | cars
2 | lorries
3 | dogs
4 | flats
Query like
SELECT Text, AndSoOn FROM transport
WHERE
FirstLevSubcat = (SELECT Id FROM subcategories WHERE NameOfSubcat = `cars`)
Or instead of SELECT Id FROM subcategories get Id from xml file or from php array
Second way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
cars | Text1 car
lorries | Text1xx lorry
cars | Text another car
FirstLevSubcat Type is varchar or char
And query simply
SELECT Text, AndSoOn FROM transport
WHERE FirstLevSubcat = `cars`
Please advice which way would use less resources and takes less time to show results. I read that better select where int than where varchar SQL SELECT speed int vs varchar
So as understand the First way would be better?
The first design is much better, because you separate two facts in your data:
There is a category 'cars'.
'Text1 car' is in the Category 'cars'.
Imagine, in your second design you enter another car, but type in 'cors' instead of 'cars'. The dbms doesn't see this, and so you have created another category with a single entry. (Well, in MySQL you could use an enum column instead to circumvent this issue, but this is not available in most other dbms. And anyhow, whenever you want to rename your category, say from 'cars' to 'vans', then you would have to change all existing records plus alter the table, instead of simply renaming the entry once in the subcategories table.)
So stay away from your second design.
As to Praveen Prasannan's comment on sub queries and joins: That is nonsense. Your query is straight forward and good. You want to select from transport where the category is the desired one. Perfect. There are two groups of persons who would prefer a join here:
Beginners who simply don't know better and always join from the start and try to sort things out in the end.
Experienced programmers who know that some dbms often handle joins better than sub-queries. But this is a pessimistic habit. Better write your queries such that they are easy to read and maintain, as you are already doing, and only change this in case grave performance issues occur.
Yup. As the SO link in your question suggests, int comparison is faster than character comparison and yield faster fetch. Keeping this in mind, first design would be considered as better design. However sub queries are never recommended. Use join instead.
eg:
SELECT t.Text, t.AndSoOn FROM transport t
INNER JOIN subcategories s ON s.ID = t.FirstLevSubcat
WHERE s.NameOfSubcat = 'cars'

Check if a cat and subCat exist in two different table using a single sql statement

I have two tables...
#mainCats
catId | catName
----------------
1 | Furniture
2 | Cutlery
#subCats
subCatId | subCatName | catId | catName
-------------------------------------
1 | Tables | 1 | Furniture
2 | Chairs | 1 | Furniture
3 | Knives | 2 | Cutlery
When adding items to a third table - items, I need check if a valid category and subcategory exist.
The way the data comes in right now, is like this:
http://www.example.com/additem/?cat=1&sub=2&add=Table_Lamps
And the way it's done is like this. This is just a stripped down example.
1st:
Select count(catId) as hasCat from mainCats where catId=1
if(hasCat == 1)
{
Select count(subCatId) as hasSubCat from subCats where subCatId=2 and catId=1;
if(hasSubCat == 1)
{
//Do the adding to the table here
}else{
echo 'A subCategory was not found';
}
}else{
echo 'A category was not found';
}
Is there a good way to check if a cat and subcat exist in one single step rather that all this much of code.
This thing comes from an old site from 1998.
If you can accept of providing a more generic error message like Category or subcategory not found, then you can skip the first select. You could even skip both select: assuming that the third table has foreign keys pointing to catId and subcatId, you can perform the insert without any check, catching exceptions to understand whether cat or subcat is missing.
If you need to distinguish the two error cases, you can perform the second select before the first one: if the second select succeds you don't need to execute the first one, if the second select fails (no category) you execute the first one to understand whether at least the main category exists. In this way, assuming that in most cases categories exist, you avoid 50% of queries. Again, you could also perform the insert without any check, catch exceptions and the perform the first select to understand whether at least the main category exists. In this way you avoid almost all queries but you can still distinguish the two error cases.
Of course I'm assuming that you goal is to reduce queries for better efficiency, not to remove some line of code.
Maybe you want this:
SELECT COUNT(mc.catId) AS hasCat COUNT(subCatId) AS hasSubCat FROM mainCats mc
LEFT JOIN subCats
WHERE subCatId=2 AND catId=1;
This will give you work if you have matching rows in neither table, or in both tables, or if you have a matching row in mainCats, but not in subCats.
The third option, (row in subCats but not in mainCats) won't show with this query.
If MySQL had FULL OUTER JOIN, you could get all four options in one query.
Interestingly, Microsoft SQL Server has FULL OUTER JOIN, while Oracle and MySQL don't.
I learned Microsoft SQL Server first, and then I took a university course that taught Oracle. When we learned about joins, the prof only explained CROSS, INNER, LEFT, and RIGHT. So I asked about FULL join, and she asked me why would I ever need it. I gave her several examples I from work where I had used FULL joins, and she insisted that was just bad database design...
I then did some research on my own and was shocked to realize that Oracle didn't have full joins. When I started using MySQL, I was just slightly disappointed, but not shocked to discover it doesn't have full joins either.