How to minimize duplicate queries - data-analysis

Suppose I have two datasets.
In QlikView, if I try to include these in a load using a query like the following:
sql select marriage_id, primary_person_id, seconary_person_id, marriage_start_date, marriage_end_date from marriage_table;
sql select person_id as primary_person_id, person_id as seconary_person_id, first_name, middle_name, last_name, date_of_birth from person_table;
I will get an error about how I could be leading myself to have inaccurate data, as QlikView has two potential paths to get to PERSON_TABLE. Which makes sense, but I really really hate the idea of duplicating the selects and tables like the following.
sql select marriage_id, primary_person_id, seconary_person_id, marriage_start_date, marriage_end_date from marriage_table;
sql select person_id as primary_person_id, first_name, middle_name, last_name, date_of_birth from person_table;
sql select person_id as seconary_person_id, first_name, middle_name, last_name, date_of_birth from person_table;
Is there a better way to deal with this that I'm missing?

What you are showing in that figure is called "circular reference" (I think Qlikview calls it "Synthetic Key") and it is something that you should really try to avoid since it may make your app, not crash, but show incorrect results (which is worse).
In my opinion you have two options:
Op1 - Duplicate your PERSON_TABLE so that PRIMARY_PERSON_ID will be linked to PERSON_TABLE_1 and SECONDARY_PERSON_ID will be linked to PERSON_TABLE_2.
PERSON_TABLE_1:
SQL SELECT person_id as primary_person_id,
first_name as first_name_1,
middle_name as middle_name_1,
last_name as last_name_1,
date_of_birth as date_of_birth_1
FROM person_table
PERSON_TABLE_2:
SQL SELECT person_id as secondary_person_id,
first_name as first_name_2,
middle_name as middle_name_2,
last_name as last_name_2,
date_of_birth as date_of_birth_2
FROM person_table
The problem with this option is that you have to choose different alias for each field, which is usually not very convenient depending on the type of analysis you do in your app.
Op2: Create a unique MARRIAGE_TABLE already including the data of the two people. For that you can create a SQL query with two JOINS (I will only use first and middle names for simplicity, but you can add all the other fields)
SELECT T1.*, T2.first_name as first_name_1, T2.middle_name as middle_name_1,
T3.first_name as first_name_2, T3.middle_name as middle_name_2
FROM MARRIAGE_TABLE AS T1
LEFT JOIN PERSON_TABLE AS T2 ON Q1.primary_person_id = T2.person_id
LEFT JOIN PERSON_TABLE AS T3 ON Q1.secondary_person_id = T3.person_id
which will result in a unique table with the following columns:
MARRIAGE_ID PRIMARY_PERSON_ID SECONDARY_PERSON_ID MARRIAGE_START_DATE MARRIAGE_END_DATE FIRST_NAME_1 MIDDLE_NAME_1 FIRST_NAME_2 MIDDLE_NAME_2

Related

Beginner MySQL query with COUNT-"filter"

I post this question here to get more clarity over my query while learning SQL
(The example is simplfied)
I have the following tables:
BookTable(bookID, isbn, title) // Holds every book, not the ammuont, just the writing
CopyTable(copyID, bookID) // Represent a physical copy
AuthorTable(authorID, fName, lName) // Represents an author
WriteTable(authorID, bookID) // Represents who wrote what
I want to select every author (Preferably like {authordID, fname, lname} ), if that author has a book written, which has more than 5 copies.
I am trying something like this:
SELECT DISTINCT authorID, fname, lname // My final "output table"
FROM T_Author
WHERE authorID IN
SELECT authorID, bookID
FROM T_Write
WHERE bookID IN
SELECT bookID, COUNT(*) AS count
FROM T_Copy
GROUP BY bookID // This part I doubt the most
WHERE count > 5
So my idea is:
Select every BookID that appears more than 5 times in CopyTable
Select every author that wrote any of those books from WriteTable
Write out the name of the author with data from AuthorTable
I am not able to test this if it acutally works, but is this the "Right" way to think in this problem?
Thanks in advance for any guidance.
You are pretty close. Try this:
SELECT a.authorID, a.fname, a.lname // My final "output table"
FROM T_Author a
WHERE a.authorID IN (SELECT w.authorID
FROM T_Write w
WHERE w.bookID IN (SELECT c.bookID
FROM T_Copy c
GROUP BY c.bookID // This part I doubt the most
HAVING COUNT(*) > 5
)
);
Notes:
Subqueries need their own parentheses.
For IN, the returned value has to exactly match what is being compared. In general, you cannot return two columns.
Use HAVING to filter after aggregation.
SELECT DISTINCT is not needed in the outer query. It just adds processing overhead.
Use table aliases and qualified column names in any query that has more than one table reference.

Query error with ambiguous column name in SQL on Vendors ID

Trying to run this query and it keeps on telling me ambiguous column name on VendorID need help
Select VendorID
, VendorName
, InvoiceNumber
, InvoiceDate
, InvoiceTotal
FROM Vendors
JOIN Invoices
ON Vendors.VendorID = Invoices.InvoiceID
Just qualify all your column names, and you will never have this problem again. I also think your ON conditions are wrong:
SELECT v.VendorID, v.VendorName, i.InvoiceNumber, i.InvoiceDate, i.InvoiceTotal
FROM Vendors v JOIN
Invoices i
ON v.VendorID = i.VendorID;
-----------------------^
For completeness, I will note that you can fix this particular problem with the USING clause. However, it is better just to write code defensively so queries don't generate errors.

Using MERGE in SQL Server 2008

I just found out about this nifty little feature. I have a couple questions. consider the statement below.
This is how interpret how it works. The USING statement is what gets compared to see if there is a match correct? I want to use how it is now, but I want to use 2 other columns from the source table in the MATCH portion. I can't do that. So is there a way that I can use the 2 columns (decesed (I know its spelled wrong :) ) and hicno_enc)?
Another thing I would like to do and don't know if it possible, but if the row exists in target but not source, then mark it inactive.
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE
FROM
aco.tmpimport i
INNER JOIN aco.patients p
ON p.hicnoenc = i.hicno_enc
MERGE aco.patients AS target
USING (
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE
FROM aco.tmpimport
) AS source
ON target.hicnoenc = source.hicno_enc
WHEN MATCHED AND target.isdeceased <> CONVERT(BIT,source.decesed) THEN
UPDATE
SET
target.isdeceased = source.decesed,
updatedat = getdate(),
updatedby = 0
WHEN NOT MATCHED THEN
INSERT (firstname, lastname, gender, dob, isdeceased, hicnoenc)
VALUES (source.FIRST_NAME,
source.LAST_NAME,
source.sex1,
source.BIRTH_DATE,
source.decesed,
source.hicno_enc);
So is there a way that I can use the 2 columns (decesed (I know its
spelled wrong :) ) and hicno_enc)?
Add the columns you need in the select statement in the using clause.
USING (
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE, decesed, hicno_enc
FROM aco.tmpimport
) AS source
if the row exists in target but not source, then mark it inactive.
Add a when not matched by source clause and do the update.
WHEN NOT MATCHED BY SOURCE THEN
UPDATE
SET active = 0

write a string based on mysql query

If I have a query that returns first and last name, how would I go about combining them into 1 field?
I want to be able to select an id - then the first and last names are found based on that id, and then they are concatenated for the final returned string.
I am working on a complicated query with many columns being returned - in the past I've gone in with more queries afterwards and replaced some of the values using php. But I would like to have it all work in the original query.
Right now I'm doing something similar to this:
SELECT id From....etc
Then afterwards in php I do this query and replace the id with the returned values for first and last name
SELECT lastName, firstName FROM people, patient WHERE idpatient = $data AND people_id = id
the result is then changed from:
id:1 ---> id: lastname, firstname
is there a way to combine both of these queries into one?
For the first part of your question, use CONCAT
SELECT CONCAT( lastName, firstName ) AS full_name
FROM people, patient
WHERE idpatient = $data AND people_id = id
Also, you can rewrite this query using explicit JOINs
SELECT CONCAT( lastName, firstName ) AS full_name
FROM people
INNER JOIN patient ON people_id = id
WHERE idpatient = $data
If you're only interested in rows in people that have an entry in patient, or
SELECT CONCAT( lastName, firstName ) AS full_name
FROM people
LEFT JOIN patient ON people_id = id
WHERE idpatient = $data
if you're also interested in rows in people that might not have an entry in patient.

Dynamic query string

I want to add some dynamic content in from clause based on one particular column value.
is it possible?
For Example,
SELECT BILL.BILL_NO AS BILLNO,
IF(BILL.PATIENT_ID IS NULL,"CUS.CUSTOMERNAME AS NAME","PAT.PATIENTNAME AS NAME")
FROM
BILL_PATIENT_BILL AS BILL
LEFT JOIN IF(BILL.PATIENT_ID IS NULL," RT_TICKET_CUSTOMER AS CUS ON BILL.CUSTOMER_ID=CUS.ID"," RT_TICKET_PATIENT AS PAT ON BILL.PATIENT_ID=PAT.ID")
But This query is not working.
Here
BILL_PATIENT_BILL table is a common table.
It can have either PATIENT_ID or CUSTOMER_ID. If a particular record has PATIENT_ID i want PATIENTNAME in RT_TICKET_PATIENT as NAME OtherWise it will hold CUSTOMER_ID. If it is i want CUSTOMERNAME as NAME.
Here I m sure That BILL_PATIENT_BILL must have either PATIENT_ID or CUSTOMER_ID.
Can anyone help me?
You can also use IF() to select the right values instead of constructing your query from strings:
SELECT
BILL.BILL_NO AS BILLNO,
IF( BILL.PATIENT_ID IS NULL, cus.CUSTOMERNAME, pat.PATIENTNAME ) AS NAME
FROM
BILL_PATIENT_BILL AS BILL
LEFT JOIN RT_TICKET_CUSTOMER cus ON BILL.CUSTOMER_ID = cus.ID
LEFT JOIN RT_TICKET_PATIENT pat ON BILL.PATIENT_ID = pat.ID
However, it would also be possible to PREPARE a statement from strings and EXECUTE it but this technique is prone to SQL injections, i can only disadvise to do so:
read here: Is it possible to execute a string in MySQL?