At least one X but no Ys Query - mysql

I come across this pattern occasionally and I haven't found a terribly satisfactory way to solve it.
Say I have a employee table and an review table. Each employee can have more than one review. I want to find all the employees who have at least one "good" review but no "bad" reviews.
I haven't figured out how to make subselects work without knowing the employee ID before hand and I haven't figured out the right combination of joins to make this happen.
Is there a way to do this WITHOUT stored procedures, functions or bringing the data server side? I've gotten it to work with those but I'm sure there's another way.

Since you haven't posted your DB Structure, I made some assumptions and simplifications (regarding the rating column, which probably is number and not a character field). Adjust accordingly.
Solution 1: Using Joins
select distinct e.EmployeeId, e.Name
from employee e
left join reviews r1 on e.EmployeeId = r1.EmployeeId and r1.rating = 'good'
left join reviews r2 on e.EmployeeId = r2.EmployeeId and r1.rating = 'bad'
where r1.ReviewId is not null --meaning there's at least one
and r2.ReviewId is null --meaning there's no bad review
Solution 2: Grouping By and Filtering with Conditional Count
select e.EmployeeId, max(e.Name) Name
from employee e
left join reviews r on e.EmployeeId = r.EmployeeId
group by e.EmployeeId
having count(case r.rating when 'good' then 1 else null end) > 0
and count(case r.rating when 'bad' then 1 else null end) = 0
Both solutions are SQL ANSI compatible, which means both work with any RDBMS flavor that fully support SQL ANSI standards (which is true for most RDBMS).
As pointed out by #onedaywhen, the code will not work in MS Access (have not tested, I'm trusting in his expertise on the subject).
But I have one saying on this (which might make some people upset): I hardly consider MS Access a RDBMS. I have worked with it in the past. Once you move on (Oracle, SQL Server, Firebird, PostGreSQL, MySQL, you name it), you do not ever want to come back. Seriously.

The question -- return rows on side A based on nonexistence of a match in B -- (employees with No "Bad" reviews) describes an "anti-semi join". There are numerous ways to accomplish this kind of query, at least 5 I've discovered in MS Sql 2005 and above.
I know this solution works in MSSQL 2000 and above, and is the most efficient out of the 5 ways I've tried in MS Sql 2005 and 2008. I am not sure if it will work in MySQL, but it should, as it reflects a rather common set operation.
Note, the IN clause gives the subquery access to the employee table in the outer scope.
SELECT EE.*
FROM employee EE
WHERE
EE.EmpKey IN (
SELECT RR.EmpKey
FROM review RR
WHERE RR.EmpKey = EE.EmpKey
AND RR.ScoreCategory = 'good'
)
AND
EE.EmpKey NOT IN (
SELECT RR.EmpKey
FROM review RR
WHERE RR.EmpKey = EE.EmpKey
AND RR.ScoreCategory = 'bad'
)

It's possible. The particular syntax depends on how you store 'good' and 'bad' reviews.
Suppose you had a classification column in review that had values 'good' and 'bad'.
Then you could do:
SELECT employee.*
FROM employee
JOIN review
ON employee.id=review.employee_id
GROUP BY employee.id
HAVING SUM(IF(classification='good',1,0))>0 -- count up #good reviews, > 0
AND SUM(IF(classification='bad',1,0))=0 -- count up #bad reviews, = 0.

SELECT ???? FROM employee,review
WHERE employees.id = review.id
GROUP BY employees.id
HAVING SUM(IF(review='good',1,0)) > 1 AND SUM(IF(review='bad',1,0)) = 0

Related

Getting count from joined table

Here's my problem: I need to get the amount of test cases and issues associated to a project that meet certain conditions (test cases that are successful, and issues that are flaws of the application), but for some reason the amount doesn't add up. I have 10 test cases in a project, of which 6 are successful; and 8 issues, of which only 4 are flaws. However, the respective results for COUNT each show 24, which makes no sense. I did notice, though, that 24 happens to be 6 times 4, but I don't see how the query would multiply them.
Anyway... Can someone help me find which part of my query is wrong? How can I get the correct result? Thanks in advance.
Here's the query:
SELECT
p.codigo_proyecto,
p.nombre,
IFNULL(COUNT(iep.id_incidencia_etapa_proyecto), 0) AS cantidad_defectos,
IFNULL(COUNT(tc.id_test_case), 0) AS test_cases_exitosos,
CASE IFNULL(COUNT(tc.id_test_case), 0) WHEN 0 THEN 'No aplica'
ELSE CONCAT((IFNULL(COUNT(tc.id_test_case), 0) / IFNULL(COUNT(tc.id_test_case), 0)) * 100, '%') END AS tasa_defectos
FROM proyecto p
INNER JOIN etapa_proyecto ep ON p.codigo_proyecto = ep.codigo_proyecto
INNER JOIN incidencia_etapa_proyecto iep ON ep.id_etapa_proyecto = iep.id_etapa_proyecto
INNER JOIN incidencia i ON iep.id_incidencia = i.id_incidencia
INNER JOIN test_case tc ON ep.id_etapa_proyecto = tc.id_etapa_proyecto
INNER JOIN etapa_proyecto ep_ultima ON ep_ultima.id_etapa_proyecto =
(SELECT ep_ultima2.id_etapa_proyecto FROM etapa_proyecto ep_ultima2
WHERE p.codigo_proyecto = ep_ultima2.codigo_proyecto ORDER BY ep_ultima2.fecha_termino_real DESC LIMIT 1)
WHERE p.esta_cerrado = 1
AND i.es_defecto = 1
AND tc.resultado = 'Exitoso'
AND ep_ultima.fecha_termino_real BETWEEN '2015-01-01' AND '2016-12-31';
I would have thought it obvious that you're not going to get the expected output from an aggregate query without a GROUP BY (which suggests you're not really in a position to evaluate any advice given here effectively).
You've not said how the states of your data are represented in the database - so I'm having to make a lot of guesses based on SQL which is clearly very wrong. And I don't speak spanish/portugese or whatever your native language is.
It looks like you are inferring that a defect exists if the primary key of the defects table is null. Primary keys cannot be null. The only way this would make any sort of sense (BTW it still won't give you the answer you're looking for) is to do a LEFT JOIN rather than an INNER JOIN.
But even then a simple COUNT() will consider null cases (no record in source table) as 1 record in the output set.
Then you've got the problem that you will have the product of defects and test cases in your output - consider the case where you have no defects, but 2 tests cases (1,2) - the result of an outer joiun will be:
defect test
------ ----
null 1
null 2
If you just count the rows, you'll get 2 defects in your output.
Taking a simpler schema, this demonstrates the 2 methods for getting the values - note that they have very different performance characteristics.
SELECT project.id
, dilv.defects
, (SELECT COUNT(*)
FROM test_cases) AS tests
FROM project
LEFT JOIN ( SELECT project_id, COUNT(*) AS defects
FROM defect_table
GROUP BY project_id) AS dilv
ON project.id=dilv.project_id

Getting stuck doing a complicated SQL query for patent research purposes

I am trying to gather data for a research study for my university thesis. Unfortunately I am not a computer science or programming expert and do not have any SQL experience.
For my thesis I need to do a SQL query answering the question: "Give me all patents of a company X where there is more than one applicant (other company) in a specific time span". The data I want to extract is stored on a database called PATSTAT (where I have a 1 month trial) and is using - dont be surprised SQL.
I tried a lot of queries but all the time I am getting different syntax errors.
This is how the interface looks like:
http://www10.pic-upload.de/07.07.13/7u5bqf7jsow.png
I think I have a really good understanding of what (also from an SQL POV) needs to be done but I cannot execute it.
My idea: As result I want the names of the companies (with reference to the company entered below)
SELECT person_name from tls206_person table
Now because I need a criteria like
WHERE nb_applicants > 1 from tls201_appln table
I need to join these two tables tls206 and tls201. I did read some brief introduction guide on SQL (provided by european patent office) and because both tables have no common "reference key" we need to use the table tls207_pers_appln als "intermediate" so to speak. Now thats the point where I am getting stuck. I tried the following but this is not working
SELECT person_name, tls201_appln.nb_applicants
FROM tls206_person
INNER JOIN tls207_pers_appln ON tls206_person.person_id= tls207_pers_appln.person_id
INNER JOIN tls207_pers_appln ON tls201_appln.appln_id=tls201_appln.appln_id
WHERE person_name = "%Samsung%"
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
I get the following error: "0:37:11 [SELECT - 0 row(s), 0 secs] [Error Code: 1064, SQL State: 0] Not unique table/alias: 'tls207_pers_appln'"
I think for just 4 Hours SQL my approach is not to bad but I really need some guidance on how to proceed because I am not making any progress.
Ideally I would like to count (for every company) and for every row respectively how many "nb_applicants" were found.
If you need further information for giving me guidance, just let me know.
Looking forward to your answers.
Best regards
Kendels
another way of doing the same thing, which you might find easier to understand (if you are new to sql it is impressive you have got so far), is:
SELECT tls206_person.person_name, tls201_appln.nb_applicants
FROM tls206_person, tls207_pers_appln, tls201_appln
WHERE tls206_person.person_id = tls207_pers_appln.person_id
AND tls201_appln.appln_id = tls201_appln.appln_id
AND tls206_person.person_name LIKE "%Samsung%"
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
(it's equivalent to the other answer, but instead of trying to understand the JOIN syntax, you just write out all the logic and SQL is smart enough to make it work - this is often called the "new" or "ISO" inner join syntax, if you want to google for more info) (although it is possible, i suppose, that this newer syntax isn't supported by the database you are using).
You are referencing the table tls201_appln, but it is not in the from clause. I am guessing that the second reference to tls207_pers_appln should be to the other table:
SELECT person_name, tls201_appln.nb_applicants
FROM tls206_person
INNER JOIN tls207_pers_appln ON tls206_person.person_id = tls207_pers_appln.person_id
INNER JOIN tls201_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
WHERE person_name like '%Samsung%"'
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
For my thesis I need to do a SQL query answering the question: "Give me all patents of a company X where there is more than one applicant (other company) in a specific time span".
Let me rephrase that for you :
SELECT * FROM patents p -- : "Give me all patents
WHERE p.company = 'X' -- of a company X
AND EXISTS ( -- where there is
SELECT *
FROM applicants x1
WHERE x1.patent_id = p.patent_id
AND x1.company <> 'X' -- another company:: exclude ourselves
AND x1.application_date >= $begin_date -- in a specific time span
AND x1.application_date < $end_date
-- more than one applicant (other company)
-- To avoid aggregation: Just repeat the same subquery
AND EXISTS ( -- where there is
SELECT *
FROM applicants x2
WHERE x2.patent_id = p.patent_id
AND x2.company <> 'X' -- another company:: exclude ourselves
AND x2.company <> x1.company -- :: exclude other other company, too
AND x2.application_date >= $begin_date -- in a specific time span
AND x2.application_date < $end_date
)
)
;
[Note: Since the OP did not give any table definitions, I had to invent these]
This is not the perfect query, but it does express your intentions. Given sane keys/indexes it will perform reasonably, too.

mysql query finding results using joined status table that needs EXIST and NOT EXIST results

I have been looking around for ages for a solution to my problem.
I have something that works but i am not sure it is the most efficient way of doing things and can't find anyone trying to do this when googling around.
I have a table with customers and a table with statuses that that customer has had.
If I want to find results where a customer has had a status happen I have managed to get the required results using a join, but sometimes I want to be able to find clients where not only has a status been reached but also where a few other statuses haven't been.
Currently I am doing this with a NOT EXISTS Sub query but it seem a bit slow and thinking about it if I have to check after finding a result that matches the first status through all the results again to see if it doesn't match another it could explain the slowness.
for instance a client could have a status of invoiced and a status of paid.
If I wanted to see which clients have been invoiced thats fine, If I want to see which clients have been invoiced and paid thats fine, but if I wanted to see which clients have been invoiced but NOT paid thats where I start having to use a NOT EXIST subquery
Is there another more efficient way around this? or is this the best way to proceed but I need to sort out how mysql uses indxes with these tables to be more efficient?
I can provide more detail of the actual sql if that helps?
Thanks
Matt
If this is over multiple clients then the usual solution would be to have a subselect for the status per client and then use LEFT OUTER JOIN to connect this.
Something like
SELECT *
FROM Clients a
LEFT OUTER JOIN (SELECT ClientId, COUNT(*) FROM ClientsStatus WHERE Status IN (1,2) GROUP BY ClientId) b
ON a.ClientId = b.ClientId
WHERE b.ClientId IS NULL
This (very rough) example is to give you a list of clients who do not have a status of 1 or 2.
You should be able to expand this basic idea to cover the scenarios / data you are dealing with
Edited for below
I have had a play with your SQL. I think you can use a JOIN onto the subselect fairly easily, but this doesn't seem to be checking anything other than whether a claim has had a status of 3 or 95.
SELECT claims.ID, claims.vat_rate, claims.com_rate,
claims.offer_val, claims.claim_value, claims.claim_ppi_amount, claims.claim_amount, claims.approx_loan_val, claims.salutationsa, claims.first_namesa, claims.last_namesa,
clients.salutation, clients.first_name,clients.last_name, clients.phone, clients.phone2, clients.mobile, clients.dob,clients.postcode, clients.address1, clients.address2, clients.town, client_claim_status.person,clients.ID
AS client_id,claims.ID AS claim_id, claims.date_added AS status_date_added,client_claim_status.date_added AS last_client_claim_status_date_added,work_suppliers.name AS refname, financial_institution.name AS lendname, clients.date_added AS client_date_added,ppi_claim_type_2.claim_type AS ppi_claim_type_name
FROM claims
RIGHT JOIN clients ON claims.client_id = clients.ID
RIGHT JOIN client_claim_status
ON claims.ID = client_claim_status.claim_id
AND client_claim_status.deleted != 'yes'
AND ((client_claim_status.status_id IN (1, 170))
AND client_claim_status.date_added < '2012-12-02 00:00:00' )
LEFT OUTER JOIN (SELECT claim_id FROM client_claim_status WHERE status_id IN (3, 95 )) Sub1
ON claims.ID = Sub1.claim_id
LEFT JOIN financial_institution ON claims.claim_against = financial_institution.ID
LEFT JOIN work_suppliers ON clients.work_supplier_id = work_suppliers.ID
LEFT JOIN ppi_claim_type_2 ON claims.ppi_claim_type_id = ppi_claim_type_2.ID
WHERE claims.deleted != 'yes'
AND Sub1.claim_id IS NULL
ORDER BY last_client_claim_status_date_added DESC
I would suggest that you rearrange the code to remove the RIGHT OUTER JOINs though to be honest. Mixing left and right joins up tend to be very confusing.

union query to combine fields

I don;t know SQL, so I am hoping that someone can provide me the SQL to copy and paste in order to merge all of the different unit price fields into one field called "merged_unit_price". Please note that many of the unit price values are null, so I would prefer that the null values don't get merged.
Thank you very much in advance, Nathaniel
SELECT p.ID AS Part_ID,
p.UNIT_PRICE,
d.UNIT_PRICE_1,
d.UNIT_PRICE_2,
d.UNIT_PRICE_3
FROM tbl_local_SYSADM_PART AS p
LEFT JOIN SYSADM_DISCOUNT_PRICE AS d
ON p.ID = d.PART_ID;
First off in your query make sure to exclude the Null values from you dataset. Cant remember if Access SQL uses Null or Nothing, so try it one way and see if it errors out.
SELECT p.ID AS Part_ID, p.UNIT_PRICE, d.UNIT_PRICE_1, d.UNIT_PRICE_2, d.UNIT_PRICE_3
FROM tbl_local_SYSADM_PART AS p
LEFT JOIN SYSADM_DISCOUNT_PRICE AS d ON p.ID = d.PART_ID;
WHERE p.UNIT_PRICE <> Nothing OR p.UNIT_PRICE_1 <> NOTHING OR p.UNIT_PRICE_2 <> NOTHING OR p.UNIT_PRICE_3 <> Nothing
As well, i would suggest you learn more about SQL statements, in general and access, cause you have limited yourself to just 4 UNIT_PRICE's and will eventually have to increase your table field signature. If it were me, i would break this table into Join Table, so you can have multiple Part_ID's and multiple UNIT_PRICE's. Currently you are restricted with one Part_ID and figuratively just one UNIT_PRICE (counting the 4 price fields as 1 record).

MySQL advanced sub query repetition

I'm writing some reasonably complex queries for reporting for an app I am developing. I could probably achieve all of the following through using higher level PHP, but obviously I would like to get it all done with MySQL which will obviously simplify things greatly.
The report I need is a typical sales report type query, which will list a list of people, and some relevant totals relating to them. The only minor difference is that this system relates to freight/haulage, so the "sales people" are actually lorry drivers, and the "sales" are individual consignments. Also, consignments are only linked/tied to their respective driver through the creation of "routes", which record who delivers/collects what on a specific day.
Naturally, I could use an INNER JOIN to get a list of each driver, with all of the consignments they have delivered/collected, and SUM the revenue made off these. The problem comes however, when I need to show both a column for total delivery revenue and collection revenue. These figures can come from a consignments table, which lists every consignment. Each consignment can have a flag (ENUM "D","C") which donates whether it is a delivery or collection. This can almost be ascertained easily through using sub queries but still, there will be a lot of repetition.
What I have so far:
SELECT pr.driver_callsign, d.first_name, d.last_name, sum(pc.revenue) AS total_revenue
FROM pallet_routes AS pr
INNER JOIN drivers AS d ON d.driver_callsign = pr.driver_callsign
INNER JOIN pallet_consignments AS pc ON pc.route_id = pr.route_id
GROUP BY pr.driver_callsign
ORDER BY d.driver_callsign ASC
This obviously returns a list of each driver, with the total amount of revenue made from all consignments they have tied to them.
What would be the most efficient way to further split this revenue SUM field up to show a SUM(revenue) WHERE type = "C" and SUM(revenue) WHERE type="D"? Subqueries? UNION?
It may also be worth mentioning that the end query will be narrowed down to a date range. So for example there will be a WHERE date BETWEEN x AND y put against the pallet_routes table.
Any advice would be greatfully received. Please do ask if you want me to elaborate more.
I don't know where your column type is, but if it's on pallet_consignments, you can try the following:
SELECT pr.driver_callsign, d.first_name, d.last_name,
SUM(IF(pc.`type` = 'C', pc.revenue, 0)) collection_revenue,
SUM(IF(pc.`type` = 'D', pc.revenue, 0)) delivery_revenue
FROM pallet_routes AS pr
INNER JOIN drivers AS d ON d.driver_callsign = pr.driver_callsign
INNER JOIN pallet_consignments AS pc ON pc.route_id = pr.route_id
GROUP BY pr.driver_callsign
ORDER BY d.driver_callsign ASC
Otherwise, please mention where the column type is.