Left join not returning all results - mysql

I am attempting to join the two tables below to show all the columns for the incident table and just a count of the corresponding records from the tickets table with the incident_id as the same in the incidents table.
As you can see below, none of the tickets have an incident_id assigned yet. The goal of my query is to show all of the records in the incident table with a count of the ticket_ids assigned to that ticket. I thought that this would work but it's returning only one row:
SELECT inc.incident_id, inc.title, inc.date_opened, inc.date_closed, inc.status, inc.description, issue_type, COUNT(ticket_id) as example_count
FROM fin_incidents AS inc
LEFT OUTER JOIN fin_tickets ON inc.incident_id = fin_tickets.incident_id;
What query can I use to return all of the incidents and their count of tickets, even if that count is 0?
Images:
Incident Table
Tickets Table
Result of my query

Your query should not work at all -- and would fail in the more recent versions of MySQL. The reason is that it is missing a GROUP BY clause:
SELECT inc.incident_id, inc.title, inc.date_opened,
inc.date_closed, inc.status, inc.description, inc.issue_type,
COUNT(t.ticket_id) as example_count
FROM fin_incidents inc LEFT OUTER JOIN
fin_tickets t
ON inc.incident_id = t.incident_id
GROUP BY inc.incident_id, inc.title, inc.date_opened,
inc.date_closed, inc.status, inc.description, inc.issue_type
You have an aggregation query with no GROUP BY. Such a query returns exactly one row, even if the tables referred to are empty.

Your code is not a valid aggregation query. You have an aggregate function in the SELECT clause (the COUNT()), but no GROUP BY clause. When executed this with sql mode ONLY_FULL_GROUP_BY disabled, MySQL gives you a single row with an overall count of tickets that are related to an incident, and any value from incident row. If that SQL mode was enabled, you would a compilation error instead.
I find that the logic you want is simpler expressed with a correlated subquery:
select i.*
(select count(*) from fin_tickets t where t.incident_id = i.incident_id) as example_count
from fin_incidents i
This query will take advantage of an index on fin_tickets(incident_id) - if you have defined a foreign key (as you should have), that index is already there.

Related

mySQL group by function showing lack of data

What I'm after is to see what is the fastest lap time for particular races, which will be identified by using race name and race date.
SELECT lapName AS Name, lapDate AS Date, T
FROM Lap
INNER JOIN Race ON Lap.lapName = Race.Name
AND Lap.lapDate = Race.Date
GROUP BY Date;
It currently only displays 3 different race names, with 4 different dates, meaning I've got 4 combinations total, when there are in fact 9 unique race name, race date combinations.
Unique race data is stored in the Race table. Laptimes are stored in the LapInfo table.
I'm also getting a warning about my group statement saying it is ambiguous though it still runs.
You don't seem to need a join for this:
SELECT l.lapRaceName, l.lapRaceDate,
MIN(l.lapTime)
FROM LapInfo l
GROUP BY l.lapRaceName, l.lapRaceDate;
If you don't need a JOIN, it is superfluous to put one in the query.
First of all, your query is actually invalid SQL. You need to use the MIN function to get the fastest lapTime. Also, you have to GROUP BY lapRaceName, raceDate instead of just lapRaceName. Unfortunately, in this case, mysql is lax enough to execute it without error.
Also, you JOIN LapInfo with Race, and return jthe joined columns from LapInfo that you alias as names that can be found in Race. That's OK from SQL point of view, but that's also usulessly complicated : return directly the columns from the Race table, as they have the names that you are looking for.
Finally, it would be far better to indicate which table each column belongs to. Here, column lapTime belongs to table LapInfo, so let's make it explicit.
Query :
SELECT
Race.raceName,
Race.raceDate,
MIN(LapInfo.lapTime)
FROM
Race
INNER JOIN LapInfo
ON LapInfo.lapRaceName = Race.raceName
AND LapInfo.lapRaceDate = Race.raceDate
GROUP BY
Race.raceName,
Race.raceDate
;

Specific where clause in Mysql query

So i have a mysql table with over 9 million records. They are call records. Each record represents 1 individual call. The columns are as follows:
CUSTOMER
RAW_SECS
TERM_TRUNK
CALL_DATE
There are others but these are the ones I will be using.
So I need to count the total number of calls for a certain week in a certain Term Trunk. I then need to sum up the number of seconds for those calls. Then I need to count the total number of calls that were below 7 seconds. I always do this in 2 queries and combine them but I was wondering if there were ways to do it in one? I'm new to mysql so i'm sure my syntax is horrific but here is what I do...
Query 1:
SELECT CUSTOMER, SUM(RAW_SECS), COUNT(*)
FROM Mytable
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;
Query 2:
SELECT CUSTOMER, COUNT(*)
FROM Mytable2
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2') AND RAW_SECS < 7
GROUP BY CUSTOMER;
Is there any way to combine these two queries into one? Or maybe just a better way of doing it? I appreciate all the help!
There are 2 ways of achieving the expected outcome in a single query:
conditional counting: use a case expression or if() function within the count() (or sum()) to count only specific records
use self join: left join the table on itself using the id field of the table and in the join condition filter the alias on the right hand side of the join on calls shorter than 7 seconds
The advantage of the 2nd approach is that you may be able to use indexes to speed it up, while the conditional counting cannot use indexes.
SELECT m1.CUSTOMER, SUM(m1.RAW_SECS), COUNT(m1.customer), count(m2.customer)
FROM Mytable m1
LEFT JOIN Mytable m2 ON m1.id=m2.id and m2.raw_secs<7
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;

Remove Duplicate record from Mysql Table using Group By

I have a table structure and data below.
I need to remove duplicate record from the table list. My confusion is that when I am firing query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
then giving me correct list(12 records).
Same query when I am using the subquery:
SELECT *
FROM `table` WHERE id IN (SELECT id FROM `table` GROUP BY CONCAT(`name`,department))
It returning all record which is wrong.
So, My question is why group by in subquery is not woking.
Actually as Tim mentioned in his answer that it to get first unique record by group by clause is not a standard feature of sql but mysql allows it till mysql5.6.16 version but from 5.6.21 it has been changed.
Just change mysql version in your sql fiddle and check that you will get what you want.
In the query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
You are selecting the id column, which is a non-aggregate column. Many RDBMS would give you an error, but MySQL allows this for performance reasons. This means MySQL has to choose which record to retain in the result set. Based on the result set in your original problem, it appears that MySQL is retaining the id of the first duplicate record, in cases where a group has more than one member.
In the query
SELECT *
FROM `table`
WHERE id IN
(
SELECT id FROM `table` GROUP BY CONCAT(`name`,department)
)
you are also selecting a non-aggregate column in the subquery. It appears that MySQL actually decides which id value to be retained in the subquery based on the id value in the outer query. That is, for each id value in table, MySQL performs the subquery and then selectively chooses to retain a record in the group if two id values match.
You should avoid using a non-aggregate column in a query with GROUP BY, because it is a violation of the ANSI standard, and as you have seen here it can result in unexpected results. If you give us more information about what result set you want, we can give you a correct query which will avoid this problem.
I welcome anyone who has documentation to support these observations to either edit my question or post a new one.
You can JOIN the grouped ids with that of table ids, so that you can get desired results.
Example:
SELECT t.* FROM so_q32175332 t
JOIN ( SELECT id FROM so_q32175332
GROUP BY CONCAT( name, department ) ) f
ON t.id = f.id
ORDER BY CONCAT( name, department );
Here order by was added just to compare directly the * results on group.
Demo on SQL Fiddle: http://sqlfiddle.com/#!9/d715a/1

Subquery in SELECT or Subquery in JOIN?

I have a MYSQL query of this form:
SELECT
employee.name,
totalpayments.totalpaid
FROM
employee
JOIN (
SELECT
paychecks.employee_id,
SUM(paychecks.amount) totalpaid
FROM
paychecks
GROUP BY
paychecks.employee_id
) totalpayments on totalpayments.employee_id = employee.id
I've recently found that this returns MUCH faster in this form:
SELECT
employee.name,
(
SELECT
SUM(paychecks.amount)
FROM
paychecks
WHERE
paychecks.employee_id = employee.id
) totalpaid
FROM
employee
It surprises me that there would be a difference in speed, and that the lower query would be faster. I prefer the upper form for development, because I can run the subquery independently.
Is there a way to get the "best of both worlds": speedy results return AND being able to run the subquery in isolation?
Likely, the correlated subquery is able to make effective use of an index, which is why it's fast, even though that subquery has to be executed multiple times.
For the first query with the inline view, that causing MySQL to create a derived table, and for large sets, that's effectively a MyISAM table.
In MySQL 5.6.x and later, the optimizer may choose to add an index on the derived table, if that would allow a ref operation and the estimated cost of the ref operation is lower than the nested loops scan.
I recommend you try using EXPLAIN to see the access plan. (Based on your report of performance, I suspect you are running on MySQL version 5.5 or earlier.)
The two statements are not entirely equivalent, in the case where there are rows in employees for which there are no matching rows in paychecks.
An equivalent result could be obtained entirely avoiding a subquery:
SELECT e.name
, SUM(p.amount) AS total_paid
FROM employee e
JOIN paychecks p
ON p.employee_id = e.id
GROUP BY e.id
(Use an inner join to get a result equivalent to the first query, use a LEFT outer join to be equivalent to the second query. Wrap the SUM() aggregate in an IFNULL function if you want to return a zero rather than a NULL value when no matching row with a non-null value of amount is found in paychecks.)
Join is basically Cartesian product that means all the records of table A will be combined with all the records of table B. The output will be
number of records of table A * number of records of table b =rows in the new table
10 * 10 = 100
and out of those 100 records, the ones that match the filters will be returned in the query.
In the nested queries, there is a sample inner query and whatever is the total size of records of the inner query will be the input to the outter query that is why nested queries are faster than joins.

wrapping inside aggregate function in SQL query

I have 2 tables called Orders and Salesperson shown below:
And I want to retrieve the names of all salespeople that have more than 1 order from the tables above.
Then firing following query shows an error:
SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The error is:
Column 'Name' is invalid in the select list because it is
not contained in either an aggregate function or
the GROUP BY clause.
From the error and searching it on google, I could understand that the error is because of Name column must be either a part of the group by statement or aggregate function.
Also I tried to understand why does the selected column have to be in the group by clause or art of an aggregate function? But didn't understand clearly.
So, how to fix this error?
SELECT max(Name) as Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The basic idea is that columns that are not in the group by clause need to be in an aggregate function now here due to the fact that the name is probably the same for every salesperson_id min or max make no real difference (the result is the same)
example
Looking at your data you have 3 entry's for Dan(7) now when a join is created the with row Dan (Name) gets multiplied by 3 (For every number 1 Dan) and then the server does not now witch "Dan" to pick cos to the server that are 3 lines even doh they are semantically the same
also try this so that you see what I am talking about:
SELECT Orders.Number, Salesperson.Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
As far as the query goes INNER JOIN is a better solution since its kinda the standard for this simple query it should not matter but in some cases can happen that INNER JOIN produces better results but as far as I know this is more of a legacy thing since this days the server should pretty much produce the same execution plan.
For code clarity I would stick with INNER JOIN
Assuming the name is unique to the salesperson.id then simply add it to your group by clause
GROUP BY salesperson_id, salesperson.Name
Otherwise use any Agg function
Select Min(Name)
The reason for this is that SQL doesn't know whether there are multiple name per salesperson.id
For readability and correctness, I usually split aggregate queries into two parts:
The aggregate query
Any additional queries to support fields not contained in aggregate functions
So:
1.Aggregate query - salespeople with more than 1 order
SELECT salesperson_id
FROM ORDERS
GROUP BY salespersonId
HAVING COUNT(Number) > 1
2.Use aggregate as subquery (basically a select joining onto another select) to join on any additional fields:
SELECT *
FROM Salesperson SP
INNER JOIN
(
SELECT salesperson_id
FROM ORDERS
GROUP BY salespersonId
HAVING COUNT(Number) > 1
) AGG_QUERY
ON AGG_QUERY.salesperson_id = SP.ID
There are other approaches, such as selecting the additional fields via aggregation functions (as shown by the other answers). These get the code written quickly so if you are writing the query under time pressure you may prefer that approach. If the query needs to be maintained (and hence readable) I would favour subqueries.