MySQL INNER JOIN with GROUP BY and COUNT(*) - mysql

I've never been able to get my head around INNER JOINs (or any other JOIN types for that matter) so I'm struggling to work out how to use it in my specific situation. In fact, I'm not even sure if it's what I need. I've looked at other examples and read tutorials but my brain just doesn't seem to work the way needed to truly get it (or it doesn't function at all).
Here's the scenario:
I have two tables -
phone_numbers - this table has a list of phone numbers that
belong to lots of different customers. A single customer can have
multiple numbers. For simplicity's sake, we'll say the fields are
'number_id', 'customer_id', 'phone_number'.
call_history - this table has a record of every single call that one of these
numbers in the first table could have had. There's a record for
every individual call going back years. Again, for simplicity,
we'll say the relevant fields are customer_id, phone_number,
call_start_time.
What I'm trying to accomplish is to find all of the numbers that belong to a particular customer_id in the phone numbers table and use that information to search through the call_history table and find the number of calls each phone number has received, and group that by the number of calls for each number, preferably also showing zeros where a number hasn't received any calls at all.
The reason the zero calls is important is because that's the data I'm interested in. Otherwise, I could just get all the information out of the call_history table. But what I'm trying to achieve is find the numbers with no activity.
All I've been able to accomplish is run one query to get all of the numbers belonging to one customer:
SELECT customer_id, phone_number FROM phone_numbers WHERE customer_id = Y;
Then run a second query to get all phone calls for that customer_id for a set duration:
SELECT customer_id, phone_number, COUNT(*) FROM call_history WHERE customer_id = Y and call_start_time >= DATE_SUB(SYSDATE(), INTERVAL 30 DAY) GROUP BY phone_number;
I've then had to use the data returned from both queries and use a VLOOKUP function in Excel to match number of calls for each individual number from the second query to the list of all numbers from the first query, thus leaving blanks in my "all numbers" table and identifying those numbers that had no calls for that time period.
I'm hoping there's some way to do all of this with a single query and return a table of results, listing the zero number of calls with it and eliminate the whole manual Excel bit as it's not overly efficient and prone to human error.

Without at least a workable example from you, it's not easy to re-create your situation. Anyway, INNER JOIN might not return the result as how you expected. In my short time with MySQL, I mainly use 2 types of JOIN; one is already mentioned and the other is LEFT JOIN. From what I can understand in your question, what you want to achieve can be done by using LEFT JOIN instead of INNER JOIN. I may not be the best person to explain this to you but this is how I understand it:
INNER JOIN - only return anything that match in ON clause between two (or more) tables.
LEFT JOIN - will return everything from the table on the left side of the join and return NULL if ON get no match in the table on the right side of the join .. unless you specify some WHERE condition from something on the right table.
Now, here is my query suggestion and hopefully it'll be useful for you:
SELECT A.customer_id, A.phone_number,
SUM(CASE WHEN call_start_time >= DATE_SUB(SYSDATE(), INTERVAL 30 DAY)
THEN 1 ELSE 0 END) AS Total
FROM phone_numbers A
LEFT JOIN call_history B
ON A.customer_id=B.customer_id
GROUP BY A.customer_id,A.phone_number;
What I did here is I LEFT JOIN phone_numbers table with call_history on customer_id and I re-position the WHERE call_start_time >= .. condition into a CASE expression in the SELECT since putting it at WHERE will turn this into a normal join or inner join instead.
Here is an example fiddle : https://www.db-fiddle.com/f/hriFWqVy5RGbnsdj8i3aVG/1

For Inner join You should have to do like this way..
SELECT customer_id,phone_number FROM phone_numbers as pn,call_history as ch where pn.customer_id = ch.customer_id and call_start_time >= DATE_SUB(SYSDATE(), INTERVAL 30 DAY) GROUP BY phone_number;
Just add table name whatever you want to join and add condition

Related

Join error and order by

I'm trying to write a query which does the below:
For every guest who has the word “Edinburgh” in their address show the total number of nights booked. Be sure to include 0 for those guests who have never had a booking. Show last name, first name, address and number of nights. Order by last name then first name.
I am having problems with making the join work properly,
ER Diagram Snippet:
Here is my current (broken) solution:
SELECT last_name, first_name, address, nights
FROM booking
RIGHT JOIN guest ON (booking.booking_id = guest.id)
WHERE address LIKE '%Edinburgh%';
Here is the results from that query:
The query is partially complete, hoping someone can help me out and create a working version. I'm currently in the process of learning SQL so apologies if its a rather basic or dumb question!
Your query seems almost correct. You were joining the booking id with guets id which gave you some results because of overlapping (matching) ids, but this most likely doesn't correspond to the foreign keys. You should join on guest_id from booking to id from guest.
I'd add grouping to sum all booked nights for a particular guest (assuming that nights is an integer):
SELECT g.last_name, g.first_name, g.address, SUM(b.nights) AS nights
FROM guest AS g
LEFT JOIN booking AS b ON b.guest_id = g.id
WHERE g.address LIKE '%Edinburgh%'
GROUP BY g.last_name, g.first_name, g.address;
Are you sure that nights spent should be calculated using nights field? Why can it be null? If you'd like to show zero for null values just wrap it up with a coalesce function like that:
COALESCE(SUM(b.nights), 0)
Notes:
Rewriten RIGHT JOIN into LEFT JOIN, but that doesn't affect results - it's just cleaner for me
Using aliases eg. AS g makes the code shorter when specifying joining columns
Reference every column with their table alias to avoid ambiguity
SELECT g.first_name,
g.last_name,
g.address,
COALESCE(Sum(b.nights), 0)
FROM booking b
RIGHT JOIN guest g
ON ( b.guest_id = g.id )
WHERE address LIKE 'edinburgh%'
GROUP BY g.last_name,
g.first_name,
g.address;
This post answers your questions about how to make the query.
MySQL SUM with same ID
You can simply use COALESCE as referenced here to avoid the NULL Values
How do I get SUM function in MySQL to return '0' if no values are found?

Multitable counting and multiplying in same query

I have got a somewhat complicated problem. This is my situation (ERD).
For a dashboard i need to create a pivot table that shows me the total amount of competences used by the vacancies. Therefore I need to:
Count the amount of vacancies per template
Count the amount of templates per competence
and last: multiply these numbers to get the total amount of comps used.
I have the first query:
SELECT vacancytemplate_id, count(id)
FROM vacancies
group by vacancytemplate_id;
And the second query isn't that difficult either, but I don't know what the right solution will be. I'm literally brainstuck. My mind can't comprehend how I can achieve the next step and put it down in a query. Please kind stranger, help me out :)
EDIT: my desired result is something like this
NameOfComp, NrOfTimesUsed
Leading, 17
Inspiring, 2
EDIT2: the meta query it should look like:
SELECT NameOfComp, (count of the competences used by templates) * (number of vacancies per template)
EDIT3: http://sqlfiddle.com/#!9/2773ca SQLFiddle
Thanks a lot!
If I am understanding your request correctly, you are wanting a count of competences per vacancy. This can be done very simply due to your table structure:
Select v.ID, count(*) from vacancy as v inner join CompTemplate_Table as CT
on v.Template_ID = CT.Template_ID group by v.ID;
The reason you can do only one join is because there will be a record in the CompTemplate_Table for every competency in each template. Additionally, the same key is used to join vacancy to templates as is used to join templates to CompTemplate_Table, so they represent the same key value (and you can skip joining the Templates table if you don't need data from there).
If you are wanting to add this data to a pivot table, I will leave that exercise to you. There are a number of tutorials available if you do a quick google search and it should not be that hard.
UPDATE: For the second query you are looking at something like:
Select cp.NameOfComp, count(*) from vacancy as v inner join CompTemplate_Table as CT
on v.Template_ID = CT.Template_ID inner join competencies as CP
on CP.ID = CT.Comp_ID
group by CP.NameOfComp
The differences here are you are adding in the comptetencies table, as you need data from that, and grouping by the CP.NameOfComp instead of the vacancy id. You can also restrict this to specific templates, competencies, or vacancies by adding in search conditions (e.g. where CP.ID = 12345)

How to store SQL Query result in table column

I'm aware of the INSERT INTO table_name QUERY; however, I'm unsure how to go about achieving the desired result in this case.
Here's a slightly contrived example to explain what I'm looking for, but I'm afraid I cannot put it more succiently.
I have two tables in a database designed for a hotel.
BOOKING and CUSTOMER_BOOKING
Where BOOKING contains PK_room_number, room_type, etc. and CUSTOMER_BOOKING contains FK_room_number, FK_cusomer_id
CUSTOMER_BOOKING is a linking table (many customers can make many bookings, and many bookings can consist of many customers).
Ultimately, in the application back-end I want to be able to list all rooms that have less than 3 customers associated with them. I could execute this a separate query and save the result in the server-side scripting.
However, a more elegant solution (from my point of view) is to store this within the BOOKING table itself. That is to add a column no_of_bookings that counts the number of times the current PK_room_number appears as the foreign key FK_room_number within the CUSTOMER_BOOKING table. And why do this instead? Because it would be impossible for me to write a single complicated query which will both include the information from all ROOMS, among other tables, and also count the occurrences of bookings, without excluding ROOMS that don't have any bookings. A very bad thing for a hotel website attempting to show free rooms!
So it would look like this
BOOKING: PK_room_number (104B) room_type (double) room_price (high), no_of_bookings (3)
BOOKING: PK_room_number (108C) room_type (single) room_price (low), no_of_bookings (1)
CUSTOMER_BOOKING: FK_room_number (104B) FK_customer_id (4312)
CUSTOMER_BOOKING: FK_room_number (104B) FK_customer_id (6372)
CUSTOMER_BOOKING: FK_room_number (104B) FK_customer_id (1112)
CUSTOMER_BOOKING: FK_room_number (108C) FK_customer_id (9181)
How would I go about creating this?
Because it would be impossible for me to write a single complicated
query which will both include the information from all ROOMS, among
other tables, and also count the occurrences of bookings, without
excluding ROOMS that don't have any bookings.
I wouldn't say it's impossible and unless you're running into performance issues, it's easier to implement than adding a new summary column:
select b.*, count(cb.room_number)
from bookings b
left join customer_booking cb on b.room_number = cb.room_number
group by b.room_number
Depending on your query may need to use a derived table containing the booking counts for each room instead instead
select b.*, coalesce(t1.number_of_bookings,0) number_of_bookings
from bookings b
left join (
select room_number, count(*) number_of_bookings
from customer_booking
group by room_number
) t1 on t1.room_number = b.room_number
You have to left join the derived table and select coalesce(t1.number_of_bookings,0) in case a room does not have any entries in the derived table (i.e. 0 bookings).
A summary column is a good idea when you're running into performance issues with counting the # of bookings each time. In that case I recommend creating insert and delete triggers on the customer_booking table that either increment or decrement the number_of_bookings column.
You could do it in a single straight select like this:
select DISTINCT
b1.room_pk,
c1.no_of_bookings
from cust_bookings b1,
(select room_pk, count(1) as no_of_bookings
from cust_bookings
group by room_pk) c1
where b1.room_pk = c1.room_pk
having c1.no_of_bookings < 3
Sorry i used my own table names to test it but you should figure it out easily enough. Also, the "having" line is only there to limit the rows returned to rooms with less than 3 bookings. If you remove that line you will get everything and could use the same sql to update a column on the bookings table if you still want to go that route.
Consider below solutions.
A simple aggregate query to count the customers per each booking:
SELECT b.PK_room_number, Count(c.FK_customer_id)
FROM Booking b
INNER JOIN Customer_Booking c ON b.PK_room_number = c.FK_room_number
GROUP BY b.PK_room_number
HAVING Count(c.FK_customer_id) < 3; # ADD 3 ROOM MAX FILTER
And if you intend to use a new column no_of_booking, here is an update query (using aggregate subquery) to run right after inserting new value from web frontend:
UPDATE Booking b
INNER JOIN
(SELECT b.PK_room_number, Count(c.FK_customer_id) As customercount
FROM Booking b
INNER JOIN Customer_Booking c ON b.PK_room_number = c.FK_room_number
GROUP BY b.PK_room_number) As r
ON b.PK_room_number = r.PK_room_number
SET b.no_of_booking = r.customercount;
the following generates a list showing all of the bookings and a flag of 0 or 1 if the the room has a customer for each of the rooms. it will display some rooms multiple times if there are multiple customers.
select BOOKING.*,
case CUSTOMER_BOOKING.FK_ROOM_NUMBER is null THEN 0 ELSE 1 END AS BOOKING_FLAG
from BOOKING LEFT OUTER JOIN CUSTOMER_BOOKING
ON BOOKING.PK_room_numer = CUSTOMER_BOOKING.FK_room_number
summing and grouping we arrive at:
select BOOKING.*,
SUM(case when CUSTOMER_BOOKING.FK_ROOM_NUMBER is null THEN 0 ELSE 1 END) AS BOOKING_COUNT
from BOOKING LEFT OUTER JOIN CUSTOMER_BOOKING
ON BOOKING.PK_room_number = CUSTOMER_BOOKING.FK_room_number
GROUP BY BOOKING.PK_room_number
there are at least two other solutions I can think of off the top of my head...

Removing duplicate rows in a complex MySQL Query - not sure if I'm doing this wrong, or it's a MySQL bug

I have a reasonably complex MySQL query being run on another developer's database. I am trying to copy over his data to our new database structure, so I'm running this query to get a load of the data over to copy. The main table has around 45,000 rows.
As you can see from the query below, there's a lot of fields from several different tables. My problem is that the field Ref.refno (as ref_id) is being pulled through, in some cases, two or three times. This is because in the table LandlordOnlineRef (LLRef) there are sometimes multiple rows with this same reference number - in this case, because the row should have been edited, but instead was duplicated...
Here's what I've tried doing: -
SELECT DISTINCT(Ref.refno) [...] - this makes no difference to the output at all, although I would've assumed it would stop selecting duplicate refno IDs
Is this a MySQL bug, or me? - I also tried adding GROUP BY ref_id to the end of my query. The query normally takes a few milliseconds to run, but when I add GROUP BY to the end, it seems to run infinitely - I waited several minutes but nothing was happening. I thought it might be struggling because I'm using LIMIT 1000, so I also tried LIMIT 10 but still get the same effect.
Here's the problem query - thanks!
SELECT
-- progress
Ref.refno AS ref_id,
Ref.tenantid AS tenant_id,
Ref.productid AS product_id,
Ref.guarantorid AS guarantor_id,
Ref.agentid AS agent_id,
Ref.companyid AS company_id,
Ref.status AS status,
Ref.startdate AS ref_start_date,
Ref.enddate AS ref_end_date,
-- ReferenceDetails
RefDetails.creditscore AS credit_score,
-- LandlordOnlineRef
LLRef.propaddress AS prev_ll_address,
LLRef.rent AS prev_ll_rent,
LLRef.startdate AS prev_ll_start_date,
LLRef.enddate AS prev_ll_end_date,
LLRef.arrears AS prev_ll_arrears,
LLRef.arrearsreason AS prev_ll_arrears_reason,
LLRef.propertycondition AS prev_ll_property_condition,
LLRef.conditionreason AS prev_ll_condition_reason,
LLRef.consideragain AS prev_ll_consider_again,
LLRef.completedby AS prev_ll_completed_by,
LLRef.contactno AS prev_ll_contact_no,
LLRef.landlordagent AS prev_ll_or_agent,
-- EmpDetails
EmpRef.cempname AS emp_name,
EmpRef.cempadd1 AS emp_address_1,
EmpRef.cempadd2 AS emp_address_2,
EmpRef.cemptown AS emp_address_town,
EmpRef.cempcounty AS emp_address_county,
EmpRef.cemppostcode AS emp_address_postcode,
EmpRef.ctelephone AS emp_telephone,
EmpRef.cemail AS emp_email,
EmpRef.ccontact AS emp_contact,
EmpRef.cgross AS emp_income,
EmpRef.cyears AS emp_years,
EmpRef.cmonths AS emp_months,
EmpRef.cposition AS emp_position,
-- EmpLlodReference
ELRef.lod_ref_status AS prev_ll_status,
ELRef.lod_ref_email AS prev_ll_email,
ELRef.lod_ref_tele AS prev_ll_telephone,
ELRef.emp_ref_status AS emp_status,
ELRef.emp_ref_tele AS emp_telephone,
ELRef.emp_ref_email AS emp_email
FROM ReferenceDetails AS RefDetails
LEFT JOIN progress AS Ref ON Ref.refno
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
LEFT JOIN EmpLlodReference AS ELRef ON ELRef.refno = Ref.refno
LEFT JOIN EmpDetails AS EmpRef ON EmpRef.tenantid = Ref.tenantid
-- For testing purposes to speed things up, limit it to 1000 rows
LIMIT 1000
LEFT JOIN progress AS Ref ON Ref.refno
is going to basically turn that into a cartesian join. You're not doing an explicit comparison, you're saying "join all records where there's a non-null value".
Shouldn't it be
LEFT JOIN progress AS Ref ON Ref.refno = RefDetails.something
?
Put all of the selected columns into DISTINCT, separated by ,. If you want to keep the renaming, wrap another SELECT DISTINCT(*) FROM (YOUR_SELECT) around.
Are there indexes on the columns in the GROUP BY clause? LIMIT is applied after GROUP BY. So limiting does not affect the query runtime.
General rule is to never group by more columns then you have to. Use a subquery with a group by on the table thats returning duplicate rows to get rid of them.
Change:
LEFT JOIN LandlordOnlineRef AS LLRef ON LLRef.refno = Ref.refno
to:
Left Join (select refno
, othercolumns you need
from LandlordOnlineRef
group by refno,othercolumn) as LLRef
Not sure on which columns you'll want to include here, but at any table level, you can change that table to a subquery to eliminate duplicate rows before the join. As MarkBannister says, you'll need some logic to identify a unique refno within LLRef . You can also use a date column for 'most recent' or any other logic you can think of to get back a unique LLRef and the info related to that record.
ugh, auto spell check is changing refno to refine. ha

simple joins between 2 mysql tables returning all results every time.. Help!

I just imported a large amount of data into two tables. Let's call them shipments and returns.
When trying to do a simple join (left or inner) based on any criteria in these two tables. query looks like it tries to do a cross join or find every combination instead of what the query should be pulling.
each table has an PK id field, but there is not FK relationship between the two other than some shared field.
I'm currently just trying to related them on shipment_id.
I feel this is a simple answer. Am I missing a reference or something obvious that is causing this? Thanks!
here's an example. This should returned under 100 rows. This instead returns hundreds of thousands.
SELECT r.*
FROM returns as r
left outer join shipments as s
on r.shipment_id = s.shipment_id
where r.date = '2011-06-20'
Here is a query that should work:
SELECT T0.*, T1.*
FROM shipments AS T0 LEFT JOIN returns AS T1 ON T0.shipment_id = T1.shipment_id
ORDER BY T0.shipment_id;
This query join assumes 1:1 on the shipment_id
It would be nice if you included the query you were using
You need to specify what you are joining on, otherwise it will do a cartesian join:
SELECT r.*
FROM returns as r
LEFT JOIN shipments as s ON s.shipment_id = r.shipment_id
where r.date = '2011-06-20'
Josh,
I would be interested in seeing what would happen if you forced a join to a specific record or set of records instead of the whole table. Assuming there is a shipment with an id of 5 in your table, you could try:
SELECT r.* FROM returns as r
left join shipments as s
ON 5 = r.shipment_id
WHERE r.date = '2011-06-20'
While just a fancy where clause, it would at least prove that the join you are attempting will eventually work correctly. The issue is that your on clause is always returning true, no matter what the value is. This could be because it's not interpreting the shipment_id as an integer, but instead as a true/false variable where any value evaluates to true.
Original Rejected Solution:
No Foreign Key relationship should be needed in order to make the joins happen. The PK id fields I'm assuming are an integer (or number, or whatever your rdms equivalent is)?
Can you past a snippet of your sql query?
Updating based on posted query:
I would add your explicit join criteria in order to rule out any funny business (my guess is since no criteria is specified, it's using 1=1, which always joins). So I would change your query to look like:
SELECT r.*
FROM returns as r
left join shipments as s ON
s.ShipId = R.ReturnId
where r.date = '2011-06-20'
The issue turned out to be very simple, just not readily apparent until going through all the columns. It turns out that the shipment ID was duplicated through every row as it hit the upper limit for the int datatype. This is why joins were returning every record.
After switching the datatype to bigint and reimporting, everything worked great. Thanks all for looking into it.