Get max row per group from a related table - mysql

This is my first time asking a question on here. It has been very helpful with learning.
I am trying to select a table and getting only rows that have a maximum value for its particular group in another table. One of the best answers that is very close but not quite there is this one (SQL Select only rows with Max Value on a Column) but it only relates to a single table. I have found some others with multiple table but not sure how exactly to use it.
I have a table with (simplified)
prodID, quantity, mach, etc
I then have a table with
prodStatusID, prodID, userID, subStatusID
a last table with sub status names
subStatusID, subStatusName
I am trying to get a table with all of the first table and the second table but only with the row that has the maximum status number and include the right status name.
My other concern which may not matter now but in a year or two when this thing starts to really fill up is performance. I dont know bad it is to have select inside a select but if I am trying to return all productions then it will be doing a query for every production.
Just to be clearer. in the second table prodStatus there might be 2 rows with prodID of 4 but the subStatusID for the first one would be 1 and the second one would be 2. The userID will be different. All I want to get back is the second row because it has the highest status number and I need the userID and statusName associated with that row.
I have been googling for 2 days to get this answer and I saw 1 about auctions but I just dont fully understand it even after researching it.

You need to create a subquery which get the maximum value of subStatusID for each prodID.
SELECT a.*, -- select only columns that you want to show
c.*, -- asterisks means all columns
d.*
FROM table1 a
INNER JOIN
(
SELECT prodID, max(subStatusID) maxID
FROM table2
GROUP BY prodID
) b ON a.prodID = b.prodID
INNER JOIN table2 c
ON b.prodID = c.prodID AND
b.maxID = c.subStatusID
INNER JOIN table3 d
ON c.subStatusID = d.subStatusID

Related

MySQL query - Select statement from two tables with group by returning records with largest ids

I really need help from you, I've spend a lot of time already on trying to figure it out but without success :(
I have two tables:
What I need is to group everything by sea_id / bat_season and gain the greatest Id's for these seasons. So bat_id's 3 & 5 should be returned with their linked data.
But if there is no data in Table 2 I still should see details of two seasons without Table 2 details.
My closest result is here with the below statement:
SELECT b.bat_id, b.bat_trophies, b.bat_ranking, s.sea_id, s.sea_name, s.sea_start
FROM gvg_seasons s
LEFT JOIN (SELECT bat_id, bat_trophies, bat_ranking, bat_season FROM gvg_battles ORDER BY bat_id DESC LIMIT 1) b
ON s.sea_id = b.bat_season
WHERE s.sea_gl_id = 1
GROUP BY s.sea_id DESC
The result:
Result
If someone can help me here please I will be very grateful.
I haven't tried this as I didn't fancy transcribing the table data from your images but it should provide the result you are looking for.
The innermost sub-query gets the max(bat_id) per bat_season. This is joined back to the gvg_battles to give the latest battle per season.
SELECT *
FROM gvg_seasons s
LEFT JOIN (
SELECT b1.*
FROM gvg_battles b1
JOIN (
SELECT bat_season, MAX(bat_id) AS max_bat_id
FROM gvg_battles
GROUP BY bat_season
) b_max ON b1.bat_id = b_max.max_bat_id
) b2 ON s.sea_id = b2.bat_season;

Cannot query a sum and compare the sum with another total in a mysql query

I want to check 2 databases to see if the money-payments are the same as the total. That is possible, but I get a very long table:
select
transaction_id
,total_low+total_high a
, sum(money_received) b
from
archive_transaction inner join archive_transaction_payment
on archive_transaction.id=archive_transaction_payment.transaction_id
group by transaction_id;
Actually I only want the transactions where the total is wrong!!
So now I want to add a!=b and that gives an invalid query. How to proceed?
Table archive_transaction has 1 row per transaction, but archive_transaction_payment can have multiple payments for one transaction. This makes it complicated for me.
select
transaction_id
,total_low+total_high a
, sum(money_received) b
from archive_transaction inner join archive_transaction_payment
on archive_transaction.id=archive_transaction_payment.transaction_id
where
a!=b
group by transaction_id;
Joins are still problematic for me, but I found an answer without join to find faults in the database.
SELECT id
FROM archive_transaction a
WHERE total_low + total_high != (SELECT Sum(money_received)
FROM archive_transaction_payment b
WHERE a.id = b.transaction_id);
Now I get a short list of problems in my database. Thanks for helping me out.

How to store SQL Query result in table column

I'm aware of the INSERT INTO table_name QUERY; however, I'm unsure how to go about achieving the desired result in this case.
Here's a slightly contrived example to explain what I'm looking for, but I'm afraid I cannot put it more succiently.
I have two tables in a database designed for a hotel.
BOOKING and CUSTOMER_BOOKING
Where BOOKING contains PK_room_number, room_type, etc. and CUSTOMER_BOOKING contains FK_room_number, FK_cusomer_id
CUSTOMER_BOOKING is a linking table (many customers can make many bookings, and many bookings can consist of many customers).
Ultimately, in the application back-end I want to be able to list all rooms that have less than 3 customers associated with them. I could execute this a separate query and save the result in the server-side scripting.
However, a more elegant solution (from my point of view) is to store this within the BOOKING table itself. That is to add a column no_of_bookings that counts the number of times the current PK_room_number appears as the foreign key FK_room_number within the CUSTOMER_BOOKING table. And why do this instead? Because it would be impossible for me to write a single complicated query which will both include the information from all ROOMS, among other tables, and also count the occurrences of bookings, without excluding ROOMS that don't have any bookings. A very bad thing for a hotel website attempting to show free rooms!
So it would look like this
BOOKING: PK_room_number (104B) room_type (double) room_price (high), no_of_bookings (3)
BOOKING: PK_room_number (108C) room_type (single) room_price (low), no_of_bookings (1)
CUSTOMER_BOOKING: FK_room_number (104B) FK_customer_id (4312)
CUSTOMER_BOOKING: FK_room_number (104B) FK_customer_id (6372)
CUSTOMER_BOOKING: FK_room_number (104B) FK_customer_id (1112)
CUSTOMER_BOOKING: FK_room_number (108C) FK_customer_id (9181)
How would I go about creating this?
Because it would be impossible for me to write a single complicated
query which will both include the information from all ROOMS, among
other tables, and also count the occurrences of bookings, without
excluding ROOMS that don't have any bookings.
I wouldn't say it's impossible and unless you're running into performance issues, it's easier to implement than adding a new summary column:
select b.*, count(cb.room_number)
from bookings b
left join customer_booking cb on b.room_number = cb.room_number
group by b.room_number
Depending on your query may need to use a derived table containing the booking counts for each room instead instead
select b.*, coalesce(t1.number_of_bookings,0) number_of_bookings
from bookings b
left join (
select room_number, count(*) number_of_bookings
from customer_booking
group by room_number
) t1 on t1.room_number = b.room_number
You have to left join the derived table and select coalesce(t1.number_of_bookings,0) in case a room does not have any entries in the derived table (i.e. 0 bookings).
A summary column is a good idea when you're running into performance issues with counting the # of bookings each time. In that case I recommend creating insert and delete triggers on the customer_booking table that either increment or decrement the number_of_bookings column.
You could do it in a single straight select like this:
select DISTINCT
b1.room_pk,
c1.no_of_bookings
from cust_bookings b1,
(select room_pk, count(1) as no_of_bookings
from cust_bookings
group by room_pk) c1
where b1.room_pk = c1.room_pk
having c1.no_of_bookings < 3
Sorry i used my own table names to test it but you should figure it out easily enough. Also, the "having" line is only there to limit the rows returned to rooms with less than 3 bookings. If you remove that line you will get everything and could use the same sql to update a column on the bookings table if you still want to go that route.
Consider below solutions.
A simple aggregate query to count the customers per each booking:
SELECT b.PK_room_number, Count(c.FK_customer_id)
FROM Booking b
INNER JOIN Customer_Booking c ON b.PK_room_number = c.FK_room_number
GROUP BY b.PK_room_number
HAVING Count(c.FK_customer_id) < 3; # ADD 3 ROOM MAX FILTER
And if you intend to use a new column no_of_booking, here is an update query (using aggregate subquery) to run right after inserting new value from web frontend:
UPDATE Booking b
INNER JOIN
(SELECT b.PK_room_number, Count(c.FK_customer_id) As customercount
FROM Booking b
INNER JOIN Customer_Booking c ON b.PK_room_number = c.FK_room_number
GROUP BY b.PK_room_number) As r
ON b.PK_room_number = r.PK_room_number
SET b.no_of_booking = r.customercount;
the following generates a list showing all of the bookings and a flag of 0 or 1 if the the room has a customer for each of the rooms. it will display some rooms multiple times if there are multiple customers.
select BOOKING.*,
case CUSTOMER_BOOKING.FK_ROOM_NUMBER is null THEN 0 ELSE 1 END AS BOOKING_FLAG
from BOOKING LEFT OUTER JOIN CUSTOMER_BOOKING
ON BOOKING.PK_room_numer = CUSTOMER_BOOKING.FK_room_number
summing and grouping we arrive at:
select BOOKING.*,
SUM(case when CUSTOMER_BOOKING.FK_ROOM_NUMBER is null THEN 0 ELSE 1 END) AS BOOKING_COUNT
from BOOKING LEFT OUTER JOIN CUSTOMER_BOOKING
ON BOOKING.PK_room_number = CUSTOMER_BOOKING.FK_room_number
GROUP BY BOOKING.PK_room_number
there are at least two other solutions I can think of off the top of my head...

How to deal with bad data in mysql?

I have three tables that I want to combine.
I have the following query to run:
DROP TABLE
IF EXISTS testgiver.smart_curmonth_downs;
CREATE TABLE testgiver.smart_curmonth_downs
SELECT
ldap_karen.uid,
ldap_karen.supemail,
ldap_karen.regionname,
smart_curmonth_downs_raw.username,
smart_curmonth_downs_raw.email,
smart_curmonth_downs_raw.publisher,
smart_curmonth_downs_raw.itemtitle,
smart_items.`Owner`
FROM
smart_curmonth_downs_raw
INNER JOIN ldap_karen ON smart_curmonth_downs_raw.username = ldap_karen.uid
INNER JOIN smart_items ON smart_curmonth_downs_raw.itemtitle = smart_items.Title
I want to know how to create the joins while maintaining a one to one relationship at all times with rows in table smart_curmonth_downs_raw.
For instance if there is not a uid in ldap_karen I have issues. And then the last issue I have found is that our CMS is allowing for duplicate itemtitle. So if I run my query I am getting a lot more rows because it is creating a row for each itemtitle. For example would there be a way to only catch the last itemtitle that is in smart_items. I would just really like to maintain the same number of rows - and I have no control over the integrity issues of the other tables.
The smart_curmonth_downs_raw table is the raw download information (download stats), the karen table adds unique user information, and the smart_items table adds unique items (download) info. They are all important. If a user made a download but is knocked off the karen table I would like to see NULLs for the user info and if there is more than one item in smart_items that has the same name then I would like to see just the item with the highest ID.
It sounds like relationship between smart_curmonth_downs_raw and ldap_karen is optional, which means you want to use a LEFT JOIN which all the rows in the first table, and, if the right table does not exists, use NULL as the right table's column values.
In terms of the last item in the smart_items table, you could use this query.
SELECT title, MAX(id) AS max_id
FROM smart_items
GROUP BY title;
Combining that query with the other logic, try this query as a solution.
SELECT COALESCE(ldap_karen.uid, 'Unknown') AS uid,
COALESCE(ldap_karen.supemail, 'Unknown') AS supemail,
COALESCE(ldap_karen.regionname, 'Unknown') AS regionname,
smart_curmonth_downs_raw.username,
smart_curmonth_downs_raw.email,
smart_curmonth_downs_raw.publisher,
smart_curmonth_downs_raw.itemtitle,
smart_items.`Owner`
FROM smart_curmonth_downs_raw
INNER JOIN (SELECT title, MAX(id) AS max_id
FROM smart_items
GROUP BY title) AS most_recent
ON smart_curmonth_downs_raw.itemtitle = most_recent.Title;
INNER JOIN smart_items
ON most_recent.max_id = smart_items.id
LEFT JOIN ldap_karen
ON smart_curmonth_downs_raw.username = ldap_karen.uid;

Retrieve top 1 and 2 records from each group from table

I have a query that needs to get the first and second highest sku in each members wishlist. The below query works, but it takes way too long because there's about 9 million users and each user has about 10 wishlist items, so you can see that the query below will never finish.
SELECT MAX(CASE WHEN wl.rank = 1 THEN wl.SKU ELSE NULL END) AS [highestSku],
MAX(CASE WHEN wl.rank = 2 THEN wl.SKU ELSE NULL END) AS [secondHighestSku],
FROM Member m
LEFT JOIN (SELECT *
FROM (SELECT DENSE_RANK() OVER (PARTITION BY wl.MemberID ORDER BY wli.Price DESC) AS rank, wl.MemberID, wli.SKU
FROM WishListItem wli
INNER JOIN WishList wl ON wli.WishListID = wl.ID) T1) w ON w.MemberID = m.ID
My question is, is there a better way to get the top first and second records for each user? If not, is there a way I can optimize this query? Ideally, if I can restirct the number of tiems pulled back from the ranking query (the one with the DENSE_RANK()) that will help me out. I wanted to do something like WHERE DENDS_RANK() <= 2, but that's not possible, and doing it outside of the brackets defeats the purpose of the soultion.
Also, this is just part of the query. I actually have even more left joins across more tables that have just as many items, and I need to get the top 1 and 2 records for each user.
And this needs to be done in one query, or as much as possible in one because I'm throwing it in a data table. I can also reduce the number of records, ie. TOP 1000, and break up the query, but I will need to be able to continue from where I left off... also, I did try TOP 1000, and after 10 minutes, I cancelled the query because I need to get all 9 million records out.
I'd grab a relatively small subset of the data, stick it in a table variable, and run the query off that instead of the main (and likely very "busy") tables:
DECLARE #Member TABLE
(
ID int IDENTITY (1, 1) PRIMARY KEY NOT NULL,
-- add necessary columns to this definition.
)
INSERT INTO #Member (field1, field2...)
SELECT field1, field2 -- etc.
FROM YourTables
WHERE SomeCriteria = Whatever
Make sure that the WHERE clause defines a narrower subset of data than your production tables. If performance still suffers, you could create table variables for the other tables you're joining, then use those in the final query.