I have a query which is used to obtain information about the owner of a vehicle at a particular point in time, when it is sighted (from the vehicle_sightings table). I have attached a snippet of part of the query below:
SELECT
sighting_id
FROM
vehicle_sightings
INNER JOIN
vehicle_vrn ON vehicle_sightings.plate = vehicle_vrn.vrnno
INNER JOIN
vehicle_ownership ON vehicle_vrn.fk_sysno = vehicle_ownership.fk_sysno
WHERE
vehicle_sightings.seenDate >= vehicle_ownership.ownership_start_date
AND (vehicle_sightings.seenDate <= vehicle_ownership.ownership_end_date
OR vehicle_ownership.ownership_end_date IS NULL
OR vehicle_ownership.ownership_end_date = '0001-01-01 00:00:00')
This works well for most scenarios where a vehicle only has one owner in its history. However, there are some instances where the ownership_end_date field is not filled in (it is filled in for most cases, as it indicates that a vehicle would have changed hands, and is from that stage onwards passed on to a new owner). In these instances where it is not filled in (or left default), all entries for that ownership history are returned, such as the below case:
In that case above, the query returns both of those records as the seenDate fits in both of them since the end date is not filled in (and has the default value in that case). I therefore need to modify my query to return the record with the highest ownership_start_date in those cases.
I tried to do this by adding the following at the end:
GROUP BY sighting_id HAVING seenDate >= MAX(ownership_start_date)
This however did not work, as many less records were returned. Is there a clean way this can be achieved, maybe without the GROUP BY?
To start with, using a default date as you have is an extremely bad idea. You should already have an idea of that, because now you're stuck coding against a hard-coded date in some scenarios. When another developer is coding against the database (or even you at a later date) they now have to both know and remember that they have to code for exceptions based on some hard-coded date.
Further, your ownership_end_date should always be after the ownership_start_date, which this "default" date is now going to violate. If you don't know the date or the date doesn't exist yet then it should be NULL - that's exactly what NULL is for - unknown.
For your specific issue, you can do this with a LEFT JOIN that checks for other owners that fit the criteria and excludes the row if a better one exists. You didn't provide all of your table structures and it was a little unclear on whether or not you just wanted the latest owner before the sighted date (what I've done) or all owners who owned the car after the sighted date, so I don't know if this works, but something like this:
SELECT
VS.sighting_id -- ALWAYS use table aliases or prefixes for clarity
FROM
vehicle_sightings VS
INNER JOIN vehicle_vrn VRN ON VRN.vrnno = VS.plate
INNER JOIN vehicle_ownership VO ON VO.fk_sysno = VRN.fk_sysno
LEFT OUTER JOIN vehicle_ownership VO2 ON
VO2.fk_sysno = VRN.fk_sysno AND
VO2.ownership_start_date <= VS.seenDate AND
(
VO2.ownership_end_date >= VS.seenDate OR
VO2.ownership_end_date IS NULL OR
VO2.ownership_end_date = '0001-01-01 00:00:00'
) AND
VO2.ownership_start_date > VO.ownership_start_date
WHERE
VS.seenDate >= VO.ownership_start_date AND
(
VS.seenDate <= VO.ownership_end_date OR
VO.ownership_end_date IS NULL OR
VO.ownership_end_date = '0001-01-01 00:00:00'
) AND
VO2.id IS NULL -- Or some other non-nullable column
One last caveat: decide on a naming convention and stick to it (seenDate vs ownership_end_date for example) and use names that make sense (what is an fk_sysno??)
Here is a solution which doesn't require a subquery. It ensures that there exists no ownership records greater than the record which is returned.
SELECT
sighting_id
FROM
vehicle_sightings
INNER JOIN
vehicle_vrn ON vehicle_sightings.plate = vehicle_vrn.vrnno
INNER JOIN
vehicle_ownership ON vehicle_vrn.fk_sysno = vehicle_ownership.fk_sysno
LEFT OUTER JOIN
vehicle_ownership vo2 ON vo2.fk_sysno = vehicle_ownership.fk_sysno
AND vo2.ownership_start_date > vehicle_ownership.ownership_start_date
WHERE
vehicle_sightings.seenDate >= vehicle_ownership.ownership_start_date
AND (vehicle_sightings.seenDate <= vehicle_ownership.ownership_end_date
OR vehicle_ownership.ownership_end_date IS NULL
OR vehicle_ownership.ownership_end_date = '0001-01-01 00:00:00')
AND vo2.fk_sysno IS NULL
Related
I have 3 tables in my DB; Transactions, transaction_details, and accounts - basically as below.
transactions :
id
details
by_user
created_at
trans_details :
id
trans_id (foreign key)
account_id
account_type (Enum -[c,d])
amount
Accounts :
id
sub_name
In each transaction each account may be creditor or debtor. What I'm trying to get is an account statement (ex : bank account movements) so I need to query each movement when the account is type = c (creditor) or the account type is = d (debtor)
trans_id, amount, created_at, creditor_account, debtor_account
Update : I tried the following query but i get the debtor column values all Null!
SELECT transactions.created_at,trans_details.amount,(case WHEN trans_details.type = 'c' THEN sub_account.sub_name END) as creditor,
(case WHEN trans_details.type = 'd' THEN sub_account.sub_name END) as debtor from transactions
JOIN trans_details on transactions.id = trans_details.trans_id
JOIN sub_account on trans_details.account_id = sub_account.id
GROUP by transactions.id
After the help of #Jalos I had to convert the query to Laravel which also toke me 2 more hours to convert and get the correct result :) below is the Laravel code in case some one needs to perform such query
I also added between 2 dates functionality
public function accountStatement($from_date,$to_date)
{
$statemnt = DB::table('transactions')
->Join('trans_details as credit_d',function($join) {
$join->on('credit_d.trans_id','=','transactions.id');
$join->where('credit_d.type','c');
})
->Join('sub_account as credit_a','credit_a.id','=','credit_d.account_id')
->Join('trans_details as debt_d',function($join) {
$join->on('debt_d.trans_id','=','transactions.id');
$join->where('debt_d.type','d');
})
->Join('sub_account as debt_a','debt_a.id','=','debt_d.account_id')
->whereBetween('transactions.created_at',[$from_date,$to_date])
->select('transactions.id','credit_d.amount','transactions.created_at','credit_a.sub_name as creditor','debt_a.sub_name as debtor')
->get();
return response()->json(['status_code'=>2000,'data'=>$statemnt , 'message'=>''],200);
}
Your transactions table denotes transaction records, while your accounts table denotes account records. Your trans_details table denotes links between transactions and accounts. So, since in a transaction there is a creditor and a debtor, I assume that trans_details has exactly two records for each transaction:
select transactions.id, creditor_details.amount, transactions.created_at, creditor.sub_name, debtor.sub_name
from transactions
join trans_details creditor_details
on transactions.id = creditor_details.trans_id and creditor_details.account_type = 'c'
join accounts creditor
on creditor_details.account_id = creditor.id
join trans_details debtor_details
on transactions.id = debtor_details.trans_id and debtor_details.account_type = 'd'
join accounts debtor
on debtor_details.account_id = debtor.id;
EDIT
As promised, I am looking into the query you have written. It looks like this:
SELECT transactions.id,trans_details.amount,(case WHEN trans_details.type = 'c' THEN account.name END) as creditor,
(case WHEN trans_details.type = 'd' THEN account.name END) as debtor from transactions
JOIN trans_details on transactions.id = trans_details.trans_id
JOIN account on trans_details.account_id = account.id
GROUP by transactions.id
and it is almost correct. The problem is that due to the group-by MySQL can only show a single value for each record for creditor and debtor. However, we know that there are exactly two values for both: there is a null value for creditor when you match with debtor and a proper creditor value when you match with creditor. The case for debtor is similar. My expectation for this query would have been that MySQL would throw an error because you did not group by these computed case-when fields, yet, there are several values, but it seems MySQL can surprise me after so many years :)
From the result we see that MySQL probably found the first value and used that both for creditor and debtor. Since it met with a creditor match as a first match, it had a proper creditor value and a null debtor value. However, if you write bullet-proof code, you will never meet these strange behavior. In our case, doing some minimalistic improvements on your code transforms it into a bullet-proof version of it and provides correct results:
SELECT transactions.id,trans_details.amount,max((case WHEN trans_details.type = 'c' THEN account.name END)) as creditor,
max((case WHEN trans_details.type = 'd' THEN account.name END)) as debtor from transactions
JOIN trans_details on transactions.id = trans_details.trans_id
JOIN account on trans_details.account_id = account.id
group by transactions.id
Note, that the only change I did with your code is to wrap a max() function call around the case-when definitions, so we avoid the null values, so your approach was VERY close to a bullet-proof solution.
Fiddle: http://sqlfiddle.com/#!9/d468dc/10/0
However, even though your thought process was theoretically correct (theoretically there is no difference between theory and practice, but in practice they are usually different) and some slight changes are transforming it into a well-working code, I still prefer my query, because it avoids group by clauses, which can be useful, if necessary, but here it's unnecessary to do group by, which is probably better in terms of performance, memory usage, it's easier to read and keeps more options open for you for your future customisations. Yet, your try was very close to a solution.
As about my query, the trick I used was to do several joins with the same tables, aliasing them and from that point differentiating them as if they were different tables. This is a very useful trick that you will need a lot in the future.
Good day,
I have a small issue with MySQL Distinct.
Trying the following query in my system :
SELECT DISTINCT `booking_id`, `booking_ticket`, `booking_price`, `bookingcomment_id`, `bookingcomment_message` FROM `mysystem_booking`
LEFT JOIN `mysystem_bookingcomment` ON `mysystem_booking`.`booking_id` = `mysystem_bookingcomment`.`bookingcomment_link`
WHERE `booking_id` = 29791
The point is that there are bookings like 29791 that have many comments added.
Let's say 10. Then when running the above query I see 10 results instead of one.
And that's not the way DISTINCT supposes to work.
I simply want to know if there are any comments. If the comment ID is not 0 then there is a comment. Of course I can add COUNT(blabla) as comment_number but that's a whole different story. For me now I'd like just to have this syntax right.
You may try aggregating here, to find which bookings have at least a single comment associated with them:
SELECT
b.booking_id,
b.booking_ticket,
b.booking_price
FROM mysystem_booking b
LEFT JOIN mysystem_bookingcomment bc
ON b.booking_id = bc.bookingcomment_link
WHERE
b.booking_id = 29791
GROUP BY
b.booking_id
HAVING
COUNT(bc.bookingcomment_link) > 0;
Note that depending on your MySQL server mode, you might have to also add the booking_ticket and booking_price columns to the GROUP BY clause to get the above query to run.
You can try below - using a case when expression
SELECT DISTINCT `booking_id`, `booking_ticket`, `booking_price`, `bookingcomment_id`,
case when `bookingcomment_message`<>'0' then 'No' else 'Yes' end as comments
FROM `mysystem_booking`
LEFT JOIN `mysystem_bookingcomment` ON `mysystem_booking`.`booking_id` = `mysystem_bookingcomment`.`bookingcomment_link`
WHERE `booking_id` = 29791
From MySQL 5.7 I am executing a LEFT JOIN, and the WHERE clause calls a user-defined function of mine. It fails to find a matching row which it should find.
[Originally I simplified my actual code a bit for the purpose of this post. However in view of a user's proposed response, I post the actual code as it may be relevant.]
My user function is:
CREATE FUNCTION `jfn_rent_valid_email`(
rent_mail_to varchar(1),
agent_email varchar(45),
contact_email varchar(60)
)
RETURNS varchar(60)
BEGIN
IF rent_mail_to = 'A' AND agent_email LIKE '%#%' THEN
RETURN agent_email;
ELSEIF contact_email LIKE '%#%' THEN
RETURN contact_email;
ELSE
RETURN NULL;
END IF
END
My query is:
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email)
AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL)
This produces no rows.
However. When a.AgentEmail IS NULL if I only change from
AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL)
to
AND (jfn_rent_valid_email(r.MailTo, NULL, co.Email) IS NOT NULL)
it does correctly produce a matching row:
RentCode, MailTo, AgentEmail, Email, ValidEmail
ZAKC17, N, <NULL>, name#email, name#email
So, when a.AgentEmail is NULL (from non-matching LEFT JOINed row), why in the world does passing it to the function as a.AgentEmail act differently from passing it as a literal NULL?
[BTW: I believe I have used this kind of construct under MS SQL server in the past and it has worked as I would expect. Also, I can reverse the test of AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL) to AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NULL) yet I still get no match. It's as though any reference to a.... as a parameter to the function causes no matching row...]
Most likely this is an issue with optimizer turning the LEFT JOIN into a INNER JOIN. The optimizer may do this when it believes that the WHERE-condition is always false for the generated NULL row (which it in this case is not).
You can take a look at the query plan with the EXPLAIN command, you will likely see different table order depending on the query variation.
If the actual logic of the function is to check all emails with one function call, you may have better luck with using a function that takes just one email address as parameter and use that for each email-column.
You can try without the function:
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email)
AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
AND ((r.MailTo='A' AND a.AgentEmail LIKE '%#%') OR co.Email LIKE '%#%' )
Or wrap the function in a subquery:
SELECT q.RentCode, q.MailTo, q.AgentEmail, q.Email, q.ValidEmail
FROM (
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
) as q
WHERE q.ValidEmail IS NOT NULL
Changing the call to the function in the WHERE clause to read
jfn_rent_valid_email(r.MailTo, IFNULL(a.AgentEmail, NULL), IFNULL(co.Email, NULL)) IS NOT NULL
solves the issue.
It appears that the optimizer feels it can incorrectly guess that the function will return NULL in the non-match LEFT JOIN case if a plain reference to a.AgentEmail is passed as any parameter. But if the column reference is inside any kind of expression the optimizer ducks out. Wrapping it inside a "dummy", seemingly pointless IFNULL(column, NULL) is thus enough to restore correct behaviour.
I am marking this as the accepted solution because it is by far the simplest workaround, requiring the least code change/complete query rewrite.
However, full credit is due to #slaakso's post here in this topic for analysing the problem. Note that he states that the behaviour has been fixed/altered in MySQL 8 such that this workaround is unnecessary, so it may only be necessary in MySQL 5.7 or earlier.
I have a MySQL table called EssayStats with three columns, EssayDate, WordCount and EssayId.
Each row is a record of when the bot recorded how many words were in an essay at a particular point in time.
I'm trying to write a query that will group by EssayId and sort by the largest increase in WordCount from a particular EssayDate to an ending EssayDate.
I'm not really sure where to start. I've tried a handful of things but they obviously don't accomplish what I jeed. My most recent query attempt was
SELECT *
FROM EssayStats
WHERE EssayDate >= "2014-01-01" AND EssayDate <= "2014-05-31"
GROUP BY EssayId
ORDER BY (WordCount)
Start by getting the dates at the beginning and end for each essay. Then join back the original tables to get the counts and do some arithmetic:
select es.EssayId, (esmax.WordCount - esmin.WordCount)
from (select es.EssayId, min(es.EssayDate) as mined, max(es.EssayDate) as maxed
from EssayStats es
group by es.EssayId
) es join
EssayStats esmin
on es.EssayId = esmin.EssayId and es.mined = esmin.EssayDate join
EssayStats esmax
on es.EssayId = esmax.EssayId and es.maxed = esmax.EssayDate;
I am trying to build an access query with multiple criteria. The table to be queried is "tblVendor" which has information about vendor shipment data as shown below:
The second table is "tblSchedule" which has the schedule for each Vendor cutoff date. This table has cutoff dates for data analysis.
For each vendor, I need to select records which have the ShipDate >= CutoffDate. Although not shown in the data here, it may be possible that multiple vendors have same CutoffDate.
For small number of records in "tblCutoffdate", I can write a query which looks like:
SELECT tblVendors.ShipmentId, tblVendors.VendorNumber, tblVendors.VendorName,
tblVendors.Units, tblVendors.ShipDate
FROM tblVendors INNER JOIN tblCutoffDate ON tblVendors.VendorNumber =
tblCutoffDate.VendorNumber
WHERE (((tblVendors.VendorNumber) In (SELECT VendorNumber FROM [tblCutoffDate] WHERE
[tblCutoffDate].[CutoffDate] = #2/1/2014#)) AND ((tblVendors.ShipDate)>=#2/1/2014#)) OR
(((tblVendors.VendorNumber) In (SELECT VendorNumber FROM [tblCutoffDate] WHERE
[tblCutoffDate].[CutoffDate] = #4/1/2014#)) AND ((tblVendors.ShipDate)>=#4/1/2014#));
As desired, the query gives me a result which looks like:
What concerns me now is that I have a lot of records being added to the "tblCutoffDate" which makes it difficult for me to hardcode the dates in the query. Is there a better way to write the above SQL statement without any hardcoding?
You might try something like -- this should handle vendors having no past cutoff,
or those having no future cutoff
"today" needs a suitable conversion to just date w/o time
comparison "=" may go on both, or one, or none Max/Min
"null" may be replaced by 1/1/1900 and 12/31/3999 in Max/Min
SELECT tblvendors.shipmentid,
tblvendors.vendornumber,
tblvendors.vendorname,
tblvendors.units,
tblvendors.shipdate
FROM tblvendors
LEFT JOIN
( SELECT vendornum,
Max( iif cutoffdate < today, cutoffdate, null) as PriorCutoff,
Min( iif cutoffdate >= today, cutoffdate, null) as NextCutoff
FROM tblcutoffdate
GROUP BY vendornum
) as VDates
ON vendornumber = vendornum
WHERE tblvendors.shipdate BETWEEN PriorCutoff and NextCutoff
ORDER BY vendornumber, shipdate, shipmentid
A simpler WHERE clause should give you what you want.
SELECT
v.ShipmentId,
v.VendorNumber,
v.VendorName,
v.Units,
v.ShipDate
FROM
tblVendors AS v
INNER JOIN tblCutoffDate AS cd
ON v.VendorNumber = cd.VendorNumber
WHERE v.ShipDate >= cd.CutoffDate;