Error in SQL query on Adventure Work database - sql-server-2008

I am looking for a solution to the problem from Beginning Microsoft SQL ServerĀ® 2008 Programming book.(AdventureWorks database used)
Ques> Show the most recent five orders that were purchased from account numbers that have spent more than $70,000 with AdventureWorks.
I found a solution to this on See query but this is not working. What's the error in this query?
SELECT ss.AccountNumbeR, OrderDate
FROM Sales.SalesOrderHeader ss
WHERE ss.TotalDue > 70000
AND ss.AccountNumber NOT IN (SELECT TOP 5 ss.AccountNumber
FROM Sales.SalesOrderDetail so
JOIN Sales.SalesOrderHeader ss
ON SO.SalesOrderID = ss.SalesOrderID
HAVING SUM(so.LineTotal > 70000))
GROUP BY ss.TotalDue, ss.AccountNumber, OrderDate
ORDER BY OrderDate DESC

I'm not entirely sure why you though this query would work:
It's querying for orders where the account number is not in the top 5 which is the opposite of what we want, moreover TOP 5 here has no ordering so it's entirely random. And this is entirely in the wrong place, the question wants the top 5 orders, not the top 5 accounts.
It's also filtering for orders where each of the orders individually has a value greater than 70000. And HAVING SUM(so.LineTotal > 70000) is a syntax error.
I would advise you instead to use window functions here
SELECT s.AccountNumber, s.OrderDate
FROM (
SELECT so.*,
TotalPerAccount = SUM(ss.LineTotal) OVER (PARTITION BY so.AccountNumber),
rn = ROW_NUMBER() OVER (PARTITION BY so.AccountNumber ORDER BY so.OrderDate DESC)
FROM Sales.SalesOrderHeader so
JOIN Sales.SalesOrderDetail ss ON so.SalesOrderID = ss.SalesOrderID
) s
WHERE s.TotalPerAccount > 70000
AND s.rn <= 5;
db<>fiddle

Related

SQL: Select records based on comparison of two most recent associated records

Let's say we have a person table and survey table. survey is a set of attributes collected from a person at some point in time. Let's say survey has columns address and marriage_status
How do I select all persons whose address or marriage status has changed in the last survey?
Here's how I would write it if MySQL were able to magically interpret my intention:
SELECT *
FROM person
JOIN
(SELECT *
FROM survey
GROUP BY survey.person_id
ORDER BY survey.timestamp DESC
LIMIT 2 EACH) -- of course this part doesn't actually work. Trying to get last 2 records per person
surveys
ON surveys.person_id = person.id
WHERE surveys[0].address != surveys[1].address
OR surveys[0].marriage_status != surveys[1].marriage_status;
OR
SELECT *
FROM person
JOIN
(SELECT MOST RECENT survey FOR EACH person) latest_survey
ON latest_survey.person_id = person.id
JOIN
(SELECT SECOND MOST RECENT survey FOR EACH person) previous_survey
ON previous_survey.person_id = person.id
WHERE latest_survey.address != previous_survey.address
OR latest_survey.marriage_status != previous_survey.marriage_status;
This seems like a relatively straightforward query, but it's driving me crazy. I suspect I have tunnel vision and I'm not approaching this the right way.
EDIT: I am on MySQL v5. Based on the first couple answers, it seems like this might be the time to migrate to v8 (among other reasons)
So here's how I ended up doing it. It's a little long, but I think it's pretty straightforward? This felt amazing to get working.
(Note that underscores are used as prefixes in table aliases to help keep track of subquery depth)
SELECT person.*
FROM person
JOIN (
-- Join full survey data against each 'most recent' survey timestamp
SELECT s1.*
FROM survey s1
JOIN (
-- get most recent timestamp for each person
SELECT _s1.person_id, MAX(_s1.timestamp) timestamp
FROM survey _s1
GROUP BY person_id
) latest_surveys
ON latest_surveys.person_id = s1.person_id and latest_surveys.timestamp = s1.timestamp
) latest
ON latest.person_id = person.id
JOIN (
-- Join full survey data against each 'SECOND most recent' survey timestamp
select s2.*
from survey s2
JOIN (
-- to get SECOND most recent survey timestamp, do similar query, but exclude latest timestamp
SELECT _s2.person_id, MAX(_s2.timestamp) timestamp
FROM survey _s2
JOIN (
-- get most recent timestamp for each person (again)
SELECT __s2.person_id, MAX(__s2.timestamp) timestamp
FROM survey __s2
GROUP BY person_id
) _latest_surveys
-- Note the *NOT* equal here
ON _latest_surveys.person_id = _s2.person_id and _latest_surveys.timestamp != _s2.timestamp
GROUP BY _s2.person_id
) previous_surveys
ON previous_surveys.person_id = s2.person_id and previous_surveys.timestamp = s2.timestamp
) previous
ON previous.person_id = person.id
WHERE latest.address != previous.address
OR latest.marriage_status != previous.marriage_status;
Analytic functions make your question much more tractable. If you are not yet using MySQL 8+, then now would be a good time to upgrade. Assuming you are using MySQL 8+, we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY p.id ORDER BY s.timestamp DESC) rn
FROM person p
INNER JOIN survey s ON p.id = s.person_id
)
SELECT id
FROM cte
GROUP BY id
HAVING
MAX(CASE WHEN rn = 1 THEN address END) <> MAX(CASE WHEN rn = 2 THEN address END) OR
MAX(CASE WHEN rn = 1 THEN marriage_status END) <> MAX(CASE WHEN rn = 2 THEN marriage_status END);
The above query uses a pivot trick to isolate the latest, and second latest, addresses and marriage statuses for each person. It retains person id values for those whose latest and second latest addresses or marriage statuses are not identical.
This might be how you can achieve that:
SELECT *
FROM person
JOIN (
SELECT *,
MAX(survey_date) latest_survey,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(survey_date ORDER BY person_id, survey_date ASC),',',-2),',',1) previous_survey,
SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-1) curadd,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-2),',',1) prevadd,
SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-1) curms,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-2),',',1) prevms
FROM survey GROUP BY person_id
HAVING curadd != prevadd OR curms != prevms) A
ON person.id=A.person_id;
Using GROUP_CONCAT and SUBSTRING_INDEX to combine the data value then separate it again and using those to compare at the end. I know there are a bunch of ways to achieve without all these, like your second example is something that I think can be done but when I think about it, it's going to be a very long query. This query however, since you're not using MySQL 8+ is much shorter but the performance of this query is a concern especially on a large table.
It is not given, but I hope you have at least MySQL 8 or similar to have ability to use Common Table Expression. It can simplify the complex query.
The trick part is getting survey records #1 and #2 for each user. I will do it this way: see cte1 and cte2 definition
WITH
cte1 AS (
SELECT MAX(x1.id) AS id, x1.person_id
FROM survey x1
GROUP BY x1.person_id),
cte2 AS (
SELECT MAX(x2.id) AS id, x2.person_id
FROM survey x2
JOIN cte1 ON cte1.person_id = x2.person_id
AND cte1.id > x2.id
GROUP BY x2.person_id)
SELECT
p.*,
s1.address, s2.address address2,
s1.marriage_status, s2.marriage_status marriage_status2
FROM person AS p
JOIN (
cte1 JOIN survey s1 ON s1.id = cte1.id
) ON cte1.person_id = p.id
JOIN (
cte2 JOIN survey s2 ON s2.id = cte2.id
) ON cte2.person_id = p.id
WHERE
(s1.address <> s2.address)
OR (s1.marriage_status <> s2.marriage_status)
https://www.db-fiddle.com/f/hLwdHiZin4MkdUZ4aBz67H/2
Update: Thanks to Ian, I replaced MIN to MAX to get recent records

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

finding change between records in MySQL

I have a table where I am storing the stored number of barrels inside of many tanks. I am storing values here every night at midnight, and at the beggining and end of any operator initiated transfer.
What I want to return is the number of barrels difference since the previous event record for that specific tank. I have the correct ID for the self join to get the previous record number, however the barrels is incorrect.
Here is what I currently have.
SELECT
inventory.id,
MAX(inventory2.id) AS id2,
inventory.tankname,
inventory.barrels,
inventory.eventstamp,
inventory2.barrels
FROM
inventory
LEFT JOIN
inventory inventory2 ON inventory2.tankname = inventory.tankname AND inventory2.eventstamp < inventory.eventstamp
GROUP BY
inventory.id,
inventory.tankname,
inventory.barrels,
inventory.eventstamp
ORDER BY
inventory.tankname,
inventory.eventstamp
That returns the following
Just use correlated subqueries:
SELECT i.*,
(SELECT i2.id
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_id,
(SELECT i2.barrels
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_barrels
FROM inventory i
ORDER BY i.tankname, i.eventstamp;
Your query doesn't work because you have columns in the SELECT that are not in the GROUP BY and are not aggregated. That shouldn't be allowed in any database; it is unfortunate that MySQL does allow it.

Best way to subtract SUM and COUNT with 2 table select?

I have came up with solution to count total of fields based on specific group, but it looks quite lengthy to get to the result i expect.
I have some basic knowledge when it comes to sql.
Is there obvious improvements to be made and why?
Why i would like to shorten this: Easier to implement in ORM type systems.
Changing scheme is not an option.
Schema and sample data: http://sqlfiddle.com/#!9/62df6
Query i'm using:
SELECT s.release_id,
(s.shipments_total - IFNULL(sh.shipment_entries, 0)) AS shipments_left
FROM
( SELECT release_id,
SUM(shipments) AS shipments_total
FROM subscriptions
WHERE is_paid = 1
AND shipments > 1
GROUP BY release_id ) AS s
LEFT JOIN
( SELECT release_id,
COUNT(*) AS shipment_entries
FROM shipments
GROUP BY release_id ) AS sh ON s.release_id = sh.release_id
Expected result on sample data is in sqlfiddle.
If you bring the condition in-line and remove the group by, then you don't need ifnull():
SELECT s.release_id,
(SUM(s.Shipments) -
(SELECT COUNT(*)
FROM shipments sh
WHERE sh.release_id = s.release_id
)
) AS shipments_left
FROM subscriptions s
WHERE is_paid = 1 AND shipments > 1
GROUP BY s.release_id;
The subquery returns 0 if nothing matches, not NULL (with the GROUP BY, it would return NULL). I am not sure if this is easier with your ORM model. Your original version is fine from a SQL point of view.
You can bring the join inline instead:
SELECT s.release_id,
SUM(s.Shipments) - IFNULL(( SELECT COUNT(*) AS shipment_entries
FROM shipments sh
WHERE sh.release_id = s.release_id
GROUP BY sh.release_id ), 0) AS shipments_left
FROM subscriptions s
WHERE is_paid = 1
AND shipments > 1
GROUP BY s.release_id
The execution plan for this is more performant too.

optimise and merge sql queries into one - newbie alert

how is the best way to merge these sql queries from the db.
I'm using this but wants to add the two others into it if possible
SELECT m.username, sum(d.liter) AS TOT_LITERS, min(krl) AS TOT_KM
FROM members m JOIN
diesel d
ON (d.userid = m.id)
WHERE d.dato >= '$dx'
GROUP BY m.username
ORDER BY 2 DESC;
and the other two
SELECT SUM(liter) AS totalfuel
FROM diesel
WHERE diesel.dato >= '$dx' AND userid = ".$_COOKIE['userid']."
ORDER BY diesel.dato DESC
SELECT MIN(km) AS minikm, MAX(km) AS maxkm
FROM diesel
WHERE diesel.dato >= '$dx' AND userid = ".$_COOKIE['userid']."
ORDER BY diesel.dato DESC
They all three does what I expected them to, but I guess there has to be a better way than having three queries on the same page
Is UNION the way to go here ? I don't see how though cause they need to have the same amount of tables :P ?
EDIT: Well I don't need the user ID and that cookie thing this time. Just forgot to remove it. I'm making a highscore and would like to show the usernames, average gas, km driven, liters filled etc ..
Here's an example of a sub-query:
SELECT m.username,
(
SELECT SUM(liter) AS SumOfLiters
FROM diesel
WHERE diesel.dato >= '$dx'
AND diesel.IDField = d.IDField
) AS TotalFuel
FROM members m
JOIN diesel d ON d.userid = m.id
WHERE d.dato >= '$dx'
The sub-query (in brackets) returns a value that you can assign to a field (TotalFuel in this case).
rather than restricting everything in the where clause then trying to marry it all back you could use your top query with a few sum case whens e.g.
sum(case when userid = '.$_COOKIE['userid'].' then liter else 0 end)