EDIT: See edit below for explanation of why min() and max() are NOT adequate.
=========================
The MS documentation on the functions first() and last() says “Because records are usually returned in no particular order (unless the query includes an ORDER BY clause), the records returned by these functions will be arbitrary.”
Obviously, that makes these functions pretty useless for their intended purpose unless the query includes an ORDER BY. But including that in the query is not a straightforward thing to do because these are "aggregate" functions, so a query that SELECTs on them cannot ORDER BY any other field that is not also submitted to an aggregate function.
I have found that a query based on a single table generally returns results in the order of that table’s primary key. But apparently, that cannot be relied on to always be true and may fail under certain circumstances. There's an excellent discussion of this issue in an article, DFirst/DLast and the Myth of the Sorted Result Set.
That article offers two solutions to this problem:
Option one; you first use the DMin/DMax-Function to retrieve the value from the “sortable” column ... and use this as an additional criterion to your query to retrieve the target record.
Second option; you first create a query just containing the primary key and the max value of the sortable column (e.g. CustomerId and maximum of order date). Then you create a second query on the orders and join the first query in there on these two fields. The results will be all column from the orders table but only for the most recent order of each customer.
Those instructions are pretty complicated, so I'd need to see an example of them implemented in code in order to trust myself to use them myself.
This issue has got to be very common because a lot of businesses need to know the first or last order by a customer that meets some condition. But when I Google "Access query first last "order by"", there are several results that explain the problem, including on StackOverflow, but none that lay out a solution with sample SQL code.
What is the right way to do this, including sample code of doing it?
=========================
Edit:
Many sources online, as well as the comment below by Gustav and the proposed answer by Albert D. Kallal, say you can just use min() and max() instead of first() and last(). Obviously, that's okay if what you want is the value of a field in the record in which that field has the smallest or largest value. That's a trivial problem. What I'm talking about is how to get the value of a field in the record in which some other field has the smallest or largest value.
For example, in the answer by Albert D. Kallal, he wants the first and last tour for each customer, so he can just use min() and max() on the dates of the tours. But what if I want to know the location of the first tour for each customer? Obviously, I can't use min(location). If first() would work in a sensible way and if table [Tours] has the primary key [Date], I should be able to use something like:
(SELECT first(location) from [Tours] where [Customer] = ID_Customer)
I am using code like that and it usually gives me the right answer, but not always. So that is what I need to fix. I understand that I may need to use min() instead of first(). But how do I use min() for this since, as I said, I obviously can't just use min(location)?
Never really grasped what first() and last() does in Access.
As you note, rather common to want say last invoice or whatever.
So, say we have a table of Tours. I want the first tour date, and the last tour date.
Well, this query works:
SELECT MAX(FromDate) as LastTourDate, min(FromDate) as FirstTourDate
FROM tblTours
WHERE FromDate is not null
When I run above, I get this:
So, that gets you the min, and max - and gets you this in one query.
No real need for a order by.
However, often there are more then one table involved.
So, I might in place of JUST the first and last tour date?
I probably want a list of customers, and their first tour they took, and say their last tour. But, then again, that's a different question.
But, you again can order your main table ANY way you want, and still pluck out
(pull the min and max).
So, you can do it this way:
Say, tblMain client (people - customers whatever).
Say, tblMyTours - a list of tours they took (child table).
So, the query can look like this:
SELECT tblMainClient.FirstName, tblMainClient.LastName,
(SELECT Min(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS FirstTourDate,
(SELECT MAX(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS LastTourDate
FROM tblMainClient
so, the main query is still tblMainClient - I can order, filter, sort by any column in that main table, but we used two sub-query to get the first tour date and the last tour date. So, it will look say like this:
So, typical, we can use a sub-query, pull the max (or min) value, but restrict the sub query to the one row from our parent/main table.
edit: Get last reocrd, but SOME OTHER column
Ok, so say in our simple example, we want the last tour, but NOT the date, but say some other column - like say the last Tour name.
Ok, so we just modify the sub query to return ONLY the last reocrd, but a different column.
And since dates (say 2 invoices on the same day, or yearly tours might have the SAME name, then we need to ensure that ONLY one reocrd is returned. We do this by using top 1, but ALSO add a order by to be 100%, 200%, 300% sure that ONLY ONE top record is returned.
So, our query to get the last tour name, but based on say most recent tour date?
We can do this:
SELECT FirstName, LastName,
(SELECT TOP 1 TourName FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
AS LastTour
FROM tblMainClient
And that will give us the tour name, but the last one.
This:
So, you ceratinly not limited to using "max()" in that sub query.
However, what happens if we want the Tour Name, Hotel Name, and City of that tour?
In other words, it certainly reasonable that we may well want multiple columns.
There are more ways to do this then flavors of ice cream.
However, I like using the query builder for the first part.
What I do is use the standard query builder, do a join to the table and simple slect all the columns I need.
So, for above tblMainClient, and their tours from tblMyTours?
I build a join - use query builder like this:
So, note how I added the columns TourName, FromDate, HotelName and city from that child table (tblMyTours).
Now, of course the above will return 10 rows for anyone who gone on 10 trips.
So, what we do is add a WHERE clause to the child table, get the LAST pk "id" from tblMyTours, and restrict that child table to the ONE row.
So, the above query builder gives us this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName, tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON
tblMainClient.ID = tblMyTours.Main_id;
(but, I did not have to write above).
So, we add a where clause to that child table join - get the CHILD table "id" in place of TourName, or Tourdate).
So above becomes this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName,
tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON tblMainClient.ID = tblMyTours.Main_id
WHERE tblMyTours.ID =
(SELECT TOP 1 ID FROM tblMyTours
WHERE tblMyTours.Main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
Now, above is a bit advanced, but OFTEN we want SEVERAL columns. But, at least the first part of the query, the two tables, and the join was done using the query builder - I did not have to type that part in.
so, if you want JUST one column - differnt then the max() critera, then use top 1 with a order by. Do keep in mind that ONLY ONE RECORD can EVER be retunred by that query - if more then one reocrd is returned, the query enginer will fail and you get a message to this fact.
So, for a produce bought, invoice date? They could by the 1 product 2 times, or 2 invoices on the same day might occur. So, by introduction of the 2nd ORDER BY clause (by ID DESC), then that top 1 will ONLY ever return one row.
So, which of the above two?
Well, if just one column from the child table - easy. But, if you want multiple columns? Then you could probably write up a "messy" solution, but I perfect to just fire up query builder, join in the child table, click on the "several" child values I want. Get the query working - and hey, it all up to this point 100% GUI.
Then we toss in the EXTRA criteria to restrict that child table row to the ONE last row, be it simple last one based on ID DESC, or say TourDate, or whatever.
And now we get this:
Suppose i have a simple table with this columns:
| id | user_id | order_id |
About 1,000,000 rows is inserted to this table per month and as it is clear relation between user_id and order_id is 1 to M.
The records in the last month needed for accounting issues and the others is just for showing order histories to the users.To archive records before last past month,i have two options in my mind:
first,create a similar table and each month copy old records to it.so it will get bigger and bigger each month according to growth of orders.
second,create a table like below:
| id | user_id | order_idssss |
and each month, for each row to be inserted to this table,if there exist user_id,just update order_ids and add new order_id to the end of order_ids.
in this solution number of rows in the table will be get bigger according to user growth ratio.
suppose for each solution we have an index on user_id.
.
Now question is which one is more optimized for SELECT all order_ids per user in case of load on server.
the first one has much more records than the second one,but in the second one some programming language is needed to split order_ids.
The first choice is the better choice from among the two you have shown. With respect, I should say your second choice is a terrible idea.
MySQL (with all SQL dbms systems) is excellent at handling very large numbers of rows of uniformly laid out (that is, normalized) data.
But, your best choice is to do nothing except create appropriate indexes to make it easy to look up order history by date or by user. Leave all your data in this table and optimize lookup instead.
Until this table contains at least fifty million rows (at least four years' worth of data), the time you spend reprogramming your system to allow it to be split into a current and an archive version will be far more costly than just keeping it together.
If you want help figuring out which indexes you need, you should ask another question showing your queries. It's not clear from this question how you look up orders by date.
In a 1:many relationship, don't make an extra table. Instead have the user_id be a column in the Orders table. Furthermore, this is likely to help performance:
PRIMARY KEY(user_id, order_id),
INDEX(order_id)
Is a "month" a calendar month? Or "30 days ago until now"?
If it is a calendar month, consider PARTITION BY RANGE(TO_DAYS(datetime)) and have an ever-increasing list of monthly partitions. However, do not create future months in advance; create them just before they are needed. More details: http://mysql.rjweb.org/doc.php/partitionmaint
Note: This would require adding datetime to the end of the PK.
At 4 years' worth of data (48 partitions), it will be time to rethink things. (I recommend not going much beyond that number of partitions.)
Read about "transportable tablespaces". This may become part of your "archiving" process.
Use InnoDB.
With that partitioning, either of these becomes reasonably efficient:
WHERE user_id = 123
AND datetime > CURDATE() - INTERVAL 30 DAY
WHERE user_id = 123
AND datetime >= '2017-11-01' -- or whichever start-of-month you need
Each of the above will hit at most one non-empty partition more than the number of months desired.
If you want to discuss this more, please provide SHOW CREATE TABLE (in any variation), plus some of the important SELECTs.
I use a number of pre-defined conditions to compare the Date field from Trade table against GETDATE() function to work out the trades for this month, this year, previous year etc.
Now I need to create an additional table, where I will have a set of dates representing the start and the end dates for reporting periods, e.g. the start and end dates of current year reporting, the start and the end dates for previous year reporting etc.
This additional table and Trade table are not joined. In fact the additional table is not linked to anything.
I need to create a new set of pre-defined conditions where Date from Trade table will be compared from the values from the new dates table, i.e. I would like the new conditions to look like the following
Trade.Date < ReportingDates.CurrentYearEndDate
AND
Trade.Date > ReportingDates.CurrentYearStartDate
The condition validates fine, but unfortunately I get the "The query cannot run because it contains incompatible objects. (IES 00008)" error" when I try to execute the condition.
You could use subselects in the predefined condition like:
trade.date between (select currentyearstartdate from reportingdates)
and (select currentyearenddate from reportingdates)
Since reportingdates is not actually joined to anything, it does not need to be in the universe structure. This does have one disadvantage -- it's not obvious from the list of tables that it is actually needed in the universe.
Alternatively, you can create a derived table that joins trade and reportingdates as a cartesian product. The derived table would include all columns from both tables, so your condition would simply be (assuming the derived table is named DT):
dt.date between dt.currentyearstartdate and dt.currentyearenddate
The downside to this approach is that reportingdates is included in all queries, whether it's needed or not, and the SQL can be a little harder to read.
I have an order table that contains dates and amounts for each order, this table is big and contains more that 1000000 records and growing.
We need to create a set of queries to calculate certain milestones, is there a way in mysql to figure out on which date we reached an aggregate milestone of x amount.
For e.g we crossed 1 m sales on '2011-01-01'
Currently we scan the entire table then use the logic in PHP to figure out the date, but it would be great if this could be done in mysql without reading so many records at 1 time.
There maybe elegant approaches, but what you can do is maintain a row in another table which contains, current_sales and date it occurred. Every time you have a sale, increment the value, and store sales date. If the expected milestones(1 Million, 2 Million etc) are known in advance, you can store them away when they occur(in same or different table)
i think using gunner's logic with trigger will be a good option as it reduce your efforts to maintain the row and after that you can send mail notification through trigger to know the milestone status
I am trying to build an access report based on data from multiple different tables within the database.
I have 3 columns which perform calculations, and I am wondering how to put this query together. All 3 columns deal with dates, but calculate them differently.
The first column retrieves the most recent date of action for a userid if the type of action is "B":
select pid, Max(date) as most_recent
from actions
where ref = 'B'
group by pid;
The second column performs a calculation based on 2 fields, one is a date and one is a number in months. I am unsure how to add these two fields so that the number is added to the date as a number of months.
what i have so far is:
select nummonths,Max(lastvisit) from users
the third column I need to select the first date thats in the future for each user (next appointment date), there will be dates before and after this date so its a little difficult:
select uid,date from visits
The code for the last 2 queries needs to be slightly modified, and I was wondering what the best approach would be to join these all together? A type of join?
If you need to build a report with data from the 3 queries, you will need related data to join them. In that case, please send the structure of the tables.
If you need to show 3 lists in one report, you can use subreports: create a new empty report. In design mode, you can add 3 subreports from the toolbox bar. To each of the subreport assign the record source property to the corresponding sql.
regards
I am unsure how to add these two fields so that the number is added to the date as a number of months.
Use the DateAdd() function:
SELECT DateAdd("m", 2, LastVisit) FROM ...
Results in a date two months from the LastVisit date.