EDIT: See edit below for explanation of why min() and max() are NOT adequate.
=========================
The MS documentation on the functions first() and last() says “Because records are usually returned in no particular order (unless the query includes an ORDER BY clause), the records returned by these functions will be arbitrary.”
Obviously, that makes these functions pretty useless for their intended purpose unless the query includes an ORDER BY. But including that in the query is not a straightforward thing to do because these are "aggregate" functions, so a query that SELECTs on them cannot ORDER BY any other field that is not also submitted to an aggregate function.
I have found that a query based on a single table generally returns results in the order of that table’s primary key. But apparently, that cannot be relied on to always be true and may fail under certain circumstances. There's an excellent discussion of this issue in an article, DFirst/DLast and the Myth of the Sorted Result Set.
That article offers two solutions to this problem:
Option one; you first use the DMin/DMax-Function to retrieve the value from the “sortable” column ... and use this as an additional criterion to your query to retrieve the target record.
Second option; you first create a query just containing the primary key and the max value of the sortable column (e.g. CustomerId and maximum of order date). Then you create a second query on the orders and join the first query in there on these two fields. The results will be all column from the orders table but only for the most recent order of each customer.
Those instructions are pretty complicated, so I'd need to see an example of them implemented in code in order to trust myself to use them myself.
This issue has got to be very common because a lot of businesses need to know the first or last order by a customer that meets some condition. But when I Google "Access query first last "order by"", there are several results that explain the problem, including on StackOverflow, but none that lay out a solution with sample SQL code.
What is the right way to do this, including sample code of doing it?
=========================
Edit:
Many sources online, as well as the comment below by Gustav and the proposed answer by Albert D. Kallal, say you can just use min() and max() instead of first() and last(). Obviously, that's okay if what you want is the value of a field in the record in which that field has the smallest or largest value. That's a trivial problem. What I'm talking about is how to get the value of a field in the record in which some other field has the smallest or largest value.
For example, in the answer by Albert D. Kallal, he wants the first and last tour for each customer, so he can just use min() and max() on the dates of the tours. But what if I want to know the location of the first tour for each customer? Obviously, I can't use min(location). If first() would work in a sensible way and if table [Tours] has the primary key [Date], I should be able to use something like:
(SELECT first(location) from [Tours] where [Customer] = ID_Customer)
I am using code like that and it usually gives me the right answer, but not always. So that is what I need to fix. I understand that I may need to use min() instead of first(). But how do I use min() for this since, as I said, I obviously can't just use min(location)?
Never really grasped what first() and last() does in Access.
As you note, rather common to want say last invoice or whatever.
So, say we have a table of Tours. I want the first tour date, and the last tour date.
Well, this query works:
SELECT MAX(FromDate) as LastTourDate, min(FromDate) as FirstTourDate
FROM tblTours
WHERE FromDate is not null
When I run above, I get this:
So, that gets you the min, and max - and gets you this in one query.
No real need for a order by.
However, often there are more then one table involved.
So, I might in place of JUST the first and last tour date?
I probably want a list of customers, and their first tour they took, and say their last tour. But, then again, that's a different question.
But, you again can order your main table ANY way you want, and still pluck out
(pull the min and max).
So, you can do it this way:
Say, tblMain client (people - customers whatever).
Say, tblMyTours - a list of tours they took (child table).
So, the query can look like this:
SELECT tblMainClient.FirstName, tblMainClient.LastName,
(SELECT Min(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS FirstTourDate,
(SELECT MAX(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS LastTourDate
FROM tblMainClient
so, the main query is still tblMainClient - I can order, filter, sort by any column in that main table, but we used two sub-query to get the first tour date and the last tour date. So, it will look say like this:
So, typical, we can use a sub-query, pull the max (or min) value, but restrict the sub query to the one row from our parent/main table.
edit: Get last reocrd, but SOME OTHER column
Ok, so say in our simple example, we want the last tour, but NOT the date, but say some other column - like say the last Tour name.
Ok, so we just modify the sub query to return ONLY the last reocrd, but a different column.
And since dates (say 2 invoices on the same day, or yearly tours might have the SAME name, then we need to ensure that ONLY one reocrd is returned. We do this by using top 1, but ALSO add a order by to be 100%, 200%, 300% sure that ONLY ONE top record is returned.
So, our query to get the last tour name, but based on say most recent tour date?
We can do this:
SELECT FirstName, LastName,
(SELECT TOP 1 TourName FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
AS LastTour
FROM tblMainClient
And that will give us the tour name, but the last one.
This:
So, you ceratinly not limited to using "max()" in that sub query.
However, what happens if we want the Tour Name, Hotel Name, and City of that tour?
In other words, it certainly reasonable that we may well want multiple columns.
There are more ways to do this then flavors of ice cream.
However, I like using the query builder for the first part.
What I do is use the standard query builder, do a join to the table and simple slect all the columns I need.
So, for above tblMainClient, and their tours from tblMyTours?
I build a join - use query builder like this:
So, note how I added the columns TourName, FromDate, HotelName and city from that child table (tblMyTours).
Now, of course the above will return 10 rows for anyone who gone on 10 trips.
So, what we do is add a WHERE clause to the child table, get the LAST pk "id" from tblMyTours, and restrict that child table to the ONE row.
So, the above query builder gives us this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName, tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON
tblMainClient.ID = tblMyTours.Main_id;
(but, I did not have to write above).
So, we add a where clause to that child table join - get the CHILD table "id" in place of TourName, or Tourdate).
So above becomes this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName,
tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON tblMainClient.ID = tblMyTours.Main_id
WHERE tblMyTours.ID =
(SELECT TOP 1 ID FROM tblMyTours
WHERE tblMyTours.Main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
Now, above is a bit advanced, but OFTEN we want SEVERAL columns. But, at least the first part of the query, the two tables, and the join was done using the query builder - I did not have to type that part in.
so, if you want JUST one column - differnt then the max() critera, then use top 1 with a order by. Do keep in mind that ONLY ONE RECORD can EVER be retunred by that query - if more then one reocrd is returned, the query enginer will fail and you get a message to this fact.
So, for a produce bought, invoice date? They could by the 1 product 2 times, or 2 invoices on the same day might occur. So, by introduction of the 2nd ORDER BY clause (by ID DESC), then that top 1 will ONLY ever return one row.
So, which of the above two?
Well, if just one column from the child table - easy. But, if you want multiple columns? Then you could probably write up a "messy" solution, but I perfect to just fire up query builder, join in the child table, click on the "several" child values I want. Get the query working - and hey, it all up to this point 100% GUI.
Then we toss in the EXTRA criteria to restrict that child table row to the ONE last row, be it simple last one based on ID DESC, or say TourDate, or whatever.
And now we get this:
Related
I'm in a fix with Alteryx. I'm trying to select the top N rows where N=a cell value for that partition. The business question is:
"We need to know, out of our orders (TicketIDs), those that have
least 1 combination of Type of discount item AND drink AND side."
The SQL query would join this table onto itself and partition to get the TopNtoIncludeInItems for that row, however, I just can't seem to find a way to do this in Alteryx. I've tried the community, but the question has gone ananswered.
In other words, select thusly:
<pseudocode>
for each (TicketID)
for each(Type)
select top(TopNtoIncludeInItems for this.TicketID)
next
next
</pseudocode>
or indeed select just the green records
Here's my solution:
MultiRow Formula: create new field ComboCount (or whatever) as Int32, 0 or empty for rows that don't exists, Group By TicketID and Type, with the Expression [Row-1:ComboCount]+1 ... this counts up each group; we'll want the first topN of each group, ensuring the group actuall has that many, and not going beyond TopN.
Filter on [ComboCount] <= [TopN] ... which excludes unnecessary rows beyond TopN
Summarize: group by TicketID and Type, doing Max(ComboCount) ... if this value is less than TopN for any group, the group should be excluded:
Join the summary back to the earlier pre-summary data on TicketID and Type
Filter on [Max_ComboCount] = [TopN] ... this excludes the groups where any ItemType falls short of TopN
And that's it. Pictorally, this is what my workflow looks like, along with data results based on data similar to that in your screenshot:
I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...
I have the issue of using GROUP BY when select all the column from the table and in result with the poor performance in term of speed.
Select * from employee
group by customer_id;
The query above wouldn't be change,it is mandatory and fixed.It takes 17720ms is to long and the result must take shorter time, which is below 1 minute as my desired result.Since the table has many column and record, so it take much time in query searching.Is there any solution to solve this problem.Thanks.
For as simple as your query is, it appears almost pointless... You would not have duplicate employee IDs within an employee table, and doing a group by would still result in returning every row, every column.
However, that said, to optimize a GROUP BY, you would need an index on that column ... which I would think would already exist as the employee ID would probably be the primary key to the table.
Additionally, you don't have any aggregate columns what would warrant a group by. Are you instead just trying to LOOK for a specific employee? If so, that would be a different query using a WHERE clause for the criteria you are looking for.
FEEDBACK...
You updated your question and did a group by CUSTOMER ID (not employee ID). Ok, but what do you really mean to group by..
OR... Did you want to ORDER by a customer... In other words, I want a list of all employees, but want them sorted by the customer they are associated with... If this is the case, you would want something like...
select *
from employees
ORDER BY
customerID,
employeeLastName,
employeeFirstName
Without seeing your table structure(s), but if the employee table DOES have a column for the customer ID they are associated with, this query would put all employees for the same customer in a common PRE-SORT output by customer, then within that customer, sorted by the employees name (last, first).
If you have another table(s) with relationships between employees and customers, we would need to see that too to better offer an answer.
Column with heavy type LIKE BLOB, TEXT, NVARCHAR(200 or more) will slowdown your query by a lot if you have a lot of records. I suggest to check if it is really necessary to load them all from the start.
Also, you GROUP BY seem weird. What exactly are you trying to achieve with it?
The GROUP BY is not just weird, it is wrong. If you don't specify all the non-aggregate columns in the GROUP BY, you get seemingly random values for each column. Remove the GROUP BY or explain why you think you need it.
Or maybe the "*" is not correct. OK, you cannot show us your real column names, at least show us the real pattern to the SELECT, even if it has bogus column names.
I'm also confused as to why you call it a "search". There is no WHERE clause, which is where "search" criteria goes.
I've searched and searched and can't find an answer to this question, I'm probably asking it in the wrong way.
I am querying an employee database.
I need to get details based on a position id, however there could be multiple records for that position id as this organisation has permanent employees and temporary employees that act against the same position.
So, in order to get the CURRENT occupant of the position id, I need my query to SELECT the FIRST record that matches the position string, from the TOP DOWN.
will this select the first matched record from the top?
SELECT * WHERE `position_id`="00000000" LIMIT 1;
Thanks in advance.
You need an ORDER BY clause to define the ordering between the individual records your table. If you do not use ORDER BY you can assume no fixed order between the records, and you could get a new order each time you executed the query.
From the manual:
With one argument, the value specifies the number of rows to return from the beginning of the result set
So with LIMIT 1 you get the first row from the result set. What the result set is depends on engine used and which indexes you have. If you want the first row added, you need to create another column to define that.
It just gets one at random*. There's no way to tell which one it will be, unless you add an ORDER BY clause.
* Not really at random, of course. It depends on the way the records are stored and repeated queries will probably return the same result every time, at least as long as you don't modify the table or its contents. I actually mean, you cannot be sure.
I have the following SQL query , it seems to run ok , but i am concerned as my site grows it may not perform as expected ,I would like some feeback as to how effective and efficient this query really is:
select * from articles where category_id=XX AND city_id=XXX GROUP BY user_id ORDER BY created_date DESC LIMIT 10;
Basically what i am trying to achieve - is to get the newest articles by created_date limited to 10 , articles must only be selected if the following criteria are met :
City ID must equal the given value
Category ID must equal the given value
Only one article per user must be returned
Articles must be sorted by date and only the top 10 latest articles must be returned
You've got a GROUP BY clause which only contains one column, but you are pulling all the columns there are without aggregating them. Do you realise that the values returned for the columns not specified in GROUP BY and not aggregated are not guaranteed?
You are also referencing such a column in the ORDER BY clause. Since the values of that column aren't guaranteed, you have no guarantee what rows are going to be returned with subsequent invocations of this script even in the absence of changes to the underlying table.
So, I would at least change the ORDER BY clause to something like this:
ORDER BY MAX(created_date)
or this:
ORDER BY MIN(created_date)
some potential improvements (for best query performance):
make sure you have an index on all columns you querynote: check if you really need an index on all columns because this has a negative performance when the BD has to build the index. -> for more details take a look here: http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html
SELECT * would select all columns of the table. SELECT only the ones you really require...