There are several good posts on how to number rows within groups with MySQL, but how does the actually code work? I'm unclear on what MySQL evaluates first in the code below.
For instance, placing #yearqt := yearqt as bloc before the IF() call produces different results, and I'm unclear on the role of the s1 subquery in initializing the # variables: when are they updated as MySQL runs through the data rows? Is the order by statement run before the select?
The code below selects three random records per yearqt group. There may be other ways to do this, but the question pertains to how the code works, not how I could do this differently or whether I can do this more efficiently. Thank you.
select * from (
select customer_id , yearqt , a ,
IF(#yearqt = yearqt , #rownum := #rownum + 1 , #rownum := 1) as rownum ,
#yearqt := yearqt as bloc
from
( select customer_id , yearqt , rand(123) as a from tbl
order by rand(123)
) a join ( select #rownum := 0 , #yearqt := '' ) s1
order by yearqt
) s2
where rownum <= 3
order by bloc
This question is related to how the engine retrieves SQL SELECT query results. The order is roughly the following:
Calculate explain plan
Calculate sets and join them using plan's directives (FROM / JOIN phase)
Apply WHERE clause
Apply GROUP BY/HAVING clause
Apply ORDER BY clause
Projection phase: every row returned is ordered and can now be 'displayed'.
So, in respect to the variables, you now understand why there's subquery to initialize them. This subquery is evaluated only once, and at the beginning of the process.
After that, the project phase seems to treat each selected attribute in the order you decided which is the reason why puting #yearqt := yearqt as bloc up one attribute would changes the outcome of the next/previous IF statement. Since each row will be projected once, it means any work you're doing on the variables will be done as many times as the number of rows in the final resulset.
The purpose of this
join ( select #rownum := 0 , #yearqt := '' ) s1
is to initialize the user-defined variables at the beginning of statement execution. Because this is a rowsource for the outer query (MySQL calls it a derived table) this will be executed BEFORE the outer query runs. We aren't really interested in what this query returns, except that it returns a single row, because of the JOIN operation.
So this inline view s1 could be omitted from the query and be replaced by a couple of SET statements that are executed immediately before the query:
SET #rownum := 0;
SET #yearqt := 0;
But then we'd have three separate statements to run, and we'd get different output from the query if these weren't run, if those variables were set to some other value. By including this in the query itself, it's a single statement, and we remove the dependency on separate SET statements.
This is the query that's really doing the work, whittled down to just the two expressions that matter in this case
SELEECT IF(#yearqt = t.yearqt , #rownum := #rownum + 1 , #rownum := 1) as rownum
, #yearqt := t.yearqt as bloc
FROM ( ... ) t
ORDER BY t.yearqt
Some key points that make this "work"
MySQL processes the expressions in the SELECT list in the order that they appear in the SELECT list.
MySQL processes the rows in the order specified in the ORDER BY.
The references to user-defined variables are evaluated for each row, not once at the beginning of the statement.
Note that the MySQL Reference Manual points out that this behavior is not guaranteed. (So, it may change in a future release.)
So, the processing of that can be described as
for the first expression:
compare the value of the yearqt column from the current row with current value of #yearqt user-defined variable
set the value of #rownum user-defined variable
return the result of the IF() expression in the resultset
for the second expression:
set the value of the #yearqt user-defined variable to the value of the yearqt column from the current row
return the value of the yearqt column in the resultset
The net effect is that for each row processed, we're comparing the value in the yearqt column to the value from the "previously" processed row, and we're saving the current value to compare to the next row.
I have been looking for a solution to this seemingly simple problem. The query only breaks when I introduce the last sum() function in the where clause. I get the error "Error Code: 1111. Invalid use of group function." I can't figure out why it won't allow me to make this one where statement.
Query
select
dIntervalStart,
weekofyear(dIntervalStart) WeekOfYear,
weekday(dIntervalStart) DayOfWeek,
cast(dIntervalStart as time) IntervalStart,
sum(nExternToInternAcdCalls) CallVolume
from
iwrkgrpqueuestats
where
SiteId = 1
and cname in ('applications' , 'employer',
'factfind',
'general',
'other')
and cReportGroup = '*'
and CAST(dIntervalStart as time) between '07:00' and '17:30'
and weekday(dIntervalStart) not in (5 , 6)
and sum(nExternToInternAcdCalls) <> 0
group by weekofyear(dIntervalStart) , weekday(dIntervalStart) , cast(dIntervalStart as time)
order by IntervalStart asc, dIntervalStart desc
Relocate the predicate on the aggregate expression to the HAVING clause, e.g.:
GROUP BY ...
HAVING sum(nExternToInternAcdCalls) <> 0
ORDER BY ...
The predicates in the WHERE clause are evaluated when rows are accessed, they pick out which rows are included. The value of the aggregate expression (e.g. SUM(foo)) isn't available when the rows are accessed. The value for that expression can't be determined until after the rows are accessed.
The predicates in the HAVING clause are applied after the rows are accessed, and after the resultset is prepared. When the HAVING predicates are evaluated, the aggregate (SUM(foo)) will be available.
NOTE: the GROUP BY clause would typically include all of the non-aggregate expressions in the SELECT list. MySQL is more relaxed than other databases, which can be both a blessing and a curse; if the GROUP BY "collapses" rows, for non-aggregates in the SELECT list, MySQL returns a value from one of the rows.
That is, if you GROUP BY weekofyear(foo), and return foo in the SELECT list, and there are multiple rows with foo values that evaluate to the same weekofyear(foo) value, MySQL will return one row for the given weekofyear(foo) and also return one of the foo values from one of the rows. Other databases (Oracle, SQL Server, et al.) would throw an error rather than returning a resultset.
You cannot use group by like this, group by is used to distinct result and and specify the element of reference of aggregate functions
Try this :
select
dIntervalStart,
weekofyear(dIntervalStart) WeekOfYear,
weekday(dIntervalStart) DayOfWeek,
cast(dIntervalStart as time) IntervalStart,
sum(nExternToInternAcdCalls) CallVolume
from
iwrkgrpqueuestats
where
SiteId = 1
and cname in ('applications' , 'employer',
'factfind',
'general',
'other')
and cReportGroup = '*'
and CAST(dIntervalStart as time) between '07:00' and '17:30'
and weekday(dIntervalStart) not in (5 , 6)
and sum(nExternToInternAcdCalls) <> 0
group by dIntervalStart,
WeekOfYear,
DayOfWeek,
IntervalStart --columns not using aggregate functions
order by IntervalStart asc, dIntervalStart desc
I'm interested in learning some (ideally) database agnostic ways of selecting the nth row from a database table. It would also be interesting to see how this can be achieved using the native functionality of the following databases:
SQL Server
MySQL
PostgreSQL
SQLite
Oracle
I am currently doing something like the following in SQL Server 2005, but I'd be interested in seeing other's more agnostic approaches:
WITH Ordered AS (
SELECT ROW_NUMBER() OVER (ORDER BY OrderID) AS RowNumber, OrderID, OrderDate
FROM Orders)
SELECT *
FROM Ordered
WHERE RowNumber = 1000000
Credit for the above SQL: Firoz Ansari's Weblog
Update: See Troels Arvin's answer regarding the SQL standard. Troels, have you got any links we can cite?
There are ways of doing this in optional parts of the standard, but a lot of databases support their own way of doing it.
A really good site that talks about this and other things is http://troels.arvin.dk/db/rdbms/#select-limit.
Basically, PostgreSQL and MySQL supports the non-standard:
SELECT...
LIMIT y OFFSET x
Oracle, DB2 and MSSQL supports the standard windowing functions:
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber <= n
(which I just copied from the site linked above since I never use those DBs)
Update: As of PostgreSQL 8.4 the standard windowing functions are supported, so expect the second example to work for PostgreSQL as well.
Update: SQLite added window functions support in version 3.25.0 on 2018-09-15 so both forms also work in SQLite.
PostgreSQL supports windowing functions as defined by the SQL standard, but they're awkward, so most people use (the non-standard) LIMIT / OFFSET:
SELECT
*
FROM
mytable
ORDER BY
somefield
LIMIT 1 OFFSET 20;
This example selects the 21st row. OFFSET 20 is telling Postgres to skip the first 20 records. If you don't specify an ORDER BY clause, there's no guarantee which record you will get back, which is rarely useful.
I'm not sure about any of the rest, but I know SQLite and MySQL don't have any "default" row ordering. In those two dialects, at least, the following snippet grabs the 15th entry from the_table, sorting by the date/time it was added:
SELECT *
FROM the_table
ORDER BY added DESC
LIMIT 1,15
(of course, you'd need to have an added DATETIME field, and set it to the date/time that entry was added...)
SQL 2005 and above has this feature built-in. Use the ROW_NUMBER() function. It is excellent for web-pages with a << Prev and Next >> style browsing:
Syntax:
SELECT
*
FROM
(
SELECT
ROW_NUMBER () OVER (ORDER BY MyColumnToOrderBy) AS RowNum,
*
FROM
Table_1
) sub
WHERE
RowNum = 23
I suspect this is wildly inefficient but is quite a simple approach, which worked on a small dataset that I tried it on.
select top 1 field
from table
where field in (select top 5 field from table order by field asc)
order by field desc
This would get the 5th item, change the second top number to get a different nth item
SQL server only (I think) but should work on older versions that do not support ROW_NUMBER().
Verify it on SQL Server:
Select top 10 * From emp
EXCEPT
Select top 9 * From emp
This will give you 10th ROW of emp table!
Contrary to what some of the answers claim, the SQL standard is not silent regarding this subject.
Since SQL:2003, you have been able to use "window functions" to skip rows and limit result sets.
And in SQL:2008, a slightly simpler approach had been added, using
OFFSET skip ROWS
FETCH FIRST n ROWS ONLY
Personally, I don't think that SQL:2008's addition was really needed, so if I were ISO, I would have kept it out of an already rather large standard.
1 small change: n-1 instead of n.
select *
from thetable
limit n-1, 1
SQL SERVER
Select n' th record from top
SELECT * FROM (
SELECT
ID, NAME, ROW_NUMBER() OVER(ORDER BY ID) AS ROW
FROM TABLE
) AS TMP
WHERE ROW = n
select n' th record from bottom
SELECT * FROM (
SELECT
ID, NAME, ROW_NUMBER() OVER(ORDER BY ID DESC) AS ROW
FROM TABLE
) AS TMP
WHERE ROW = n
When we used to work in MSSQL 2000, we did what we called the "triple-flip":
EDITED
DECLARE #InnerPageSize int
DECLARE #OuterPageSize int
DECLARE #Count int
SELECT #Count = COUNT(<column>) FROM <TABLE>
SET #InnerPageSize = #PageNum * #PageSize
SET #OuterPageSize = #Count - ((#PageNum - 1) * #PageSize)
IF (#OuterPageSize < 0)
SET #OuterPageSize = 0
ELSE IF (#OuterPageSize > #PageSize)
SET #OuterPageSize = #PageSize
DECLARE #sql NVARCHAR(8000)
SET #sql = 'SELECT * FROM
(
SELECT TOP ' + CAST(#OuterPageSize AS nvarchar(5)) + ' * FROM
(
SELECT TOP ' + CAST(#InnerPageSize AS nvarchar(5)) + ' * FROM <TABLE> ORDER BY <column> ASC
) AS t1 ORDER BY <column> DESC
) AS t2 ORDER BY <column> ASC'
PRINT #sql
EXECUTE sp_executesql #sql
It wasn't elegant, and it wasn't fast, but it worked.
In Oracle 12c, You may use OFFSET..FETCH..ROWS option with ORDER BY
For example, to get the 3rd record from top:
SELECT *
FROM sometable
ORDER BY column_name
OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY;
Here is a fast solution of your confusion.
SELECT * FROM table ORDER BY `id` DESC LIMIT N, 1
Here You may get Last row by Filling N=0, Second last by N=1, Fourth Last By Filling N=3 and so on.
This is very common question over the interview and this is Very simple ans of it.
Further If you want Amount, ID or some Numeric Sorting Order than u may go for CAST function in MySQL.
SELECT DISTINCT (`amount`)
FROM cart
ORDER BY CAST( `amount` AS SIGNED ) DESC
LIMIT 4 , 1
Here By filling N = 4 You will be able to get Fifth Last Record of Highest Amount from CART table. You can fit your field and table name and come up with solution.
ADD:
LIMIT n,1
That will limit the results to one result starting at result n.
Oracle:
select * from (select foo from bar order by foo) where ROWNUM = x
For example, if you want to select every 10th row in MSSQL, you can use;
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName1 ASC) AS rownumber, ColumnName1, ColumnName2
FROM TableName
) AS foo
WHERE rownumber % 10 = 0
Just take the MOD and change number 10 here any number you want.
For SQL Server, a generic way to go by row number is as such:
SET ROWCOUNT #row --#row = the row number you wish to work on.
For Example:
set rowcount 20 --sets row to 20th row
select meat, cheese from dbo.sandwich --select columns from table at 20th row
set rowcount 0 --sets rowcount back to all rows
This will return the 20th row's information. Be sure to put in the rowcount 0 afterward.
Here's a generic version of a sproc I recently wrote for Oracle that allows for dynamic paging/sorting - HTH
-- p_LowerBound = first row # in the returned set; if second page of 10 rows,
-- this would be 11 (-1 for unbounded/not set)
-- p_UpperBound = last row # in the returned set; if second page of 10 rows,
-- this would be 20 (-1 for unbounded/not set)
OPEN o_Cursor FOR
SELECT * FROM (
SELECT
Column1,
Column2
rownum AS rn
FROM
(
SELECT
tbl.Column1,
tbl.column2
FROM MyTable tbl
WHERE
tbl.Column1 = p_PKParam OR
tbl.Column1 = -1
ORDER BY
DECODE(p_sortOrder, 'A', DECODE(p_sortColumn, 1, Column1, 'X'),'X'),
DECODE(p_sortOrder, 'D', DECODE(p_sortColumn, 1, Column1, 'X'),'X') DESC,
DECODE(p_sortOrder, 'A', DECODE(p_sortColumn, 2, Column2, sysdate),sysdate),
DECODE(p_sortOrder, 'D', DECODE(p_sortColumn, 2, Column2, sysdate),sysdate) DESC
))
WHERE
(rn >= p_lowerBound OR p_lowerBound = -1) AND
(rn <= p_upperBound OR p_upperBound = -1);
But really, isn't all this really just parlor tricks for good database design in the first place? The few times I needed functionality like this it was for a simple one off query to make a quick report. For any real work, using tricks like these is inviting trouble. If selecting a particular row is needed then just have a column with a sequential value and be done with it.
Nothing fancy, no special functions, in case you use Caché like I do...
SELECT TOP 1 * FROM (
SELECT TOP n * FROM <table>
ORDER BY ID Desc
)
ORDER BY ID ASC
Given that you have an ID column or a datestamp column you can trust.
For SQL server, the following will return the first row from giving table.
declare #rowNumber int = 1;
select TOP(#rowNumber) * from [dbo].[someTable];
EXCEPT
select TOP(#rowNumber - 1) * from [dbo].[someTable];
You can loop through the values with something like this:
WHILE #constVar > 0
BEGIN
declare #rowNumber int = #consVar;
select TOP(#rowNumber) * from [dbo].[someTable];
EXCEPT
select TOP(#rowNumber - 1) * from [dbo].[someTable];
SET #constVar = #constVar - 1;
END;
LIMIT n,1 doesn't work in MS SQL Server. I think it's just about the only major database that doesn't support that syntax. To be fair, it isn't part of the SQL standard, although it is so widely supported that it should be. In everything except SQL server LIMIT works great. For SQL server, I haven't been able to find an elegant solution.
In Sybase SQL Anywhere:
SELECT TOP 1 START AT n * from table ORDER BY whatever
Don't forget the ORDER BY or it's meaningless.
T-SQL - Selecting N'th RecordNumber from a Table
select * from
(select row_number() over (order by Rand() desc) as Rno,* from TableName) T where T.Rno = RecordNumber
Where RecordNumber --> Record Number to Select
TableName --> To be Replaced with your Table Name
For e.g. to select 5 th record from a table Employee, your query should be
select * from
(select row_number() over (order by Rand() desc) as Rno,* from Employee) T where T.Rno = 5
SELECT
top 1 *
FROM
table_name
WHERE
column_name IN (
SELECT
top N column_name
FROM
TABLE
ORDER BY
column_name
)
ORDER BY
column_name DESC
I've written this query for finding Nth row.
Example with this query would be
SELECT
top 1 *
FROM
Employee
WHERE
emp_id IN (
SELECT
top 7 emp_id
FROM
Employee
ORDER BY
emp_id
)
ORDER BY
emp_id DESC
I'm a bit late to the party here but I have done this without the need for windowing or using
WHERE x IN (...)
SELECT TOP 1
--select the value needed from t1
[col2]
FROM
(
SELECT TOP 2 --the Nth row, alter this to taste
UE2.[col1],
UE2.[col2],
UE2.[date],
UE2.[time],
UE2.[UID]
FROM
[table1] AS UE2
WHERE
UE2.[col1] = ID --this is a subquery
AND
UE2.[col2] IS NOT NULL
ORDER BY
UE2.[date] DESC, UE2.[time] DESC --sorting by date and time newest first
) AS t1
ORDER BY t1.[date] ASC, t1.[time] ASC --this reverses the order of the sort in t1
It seems to work fairly fast although to be fair I only have around 500 rows of data
This works in MSSQL
SELECT * FROM emp a
WHERE n = (
SELECT COUNT( _rowid)
FROM emp b
WHERE a. _rowid >= b. _rowid
);
unbelievable that you can find a SQL engine executing this one ...
WITH sentence AS
(SELECT
stuff,
row = ROW_NUMBER() OVER (ORDER BY Id)
FROM
SentenceType
)
SELECT
sen.stuff
FROM sentence sen
WHERE sen.row = (ABS(CHECKSUM(NEWID())) % 100) + 1
select * from
(select * from ordered order by order_id limit 100) x order by
x.order_id desc limit 1;
First select top 100 rows by ordering in ascending and then select last row by ordering in descending and limit to 1. However this is a very expensive statement as it access the data twice.
It seems to me that, to be efficient, you need to 1) generate a random number between 0 and one less than the number of database records, and 2) be able to select the row at that position. Unfortunately, different databases have different random number generators and different ways to select a row at a position in a result set - usually you specify how many rows to skip and how many rows you want, but it's done differently for different databases. Here is something that works for me in SQLite:
select *
from Table
limit abs(random()) % (select count(*) from Words), 1;
It does depend on being able to use a subquery in the limit clause (which in SQLite is LIMIT <recs to skip>,<recs to take>) Selecting the number of records in a table should be particularly efficient, being part of the database's meta data, but that depends on the database's implementation. Also, I don't know if the query will actually build the result set before retrieving the Nth record, but I would hope that it doesn't need to. Note that I'm not specifying an "order by" clause. It might be better to "order by" something like the primary key, which will have an index - getting the Nth record from an index might be faster if the database can't get the Nth record from the database itself without building the result set.
Most suitable answer I have seen on this article for sql server
WITH myTableWithRows AS (
SELECT (ROW_NUMBER() OVER (ORDER BY myTable.SomeField)) as row,*
FROM myTable)
SELECT * FROM myTableWithRows WHERE row = 3