I try to get 10 random rows of a table. Many similar questions are out there but I have got an additional condition: I want this list to update weekly, 52 times a year. These 10 items are going to be the "random featured ones of this week."
So, I found this snippet of getting random rows that change each time you invoke the script
select top 10 percent * from [yourtable] order by newid();
The problem with it is that I can't simply add a seed, which in this case would be the week-of-year. Any idea how to implement this?
Your code sample is T-SQL (SQL Server). So, to do this in SQL Server is pretty easy:
select *
from (select t.*,
row_number() over (partition by datepart(wk, thedate) order by newid()) as seqnum
from [yourtable] t
) t
where seqnum <= 10;
MySQL does not support row_number(), so the approach would be different in that database.
Related
I have a blog SQL system with 'posts' table like this (id, theme, title, content, created).
I would like to get the last 3 titles of each post theme, to display in a table, but I didn't get it.
I tried this SQL code:
SELECT *
FROM posts
GROUP BY theme
ORDER BY created DESC
LIMIT 0,3
But this gives me the last 3 posts of the table!!!
Can you please help me?
Thanks in advance.
You're doing some parts right, but in others you're not understanding the use of the GROUP BY clause.
The LIMIT above works on the whole query results. Besides you need an aggregate function for the GROUP BY to be applied on, e.g. SUM, AVG, etc. You cannot simply SELECT * and expect grouping by theme.
The GROUP BY returns one row per item you're grouping by, for instance, the latest, and not three rows.
Therefore you need to perform the query per theme you're grouping by, but in this case you don't need to group by at all. Write a loop that executes this query for each theme you require:
SELECT *
FROM posts
WHERE theme = "your-theme"
ORDER BY created DESC
LIMIT 0,3
Josvic Zammit's answer is in parts right (+1 from me) but you don't have to perform the query per theme. This is a way you can do it in MySQL, although it would surely be easier to "perform the query per theme" in PHP or whatever.
SELECT * FROM
(
SELECT
IF(#prev != sq.theme, #rownum:=1, #rownum:=#rownum + 1) AS rownumber, #prev:=sq.theme, sq.*
FROM
(
SELECT
*
FROM
posts
, (SELECT #rownum:=0, #prev:=0) r
ORDER BY theme, created DESC
) sq
) q
WHERE q.rownumber <= 3
Note that I unfortunately can't test it right now, but it should work. Also note, that this query might be slower than performing a simpler query per theme.
I have the following query I use and it works great:
SELECT * FROM
(
SELECT * FROM `Transactions` ORDER BY DATE DESC
) AS tmpTable
GROUP BY Machine
ORDER BY Machine ASC
What's not great, is when I try to create a view from it. It says that subqueries cannot be used in a view, which is fine - I've searched here and on Google and most people say to break this down into multiple views. Ok.
I created a view that orders by date, and then a view that just uses that view to group by and order by machines - the results however, are not the same. It seems to have taken the date ordering and thrown it out the window.
Any and all help will be appreciated, thanks.
This ended up being the solution, after hours of trying, apparently you can use a subquery on a WHERE but not FROM?
CREATE VIEW something AS
SELECT * FROM Transactions AS t
WHERE Date =
(
SELECT MAX(Date)
FROM Transactions
WHERE Machine = t.Machine
)
You don't need a subquery here. You want to have the latest date in the group of machines, right?
So just do
SELECT
t.*, MAX(date)
FROM Transactions t
GROUP BY Machine
ORDER BY Machine ASC /*this line is obsolete by the way, since in MySQL a group by automatically does sort, when you don't specify another sort column or direction*/
A GROUP BY is used together with a aggregate function (in your case MAX()) anyway.
Alternatively you can also specify multiple columns in the ORDER BY clause.
SELECT
*
FROM
Transactions
GROUP BY Machine
ORDER BY Date DESC, Machine ASC
should give you also what you want to achieve. But using the MAX() function is definitely the better way to go here.
Actually I have never used a GROUP BY without an aggregate function.
This simple SQL problem is giving me a very hard time. Either because I'm seeing the problem the wrong way or because I'm not that familiar with SQL. Or both.
What I'm trying to do: I have a table with several columns and I only need two of them: the datetime when the entry was created and the id of the entry. Note that the hours/minutes/seconds part is important here.
However, I want to group my selection according to the DATE part only. Otherwise all groups will most likely have 1 element.
Here's my query:
SELECT MyDate as DateCr, COUNT(Id) as Occur
FROM MyTable tb WITH(NOLOCK)
GROUP BY CAST(tb.MyDate as Date)
ORDER BY DateCr ASC
However I get the following error from it:
Column "MyTable.MyDate" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If I don't do the cast in the GROUP BY, everything fine. If I cast MyDate to DATE in the SELECT and keep the CAST from GROUP BY, everything fine once more. Apparently it wants to keep the same DATE or DATETIME format in the GROUP BY as in the SELECT.
My approach can be completely wrong so I am not necessarily looking to fix the above query, but to find the proper way to do it.
LE: I get the above error on line 1.
LE2: On a second look, my question indeed is not very explicit. You can ignore the above approach if it is completely wrong. Below is a sample scenario
Let me tell you what I need: I want to retrieve (1) the DateTime when each entry was created. So if I have 20 entries, then I want to get 20 DateTimes. Then if I have multiple entries created on the same DAY, I want the number of those entries. For example, let's say I created 3 entries on Monday, 1 on Tuesday and 2 today. Then from my table I need the datetimes of these 6 entries + the number of entries which were created on each day (3 for 19/03/2012, 1 for 20/03/2012 and 2 for 21/03/2012).
I'm not sure why you're objecting to performing the CONVERT in both the SELECT and the GROUP BY. This seems like a perfectly logical way to do this:
SELECT
DateCr = CONVERT(DATE, MyDate),
Occur = COUNT(Id)
FROM dbo.MyTable
GROUP BY CONVERT(DATE, MyDate)
ORDER BY DateCr;
If you want to keep the time portion of MyDate in the SELECT list, why are you bothering to group? Or how do you expect the results to look? You'll have a row for every individual date/time value, where the grouping seems to indicate you want a row for each day. Maybe you could clarify what you want with some sample data and example desired results.
Also, why are you using NOLOCK? Are you willing to trade accuracy for a haphazard turbo button?
EDIT adding a version for the mixed requirements:
;WITH d(DateCr,d,Id) AS
(
SELECT MyDate, d = CONVERT(DATE, MyDate), Id
FROM dbo.MyTable)
SELECT DateCr, Occur = (SELECT COUNT(Id) FROM d AS d2 WHERE d2.d = d.d)
FROM d
ORDER BY DateCr;
Even though this is an old post, I thought I would answer it. The solution below will work with SQL Server 2008 and above. It uses the over clause, so that the individual lines will be returned, but will also count the rows grouped by the date (without time).
SELECT MyDate as DateCr,
COUNT(Id) OVER(PARTITION BY CAST(tb.MyDate as Date)) as Occur
FROM MyTable tb WITH(NOLOCK)
ORDER BY DateCr ASC
Darren White
I've been doing a lot of reading on alternatives to the LIMIT clause for SQL SERVER. It's so frustrating that they still refuse to adapt it. Anyway, I really havn't been able to get my head around this. The query I'm trying to convert is this...
SELECT ID, Name, Price, Image FROM Products ORDER BY ID ASC LIMIT $start_from, $items_on_page
Any assistance would be much appreciated, thank you.
In SQL Server 2012, there is support for the ANSI standard OFFSET / FETCH syntax. I blogged about this and here is the official doc (this is an extension to ORDER BY). Your syntax converted for SQL Server 2012 would be:
SELECT ID, Name, Price, Image
FROM Products
ORDER BY ID ASC
OFFSET (#start_from - 1) ROWS -- not sure if you need -1
-- because I don't know how you calculated #start_from
FETCH NEXT #items_on_page ROWS ONLY;
Prior to that, you need to use various workarounds, including the ROW_NUMBER() method. See this article and the follow-on discussion. If you are not on SQL Server 2012, you can't use standard syntax or MySQL's non-standard LIMIT but you can use a more verbose solution such as:
;WITH o AS
(
SELECT TOP ((#start_from - 1) + #items_on_page)
-- again, not sure if you need -1 because I
-- don't know how you calculated #start_from
RowNum = ROW_NUMBER() OVER (ORDER BY ID ASC)
/* , other columns */
FROM Products
)
SELECT
RowNum
/* , other columns */
FROM
o
WHERE
RowNum >= #start_from
ORDER BY
RowNum;
There are many other ways to skin this cat, this is unlikely to be the most efficient but syntax-wise is probably simplest. I suggest reviewing the links I posted as well as the duplicate suggestions noted in the comments to the question.
For SQL Server 2005 and 2008
This is an example query to select rows from 11 to 20 from Report table ordered by LastName.
SELECT a.* FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY LastName) as row FROM Report) a
WHERE a.row > 10 and a.row <= 20
Try this:
SELECT TOP $items_on_page ID, Name, Price, Image
FROM (SELECT TOP $start_from + $items_on_page - 1 * FROM Products ORDER BY ID) as T
ORDER BY ID DESC
EDIT: Explanation-
No getting around the subquery, but this is an elegant solution.
Say you wanted 10 items per page, starting on the 5th row, this would give you the bottom 10 rows of the top 14 rows. Essentially LIMIT 5,10
you can use ROW COUNT : Returns the number of rows affected by the last statement. when you don, you reset the rowcont.
SET ROWCOUNT 100
or
you can try using TOP query
SELECT TOP 100 * FROM Sometable ORDER BY somecolumn
If you allow the application to store a tad of state, you can do this using just TOP items_on_page. What you do is to store the ID of the last item you retrieved, and for each subsequent query add AND ID > [last ID]. So you get the items in batches of items_on_page, starting where you left off each time.
I need to select sample rows from a set. For example if my select query returns x rows then if x is greater than 50 , I want only 50 rows returned but not just the top 50 but 50 that are evenly spread out over the resultset. The table in this case records routes - GPS locations + DateTime.
I am ordering on DateTime and need a reasonable sample of the Latitude & Longitude values.
Thanks in advance
[ SQL Server 2008 ]
To get sample rows in SQL Server, use this query:
SELECT TOP 50 * FROM Table
ORDER BY NEWID();
If you want to get every n-th row (10th, in this example), try this query:
SELECT * From
(
SELECT *, (Dense_Rank() OVER (ORDER BY Column ASC)) AS Rank
FROM Table
) AS Ranking
WHERE Rank % 10 = 0;
Source
More examples of queries selecting random rows for other popular RDBMS can be found here: http://www.petefreitag.com/item/466.cfm
Every n'th row to get 50:
SELECT *
FROM table
WHERE row_number() over() MOD (SELECT Count(*) FROM table) / 50 == 0
FETCH FIRST 50 ROWS ONLY
And if you want a random sample, go with jimmy_keen's answer.
UPDATE:
In regard to the requirement for it to run on MS SQL, I think it should be changed to this (no MS SQL Server around to test though):
SELECT TOP 50 *
FROM (
SELECT t.*, row_number() over() AS rn, (SELECT count(*) FROM table) / 50 AS step
FROM table t
)
WHERE rn % step == 0
I suggest that you add a calculated column to your resultset on selection that is obtained as a random number, and then select the top 50 sorted by that column. That will give you a random sample.
For example:
SELECT TOP 50 *, RAND(Id) AS Random
FROM SourceData
ORDER BY Random
where SourceData is your source data table or view. This assumes T-SQL on SQL Server 2008, by the way. It also assumes that you have an Id column with unique ids on your data source. If your ids are very low numbers, it is a good practice to multiply them by a large integer before passing them to RAND, like this:
RAND(Id * 10000000)
If you want an statically correct sample, tablesample is a wrong solution. A good solution as I described in here based on a Microsoft Research paper, is to create a materialized view over your table which includes an additional column like
CAST( ROW_NUMBER() OVER (...) AS BYTE ) AS RAND_COL_, then you can add an index on this column, plus other interesting columns and get statistically correct samples for your queries fairly quickly. (by using WHERE RAND_COL_ = 1).