I am running the below query and getting the following error -
MySQL Database Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(PARTITION BY STUDY_SITE_ID ORDER BY VISIT_START_DATE DESC) AS cnt
FROM DESIRE' at line 1 3 28
Query -
SELECT *, ROW_NUMBER() OVER (PARTITION BY STUDY_SITE_ID ORDER BY VISIT_START_DATE DESC) AS cnt
FROM TABLE_a
You can use variables . . . but it is very important that the assignment be within a single expression. So:
SELECT a.*,
(#rn := if(#ss = a.STUDY_SITE_ID, #rn + 1,
if(#ss := a.STUDY_SITE_ID, 1, 1)
)
) as cnt
FROM (SELECT a.*
FROM TABLE_a a
ORDER BY a.STUDY_SITE_ID, a.VISIT_START_DATE DESC
) a CROSS JOIN
(SELECT #ss := -1, #rn := 0) params
I suspect the comment is correct and your MySQL doesn't support rownumber
You can fake it with a self join if you have numeric/incremental ids, by joining the table on id <= id and grouping/counting the result. If you want a partition, join on partioncol = partitioncol and id <= id
You can also use in-query variables in a pattern like this:
SELECT
t.*,
#r := #r + 1 AS rn
FROM
t,
(SELECT #r := 0) x
You can get more funky with this if you need a partition by col:
SELECT
t.*,
#r := CASE
WHEN col = #prevcol THEN #r + 1
WHEN (#prevcol := col) = null THEN null
ELSE 1 END AS rn
FROM
t,
(SELECT #r := 0, #prevcol := null) x
ORDER BY col
Order of assignment of prevcol is important - prevcol has to be compared to the current row's value before we assign it a value from the current row (otherwise it would be the current rows col value, not the previous row's col value). The MySQL documentation states that the order of evaluation of select list items isn't guaranteed so we need a way to guarantee that we will first compare the current row's col to prevcol, then we'll assign to it
To do this we use a construct that does have a guaranteed order of execution: the case when
The first WHEN is evaluated. If this row's col is the same as the previous row's col then #r is incremented and returned from the CASE, and stored in #r. The assignment returns the new Blair of #r into the result rows.
For the first row on the result set, #prevcol is null so this predicate is false. This predicate also returns false every time col changes (current row is different to previous row). This causes the second WHEN to be evaluated.
The second WHEN is always false, and it exists purely to assign a new value to #prevcol. Because this row's col is different to the previous row's col, we have to assign the new value to keep it for testing next time. Because the assignment is made and then the result of the assignment is compared with null, and anything equated with bill is false. This predicate is always false. But it did its job of keeping the value
This means in situations where the partition by col has changed, it is the ELSE that gives a new value for #r, restarting the numbering from 1
Let's say there are millions of records in my_table.
Here is my query to extract rows with a specific name from list:
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4')
How do I limit the returned result per name1, name2, etc?
The following query would limit the whole result (to 100).
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4') LIMIT 100
I need to limit to 100 for each name.
This is a bit of a pain in MySQL, but the best method is probably variables:
select t.*
from (select t.*,
(#rn := if(#n = name, #rn + 1,
if(#n := name, 1, 1)
)
) as rn
from my_table t cross join
(select #n := '', #rn := 0) params
order by name
) t
where rn <= 100;
If you want to limit this to a subset of the names, then add the where clause to the subquery.
Note: If you want to pick certain rows -- such as the oldest or newest or biggest or tallest -- just add a second key to the order by in the subquery.
Try
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4') FETCH FIRST 100 ROWS ONLY
I have a query like
Select *
From y
WHERE y.z = (
SELECT a, (adding rownumber here)
FROM b
)
I want to add a clause where it only selects every second row. To do this I need to add row_number() to the subquery, and have a clause where rownumber % 2 = 0.
My question is, am I able to add rownumber to the select of the subquery and somehow hide it so it doesn't affect the query
Rownumbering in MySQL is a notorious pain in the neck.
You can number your rows in MySQL like this.
SELECT (#rownum := #rownum+1) rownum, b.*
FROM b
JOIN (SELECT #rownum := 0) init
ORDER BY b.whatever
Don't forget the ORDER BY clause here. Without explicit ordering the query engine is free to randomize the order of rows it returns.
Then, you can use that mess as a subquery and do things with the rownum.
SELECT *
FROM (
SELECT (#rownum := #rownum+1) rownum, b.*
FROM b
JOIN (SELECT #rownum := 0) init
ORDER BY b.whatever
) table_with_rownum
WHERE rownum % 2 = 0
If you don't want to show the rownumbers, change your SELECT from SELECT * to SELECT col, col, col and leave out rownum.
I have a table with columns like this:
id | timestamp | ...
and I am looking for rows where the timestamp decreased since the previous row.
I tried a statement like this:
SELECT count(a.id)
FROM tbl AS a INNER JOIN tbl AS b ON a.id+1=b.id
WHERE a.timestamp<b.timestamp;
but it appears not to have worked. I get zero results even though I expect some. Any suggestions what is wrong?
I would also appreciate any ideas on a better way to write this query.
I am using MySQL.
You can get the previous value using a correlated subquery, and then use that for the comparison:
select t.*
from (select t.*,
(select t2.timestamp from tbl t2 where t2.id < t.id order by t2.id desc limit 1
) as prevts
from tbl t
) t
where timestamp < prevts;
The problem with your query is probably that the ids have gaps in them.
EDIT:
You can do this with variables. The challenge is getting the variable comparison and assignment in a single expression. This is needed because MySQL does not guarantee the order of evaluation of expressions in a select statement.
The following assigns a value to IsDecreasing and assigns the values:
select t.*
from (select t.*,
if(#prev > timestamp, if(#prev := timestamp, 1, 1),
if(#prev := timestamp, 0, 0)
) IsDecreasing
from tbl t cross join
(select #prev := -1) vars
order by id
) t
where IsDecreasing = 1;
This should be faster than the previous method -- probably even when you have the right index.
Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;
MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.
min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words
SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...
Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.