Funny one this,
I've got a table, "addresses", with a list of address details, some with missing fields.
I want to identify these rows, and replace them with the previous address row, however these must only be accounts that are NOT the most recent address on the account, they must be previous addresses.
Each address has a sequence number (1,2,3,4 etc), so i cab easily identify the MAX address and make that it's not the most recent address on the account, however how do I then scan for what is effectively, "Max -1", or "one less than max"?
Any help would be hugely appreciated.
Try this:
SELECT MAX(field) FROM table WHERE field < (SELECT MAX(field) FROM table)
By the way: Here is a good article, which describes how to achieve nth row.
SELECT TOP 1 field
FROM(
SELECT DISTINCT TOP 2 field
FROM table
ORDER BY field DESC
)tbl ORDER BY field;
This returns 1st or nth max record.
;WITH Distincts as (
SELECT DISTINCT field from table
)
,
NextMax as (
select field, ROW_NUMBER() over (order by field desc) as RN from Distincts
)
select * from NextMax where RN = 2
Related
This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 10 months ago.
Above is the table and on the basis of which I have to answer the below question in my past interview.
Q. The most recent order value for each customer?
Answer which I have given in interview:
select customerID, ordervalue, max(orderdate)
from office
group by customerID;
I know since we are not using ordervalue in aggregate and nor in group by so this query will throw an error in SQL but I want to know how to answer this question.
Many times in my past interviewers asked a question where I need to use a column in select statement which is not in aggregate function or nor in group by. So I want know in general what is a workaround for it with an example so that I can resolve these type of questions or how to answer these questions.
The work around depends on what is being asked. For the requirements you have above, I think it makes sense to create (customerid, MAX(orderdate)) pairs.
SELECT customerid, MAX(orderdate)
FROM office
GROUP BY customerid;
Then you can use them to match the row you need from the table.
SELECT customerid, ordervalue, orderdate
FROM office
WHERE (customerid, orderdate) IN
(SELECT customerid, MAX(orderdate)
FROM office
GROUP BY customerid);
Note, this assumes there is only one order per customer per day. If there were more than one, you would see the most recent order(s) per customer. You could add also a GROUP BY on the outer query if needed.
SELECT customerid, MAX(ordervalue), orderdate
FROM office AS tt
WHERE (customerid, orderdate) IN
(SELECT customerid, MAX(orderdate)
FROM office
GROUP BY customerid)
GROUP BY customerid, orderdate;
If the non-aggregate column you need in the SELECT is functionally dependent on the column in the GROUP BY, you can add a subquery in the SELECT.
We can extend your example by adding a name column, where the name of different customers could be the same. If you wanted name instead of ordervalue, just match the customerid of the outer query to get name.
SELECT customerid,
(SELECT name FROM office WHERE customerid=o.customerid LIMIT 1) AS name,
MAX(orderdate)
FROM office AS o
GROUP BY customerid;
You are approaching the task as follows: Aggregate all rows to get one result line per customer, showing the maximum order date and its order value. The problem with this: you'd need an aggregate function to get the value for the maximum order date. The only DBMS I know of featuring such a function is Oracle with KEEP FIRST/LAST.
So look at the task from a different angle. Don't think aggregation-wise where you could count and add up values for a group and get the minimum or maximum value over all the group's rows, because after all you just want to pick single rows. (That is, pick the top 1 row per customer.) In order to pick rows, you'll use a WHERE clause.
One option has been shown by Steve in his answer:
select *
from office
where (customerid, orderdate) in
(
select customerid, max(orderdate)
from office
group by customerid
);
This is a good, straight-forward approach. (Some DBMS, though, don't feature tuples with IN clauses.)
Another way to get the "best" row for a customer would be to pick those rows for which not exists a better row:
select *
from office
where not exists
(
select null
from office better
where better.customerid = office.customerid
and better.orderdate > office.orderdate
);
And then there is the option to use a window function (aka analytic function) in order to get those rows. One example is to get the maximum dates along with the rows' data:
select customerid, ordervalue, orderdate
from
(
select
customerid, ordervalue, orderdate,
max(orderdate) over (partition by customerid) as max_orderdate
from office
)
where orderdate = max_orderdate;
And with ROW_NUMBER, RANK, and DENSE_RANK there are window functions to assign numbers to your rows in the order you want. You number them such that the best rows get number 1 and pick them. The big advantage here: you can apply any order, deal with ties and not only get the top 1, but the top n rows.
select customerid, ordervalue, orderdate
from
(
select
customerid, ordervalue, orderdate,
row_number() over (partition by customerid order by orderdate desc) as rn
from office
)
where rn = 1;
I am looking to combine rows into a single row if the data is consecutive. I've looked over gaps and islands and I know I used to do this very regularly with SQL Server. The solution escapes me, but I recall doing some type of ROW_NUMBER() over (Partition BY groupId, name, email, phone ORDER BY id) - ROW_NUMBER() over (ORDER BY id) seq with a calculation. I was not successful in getting this to work.
Here is a sample data set:
Desired Result:
Any help would be greatly appreciated.
If you only want the last row in each group, you can use lead():
select t.*
from (select t.*,
lead(id) over (order by id) as next_id,
lead(id) over (partition by groupid, name, email, phone order by id) as next_id_grp
from t
) t
where next_id_grp is null or next_id_grp <> next_id;
This is looking at the next id and the next id for members in the same group. When these are different, then the row is the last row in the group.
This is better than the row_number() approach because that would typically require an aggregation.
Lets say I have a table with the following rows/values:
I need a way to select the values in amount but only once if they're duplicated. So from this example I'd want to select A,B and C the amount once. The SQL result should look like this then:
Use LAG() function and compare previous amount with current row amount for name.
-- MySQL (v5.8)
SELECT t.name
, CASE WHEN t.amount = t.prev_val THEN '' ELSE amount END amount
FROM (SELECT *
, LAG(amount) OVER (PARTITION BY name ORDER BY name) prev_val
FROM test) t
Please check from url https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8c1af9afcadf2849a85ad045df7ed580
You can handle situation like these with different function depending on what you need:
Case1 : If you have same values per name:
select distinct name, amount from [table name]
Case2 : You have duplicates with different values for each name and you want to pick the one with the highest value. Use min() if you need the minimum one to show up.
select name, max(amount) from [table name] group by 1
Case 3: The one you need with blanks for the rest of the duplications.
Row number will create rows based on values in amount and since the values are the same it will create it incrementally and you can then use IF to create a new column where rank_ > 1 then blanks. This will also cover the case where you would like to select just the minimum value and then have blanks for the rest of the name values
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
Case 4: You need to select maximum and put the other names as blank
You will just adjust the order by clause of ROW_NUMBER() to DESC. This will put the rank 1 to the highest amount per name and for the rest, the blank will be filled
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT DESC) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
If you are using mysql 8 you can use row_number for this:
with x as (
select *, row_number() over(partition by name order by amount) rn
from t
)
select name, case when rn=1 then amount else '' end amount
from x
See example Fiddle
The other answers are missing a really important point: A SQL table returns an unordered set unless there is an explicit order by.
The data that you have provides has rows that are exact duplicates. For this reason, I think the best approach uses row_number() and an order by in the outer query:
select name, (case when seqnum = 1 then amount end) as amount
from (select t.*,
row_number() over (partition by name, amount) as seqnum
from t
) t
order by name, seqnum;
Note that MySQL does not require an order by argument for row_number().
More commonly, though, you would have some other column (say a date or id) that would be used for ordering. I should also emphasize that this type of formatting is often handled at the application layer and not in the database.
I have a subquery that aggregates some UNION ALL selects. Over that I prepare the SELECT to create cross-tab and limit it to let's say 20. I would like to be able to retrieve the total COUNT of sub query results before I am limiting them in main query. This is for the purpose of trying to build a pagination that receives the total number of records and then the specific page record grid.
Sample query:
SELECT
name,
sumIf(metric_value, metric_name = 'data') AS data,
sumif(....
FROM
(SELECT
name, metric_name, SUM(metric_value) as metric_value
FROM
(SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table2
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
SELECT
name, 'data' AS metric_name, SUM(data) AS metric_value
FROM
table3
WHERE
date > '2017-01-01 00:00:00'
GROUP BY
name
UNION ALL
.
.
.)
GROUP BY
name, metric_name)
GROUP BY
name
ORDER BY
name ASC
LIMIT 0,20;
The first subselect returns tons of data, so I thought I can count it and return as one column value, or row and it would propagate to main select that limits 20 results. Because I need to know the entire set of results but don;t want to call the same query twice without limit and with limit just to get COUNT. There are at least 12 UNION ALL third level sub selects, so why waste resources. I am looking to try generic SQL solutions not necessarily related to ClickHouse
I was thinking of using count(*) OVER (), however that is not supported, so if thats only option I know I need to run query twice.
The first thing that one should mention is that nobody is usually interested in the exact number of pages on a query. It can be easily estimated and almost no one will care how exact is the estimation. However, if you have a link to the last page in your GUI, people will often click to link just to see whether it works.
Nevertheless, there are cases when an analyst should visit all the pages, and then the GUI should display the exact amount of work. A good news is that in that latter case, a better strategy is to cache a snapshot of the whole results table and counting the rows in the table becomes not a problem anymore.
I mean, it makes sense to discuss with the customers whether they really need it, because unneeded full scans many times per day may have effect on the database load and billing sums.
Anyway, if you still need to estimate the number of rows, you can simplify the query just to count the number of rows. As I understand this is something like:
SELECT SUM(cnt) as row_count
FROM (
SELECT COUNT(DISTINCT name) as cnt FROM table1 WHERE date > ...
UNION ALL
SELECT COUNT(DISTINCT name) as cnt FROM table2 WHERE date > ...
...
) as counts;
or if data is a constant metric name
SELECT COUNT(DISTINCT name) as row_count
FROM (
SELECT DISTINCT name FROM table1 WHERE date > ...
UNION ALL
SELECT DISTINCT name FROM table2 WHERE date > ...
...
) as names;
I have a voting application that writes values to a mysql db table. It is a preference/weighted voting system so people choose a first option, second option, and third option. These all go into separate fields in the table. I'm looking for a way to write a query that will assign numerical values to the responses (3 for a first response, 2 for a second, 1 for a first) and then display the value with the summed score. I've been able to do this for total number of votes
select count(name) as votes,name
from (select 1st_option as name from votes
union all
select 2nd_option from votes
union all
select 3rd_option from votes) as tbl
group by name
having count(name) > 0
order by 1 desc;
but haven't quite figured out how to assign values to response in each column and then pull them together. Any help is much appreciated. Thanks!
You could do something like this:
select sum(score) as votes,name
from (select 1st_option as name, 3 as score from votes
union all
select 2nd_option as name, 2 as score from votes
union all
select 3rd_option as name, 1 as score from votes) as tbl
group by name;