I'm becoming frustrated with a curious limitation of SQL - its apparent inability to relate one record to another outside of aggregate functions. My problem is summarized thusly.
I have a table, already sorted. I need to find its maximum values (note the plural!) and minimum values. No, I am not looking for a single maximum or single minimum. More specifically I'm trying to generate a list of the local peaks of a numeric sequence. A rough description of an algorithm to generate this is:
WHILE NOT END_OF_TABLE
IF RECORD != FIRST_RECORD AND RECORD != LAST_RECORD THEN
IF ((RECORD(Field)<RECORD_PREVIOUS(Field) AND RECORD(Field)<RECORD_NEXT(Field)) OR
RECORD(Field)>RECORD_PREVIOUS(Field) AND RECORD(Field)>RECORD_NEXT(Field)) THEN
ADD_RESULT RECORD
END IF
END IF
END WHILE
See the Problem? I need to do a query that a given record must compare against the previous and next records' values. Can this even be accomplished in standard SQL?
Your frustration is shared by many; while SQL is great for working with general sets, it's terribly deficient when trying to work with issues specific to ordered sets (whether it's physically ordered in the table or there is an implicit or explicit logical order is irrelevant). There are some things that can help (for example, the rank() and row_number() functions), but the solutions can differ across RDBMS's.
If you can be specific about which platform you're working with, I or someone else can provide a more detailed answer.
You have to self-join twice and generate a rownumber without gaps:
In T-SQL:
WITH ordered AS (
SELECT ROW_NUMBER() OVER (ORDER BY your_sort_order) AS RowNumber
,* -- other columns here
)
SELECT *
FROM ordered
LEFT JOIN ordered AS prev
ON prev.RowNumber = ordered.RowNumber - 1
LEFT JOIN ordered AS next
ON next.RowNumber = ordered.RowNumber + 1
WHERE -- here you put in your local min/local max and end-point handling logic - end points will have NULL in next/prev
Yes. You need a self join - but without a database schema, it's hard to be specific about the solution.
Specifically, I'm wondering about the "ordering" thing you mention - but I'm going to assume there's an "ID" field we can use for this.
(Oh, and I'm using old-school join syntax, coz I'm a dinosaur).
select *
from myTable main,
myTable previous,
myTable next
where previous.id = main.id - 1
and next.id = main.id + 1
and previous.record > main.record
and next.record < main.record
(I think I've interpreted your requirement correctly in the greater/less than clauses, but adjust to taste).
SELECT
current.RowID,
current.Value,
CASE WHEN
(
(current.Value < COALESCE(previous.Value, current.Value + 1))
AND
(current.Value < COALESCE(subsequent.Value, current.Value + 1))
)
THEN
'Minima'
ELSE
'Maxima'
END
FROM
myTable current
LEFT JOIN
myTable previous
ON previous.RowID = (SELECT MAX(RowID) FROM myTable WHERE RowID < current.ROWID)
LEFT JOIN
myTable subsequent
ON subsequent.RowID = (SELECT MIN(RowID) FROM myTable WHERE RowID > current.ROWID)
WHERE
(
(current.Value < COALESCE(previous.Value, current.Value + 1))
AND
(current.Value < COALESCE(subsequent.Value, current.Value + 1))
)
OR
(
(current.Value > COALESCE(previous.Value, current.Value - 1))
AND
(current.Value > COALESCE(subsequent.Value, current.Value - 1))
)
Note: The < and > logic is copied from you, but does not cater for local maxima/minima that are equal across one or more consecutive records.
Note: I've created a fictional RowID to join the records in order, all the is important is that the joins get the "previous" and "subsequent" records.
Note: The LEFT JOINs and COALESCE statements cause the first and last values to always be counted as a maxima or minima.
Related
I'm stucked in a MySQL problem that I was not able to find a solution yet. I have the following query that brings to me the month-year and the number new users of each period in my platform:
select
u.period ,
u.count_new as new_users
from
(select DATE_FORMAT(u.registration_date,'%Y-%m') as period, count(distinct u.id) as count_new from users u group by DATE_FORMAT(u.registration_date,'%Y-%m')) u
order by period desc;
The result is the table:
period,new_users
2016-10,103699
2016-09,149001
2016-08,169841
2016-07,150672
2016-06,148920
2016-05,160206
2016-04,147715
2016-03,173394
2016-02,157743
2016-01,173013
So, I need to calculate for each month-year the difference between the period and the last month-year. I need a result table like this:
period,new_users
2016-10,calculate(103699 - 149001)
2016-09,calculate(149001- 169841)
2016-08,calculate(169841- 150672)
2016-07,So on...
2016-06,...
2016-05,...
2016-04,...
2016-03,...
2016-02,...
2016-01,...
Any ideas: =/
Thankss
You should be able to use a similar approach as I posted in another S/O question. You are on a good track to start. You have your inner query get the counts and have it ordered in the final direction you need. By using inline mysql variables, you can have a holding column of the previous record's value, then use that as computation base for the next result, then set the variable to the new balance to be used for each subsequent cycle.
The JOIN to the SqlVars alias does not have any "ON" condition as the SqlVars would only return a single row anyhow and would not result in any Cartesian product.
select
u.period,
if( #prevCount = -1, 0, u.count_new - #prevCount ) as new_users,
#prevCount := new_users as HoldColumnForNextCycle
from
( select
DATE_FORMAT(u.registration_date,'%Y-%m') as period,
count(distinct u.id) as count_new
from
users u
group by
DATE_FORMAT(u.registration_date,'%Y-%m') ) u
JOIN ( select #prevCount := -1 ) as SqlVars
order by
u.period desc;
You may have to play with it a little as there is no "starting" point in counts, so the first entry in either sorted direction may look strange. I am starting the "#prevCount" variable as -1. So the first record processed gets a new user count of 0 into the "new_users" column. THEN, whatever was the distinct new user count was for the record, I then assign back to the #prevCount as the basis for all subsequent records being processed. yes, it is an extra column in the result set that can be ignored, but is needed. Again, it is just a per-line place-holder and you can see in the result query how it gets its value as each line progresses...
I would create a temp table with two columns and then fill it using a cursor that
does something like this (don't remember the exact syntax - so this is just a pseudo-code):
#val = CURSOR.col2 - (select col2 from OriginalTable t2 where (t2.Period = (CURSOR.Period-1) )))
INSERT tmpTable (Period, NewUsers) Values ( CURSOR.Period, #val)
I have created an application to track progress in League of Legends for me and my friends. For this purpose, I collect information about the current rank several times a day into my MySQL database. To fetch the results and show the to them in the graph, I use the following query / queries:
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
WHERE
lol_summoner.name IN (". str_repeat('?, ', count($names) - 1) ."?)
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
The first query is used in case I want to retrieve all players which are saved in the database. The grid table is a temporary table which generated timestamps in a specific interval to retrive information in chunks of this interval. The two variable in this query are the interval. The second query is used if I want to retrieve information for specific players only.
The grid table is produces by the following stored procedure which is called with three parameters (n_first - first timestamp, n_last - last timestamp, n_increments - increments between two timestamps):
BEGIN
-- Create tmp table
DROP TEMPORARY TABLE IF EXISTS series_tmp;
CREATE TEMPORARY TABLE series_tmp (
series bigint
) engine = memory;
WHILE n_first <= n_last DO
-- Insert in tmp table
INSERT INTO series_tmp (series) VALUES (n_first);
-- Increment value by one
SET n_first = n_first + n_increment;
END WHILE;
END
The query works and finishes in reasonable time (~10 seconds) but I am thankful for any help to improve the query by either rewriting it or adding additional indexes to the database.
/Edit:
After review of #Rick James answer, I modified the queries as follows:
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
GROUP by lol_summoner.name, lol.timestamp div :range
ORDER by name, timestamp ASC
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
WHERE lol_summoner.name IN (<NAMES>)
GROUP by lol_summoner.name, lol.timestamp div " . $steps . "
ORDER by name, timestamp ASC
This improves the query execution time by a really good margin (finished way under 1s).
Problem 1 and Solution
You need a series of integers between two values? And they differ by 1? Or by some larger value?
First, create a permanent table of the numbers from 0 to some large enough value:
CREATE TABLE Num10 ( n INT );
INSERT INTO Num10 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE Nums ( n INT, PRIMARY KEY(n))
SELECT a.n*1000 + b.n*100 + c.n*10 + d.n
FROM Num10 AS a
JOIN Num10 AS b -- note "cross join"
JOIN Num10 AS c
JOIN Num10 AS d;
Now Nums has 0..9999. (Make it bigger if you might need more.)
To get a sequence of consecutive numbers from 123 through 234:
SELECT 123 + n FROM Nums WHERE n < 234-123+1;
To get a sequence of consecutive numbers from 12345 through 23456, in steps of 15:
SELECT 12345 + 15*n FROM Nums WHERE n < (23456-12345+1)/15;
JOIN to a SELECT like one of those instead of to series_tmp.
Barring other issue, that should significantly speed things up.
Problem 2
You are GROUPing BY series, but ORDERing by timestamp. They are related, so you might get the 'right' answer. But think about it.
Problem 3
You seem to be building "buckets" (called "series"?) from "timestamps". Is this correct? If so, let's work backwards -- Turn a "timestamp" into a "bucket" number:
bucket_number = (timestamp - start) / bucket_size
By doing that throughout, you can avoid 'Problem 1' and eliminate my solution to it. That is, reformulate the entire queries in terms of buckets.
I'm getting 01427. 00000 - "single-row subquery returns more than one row" error while executing below procedure. the issue , what i believe , is in subquery
SELECT paymentterm FROM temp_pay_term WHERE pid = d.xProject_id
but how can i get rid of it.Now, i have added the complete code. please check and let me know the wrong tell me if more info. is to be provided.
CREATE OR REPLACE PROCEDURE paytermupdate IS
recordcount INT;
vardid NUMBER(38);
varpaymentterm VARCHAR2(200 CHAR);
BEGIN
recordcount := 0;
SELECT COUNT(1) INTO recordcount
FROM temp_pay_term;
IF recordcount > 0 THEN
FOR x IN (SELECT DISTINCT r.ddocname
FROM temp_pay_term p, docmeta d, revisions r
WHERE TO_CHAR(p.pid) = d.xproject_id AND r.did = d.did )
LOOP
SELECT MAX(did) INTO vardid
FROM revisions r
WHERE r.ddocname = x.ddocname
GROUP BY r.ddocname;
UPDATE docmeta d
SET paymentterm = (
SELECT paymentterm
FROM temp_pay_term
WHERE pid = d.xproject_id
)
WHERE d.did = vardid;
INSERT INTO documenthistory (dactionmillis, dactiondate, did, drevclassid,
duser, ddocname, daction, dsecuritygroup, paymentterm)
SELECT
to_number(TO_CHAR(systimestamp, 'FF')) AS dactionmillis,
TRUNC(systimestamp, 'dd') AS dactiondate,
did,
drevclassid,
'sysadmin' AS duser,
ddocname,
'Update' AS daction,
dsecuritygroup,
paymentterm
FROM revisions
WHERE did = vardid;
END LOOP;
COMMIT;
END IF;
END paytermupdate;
Do you use something like
select x,y,z, (subquery) from ?
If you are getting ORA-01427 you should think how to make filter conditions in your subquery more restrictive, and these restrictions should be business reasonable, not just simply "and rownum <=1".
As you want to update a record through that sub query you should put more filter conditions in it. You can decide on the filter conditions on the basis of the value you want to update in the table in outer query. If there are more values which satisfy the condition (which I do not believe is ideal but just in case) then rownum <=1 would suffice.
Two basic options come to mind. I'll start with the simplest.
First, add distinct to the subquery.
SET paymentterm =
(SELECT distinct paymentterm
FROM temp_pay_term
WHERE pid = d.xProject_id
)
Second, if you're receiving multiple distinct values from the subquery, then you will either have to (a) rework your script to not use a subquery or (b) limit values returned (as #Baljeet suggested) using more filter criteria or (c) pick which of the multiple distinct values you want using an aggregate function.
Using the aggregate method, I'm guessing PaymentTerm is a number of months or years? Even if it's a n/varchar field (i.e., "6 months"), you can still use the MIN() and MAX() aggregates (or at least you can in t-sql). If it's a numeric field, you could also use average. You'll have to figure out which works best for your business needs.
I have a MySql table with a 'Order' field but when a record gets deleted a gap appears
how can i update my 'Order' field sequentially ?
If possible in one query 1 1
id.........order
1...........1
5...........2
4...........4
3...........6
5...........8
to
id.........order
1...........1
5...........2
4...........3
3...........4
5...........5
I could do this record by record
Getting a SELECT orderd by Order and row by row changing the Order field
but to be honest i don't like it.
thanks
Extra info :
I also would like to change it this way :
id.........order
1...........1
5...........2
4...........3
3...........3.5
5...........4
to
id.........order
1...........1
5...........2
4...........3
3...........4
5...........5
In MySQL you can do this:
update t join
(select t.*, (#rn := #rn + 1) as rn
from t cross join
(select #rn := 0) const
order by t.`order`
) torder
on t.id = torder.id
set `order` = torder.rn;
In most databases, you can also do this with a correlated subquery. But this might be a problem in MySQL because it doesn't allow the table being updated as a subquery:
update t
set `order` = (select count(*)
from t t2
where t2.`order` < t.`order` or
(t2.`order` = t.`order` and t2.id <= t.id)
);
There is no need to re-number or re-order. The table just gives you all your data. If you need it presented a certain way, that is the job of a query.
You don't even need to change the order value in the query either, just do:
SELECT * FROM MyTable WHERE mycolumn = 'MyCondition' ORDER BY order;
The above answer is excellent but it took me a while to grok it so I offer a slight rewrite which I hope brings clarity to others faster:
update
originalTable
join (select originalTable.ID,
(#newValue := #newValue + 10) as newValue
from originalTable
cross join (select #newValue := 0) newTable
order by originalTable.Sequence)
originalTable_reordered
on originalTable.ID = originalTable_reordered.ID
set originalTable.Sequence = originalTable_reordered.newValue;
Note that originalTable.* is NOT required - only the field used for the final join.
My example assumes the field to be updated is called Sequence (perhaps clearer in intent than order but mainly sidesteps the reserved keyword issue)
What took me a while to get was that "const" in the original answer was not a MySQL keyword. (I'm never a fan of abbreviations for that reason -- the can be interpreted many ways at times especially at these very when it is best they not be misinterpreted. Makes for verbose code I know but clarity always trumps convenience in my books.)
Not quite sure what the select #newValue := 0 is for but I think this is a side effect of having to express a variable before it can be used later on.
The value of this update is of course an atomic update to all the rows in question rather than doing a data pull and updating single rows one by one pragmatically.
My next question, which should not be difficult to ascertain, but I've learned that SQL can be a trick beast at the best of times, is to see if this can be safely done on a subset of data. (Where some originalTable.parentID is a set value).
Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?
I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.
update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.