Partitioning SQL query by arbitrary number of rows - mysql

I have a SQL table with periodic measurements. I'd like to be able to return some summary method (say SUM) over the value column, for an arbitrary number of rows at a time. So if I had
id | reading
1 10
5 14
7 10
11 12
13 18
14 16
I could sum over 2 rows at a time, getting (24, 22, 34), or I could sum 3 rows at a time and get (34, 46), if that makes sense. Note that the ID might not be contiguous -- I just want to operate by row count, in sort order.
In the real world, the identifier is a timestamp, but I figure that (maybe after applying a unix_timestamp() call) anything that works for the simple case above should be applicable. If it matters, I'm trying to gracefully scale the number of results returned for a plot query -- maybe there's a smarter way to do this? I'd like the solution to be general, and not impose a particular storage mechanism/schema on the data.

You may resequense query result and then group it
SET #seq = 0;
SELECT SUM(data), ts FROM (
SELECT #seq := #seq + 1 AS seq, data, ts FROM table ORDER BY ts LIMIT 50
) AS tmp GROUP BY floor(tmp.seq / 3);

Related

How to keep the newest rows only in a WITH RECURSIVE statement effectively?

I have some initial rows in a table. I would like to modify them with a recursive call. In my example code this function is a simple multiplication by two, and I would like to execute it 5 times:
WITH RECURSIVE cte (n,v) AS
(
-- initial values
SELECT 0,2
UNION ALL
SELECT 0,3
UNION ALL
-- generator
SELECT n + 1, v * 2 FROM cte WHERE n < 5
)
SELECT v FROM cte where n = 5;
It works, but my problem is that it only filters out the unneeded values at the end of the query. If I start with much more rows, it can degrade performance, because I have way more rows in the memory as I should. Is it possible to keep the newest values only in each iteration?
SQLFiddle: http://sqlfiddle.com/#!5/9eecb7/6761
In SQLite you can use OFFSET clause
The OFFSET clause, if it is present and has a positive value N,
prevents the first N rows from being added to the recursive table. The
first N rows are still processed by the recursive-select — they just
are not added to the recursive table. Rows are not counted toward
fulfilling the LIMIT until all OFFSET rows have been skipped.
Demo: http://sqlfiddle.com/#!5/9eecb7/6804
WITH RECURSIVE cte (n,v) AS
(
-- initial values
SELECT 0,2
UNION ALL
SELECT 0,3
UNION ALL
-- generator
SELECT n + 1, v * 2 FROM cte WHERE n < 5 LIMIT 1000 OFFSET 10
)
SELECT * FROM cte
| n | v |
|---|----|
| 5 | 64 |
| 5 | 96 |
In the example above the offset is calculated as the number of initial rows in the initial select (2 rows) times the number of iterations (5) => 2*5=10
By the way, in this concrete example the better solution would be calculating simple X * 2^5 (X mltipled by power of 2 to 5) instead of recursion.
In SQLite, the CTE is implemented as a coroutine (as shown by the EXPLAIN output), so only the current row is kept in memory, and performance will not degrade due to memory usage.
MySQL does not allow LIMIT in the recursive SELECT part. If I interpret WL#3634 correctly, the implementation in version 8.0 always completely materializes recursive CTEs.
So in SQLite, you do not need to do anything, and in MySQL, you cannot do anything.

select random value based on probability chance

How do I select a random row from the database based on the probability chance assigned to each row.
Example:
Make Chance Value
ALFA ROMEO 0.0024 20000
AUDI 0.0338 35000
BMW 0.0376 40000
CHEVROLET 0.0087 15000
CITROEN 0.016 15000
........
How do I select random make name and its value based on the probability it has to be chosen.
Would a combination of rand() and ORDER BY work? If so what is the best way to do this?
You can do this by using rand() and then using a cumulative sum. Assuming they add up to 100%:
select t.*
from (select t.*, (#cumep := #cumep + chance) as cumep
from t cross join
(select #cumep := 0, #r := rand()) params
) t
where #r between cumep - chance and cumep
limit 1;
Notes:
rand() is called once in a subquery to initialize a variable. Multiple calls to rand() are not desirable.
There is a remote chance that the random number will be exactly on the boundary between two values. The limit 1 arbitrarily chooses 1.
This could be made more efficient by stopping the subquery when cumep > #r.
The values do not have to be in any particular order.
This can be modified to handle chances where the sum is not equal to 1, but that would be another question.

MySQL query to assign values to a field based in an iterative manner

I am using a MySql table with 500,000 records. The table contains a field (abbrevName) which stores a two-character representation of the first two letters on another field, name.
For example AA AB AC and so on.
What I want to achieve is the set the value of another field (pgNo) which stores a value for page number, based on the value of that records abbrevName.
So a record with an abbrevName of 'AA' might get a page number of 1, 'AB' might get a page number of 2, and so on.
The catch is that although multiple records may have the same page number (after all multiple entities might have a name beginning with 'AA'), once the amount of records with the same page number reaches 250, the page number must increment by one. So after 250 'AA' records with a page number of 1, we must assign futher 'AA records with a page number of 2, and so on.
My Pseudocode looks something like this:
-Count distinct abbrevNames
-Count distinct abbrevNames with more than 250 records
-For the above abbrevNames count the the sum of each divided by 250
-Output a temporary table sorted by abbrevName
-Use the total number of distinct page numbers with 250 or less records to assign page numbers incrementally
I am really struggling to put anything together in a query that comes close to this, can anyone help with my logic or some code ?
Please have a try with this one:
SELECT abbrevNames, CAST(pagenumber AS signed) as pagenumber FROM (
SELECT
abbrevNames
, IF(#prev = abbrevNames, #rows_per_abbrev:=#rows_per_abbrev + 1, #pagenr:=#pagenr + 1)
, #prev:=abbrevNames
, IF(#rows_per_abbrev % 250 = 0, #pagenr:=#pagenr + 1, #pagenr) AS pagenumber
, IF(#rows_per_abbrev % 250 = 0, #rows_per_abbrev := 1, #rows_per_abbrev)
FROM
yourTable
, (SELECT #pagenr:=0, #prev:=NULL, #rows_per_abbrev:=0) variables_initialization
ORDER BY abbrevNames
) subquery_alias
UPDATE: I had misunderstood the question a bit. Now it should work

Break Numbers List Into Min and Max Ranges

Brain is not working today and my google skills are failing me.
I have a column of numbers ranging from 1 - 1000. I want to dump the min and max values for 100 (or whatever I chose) record ranges into a temp table. The plan is to use this temp table to process ranges of records (in this example 100 at a time) in a larger table.
Swear I have done this before with a CTE but then I had something to group on. Here I just want to break up a single list of numbers into ranges of X.
The output from the temp table should look like:
Min Max
0 99
100 199
200 299
300 399
etc.
Thanks!
You can use this trick from Stuart Ainsworth:
http://codegumbo.com/index.php/2009/01/25/building-ranges-using-a-dynamically-generated-numbers-table/
Numbers tables are awesome, but he uses a dynamically generated numbers table, which is even awesome...r.
If you know all numbers are present in the source table, you can use a recursive CTE to generate the number ranges:
; with numbers as
(
select 0 as a
, 99 as b
union all
select a+100
, b+100
from numbers
where a < 900
)
select *
from numbers
If the source table is sparsely populated, you can limit it to numbers that are actually present like:
... insert CTE from above here ...
select min(ot.NumberColumn)
, max(ot.NumberColumn)
from numbers
left join
OtherTable ot
on ot.NumberColumn between numbers.a and numbers.b
group by
numbers.a
enter code hereI have been having a play with a CTE after you posted this and came up with the following, I would be interested to hear if it works for you at all.
DECLARE #segment int = 100
;
WITH _CTE
(rowNum, value)
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY col01) -1, col01
FROM dbo.testTable
)
SELECT rowNum/#segment AS Bucket, MIN(Value) AS MinVal, MAX(Value) AS MaxVal
FROM _CTE
group by rowNum/#segment
ORDER BY Bucket
;
col01 in this case is the column that you want the min/max range values from, as is TestTable.

How can I optimize my query (rank query)?

For the last two days, I have been asking questions on rank queries in Mysql. So far, I have working queries for
query all the rows from a table and order by their rank.
query ONLY one row with its rank
Here is a link for my question from last night
How to get a row rank?
As you might notice, btilly's query is pretty fast.
Here is a query for getting ONLY one row with its rank that I made based on btilly's query.
set #points = -1;
set #num = 0;
select * from (
SELECT id
, points
, #num := if(#points = points, #num, #num + 1) as point_rank
, #points := points as dummy
FROM points
ORDER BY points desc, id asc
) as test where test.id = 3
the above query is using subquery..so..I am worrying about the performance.
are there any other faster queries that I can use?
Table points
id points
1 50
2 50
3 40
4 30
5 30
6 20
Don't get into a panic about subqueries. Subqueries aren't always slow - only in some situations. The problem with your query is that it requires a full scan.
Here's an alternative that should be faster:
SELECT COUNT(DISTINCT points) + 1
FROM points
WHERE points > (SELECT points FROM points WHERE id = 3)
Add an index on id (I'm guessing that you probably you want a primary key here) and another index on points to make this query perform efficiently.