Brain is not working today and my google skills are failing me.
I have a column of numbers ranging from 1 - 1000. I want to dump the min and max values for 100 (or whatever I chose) record ranges into a temp table. The plan is to use this temp table to process ranges of records (in this example 100 at a time) in a larger table.
Swear I have done this before with a CTE but then I had something to group on. Here I just want to break up a single list of numbers into ranges of X.
The output from the temp table should look like:
Min Max
0 99
100 199
200 299
300 399
etc.
Thanks!
You can use this trick from Stuart Ainsworth:
http://codegumbo.com/index.php/2009/01/25/building-ranges-using-a-dynamically-generated-numbers-table/
Numbers tables are awesome, but he uses a dynamically generated numbers table, which is even awesome...r.
If you know all numbers are present in the source table, you can use a recursive CTE to generate the number ranges:
; with numbers as
(
select 0 as a
, 99 as b
union all
select a+100
, b+100
from numbers
where a < 900
)
select *
from numbers
If the source table is sparsely populated, you can limit it to numbers that are actually present like:
... insert CTE from above here ...
select min(ot.NumberColumn)
, max(ot.NumberColumn)
from numbers
left join
OtherTable ot
on ot.NumberColumn between numbers.a and numbers.b
group by
numbers.a
enter code hereI have been having a play with a CTE after you posted this and came up with the following, I would be interested to hear if it works for you at all.
DECLARE #segment int = 100
;
WITH _CTE
(rowNum, value)
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY col01) -1, col01
FROM dbo.testTable
)
SELECT rowNum/#segment AS Bucket, MIN(Value) AS MinVal, MAX(Value) AS MaxVal
FROM _CTE
group by rowNum/#segment
ORDER BY Bucket
;
col01 in this case is the column that you want the min/max range values from, as is TestTable.
Related
I would like to calculate the std dev. min and max of the mer_data array into 3 other fields called std_dev,min_mer and max_mer grouped by mac and timestamp.
This needs to be done without flattening the data as each mer_data row consists of 4000 float values and multiplying that with 700k rows gives a very high dimensional table.
The mer_data field is currently saved as varchar(30000) and maybe Json format might help, I'm not sure.
Input:
Output:
This can be done in Snowflake or MySQL.
Also, the query needs to be optimized so that it does not take much computation time.
While you don't want to split the data up, you will need to if you want to do it in pure SQL. Snowflake has no problems with such aggregations.
WITH fake_data(mac, mer_data) AS (
SELECT * FROM VALUES
('abc','43,44.25,44.5,42.75,44,44.25,42.75,43'),
('def','32.75,33.25,34.25,34.5,32.75,34,34.25,32.75,43')
)
SELECT f.mac,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM fake_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
ORDER BY 1;
I would however discourage the use of strings in the grouping process, so would break it apart like so:
WITH fake_data(mac, mer_data, timestamp) AS (
SELECT * FROM VALUES
('abc','43,44.25,44.5,42.75,44,44.25,42.75,43', '01-01-22'),
('def','32.75,33.25,34.25,34.5,32.75,34,34.25,32.75,43', '02-01-22')
), boost_data AS (
SELECT seq8() as seq, *
FROM fake_data
), math_step AS (
SELECT f.seq,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM boost_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
)
SELECT b.mac,
m.avg_dev,
m.std_dev,
m.MIN_MER,
m.Max_MER,
b.timestamp
FROM boost_data b
JOIN math_step m
ON b.seq = m.seq
ORDER BY 1;
MAC
AVG_DEV
STD_DEV
MIN_MER
MAX_MER
TIMESTAMP
abc
43.5625
0.7529703087
42.75
44.5
01-01-22
def
34.611111111
3.226141056
32.75
43
02-01-22
performance testing:
so using this SQL to make 70K rows of 4000 values each:
create table fake_data_tab AS
WITH cte_a AS (
SELECT SEQ8() as s
FROM TABLE(GENERATOR(ROWCOUNT =>70000))
), cte_b AS (
SELECT a.s, uniform(20::float, 50::float, random()) as v
FROM TABLE(GENERATOR(ROWCOUNT =>4000))
CROSS JOIN cte_a a
)
SELECT s::text as mac
,LISTAGG(v,',') AS mer_data
,dateadd(day,s,'2020-01-01')::date as timestamp
FROM cte_b
GROUP BY 1,3;
takes 79 seconds on a XTRA_SMALL,
now with that we can test the two solutions:
The second set of code (group by numbers, with a join):
WITH boost_data AS (
SELECT seq8() as seq, *
FROM fake_data_tab
), math_step AS (
SELECT f.seq,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM boost_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
)
SELECT b.mac,
m.avg_dev,
m.std_dev,
m.MIN_MER,
m.Max_MER,
b.timestamp
FROM boost_data b
JOIN math_step m
ON b.seq = m.seq
ORDER BY 1;
takes 1m47s
the original group by strings/dates
SELECT f.mac,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER,
f.timestamp
FROM fake_data_tab f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1,6
ORDER BY 1;
takes 1m46s
Hmm, so leaving the "mac" as a number made the code very fast (~3s), and dealing with strings in ether way changed the data processed from 1.5GB for strings and 150MB for numbers.
If the numbers were in rows, not packed together like that, we can discuss how to do it in SQL.
In rows, GROUP_CONCAT(...) can construct a commalist like you show, and MIN(), STDDEV(), etc can do the other stuff.
If you continue to have the commalist, the do the rest of work in you app programming language. (It is very ugly to have SQL pick apart an array.)
I have two tables(ProductionDetails, Production), I need to populate Production based on calculated values of the ProductionDetails. ProductionDetails has the number of buckets gathered, Production needs to be populated with BINS. For example if I have 100 buckets, and I divide those by 30 that equals 3.33 bins. Now I need to create 3 BINS.
select Round( sum(ep.Buckets)/30),ep.lot,ep.crewid
from On_EmpProdDetails ep
group by ep.buckets, ep.lot,ep.crewid
Results in
Bins
Lot
Crew
3
556790186SOCC1
SOCC1
Now I want to insert 3 unique rows into the Production Table
BinID is auto increment
BinID
LOT
1
556790186SOCC1
2
556790186SOCC1
3
556790186SOCC1
You can use a recursive CTE to generate the rows:
insert into production (lot)
with recursive cte as (
select Round( sum(ep.Buckets)/30) as num_bins, ep.lot, ep.crewid, 1 as bin
from On_EmpProdDetails ep
group by ep.buckets, ep.lot,ep.crewid
union all
select num_bins, lot, crewid, bin + 1
from cte
where bin < num_bins
)
select lot
from cte;
Here is a db<>fiddle.
The SQL query is :
Select ProductName from Products;
The above query returns 5000 rows.
How can the result of 5000 rows be divided into two result sets of 2500 rows each,.i.e., one result set from 1 to 2500 and the other from 2501 to 5000?
Note:
Here ProductName is the primary Key.No ProductID column is present in the table.
It can be done either in the back end or in the front end.
An approach that works for mySQL (based on this answer https://stackoverflow.com/a/4741301/14015737):
Upper half
SELECT *
FROM (
SELECT test.*, #counter := #counter +1 counter
FROM (select #counter:=0) initvar, test
ORDER BY num
) X
WHERE counter <= round(50/100 * #counter);
ORDER BY num;
Lower half
Invert the sort order and remove the rounding
SELECT *
FROM (
SELECT test.*, #counter := #counter +1 counter
FROM (select #counter:=0) initvar, test
ORDER BY num DESC
) X
WHERE counter <= (50/100 * #counter);
ORDER BY num;
In case of an uneven number of records, the middle record is added to the upper half in this example. If you want it the other way around, move the round() to the other statement. If you don't want it at all, remove round().
Dbfiddle example: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=fb70eae0f7f1434a24099b5bb19f0878
If you know the numbers that you want, just use limit:
select ProductName
from Products
order by id
And then either:
limit 2500
limit 2500 offset 2499
If you simply want the results split into half, then you can use:
select t.*
from (select t.*,
ntile(2) over (order by <primary key>) as tile
from t
) t
where tile = 1; -- or 2 for the other half
The easiest and probably fastest approach is to use the table's primary key if you are fine with getting the rows in its order.
Run
select productname, id from products order by id;
and fetch 2500 rows. Then with the last ID, say ID 3456, run
select productname, id from products where id > 3456 order by id;
and fetch 2500 rows again. Etc.
UPDATE: Seeing I got a downvote for this, I'll better explain :-)
The query returns 5000 rows now and the OP doesn't want that many rows, so they want to cut this in halves. But the query may well return 10000 rows next year. Will the OP suddenly be fine with getting 5000 rows at once? This doesn't seem likely. It is more likely that there is an amount of rows that shall not be surpassed. This is why I cut the amount into slices of 2500.
The other approach to number all rows and return the first n rows has a severe drawback: All rows must be read again. Even if it is decided to cut the result in chunks of 100 each, everytime all rows must be read, sorted, numbered, fetched from. Reading all rows from a table and sorting all these rows is a lot of work for a DBMS.
I have a table with a column called "Points", the value of this column have a min value of 0, and a max value of 100.000. I have to do an analysis per range of this column, so I wrote a query like this:
select case when Points between 0 and 5000 then '0 - 5000'
when Points between 5001 and 20000 then '50001 - 20000'
when Points > 20000 then '> 20000'
else 0 end RangeP
from Sales
The problem is that the customer wants to see for each 2.000 points from 0 to 100.000
If I wrote the query using the case, the query will get really big, so I'd like one way to get dynamically this range.
It is possible? Thank you very much
You may create a table which contains the ranges, and their text labels, and then join to it, e.g.
SELECT
s.Points,
COALESCE(r.label, 'range not found') AS range
FROM Sales s
LEFT JOIN
(
SELECT 0 AS start, 2000 AS end, '0 - 2000' AS label UNION ALL
SELECT 2001, 4000, '2001 - 4000' UNION ALL
...
SELECT 98000, 100000, '98000 - 100000'
) r
ON s.Points BETWEEN r.start AND r.end;
I have inlined the table of ranges, but you may create a formal table (or maybe a temp table) instead, and then replace the above subquery with just a reference to that table.
I've got a dataset with a bunch of rows for monthly salary payments for each account. we have 6 columns for this -
Salary_1, Salary_2, Salary_3, Salary_4, Salary_5 and Salary_6.
Sometimes salaries 3, 4, 5, 6 and occasionally 2 aren't populated, sometimes none are populated because they're unemployed. In this case, we have 0 in the field.
What I need to do is combine all salaries and find the MAX and MIN from these columns ---
Select
Greatest(Salary_1, Salary_2, Salary_3, Salary_4, Salary_5, Salary_6) as MaxSal,
Least(COALESCE(Salary_1, Salary_2, Salary_3, Salary_4, Salary_5, Salary_6),0) as MinSal
from
(select
sal1 as Salary_1, Select sal2 as Salary_2, Sal3 as Salary_3, sal4 as Salary_4, Sal5 as Salary_5, Sal6 as Salary_6
from ....)a
The problem is, this is returning the correct value for Max Sal but 0.00 for Min, because it is the minimum value but won't let me ignore 0s, but in this case 0 is not a minimum salary value I want, I need the second lowest value here.
I've tried setting the original Sal1-Sal6 values to NULLIF 0 and it returns NULL for max and 0 for Min.
What else could I have a look at? the COALESCE combined with NULLIF has not worked for me which is what has been recommended on previous questions. Thanks!
Greatest and Least do not ignore nulls like aggregation functions do; you'll need to do something to avoid them. One option is something like this:
Greatest(IFNULL(Salary_1 ,0), ...)
Least(
CASE WHEN Salary_1 IS NULL OR Salary_1 = 0 THEN /*some huge value*/ ELSE Salary_1 END
, CASE WHEN Salary_2
....)
This might be simplest to unpivot and aggregate the data:
select id, max(salary), min(salary)
from ((select id, salary_1 as salary from t) union all
(select id, salary_2 as salary from t) union all
. . .
) t
group by id;
This is definitely more expensive than a giant case expression. On the other hand, it is less prone to error.
The real suggestion is to fix your data model. Trying to store an array in multiple columns is generally a sign of a poor data model. The more appropriate method would have one row per salary rather than putting them in separate columns.