This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Generating a series of dates
What is the best way in mysql to generate a series of numbers in a given range?
The application I have in mind is to write a report query that returns a row for every number, regardless of whether there is any data to report. An example in its simplest form might be:
SELECT numbers.num, COUNT(answers.id)
FROM <series of numbers between X and Y> numbers
LEFT JOIN answers ON answers.selection_number = numbers.num
GROUP BY 1
I have tried creating a table with lots of numbers, but that seems like a poor workaround.
First, create a table called ints which will contain one record for each digit from 0 to 9.
CREATE TABLE ints ( i tinyint );
Then populate that table with data.
INSERT INTO ints VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Now you can use a query such as the following to generate a sequence of numbers.
SELECT generator.num, COUNT(answers.id)
FROM (
SELECT a.i*10 + b.i AS num
FROM ints a, ints b
ORDER BY 1
) generator
LEFT JOIN answers ON answers.selection_number = generator.num
WHERE generator.num BETWEEN 18 AND 43
To add another place value to the generated numbers, just add more joins of the ints table and adjust the calculations accordingly. The following will generate three-digit numbers:
SELECT generator.num, COUNT(answers.id)
FROM (
SELECT a.i*100 + b.i*10 + c.i AS num
FROM ints a, ints b, ints c
ORDER BY 1
) generator
LEFT JOIN answers ON answers.selection_number = generator.num
WHERE generator.num BETWEEN 328 AND 643
you can try with group by numbers.num instead of group by 1
SELECT numbers.num, COUNT(answers.id)
FROM numbers
LEFT JOIN answers ON answers.selection_number = numbers.num
WHERE numbers.num between X and Y
GROUP BY numbers.num
Related
I have data in MySQL table, my data looks like
Key, value
A 1
A 2
A 3
A 6
A 7
A 8
A 9
B 1
B 2
and I want to group it based on the continuous sequence. Data is sorted in the table.
Key, min, max
A 1 3
A 6 9
B 1 2
I tried googling it but could find any solution to it. Can someone please help me with this.
This is way easier with a modern DBMS that support window functions, but you can find the upper bounds by checking that there is no successor. In the same way you can find the lower bounds via absence of a predecessor. By combining the lowest upper bound for each lower bound we get the intervals.
select low.keyx, low.valx, min(high.valx)
from (
select t1.keyx, t1.valx from t t1
where not exists (
select 1 from t t2
where t1.keyx = t2.keyx
and t1.valx = t2.valx + 1
)
) as low
join (
select t3.keyx, t3.valx from t t3
where not exists (
select 1 from t t4
where t3.keyx = t4.keyx
and t3.valx = t4.valx - 1
)
) as high
on low.keyx = high.keyx
and low.valx <= high.valx
group by low.keyx, low.valx;
I changed your identifiers since value is a reserved world.
Using a window function is way more compact and efficient. If at all possible, consider upgrading to MySQL 8+, it is superior to 5.7 in so many aspects.
We can create a group by looking at the difference between valx and an enumeration of the vals, if there is a gap the difference increases. Then, we simply pick min and max for each group:
select keyx, min(valx), max(valx)
from (
select keyx, valx
, valx - row_number() over (partition by keyx order by valx) as grp
from t
) as tt
group by keyx, grp;
Fiddle
I would like to calculate the std dev. min and max of the mer_data array into 3 other fields called std_dev,min_mer and max_mer grouped by mac and timestamp.
This needs to be done without flattening the data as each mer_data row consists of 4000 float values and multiplying that with 700k rows gives a very high dimensional table.
The mer_data field is currently saved as varchar(30000) and maybe Json format might help, I'm not sure.
Input:
Output:
This can be done in Snowflake or MySQL.
Also, the query needs to be optimized so that it does not take much computation time.
While you don't want to split the data up, you will need to if you want to do it in pure SQL. Snowflake has no problems with such aggregations.
WITH fake_data(mac, mer_data) AS (
SELECT * FROM VALUES
('abc','43,44.25,44.5,42.75,44,44.25,42.75,43'),
('def','32.75,33.25,34.25,34.5,32.75,34,34.25,32.75,43')
)
SELECT f.mac,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM fake_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
ORDER BY 1;
I would however discourage the use of strings in the grouping process, so would break it apart like so:
WITH fake_data(mac, mer_data, timestamp) AS (
SELECT * FROM VALUES
('abc','43,44.25,44.5,42.75,44,44.25,42.75,43', '01-01-22'),
('def','32.75,33.25,34.25,34.5,32.75,34,34.25,32.75,43', '02-01-22')
), boost_data AS (
SELECT seq8() as seq, *
FROM fake_data
), math_step AS (
SELECT f.seq,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM boost_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
)
SELECT b.mac,
m.avg_dev,
m.std_dev,
m.MIN_MER,
m.Max_MER,
b.timestamp
FROM boost_data b
JOIN math_step m
ON b.seq = m.seq
ORDER BY 1;
MAC
AVG_DEV
STD_DEV
MIN_MER
MAX_MER
TIMESTAMP
abc
43.5625
0.7529703087
42.75
44.5
01-01-22
def
34.611111111
3.226141056
32.75
43
02-01-22
performance testing:
so using this SQL to make 70K rows of 4000 values each:
create table fake_data_tab AS
WITH cte_a AS (
SELECT SEQ8() as s
FROM TABLE(GENERATOR(ROWCOUNT =>70000))
), cte_b AS (
SELECT a.s, uniform(20::float, 50::float, random()) as v
FROM TABLE(GENERATOR(ROWCOUNT =>4000))
CROSS JOIN cte_a a
)
SELECT s::text as mac
,LISTAGG(v,',') AS mer_data
,dateadd(day,s,'2020-01-01')::date as timestamp
FROM cte_b
GROUP BY 1,3;
takes 79 seconds on a XTRA_SMALL,
now with that we can test the two solutions:
The second set of code (group by numbers, with a join):
WITH boost_data AS (
SELECT seq8() as seq, *
FROM fake_data_tab
), math_step AS (
SELECT f.seq,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM boost_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
)
SELECT b.mac,
m.avg_dev,
m.std_dev,
m.MIN_MER,
m.Max_MER,
b.timestamp
FROM boost_data b
JOIN math_step m
ON b.seq = m.seq
ORDER BY 1;
takes 1m47s
the original group by strings/dates
SELECT f.mac,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER,
f.timestamp
FROM fake_data_tab f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1,6
ORDER BY 1;
takes 1m46s
Hmm, so leaving the "mac" as a number made the code very fast (~3s), and dealing with strings in ether way changed the data processed from 1.5GB for strings and 150MB for numbers.
If the numbers were in rows, not packed together like that, we can discuss how to do it in SQL.
In rows, GROUP_CONCAT(...) can construct a commalist like you show, and MIN(), STDDEV(), etc can do the other stuff.
If you continue to have the commalist, the do the rest of work in you app programming language. (It is very ugly to have SQL pick apart an array.)
So in my database, I got the tables
Product (prodId,Name,Price)
Box (BoxId,prodId,From,To,Available)
'From' represents the first serial number. And 'To' the ending serial.
Calculating 'To' sub 'From' gives the quantity of products.
A client comes and makes an order of a given product with a given quantity. What I need ,is given the 'From' serial number,I calculate 'From' + Quantity.
If the serial numbers were only sequential integers. This would be easy. But this applies to all types of products with different serial numbers.
For ex :
Box( 1,1,ABC00000C,ABC00099K,100)
What I want to achieve is this :
SELECT From + 50 FROM BOX
How Can i deal with the serial number to get the order ending serial ?
To deal with such serial numbers you need either (option 1) a fn(x) calculating a serial number given an integer x, or (option 2) a list of available serial numbers.
Option 1 is easiest to implement, but it requires that the person thinking up the serial number format actually did think of making up a conversion formula to convert an integer into a serial number also. If such a formula exists, all you need to do is determine the integer value for the "from"-value, add 50 to this integer value 'x' and determine the serial number for 'x + 50'.
Option 2 requires that you have a list, or can generate a list of serial numbers, plus those serial numbers must be (somehow) logically ordered. Option 2 then applies one of many ways SQL server provides to get the next 50 rows from this list, starting from the row with value "From" in it. Examples of such methodes are "select top (50) ...", window function "row_number() over (order by ...)", "select ... order by ... offset n rows fetch next 50 rows only" or even a cursor.
Added after comment from Wildfire:
I suggest you create a table holding the serials for option 2. Let me explain this by giving an example: what would you do if one item with serial n + 5 happens to have fallen of the production line and was damaged beyond repair? I.e. this one serial number will never be shipped to a customer. I bet you are not going to ship a box with one less item when this happens, nor are you going to discard 49 undamaged products because the one item is missing. Instead you will probably put all products with serials n to n + 4 and n + 6 to n + 51 in a box, leaving serial n + 5 out. In a perfect world this will of course never happen, but in real life things do go wrong sometimes, so you need to able to cope with -for example- missing serials. So I would really suggest creating a table with all serials available for boxing, and simply have your boxing process read it's next 50 serials from this table.
And option 1 can work, even if the serial itself is non-numerical but can be calculated into a numerical. It's just a little more complicated. That's why I said a formula must exist for the serials for the method to work. Here's an example how you could add 50 to serial "ABC00000C", making "ABC00001Y" the to serial:
declare #from varchar(9) = 'ABC00000C';
declare #from_int bigint;
with cteSerialCharacters as (
-- The set of characters used in a serial.
-- As an example I've taken all number characters plus
-- all capital letters from the alphabet excluding any
-- of these that are easily misread.
select '0123456789ABCDEFGHJKLMNPRSTVWXYZ' as chars
),
cteNumberGenerator as (
select cast (row_number() over (order by (select null)) as bigint) as n
from ( select 1 union all select 1 union all select 1 union all select 1
union all select 1 union all select 1 union all select 1
union all select 1 union all select 1
) t (xyz)
)
select
#from_int = sum(power(s.base, (n - 1)) * (-1 + charindex(substring(reverse(s.serial), n.n, 1), s.characterset)))
from (
select
#from,
cast(len(ch.chars) as bigint),
ch.chars
from cteSerialCharacters ch
) s (serial, base, characterset)
inner join cteNumberGenerator n on (n.n <= len(s.serial));
select #from, #from_int;
declare #to varchar(9);
declare #to_int bigint;
select #to_int = #from_int + 50;
with cteSerialCharacters as (
-- The set of characters used in a serial.
-- As an example I've taken all number characters plus
-- all capital letters from the alphabet excluding any
-- of these that are easily misread.
select '0123456789ABCDEFGHJKLMNPRSTVWXYZ' as chars
),
cteNumberGenerator as (
select cast (row_number() over (order by (select null)) as bigint) as n
from ( select 1 union all select 1 union all select 1 union all select 1
union all select 1 union all select 1 union all select 1
union all select 1 union all select 1
) t (xyz)
)
select
#to = (
select
substring(s.characterset, 1 + (#to_int / power(s.base, n.n - 1)) % s.base, 1) as [text()]
from (
select
cast(len(ch.chars) as bigint),
ch.chars
from cteSerialCharacters ch
) s (base, characterset)
cross join cteNumberGenerator n
order by n.n desc
for xml path(''), type
).value('text()[1]', 'varchar(9)')
select #to, #to_int;
The problem:
We have a number of entries within a table but we are only interested in the ones that appear in a given sequence. For example we are looking for three specific "GFTitle" entries ('Pearson Grafton','Woolworths (P and O)','QRX - Brisbane'), however they have to appear in a particular order to be considered a valid route. (See image below)
RowNum GFTitle
------------------------------
1 Pearson Grafton
2 Woolworths (P and O)
3 QRX - Brisbane
4 Pearson Grafton
5 Woolworths (P and O)
6 Pearson Grafton
7 QRX - Brisbane
8 Pearson Grafton
9 Pearson Grafton
So rows (1,2,3) satisfy this rule but rows (4,5,6) don't even though the first two entries (4,5) do.
I am sure there is a way to do this via CTE's but some help would be great.
Cheers
This is very simple using even good old tools :-) Try this quick-and-dirty solution, assuming your table name is GFTitles and RowNumber values are sequential:
SELECT a.[RowNum]
,a.[GFTitle]
,b.[GFTitle]
,c.[GFTitle]
FROM [dbo].[GFTitles] as a
join [dbo].[GFTitles] as b on b.RowNumber = a.RowNumber + 1
join [dbo].[GFTitles] as c on c.RowNumber = a.RowNumber + 2
WHERE a.[GFTitle] = 'Pearson Grafton' and
b.[GFTitle] = 'Woolworths (P and O)' and
c.[GFTitle] = 'QRX - Brisbane'
Assuming RowNum has neither duplicates nor gaps, you could try the following method.
Assign row numbers to the sought sequence's items and join the row set to your table on GFTitle.
For every match, calculate the difference between your table's row number and that of the sequence. If there's a matching sequence in your table, the corresponding rows' RowNum differences will be identical.
Count the rows per difference and return only those where the count matches the number of sequence items.
Here's a query that implements the above logic:
WITH SoughtSequence AS (
SELECT *
FROM (
VALUES
(1, 'Pearson Grafton'),
(2, 'Woolworths (P and O)'),
(3, 'QRX - Brisbane')
) x (RowNum, GFTitle)
)
, joined AS (
SELECT
t.*,
SequenceLength = COUNT(*) OVER (PARTITION BY t.RowNum - ss.RowNum)
FROM atable t
INNER JOIN SoughtSequence ss
ON t.GFTitle = ss.GFTitle
)
SELECT
RowNum,
GFTitle
FROM joined
WHERE SequenceLength = (SELECT COUNT(*) FROM SoughtSequence)
;
You can try it at SQL Fiddle too.
I'm having trouble with this SQL:
$sql = mysql_query("SELECT $menucompare ,
(COUNT($menucompare ) * 100 / (SELECT COUNT( $menucompare )
FROM data WHERE $ww = $button )) AS percentday FROM data WHERE $ww >0 ");
$menucompare is table fields names what ever field is selected and contains data bellow
$button is the week number selected (lets say week '6')
$ww table field name with row who have the number of week '6'
For example, I have data in $menucompare like that:
123456bool
521478bool
122555heel
147788itoo
and I want to select those, who have same word in the last of the data and make percentage.
The output should be like that:
bool -- 50% (2 entries)
heel -- 25% (1 entry)
itoo -- 25% (1 entry)
Any clearness to my SQL will be very appreciated.
I didn't find anything like that around.
Well, keeping data in such format probably not the best way, if possible, split the field into 2 separate ones.
First, you need to extract the string part from the end of the field.
if the length of the string / numeric parts is fixed, then it's quite easy;
if not, you should use regular expressions which, unfortunately, are not there by default with MySQL. There's a solution, check this question: How to do a regular expression replace in MySQL?
I'll assume, that numeric part is fixed:
SELECT s.str, CAST(count(s.str) AS decimal) / t.cnt * 100 AS pct
FROM (SELECT substr(entry, 7) AS str FROM data) AS s
JOIN (SELECT count(*) AS cnt FROM data) AS t ON 1=1
GROUP BY s.str, t.cnt;
If you'll have regexp_replace function, then substr(entry, 7) should be replaced to regexp_replace(entry, '^[0-9]*', '') to achieve the required result.
Variant with substr can be tested here.
When sorting out problems like this, I would do it in two steps:
Sort out the SQL independently of the presentation language (PHP?).
Sort out the parameterization of the query and the presentation of the results after you know you've got the correct query.
Since this question is tagged 'SQL', I'm only going to address the first question.
The first step is to unclutter the query:
SELECT menucompare,
(COUNT(menucompare) * 100 / (SELECT COUNT(menucompare) FROM data WHERE ww = 6))
AS percentday
FROM data
WHERE ww > 0;
This removes the $ signs from most of the variable bits, and substitutes 6 for the button value. That makes it a bit easier to understand.
Your desired output seems to need the last four characters of the string held in menucompare for grouping and counting purposes.
The data to be aggregated would be selected by:
SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
The divisor in the percentage is the count of such rows, but the sub-stringing isn't necessary to count them, so we can write:
SELECT COUNT(*) FROM Data WHERE ww = 6
This is exactly what you have anyway.
The divdend in the percentage will be the group count of each substring.
SELECT Last4, COUNT(Last4) * 100.0 / (SELECT COUNT(*) FROM Data WHERE ww = 6)
FROM (SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
) AS Week6
GROUP BY Last4
ORDER BY Last4;
When you've demonstrated that this works, you can re-parameterize the query and deal with the presentation of the results.