The idea is that say you have the following table.
-------------
| oID | Area|
-------------
| 1 | 5 |
| 2 | 2 |
| 3 | 3 |
| 5 | 3 |
| 6 | 4 |
| 7 | 5 |
-------------
If grouping by continuity is possible this pseudo query
SELECT SUM(Area) FROM sample_table GROUP BY CONTINUITY(oID)
would return
-------------
| SUM(Area) |
-------------
| 10 |
| 12 |
-------------
Where the continuity break arises at oID or rather the lack thereof an entry representing oID 4.
Does such functionality exist within the standard functions of Sql?
There is no such functionality in "standard functions of SQL", but it is possible to get the desired result set by using some tricks.
With the subquery illustrated below we create a virtual field which you can use to GROUP BY in the outer query. The value of this virtual field is incremented each time when there is a gap in the sequence of oID. This way we create an identifier for each of those "data islands":
SELECT SUM(Area), COUNT(*) AS Count_Rows
FROM (
/* #group_enumerator is incremented each time there is a gap in oIDs continuity */
SELECT #group_enumerator := #group_enumerator + (#prev_oID != oID - 1) AS group_enumerator,
#prev_oID := oID AS prev_oID,
sample_table.*
FROM (
SELECT #group_enumerator := 0,
#prev_oID := -1
) vars,
sample_table
/* correct order is very important */
ORDER BY
oID
) q
GROUP BY
group_enumerator
Test table and data generation:
CREATE TABLE sample_table (oID INT auto_increment, Area INT, PRIMARY KEY(oID));
INSERT INTO sample_table (oID, Area) VALUES (1,5), (2,2), (3,3), (5,3), (6,4), (7,5);
I need to thank Quassnoi for pointing out this trick in my related question ;-)
UPDATE: added test table and data and fixed duplicate column name in example query.
Here's a blog post that provides a very thorough explanation and example related to grouping by contiguous data. If you have any issues comprehending it or implementing it, I can attempt to provide an implementation for your problem.
Related
I'm writing a cronjob that runs analysis on a flags table in my database, structured as such:
| id | item | def | time_flagged | time_resolved | status |
+----+------+-----+--------------+---------------+---------+
| 1 | 1 | foo | 1519338608 | 1519620669 | MISSED |
| 2 | 1 | bar | 1519338608 | (NULL) | OPEN |
| 3 | 2 | bar | 1519338608 | 1519620669 | IGNORED |
| 4 | 1 | foo | 1519620700 | (NULL) | OPEN |
For each distinct def, for each unique price, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.
For the above data set, I would like to select:
| def | item | time | status |
+-----+------+------------+---------+
| foo | 1 | 1519620700 | OPEN |
| bar | 1 | 1519338608 | OPEN |
| bar | 2 | 1519620669 | IGNORED |
Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time.
The data set will have a few dozen distinct defs, a few hundred distinct items, and a very large number of flags that will only increase over time.
How can I go about doing this? I see the greatest-n-per-group tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.
You could try:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you
Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time
Depending on your version of mySQL, you can use a window function:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK() to ROW_NUMBER. This will guarantee you only get one row per grouping.
select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time
If I have a MySQL table such as:
I want to use SQL to calculate the sum of the PositiveResult column and also the NegativeResult column. Normally I could simply do SUM(PositiveResult) in a query.
But what if I wanted to go a step further and place the totals in a row at the bottom of the result set:
Can this be achieved at the data level or is it a presentation layer issue? If it can be done by SQL, how might I do this? I am a bit of an SQL newbie.
Thanks to the respondents. I will now check things with the customer.
Also, can a text column be added so that the value of the last row of data is not shown in the summary row? Like this:
I would also do this in the presentation layer, but you can do it MySQL...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,pos DECIMAL(5,2)
,neg DECIMAL(5,2)
);
INSERT INTO my_table VALUES
(1,0,0),
(2,1,-2.5),
(3,1.6,-1),
(4,1,-2);
SELECT COALESCE(id,'total') my_id,SUM(pos),SUM(neg) FROM my_table GROUP BY id WITH ROLLUP;
+-------+----------+----------+
| my_id | SUM(pos) | SUM(neg) |
+-------+----------+----------+
| 1 | 0.00 | 0.00 |
| 2 | 1.00 | -2.50 |
| 3 | 1.60 | -1.00 |
| 4 | 1.00 | -2.00 |
| total| 3.60 | -5.50 |
+-------+----------+----------+
5 rows in set (0.02 sec)
Here's a hack for the amended problem - it ain't pretty but I think it works...
SELECT COALESCE(id,'') my_id
, SUM(pos)
, SUM(neg)
, COALESCE(string,'') n
FROM my_table
GROUP
BY id
, string
WITH ROLLUP
HAVING n <> '' OR my_id = ''
;
select keyword,sum(positiveResults)+sum(NegativeResults)
from mytable
group by
Keyword
if you need the absolute value put sum(abs(NegativeResults)
This should be handled at least one layer above the SQL query layer.
The initial query can fetch the detail info and then the application layer can calculate the aggregation (summary row). Or, a second db call to fetch the summary directly can be used (although this would be efficient only for cases where the calculation of the summary is very resource-intensive and a second db call is really necessary - most of the time the app layer can do it more efficiently).
The ordering/layout of the results (i.e. the detail rows followed by the "footer" summary row) should be handled at the presentation layer.
I'd recommend doing this at the presentation layer. To do something like this in SQL is also possible.
create table test (
keywordid int,
positiveresult decimal(10,2),
negativeresult decimal(10,2)
);
insert into test values
(1, 0, 0), (2, 1, -2.5), (3, 1.6, -1), (4, 1, -2);
select * from (
select keywordid, positiveresult, negativeresult
from test
union all
select null, sum(positiveresult), sum(negativeresult) from test
) main
order by
case when keywordid is null then 1000000 else keywordid end;
I added ordering using a arbitrarily high number if keywordid is null to make sure the ordered recordset can be pulled easily by the view for displaying.
Result:
+-----------+----------------+----------------+
| keywordid | positiveresult | negativeresult |
+-----------+----------------+----------------+
| 1 | 0.00 | 0.00 |
| 2 | 1.00 | -2.50 |
| 3 | 1.60 | -1.00 |
| 4 | 1.00 | -2.00 |
| NULL | 3.60 | -5.50 |
+-----------+----------------+----------------+
In my projects I often need to store the result of a SELECT in another table (we call this a "resultset"). The reason is to dynamically display a large number of rows in a web application while loading only small chunks as necessary.
Typically, this is done by queries such as this one:
SET #counter := 0;
INSERT INTO resultsetdata
SELECT "12345", #counter:=#counter+1, a.ID
FROM sometable a
JOIN bigtable b
WHERE (a.foo = b.bar)
ORDER BY a.whatever DESC;
The fixed "12345" value is just a value to identify the "resultset" as a whole and changes for each query. The second column is a incrementing index counter that is meant to allow direct access to a specific row in the result and the ID column references the specific row in the source data table.
When the application needs a certain range of the result I just join resultsetdata with the source table to get the detailed data - which is quick as opposed to the resultsetdata query above which may take 2-3 seconds to complete (which explains why I need this intermediary table).
The SELECT query itself is not relevant for this question.
resultsetdata has the following structure:
CREATE TABLE `resultsetdata` (
`ID` int(11) NOT NULL,
`ContIdx` int(11) NOT NULL,
`Value` int(11) NOT NULL,
PRIMARY KEY (`ID`,`ContIdx`)
) ENGINE=InnoDB;
This usually works like a charm but lately we noticed that in some cases the ORDER of the result is not correct. This depends on the query itself (for example, adding DISTINCT is a typical cause), the server version and the data contained in the source tables, so I guess one can say that the row order is unpredictable with this method. Probably it depends on internal optimizations.
However, the problem is now that I can't think of any alternative solution that gives me the expected result.
Since the resultset can get several thousands of rows, loading all data in memory and then manually INSERTing it is not feasible.
Any suggestions?
EDIT: For further clarification, have a look at these queries:
DROP TABLE IF EXISTS test;
CREATE TABLE test (ID INT NOT NULL, PRIMARY KEY(ID)) ENGINE=InnoDB;
INSERT INTO test (ID) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
SET #counter:=0;
SELECT "12345", #counter:=#counter+1, ID
FROM test
ORDER BY ID DESC;
This produces the following result as "expected":
+-------+----------------------+----+
| 12345 | #counter:=#counter+1 | ID |
+-------+----------------------+----+
| 12345 | 1 | 10 |
| 12345 | 2 | 9 |
| 12345 | 3 | 8 |
| 12345 | 4 | 7 |
| 12345 | 5 | 6 |
| 12345 | 6 | 5 |
| 12345 | 7 | 4 |
| 12345 | 8 | 3 |
| 12345 | 9 | 2 |
| 12345 | 10 | 1 |
+-------+----------------------+----+
10 rows in set (0.00 sec)
As said, in some cases (I can't provide a testcase here, sorry), this may lead to a result similar to this:
+-------+----------------------+----+
| 12345 | #counter:=#counter+1 | ID |
+-------+----------------------+----+
| 12345 | 10 | 10 |
| 12345 | 9 | 9 |
| 12345 | 8 | 8 |
| 12345 | 7 | 7 |
| 12345 | 6 | 6 |
| 12345 | 5 | 5 |
| 12345 | 4 | 4 |
| 12345 | 3 | 3 |
| 12345 | 2 | 2 |
| 12345 | 1 | 1 |
+-------+----------------------+----+
I'm not saying this is a MySQL bug and I fully understand that my method currently provides unpredictable results. Still, I don't know how to tweak this to get predictable results.
This is because the order that records are sorted when they are inserted is unrelated to the order when you retrieve them.
When you retrieve them a query plan will be created. If no ORDER BY is specified in your SELECT statement then the order will depend on the query plan produced. This is why it is unpredictable and adding DISTINCT can change the order.
The solution is to store enough data that you can retrieve them in the correct order using an ORDER BY clause. In your case you have ordered your data by a.whatever. Can a.whatever be stored in resultsetdata? If so then you can read the records out in the correct order.
Maybe you could wrap the select into another select:
SET #counter := 0;
INSERT INTO resultsetdata
SELECT *, #counter := #counter + 1
FROM (
SELECT "12345", a.ID
FROM sometable a
JOIN bigtable b
WHERE a.foo = b.bar
ORDER BY a.whatever DESC
) AS tmp
... but you are still at the mercy of the dumbness of MySQL's optimizer.
That's all I found about this topic, but I couln't find a hard guarantee:
Pure-SQL Technique for Auto-Numbering Rows in Result Set
http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/
http://www.xaprb.com/blog/2005/09/27/simulating-the-sql-row_number-function/
I'm currently working with a database table that is structured as follows:
______________________________
| id | content | next_id |
|------|-----------|-----------|
| 1 | (value) | 4 |
| 2 | (value) | 1 |
| 3 | (value) | (NULL) |
| 4 | (value) | 3 |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
The value of the next_id field defines the id of the row of data that should follow it. A value of NULL means that no row follows it.
Is there a way I can query the database in such a way that in the resulting rows will be ordered using this method? For example, in the case I gave above, the rows should be returned ordered so that the ids are in this order: 2, 1, 4, 3. I'm looking for a solution that can do this regardless of the number of rows in this sequence.
I know that it is possible to reorder the results after retrieving them from the database (using the programming language I'm working with), but I'm hoping that there is a way that I can do it in SQL.
I can't see a solution without as many self-joins as you have rows. Instead I would build a nested set out of it in a temp table using push down stack algorithm and then retrieve a full tree.
I've got something that's close.
/*one select to init the #next variable to the first row*/
select #next:= id from table1 order by isnull(next_id) asc, next_id asc limit 1;
select distinct a.id, a.next_id from table1 b
inner join
(
select #rank:= id as id, #next:= next_id as next_id from table1
where id = #next
) a
on (b.id = b.id);
This outputs
+----+---------+
| id | next_id |
+----+---------+
| 2 | 1 |
| 1 | 4 |
And then stops. If only I could find a way for it to continue....
Anyway this sort of force feeding values into a query is dodgy enough when doing ranking, let alone this sort of stuff, so maybe I'm going down a dead end.
I've got a table in MySQL that looks roughly like:
value | count
-------------
Fred | 7
FRED | 1
Roger | 3
roger | 1
That is, it was created with string ops outside of MySQL, so the values are case- and trailing-whitespace-sensitive.
I want it to look like:
value | count
-------------
Fred | 8
Roger | 4
That is, managed by MySQL, with value a primary key. It's not important which one (of "Fred" or "FRED") is kept.
I know how to do this in code. I also know how to generate a list of problem values (with a self-join). But I'd like to come up with a SQL update/delete to migrate my table, and I can't think of anything.
If I knew that no pair of records had variants of one value, with the same count (like ("Fred",4) and ("FRED",4)), then I think I can do it with a self-join to copy the counts, and then an update to remove the zeros. But I have no such guarantee.
Is there something simple I'm missing, or is this one of those cases where you just write a short function outside of the database?
Thanks!
As an example of how to obtain the results you are looking for with a SQL query alone:
SELECT UPPER(value) AS name, SUM(count) AS qty FROM table GROUP BY name;
If you make a new table to hold the correct values, you INSERT the above query to populate the new table as so:
INSERT INTO newtable (SELECT UPPER(value) AS name, SUM(count) AS qty FROM table GROUP BY name);
Strangely, MySQL seems to do this for you. I just tested this in MySQL 5.1.47:
create table c (value varchar(10), count int);
insert into c values ('Fred',7), ('FRED',1), ('Roger',3), ('roger',1);
select * from c;
+-------+-------+
| value | count |
+-------+-------+
| Fred | 7 |
| FRED | 1 |
| Roger | 3 |
| roger | 1 |
+-------+-------+
select value, sum(count) from c group by value;
+-------+------------+
| value | sum(count) |
+-------+------------+
| Fred | 8 |
| Roger | 4 |
+-------+------------+
I was surprised to see MySQL transform the strings like that, and I'm not sure I can explain why it did that. I was expecting to have to get four distinct rows, and to have to use some string functions to map the values to a canonical form.