I have read answers to similar questions, and solved my problem based on these answers. However the (my) solution is rather complex, it includes two-step views, an additional table with left join. I wonder if there is a simpler, more elegant solution to the problem below as an illustration. The table Test has column B that needs to be summarized based on a criteria that column Marker has values 4 or over. The table TestResult is the desired output of the query. As I say, I do have an awkward, complex solution, but I'm curious if there is a simple, elegant method?
mysql> select * FROM Test;
+----+------+------+--------+
| ID | A | B | Marker |
+----+------+------+--------+
| 1 | 11 | 2 | 4 |
| 2 | 11 | 1 | 5 |
| 3 | 14 | 4 | 4 |
| 4 | 11 | 2 | 1 |
| 5 | 12 | 2 | 2 |
| 6 | 13 | 2 | 3 |
| 7 | 14 | 2 | 2 |
+----+------+------+--------+
7 rows in set (0.00 sec)
mysql> select * FROM TestResult;
+----+------+
| A | SumB |
+----+------+
| 11 | 3 |
| 12 | 0 |
| 13 | 0 |
| 14 | 4 |
+----+------+
4 rows in set (0.00 sec)
Could be achieved by using group by and case:
select A, sum(case when marker >= 4 then B else 0 end)
from Test
group by A;
Check out this SQL Fiddle
Edit: I originally overlooked the "or over" part of the question; leading me to think a different (more difficult to reach) answer was desired. I've corrected the condition, but left the solution for the more difficult answer since someone has apparently found it helpful.
Using the LEFT JOIN assures you get all A values from Test, the subselect provides you an indicator of the A values of rows satisfying the condition.
SELECT A, IF(t1.A IS NULL, 0, SUM(B)) AS sumB
FROM Test AS t0
LEFT JOIN (SELECT DISTINCT A FROM Test WHERE Marker >= 4) AS t1
ON t0.A = t1.A
GROUP BY A;
Alternatively, this is the most succinct; but might be less reliable or even rejected through some types of connections.
SELECT A, IF(SUM(IF(Marker>=4,1,0))>0, SUM(B),0) AS sumB
FROM Test
GROUP BY A;
I'd do it like this. Start with this:
SELECT t.A
FROM Test t
GROUP BY t.A
That's easy enough.
For the second column, we want a SUM of an expression...
SELECT t.A
, SUM(expr) AS SumB
FROM Test t
GROUP BY t.A
The trick is to use an expression that does a conditional tests that determine whether a value of column B should be returned or not. For example, in MySQL specific syntax, something like:
IF(t.Marker >= 4, t.B, 0)
Or, a more ANSI standards compliant equivalent:
CASE WHEN t.Marker >= 4 THEN t.B ELSE 0 END
Putting that all together...
SELECT t.A
, SUM(IF(t.Marker >= 4, t.B, 0)) AS SumB
FROM Test t
GROUP BY t.A
EDIT
What if there are rows like this in Test
+----+------+------+--------+
| ID | A | B | Marker |
+----+------+------+--------+
| 1 | 11 | 2 | 4 |
| 2 | 11 | 1 | 5 |
| 17 | 11 | 4 | 1 |
I'm questioning whether the row with a Marker value of 1 (which is not greater than or equal to 4), should the value of B be included in the SUM?
The query above, you'd get a SumB value of 3. Just the values from first two rows will be included in the SUM.
If you need to return a value of 7 for SumB (including that third row), then you'll need a different query pattern.
Related
Is there a way to limit a query in discrete groups? For example, let's say I have this query below.
| col1 | col2 |
---------------
| 1 | A |
| 1 | B |
| 2 | C |
| 2 | D |
| 3 | E |
| 3 | F |
I want the limit on this query to be 5 rows. However, I only want it to show discrete complete groups based on the first column. So that means I don't want to show (3, E) since (3, F) would be cut off. So it would only show the first 4 rows.
Is there a way to write this dynamic logic into a MySQL query?
Count rows in a subquery:
select col1, col2
from mytable m1
where (select count(*) from mytable m2 where m2.col1 <= m1.col1) <= 5
order by col1, col2;
Full disclosure, I'm a noob at SQL
Given two sparce matrices A and B, defined as:
A(row_number, column_number, value) and B(row_number, column_number, value)
I don't understand how this query represents the multiplication of the two matrices:
SELECT A.row_number, B.column_number, SUM(A.value * B.value)
FROM A, B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number
My confusion lies in the SUM syntax and the GROUP BY / SELECT syntax
So for my GROUP BY / SELECT confusion, I don't understand why the expressions
A.row_number and B.column_number are necessary after the SELECT statement
Why do we have to specify that when we're already using SELECT and WHERE ? To me that seems like we're saying we want to SELECT using those expressions (A.row_number and B.column_number) even though we're given back a table from WHERE already. Would it not make more sense to just say SELECT * ? I'm assuming that GROUP BY just requires you to type out the expressions it uses in the SELECT statement, but I don't know for sure.
For the SUM, I just want to clarify, the SUM is only using the A.value and the B.value from whatever is returned by the WHERE correct? Otherwise, you would be multiplying all A.value with all B.value.
Clarifying either of these would be immensely helpful. Thank you!
create table A
( column_number int,
row_number int,
value int
);
create table B
( column_number int,
row_number int,
value int
);
insert A (column_number,row_number,value) values (1,1,1),(1,2,2),(2,1,3),(2,2,4);
insert B (column_number,row_number,value) values (1,1,10),(1,2,20),(2,1,30),(2,2,40);
Data with your old style (non explicit) join without aggregage or group by:
SELECT A.row_number as Ar, B.column_number as Bc,
A.value as Av,B.value as Bv,A.value*B.value as product
FROM A, B
WHERE A.column_number = B.row_number
+------+------+------+------+---------+
| Ar | Bc | Av | Bv | product |
+------+------+------+------+---------+
| 1 | 1 | 1 | 10 | 10 |
| 2 | 1 | 2 | 10 | 20 |
| 1 | 1 | 3 | 20 | 60 |
| 2 | 1 | 4 | 20 | 80 |
| 1 | 2 | 1 | 30 | 30 |
| 2 | 2 | 2 | 30 | 60 |
| 1 | 2 | 3 | 40 | 120 |
| 2 | 2 | 4 | 40 | 160 |
+------+------+------+------+---------+
Seeing the above, the below gets a little more clarity:
SELECT A.row_number, B.column_number,sum(A.value * B.value) as theSum
FROM A, B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number
+------------+---------------+--------+
| row_number | column_number | theSum |
+------------+---------------+--------+
| 1 | 1 | 70 |
| 1 | 2 | 150 |
| 2 | 1 | 100 |
| 2 | 2 | 220 |
+------------+---------------+--------+
Giving table name after SELECT will identify which table to refer to. Mainly useful in the case where both tables have same column names.
GROUP BY will aggregate the data and display one record per grouped-by value. That is, in your case, you'll end up with only one record per row-column combination.
By definition multiplication of two matrices A(n,m) and B(m,p) produces a matrix C(n,p).
So the SQL for multiplication should return same data structure as was used for storage of A and B, which is three columns:
row_number
column_number
value
, with one value per (row, column) combination.
This is why you need first two in the group by clause.
WHERE clause is independent from SELECT. First is responsible for getting the right records, second for getting the right columns.
This seems to be a convoluted problem, but I'll try my best to articulate the idea and illustrate a scenario. Essentially I have two tables that need to be combined and returned as the result set for a single query. One table needs to be merged into the other in a specific order.
Say table one is called Articles and table two is called Features. Both tables have an ID field with unique numbers. Articles has a date field which will be used to initially sort its records in descending order. The Features table has a Delta field which be used initially to sort its records. Some of the records in the Features table are placeholders and are not meant to be included in the final set. Their only purpose is to affect the sort order. Each record has a unique value in the Delta field, from 1 - X which will be used to sort these records. Another field called Skip has a value of 1 if it should be eliminated when merging the two tables together. Again, the only purpose to the skipped records is to take up space during the initial sort on the Features table. Even though they are unnecessary, they exist and can't be deleted.
The tricky part is that when the results from both tables are merged, any non-skipped records from the Features table need to be inserted into the results from the Articles table in the exact order they appears in the Features table.
So lets say I have 6 records in the Features table, A - F and the order field ranges from 1 - 6. Records A,B,D,E all have a value of 1 in the Skip field. That means I'm only interested in records C and F both of which need to be inserted into the final record set in positions 3 and 6 respectively.
The records may look something like this for the Articles table:
+----+------------+
| id | date |
+----+------------+
| 1 | 9999999999 |
+----+------------+
| 2 | 9999999998 |
+----+------------+
| 3 | 9999999997 |
+----+------------+
| 4 | 9999999996 |
+----+------------+
| 5 | 9999999995 |
+----+------------+
| 6 | 9999999994 |
+----+------------+
| 7 | 9999999993 |
+----+------------+
| 8 | 9999999992 |
+----+------------+
| 9 | 9999999991 |
+----+------------+
| 10 | 9999999990 |
+----+------------+
The Features table may look something like this:
+----+------+-------+------+
| id | name | delta | skip |
+----+------+-------+------+
| 11 | A | 1 | 1 |
+----+------+-------+------+
| 12 | B | 2 | 1 |
+----+------+-------+------+
| 13 | C | 3 | 0 |
+----+------+-------+------+
| 14 | D | 4 | 1 |
+----+------+-------+------+
| 15 | E | 5 | 1 |
+----+------+-------+------+
| 16 | F | 6 | 0 |
+----+------+-------+------+
The results would look something like this (not including any additional fields that might be needed to achieve my goal):
+----+
| id |
+----+
| 1 |
+----+
| 2 |
+----+
| 13 | (record from the Features table in the third position)
+----+
| 3 |
+----+
| 4 |
+----+
| 16 | (record from the Features table in the sixth position)
+----+
| 5 |
+----+
| 6 |
+----+
| 7 |
+----+
| 8 |
+----+
| 9 |
+----+
| 10 |
+----+
Hope my explanation makes sense. Any ideas?
Thanks,
Howie
I assume that there is a mistake in your example - record id=16 is sixth row in Features table, so should be after id=5 in results, not before.
Try the blelow query. Here is SQLFiddle.
select id from (
select `date`, null delta, id
from Articles
union all
select a.`date`, f.delta, f.id
from (
select (#x:=#x+1) rn, a.*
from Articles a, (select #x:=0) z
order by a.`date` desc
) a
join (
select (#y:=#y+1) rn, f.id, f.delta, f.skip
from Features f, (select #y:=0) z
order by f.delta
) f
on a.rn = f.rn
where f.skip <> 1
order by `date` desc, isnull( delta ), delta
) merge
Looks like this example in SQL Fiddle did it for me.
SELECT id, sort_order FROM (
SELECT `date`, NULL delta, id, (#a_count:=#a_count+1) sort_order
FROM Articles a_main, (SELECT #a_count:=-1) z
UNION ALL
SELECT a.`date`, f.delta, f.id, f.weighted_rn
FROM (
SELECT (#x:=#x+1) rn, a.*
FROM Articles a, (SELECT #x:=-1) z
ORDER BY a.`date` DESC
) a
JOIN (
SELECT (#y:=#y+1) rn, TRUNCATE((f.delta - #y - (1/#y)),2) AS weighted_rn, f.id, f.delta, f.skip
FROM Features f, (SELECT #y:=-1) z
WHERE f.skip <> 1
ORDER BY f.delta
) f
ON a.rn = f.rn
ORDER BY sort_order
) merge
Thanks to Kordirko for the framework.
What I would like to do is select a specific set of rows from one table (table A) and join with another table (table B), such that only one record will appear from table A, joined with the most recent record from table B, based on a datetime column.
For example, table A has this structure (heavily simplified):
id | col_1 | col_2
---+-----------+----------------
1 | something | something else
2 | val_1 | val_2
3 | stuff | ting
4 | goats | sheep
And table B looks like this:
id | fk_A | datetime_col | col_3
---+-----------+---------------------+--------
1 | 1 | 2012-02-01 15:42:14 | Note 1
2 | 1 | 2012-02-02 09:46:54 | Note 2
3 | 1 | 2011-11-14 11:18:32 | Note 3
4 | 2 | 2009-04-30 16:49:01 | Note 4
5 | 4 | 2013-06-21 15:42:14 | Note 5
6 | 4 | 2011-02-01 18:44:24 | Note 6
What I would like is a result set that looks like this:
id | col_1 | col_2 | datetime_col | col_3
---+-----------+----------------+---------------------+--------
1 | something | something else | 2012-02-02 09:46:54 | Note 2
2 | val_1 | val_2 | 2009-04-30 16:49:01 | Note 4
3 | stuff | ting | NULL | NULL
4 | goats | sheep | 2013-06-21 15:42:14 | Note 5
So you can see that table B has been joined with table A on B.fk_A = A.id, but only the most recent corresponding record from B has been included in the results.
I have tried various combinations of SELECT DISTINCT, LEFT JOIN and sub-queries and I just can't get it to work, I either get no results or something like this:
id | col_1 | col_2 | datetime_col | col_3
---+-----------+----------------+---------------------+--------
1 | something | something else | 2012-02-01 15:42:14 | Note 1
1 | something | something else | 2012-02-02 09:46:54 | Note 2
1 | something | something else | 2011-11-14 11:18:32 | Note 3
2 | val_1 | val_2 | 2009-04-30 16:49:01 | Note 4
3 | stuff | ting | NULL | NULL
4 | goats | sheep | 2013-06-21 15:42:14 | Note 5
4 | goats | sheep | 2011-02-01 18:44:24 | Note 6
...with the records from table A repeated.
Obviously my SQL-fu is just not good enough for this task, so I would be most grateful if one of you kind people could point me in the right direction. I have done quite a bit of Googling and searching around SO and I have not found anything that matches this specific task, although I am sure the question has been asked before - I suspect there is an SQL keyword that I am forgetting/unaware of and if I searched for that I would find the answer instantly.
I think this question deals with the same problem although I am not 100% sure and the accepted answer involves SELECT TOP, which I thought (?) was not valid in MySQL.
As my actual query is much more complicated and joins several tables, I shall show it in case it makes any difference to how this is done:
SELECT `l` . * , `u`.`name` AS 'owner_name', `s`.`name` AS 'acquired_by_name', `d`.`type` AS `dtype` , `p`.`type` AS `ptype`
FROM `leads` l
LEFT JOIN `web_users` u ON `u`.`id` = `l`.`owner`
LEFT JOIN `web_users` s ON `s`.`id` = `l`.`acquired_by`
LEFT JOIN `deal_types` d ON `d`.`id` = `l`.`deal_type`
LEFT JOIN `property_types` p ON `p`.`id` = `l`.`property_type`
This query works and returns the data I want (sometimes I also add a WHERE clause but this works fine), but I would now like to:
LEFT JOIN `notes` n ON `n`.`lead_id` = `l`.`id`
...where notes contains the "many records" and leads contains the "one record" they relate to.
It should also be noted that potentially I would also want to return the oldest record (in a different query) but I imagine this will be a simple case of inverting an ASC/DESC somewhere, or something similarly easy.
I think this will help you:
SELECT A.id, A.col_1, A.col_2, A.datetime_col, A.col_3
FROM
(SELECT B.id, B.col_1, B.col_2, C.datetime_col, C.col_3
FROM tableA B LEFT OUTER JOIN tableB C ON B.id = C.id
ORDER BY C.datetime_col desc) as A
GROUP BY A.id
i've got this lil' mysql table:
+----+-------+
| id | value |
+----+-------+
| 1 | 1240 |
| 2 | 1022 |
| 3 | 802 |
| .. | .. |
+------+-----+
i'm searching for a sql-query summing up the difference between the rows:
difference of row 1 and 2 + difference of row 2 and 3 + ...
is that possibile with sql?
Sure! Your query will look something like this:
SELECT a.id,
b.VALUE - a.VALUE difference
FROM mytable a
JOIN mytable b
ON b.id = a.id + 1
The idea is to join the table with itself offset by one row -- then you can do math with values that were originally on adjacent rows.