Find Rows Using Nested Count, Join, or Having - mysql

I have data in a table that might look like so:
id | streetnum | name | item
-----------------------------
1 | 100 | a | 0
2 | 100 | b | NULL
3 | 100 | c | NULL
4 | 101 | d | NULL
5 | 101 | e | NULL
6 | 102 | f | 1
I'm trying to put together a query which would identify the identical streenum's where the item column has both a value and one or more NULL's. In the example above, the query's result should be:
100
My first instinct is to put together a nested query involving count(*) but I want to see what other ideas you guys come up with.

Also possible with a self join:
SELECT DISTINCT streetnum FROM atable AS a1,atable AS a2 WHERE a1.streetnum=a2.streenum AND a1.item IS NULL AND a2.item IS NOT NULL;

Here is a query that works in SQLServer. I haven't tested the syntax for mysql.
SELECT streetnum FROM YourTable
WHERE streetnum IN
(SELECT streetnum FROM YourTable
WHERE item IS NULL
GROUP BY streetnum)
AND streetnum IN
(SELECT streetnum FROM YourTable
WHERE item IS NOT NULL
GROUP BY streetnum)
GROUP BY streetnum

SELECT streetnum
FROM atable
GROUP BY streetnum
HAVING MAX(item) IS NOT NULL
AND COUNT(CASE WHEN item IS NULL THEN 1 END) > 0
MAX(item) can be replaced by MIN(item) or SUM or AVG. Also this part of condition can be replaced by COUNT(item) > 0.
The more tricky part is where you must account for the presence of NULLs as well. Here you'll have to use CASE, because you need to turn the NULL into a value to be able to use it in an aggregate. Once it is a value, you can COUNT or SUM it (MAX, MIN etc. would do as well).

Related

How to add up rows and return one of them in MySQL

I need to retrieve rows that have a numeric or null value in the HomeID column and finally return the value with a numeric value if there is a row with the same symbol
// my table
+--------+---------+-------+
| Symbol | Home ID | Value |
+--------+---------+-------+
| test | 1 | value |
| test | NULL | value |
| test1 | 2 | value |
| test2 | 3 | vlaue |
+--------+---------+-------+
Actually, I did something like that. It added up the symbols for me, but I don't know how to return the poem I need
SELECT
[Symbol],
COUNT(*) AS CNT
FROM [DB].[dbo].[Table]
GROUP BY
[Symbol]
HAVING COUNT(*) > 1;
One method is:
select t.*
from t
where t.homeid is not null or
not exists (select 1
from t t2
where t2.symbol = t.symbol and t2.homeid is not null
);
You can also do this with aggregation, if you just have these three columns and you want exactly one row per symbol:
select symbol, max(homeid) as homeid,
coalesce(max(case when homeid is not null then value end),
max(value)
) as value
from t
group by symbol;

Mysql IN function

class_table
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1,2,3,4 |
+----+-------+--------------+
student_mark
+----+----------+--------+
| id |student_id| marks |
+----+----------+--------+
| 1 | 1 | 12 |
+----+----------+--------+
| 2 | 2 | 80 |
+----+----------+--------+
| 3 | 3 | 20 |
+----+----------+--------+
I have these two tables and i want to calculate the total marks of student and my sql is:
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN
(SELECT `student_id` FROM `class_table` WHERE `teac_id` = '1')
But this will return null, please help!!
DB fiddle
Firstly, you should never store comma separated data in your column. You should really normalize your data. So basically, you could have a many-to-many table mapping teacher_to_student, which will have teac_id and student_id columns.
In this particular case, you can utilize Find_in_set() function.
From your current query, it seems that you are trying to getting total marks for a teacher (summing up marks of all his/her students).
Try:
SELECT SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
In case, you want to get total marks per student, you would need to add a Group By. The query would look like:
SELECT sm.`student_id`,
SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
GROUP BY sm.`student_id`
Just in case you want to know why, The reason it returned null is because the subquery returned as '1,2,3,4' as a whole. What you need is to make it returned 1,2,3,4 separately.
What your query returned
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN ('1,2,3,4')
What you expect is
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN (1,2,3,4)
The best way is it normalize as #madhur said. In your case you need to make the teacher and student as one to many link
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1 |
+----+-------+--------------+
| 2 | 1 | 2 |
+----+-------+--------------+
| 3 | 1 | 3 |
+----+-------+--------------+
| 4 | 1 | 4 |
+----+-------+--------------+
If you want to filter your table based on a comma separated list with ID, my approach is to
append extra commas at the beginning and at the end of a list as well as at the beginning and at the end of an ID, eg.
1 becomes ,1, and list would become ,1,2,3,4,. The reason for that is to avoid ambigious matches like 1 matches 21 or 12 in a list.
Also, EXISTS is well-suited in that situation, which together with INSTR function should work:
SELECT SUM(`marks`)
FROM `student_mark` sm
WHERE EXISTS(SELECT 1 FROM `class_table`
WHERE `teac_id` = '1' AND
INSTR(CONCAT(',', student_id, ','), CONCAT(',', sm.student_id, ',')) > 0)
Demo
BUT you shouldn't store related IDs in one cell as comma separated list - it should be foreign key column to form proper relation. Joins would become trivial then.

Select the same column multiple times with MySQL

Assuming I have something like this :
MySQL Table
Date | Name | Val
22/11 | a | 1
22/11 | b | 2
22/11 | a | 3
22/11 | a | 4
23/11 | b | 1
23/11 | a | 2
23/11 | a | 3
23/11 | a | 5
I need a query to have on one column the sum of the values for each day when Name = 'a' and an other column for the sum of all the values (for each day too).
With my example, the result would be something like this :
Date | a.Total | Total
22/11 | 8 | 10
23/11 | 10 | 11
I tried something like this :
SELECT date, SUM(Val) AS a.Total, SUM(Val) AS Total FROM tbl1 Where Name = 'a'
The point is that I need to specify a WHERE clause to get the "a.total" values (WHERE Name = 'a') but I don't want it to be apply to get the total.
I also tried queries with Left Join but it didn't work.
Any help is much appreciated.
You should use GROUP BY and CASE inside of the first SUM()
SELECT date,
SUM( CASE WHEN Name='a'
THEN Val
ELSE 0
END) AS a_Total,
SUM(Val) AS Total
FROM tbl1
GROUP BY `Date`
SQLFiddle demo
This is a type of problem called cross-tabbing (see https://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/)
What you're after is the use of a CASE statement to allow you to sum values only when a condition is met.
SELECT date, SUM(CASE WHEN Name='a' then Val end) AS a.Total, SUM(Val) AS Total FROM tbl1 GROUP BY date

How to merge column data using the last updated value in MySQL?

Somewhat confusing so its easier if I put down example and expected output to begin.
I have a table that could look like this: (Unit1 - Unit2 columns could span up to 30 columns in the same general format)
| ID | Name | Unit1_left | Unit2_left |
| 1 | Tom | 50 | NULL |
| 2 | Tom | NULL | 1 |
| 3 | Tom | 45 | NULL |
| 4 | Dan | NULL | NULL |
What I am trying to select is a table like this:
| Name | Unit1_left | Unit2_left |
| Tom | 45 | 1 |
| Dan | NULL | NULL |
What that is doing is grouping by name and attempting to find the last values in the 2 other columns if they exist (if not then it returns NULL).
I have looked at various other questions and they all say to use Max() however this will not work since it selects the highest value (incorrect). I have seen that in MsSQL there is a Last() function which looks vaguely like what I want it to do but its not implemented in MySQL and isn't exactly what I need anyway.
What I am trying to ask is, does anyone know of a possible method of selecting the data like this or if I will have to use a separate programming language to do this?
This will produce the result set you've described
SELECT dname.name,
l1value.unit1_left,
l2value.unit2_left
FROM (SELECT DISTINCT `name`
FROM table1) `DName`
LEFT JOIN (SELECT `name`,
Max(id) id
FROM table1
WHERE unit1_left IS NOT NULL
GROUP BY `name`) l1
ON dname.`name` = l1.`name`
LEFT JOIN table1 l1value
ON l1.id = l1value.id
LEFT JOIN (SELECT `name`,
Max(id) id
FROM table1
WHERE unit2_left IS NOT NULL
GROUP BY `name`) l2
ON dname.`name` = l2.`name`
LEFT JOIN table1 l2value
ON l2.id = l2value.id ;
DEMO
I did it by creating 2 inline views to the highest id for non-null values for both unit1_left and unit2_left (l1 and l2). Then joined it back to original table to get the values (l1value and l2value). We then join that back to a third inline view (dname) that creates the distinct names.
It's quite messy and it might make more sense just to keep your data in a more sensible manner.
You can use subqueries in you select statement. Using SqlFidlle I came up with this.
select o.name,
(select o2.Unit1_left
from original as o2
where o.name = o2.name
and o2.Unit1_left is not null
order by o2.id desc
LIMIT 1) as Unit1_left,
(select o3.Unit2_left
from original as o3
where o.name = o3.name
and o3.Unit2_left is not null
order by o3.id desc
LIMIT 1) as Unit2_left
from original as o
group by o.name
order by id;

How to `SELECT` and manufacture missing rows from previous values?

I have the following (simplified) result from SELECT * FROM table ORDER BY tick,refid:
tick refid value
----------------
1 1 11
1 2 22
1 3 33
2 1 1111
2 3 3333
3 3 333333
Note the "missing" rows for refid 1 (tick 3) and refid 2 (ticks 2 and 3)
If possible, how can I make a query to add these missing rows using the most recent prior value for that refid? "Most recent" means the value for the row with the same refid as the missing row and largest tick such that the tick is less than the tick for the missing row. e.g.
tick refid value
----------------
1 1 11
1 2 22
1 3 33
2 1 1111
2 2 22
2 3 3333
3 1 1111
3 2 22
3 3 333333
Additional conditions:
All refids will have values at tick=1.
There may be many 'missing' ticks for a refid in sequence, (as above for refid 2).
There are many refids and it's not known which will have sparse data where.
There will be many ticks beyond 3, but all sequential. In the correct result, each refid will have a result for each tick.
Missing rows are not known in advance - this will be run on multiple databases, all with the same structure, and different "missing" rows.
I'm using MySQL and cannot change db just now. Feel free to post answer in another dialect, to help discussion, but I'll select an answer in MySQL dialect over others.
Yes, I know this can be done in the code, which I've implemented. I'm just curious if it can be done with SQL.
What value should be returned when a given tick-refid combination does not exist? In this solution, I simply returned the lowest value for that given refid.
Revision
I've updated the logic to determine what value to use in the case of a null. It should be noted that I'm assuming that ticks+refid is unique in the table.
Select Ticks.tick
, Refs.refid
, Case
When Table.value Is Null
Then (
Select T2.value
From Table As T2
Where T2.refid = Refs.refId
And T2.tick = (
Select Max(T1.tick)
From Table As T1
Where T1.tick < Ticks.tick
And T1.refid = T2.refid
)
)
Else Table.value
End As value
From (
Select Distinct refid
From Table
) As Refs
Cross Join (
Select Distinct tick
From Table
) As Ticks
Left Join Table
On Table.tick = Ticks.tick
And Table.refid = Refs.refid
If you know in advance what your 'tick' and 'refid' values are,
Make a helper table that contains all possible tick and refid values.
Then left join from the helper table on tick and refid to your data table.
If you don't know exactly what your 'tick' and 'refid' values are, you maybe could still use this method, but instead of a static helper table, it would have to be dynamically generated.
The following has too many sub-selects for my taste, but it generates the desired result in MySQL, as long as every tick and every refid occurs separately at least once in the table.
Start with a query that generates every pair of tick and refid. The following uses the table to generate the pairs, so if any tick never appears in the underlying table, it will also be missing from the generated pairs. The same holds true for refids, though the restriction that "All refids will have values at tick=1" should ensure the latter never happens.
SELECT tick, refid FROM
(SELECT refid FROM chadwick WHERE tick=1) AS r
JOIN
(SELECT DISTINCT tick FROM chadwick) AS t
Using this, generate every missing tick, refid pair, along with the largest tick that exists in the table by equijoining on refid and θ≥-joining on tick. Group by the generated tick, refid since only one row for each pair is desired. The key to filtering out existing tick, refid pairs is the HAVING clause. Strictly speaking, you can leave out the HAVING; the resulting query will return existing rows with their existing values.
SELECT tr.tick, tr.refid, MAX(c.tick) AS ctick
FROM
(SELECT tick, refid FROM
(SELECT refid FROM chadwick WHERE tick=1) AS r
JOIN
(SELECT DISTINCT tick FROM chadwick) AS t
) AS tr
JOIN chadwick AS c ON tr.tick >= c.tick AND tr.refid=c.refid
GROUP BY tr.tick, tr.refid
HAVING tr.tick > MAX(c.tick)
One final select from the above as a sub-select, joined to the original table to get the value for the given ctick, returns the new rows for the table.
INSERT INTO chadwick
SELECT missing.tick, missing.refid, c.value
FROM (SELECT tr.tick, tr.refid, MAX(c.tick) AS ctick
FROM
(SELECT tick, refid FROM
(SELECT refid FROM chadwick WHERE tick=1) AS r
JOIN
(SELECT DISTINCT tick FROM chadwick) AS t
) AS tr
JOIN chadwick AS c ON tr.tick >= c.tick AND tr.refid=c.refid
GROUP BY tr.tick, tr.refid
) AS missing
JOIN chadwick AS c ON missing.ctick = c.tick AND missing.refid=c.refid
;
Performance on the sample table, along with (tick, refid) and (refid, tick) indices:
+----+-------------+------------+-------+-------------------+----------+---------+----------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+-------------------+----------+---------+----------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | c | ALL | tick_ref,ref_tick | NULL | NULL | NULL | 6 | Using where; Using join buffer |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 9 | Using temporary; Using filesort |
| 2 | DERIVED | c | ref | tick_ref,ref_tick | ref_tick | 5 | tr.refid | 1 | Using where; Using index |
| 3 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 3 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 3 | Using join buffer |
| 5 | DERIVED | chadwick | index | NULL | tick_ref | 10 | NULL | 6 | Using index |
| 4 | DERIVED | chadwick | ref | tick_ref | tick_ref | 5 | | 2 | Using where; Using index |
+----+-------------+------------+-------+-------------------+----------+---------+----------+------+---------------------------------+
As I said, too many sub-selects. A temporary table may help matters.
To check for missing ticks:
SELECT clo.tick+1 AS missing_tick
FROM chadwick AS chi
RIGHT JOIN chadwick AS clo ON chi.tick = clo.tick+1
WHERE chi.tick IS NULL;
This will return at least one row with tick equal to 1 + the largest tick in the table. Thus, the largest value in this result can be ignored.
In order to have the list of pairs (tick, refid) to insert get a whole list:
SELECT a.tick, b.refid
FROM ( SELECT DISTINCT tick FROM t) a
CROSS JOIN ( SELECT DISTINCT refid FROM t) b
Now substract from that query the existing ones:
SELECT a.tick tick, b.refid refid
FROM ( SELECT DISTINCT tick FROM t) a
CROSS JOIN ( SELECT DISTINCT refid FROM t) b
MINUS
SELECT DISTINCT tick, refid FROM t
Now you can join with t to obtain the final query (note that I use inner join + left join to obtain previous result but you could adapt):
INSERT INTO t(tick, refid, value)
SELECT c.tick, c.refid, t1.value
FROM ( SELECT a.tick tick, b.refid refid
FROM ( SELECT DISTINCT tick FROM t) a
CROSS JOIN ( SELECT DISTINCT refid FROM t) b
MINUS
SELECT DISTINCT tick, refid FROM t
) c
INNER JOIN t t1 ON t1.refid = c.refid and t1.tick < c.tick
LEFT JOIN t t2 ON t2.refid = c.refid AND t1.tick < t2.tick AND t2.tick < c.tick
WHERE t2.tick IS NULL