MYSQL grouping by field that is not null - mysql

I have a table where a field is populated if the record is a duplicate. The code is already running, and properly checks for duplicates and is working.
The table looks like this:
id | dupe_ids | id_subscription
1 NULL 5343
2 3, 4 5343
3 2, 4 5343
4 2, 3 5343
5 NULL 5343
6 7 5343
7 6 5343
The query should return a count for the number of entries, but needs to group the duplicated ids. I need the query to group the records that have entries into one count, but somehow based on the duplicates. In the example above the count for subscription 5343, the count would be 4. Record 2 would count as one with 3 and 4 being skipped or grouped, and record 6 would count as one, with record 7 being grouped or skipped.
The query now looks like this:
SELECT app.id_subscription, app.id_site, app.id_customer, COUNT(*) AS app_count, site.url
FROM web_manager.app, web_manager.site
WHERE app.id_customer = :wm_id
AND (app.received_at BETWEEN :sdate AND :edate)
AND app.id_site = site.id
AND app.dupe_ids IS NULL
GROUP BY app.id_subscription
ORDER BY app_count DESC

If the values in dupe_ids is a list of numeric id values, and the list is always "in order" with the lowest value being the first in the list, as a dirty solution...
The query in my original answer (below) modified to replace the constant 0 with an expression like this: LEAST(a.id,SUBSTRING_INDEX(a.dupe_ids,',',1)+0).
That expression is saying: take the first value from the dupe_ids list, evaluate it in a numeric context, compare the numeric value to the id value from the row, and return the lower of the two.
SELECT COUNT(DISTINCT IF(a.dupe_ids IS NULL,a.id,LEAST(a.id,SUBSTRING_INDEX(a.dupe_ids,',',1)+0))) AS my_funky_cnt
, a.id_subscription
FROM web_manager.app a
JOIN web_manager.site s
ON s.id = a.id_site
WHERE ...
GROUP BY a.id_subscription
ORDER BY my_funky_cnt DESC
Again, removing the GROUP BY and the aggregate, to see what is actually being returned by the expression...
SELECT a.id
, a.dupe_ids
, a.id_subscription
, IF(a.dupe_ids IS NULL,a.id,LEAST(a.id,SUBSTRING_INDEX(a.dupe_ids,',',1)+0)) AS expr
FROM web_manager.app a
JOIN web_manager.site s
ON s.id = a.id_site
WHERE ...
ORDER BY a.id_subscription, a.dupe_ids IS NULL, a.id
we'd expect that to return:
id | dupe_ids | id_subscription | expr
2 3, 4 5343 2 -- id=2 is less than fv=3
3 2, 4 5343 2 -- fv=2 is less than id=3
4 2, 3 5343 2 -- fv=2 is less than id=4
6 7 5343 6 -- id=6 is less than fv=7
7 6 5343 6 -- fv=6 is less than id=7
1 NULL 5343 1
5 NULL 5343 5
So a GROUP BY id_subscription and COUNT(DISTINCT expr) would return a count of 4.
(this not tested)
This approach depends on dupes_id having the lowest id value listed first (first value in the list), evaluating that first value in a numeric context, and comparing that to the id value from the row.
If dupe_ids is an empty string, or starts with a comma, or the first non-blank characters can't be interpreted as a numeric value, then expr is going to return a 0.
EDIT
The original answer (below) was based on collapsing all of the rows with non-NULL values for a given id_subscription... returning a count of 3. The question has been updated, adding more example rows with non-NULL values which should not be collapsed together. Desired return for "count" is now 4. The query in the original answer would return a count of 3.
Getting a count of rows with a NULL value of dupe_ids is straightforward.
The sticky wicket is the bizarre contents of the dupe_ids column, the comma separated list of id values...
id dupe_ids
---- --------
2 '3,4'
3 '2,4'
4 '2,3'
6 '7'
7 '6'
This would be easier if we weren't dealing with a "comma separated list" of values. If we instead had foreign key references to the rows, in a separate table. Or, if we had some criteria other than the dupe_ids columns to identify rows that are "duplicates".
But, this wasn't the question asked. The question didn't ask if it would be better to avoid storing a comma separated list; whether there was a better approach.
The question leaves us dealing with a comma separated list. (It serves as an example of why we strongly recommend avoiding comma separated lists in the first place).
If we had an expression that has the values in dupe_ids along with the id value, together, so that we had identical values on the rows...
id dupe_ids expr
---- -------- ------
2 '3,4' '2,3,4'
3 '2,4' '2,3,4'
4 '2,3' '2,3,4'
6 '7' '6,7'
7 '6' '6,7'
Then we could use a COUNT(DISTINCT expr) to get us the return we're after. The ugly part is getting that value of expr. It would be easy to prepend or append id onto dupe_ids, but the resulting string values wouldn't be identical. The lists would be in a different order.
There's no simple builtin in function in MySQL to return the values shown for expr based on the contents of id and dupe_ids.
ORIGINAL ANSWER
The approach I would take is to use an expression, and count distinct values of that.
If dupe_ids is null, the return a unique value. If id is unique in the table, I would just use the value of that column. If dupe_ids is not null, then substitute a constant that is not a valid id value. Assuming id values are positive integers, I would use 0 or a negative value.
As an example:
SELECT COUNT(DISTINCT IF(a.dupe_ids IS NULL,a.id,0)) AS my_funky_cnt
, a.id_subscription
FROM web_manager.app a
JOIN web_manager.site s
ON s.id = a.id_site
WHERE ...
GROUP BY a.id_subscription
ORDER BY my_funky_cnt DESC
I'd verify the expression is "working" by first doing a query without the GROUP BY and aggregate...
SELECT a.id
, a.dupe_ids
, a.id_subscription
, IF(a.dupe_ids IS NULL,a.id,0) AS derived_col
FROM web_manager.app a
JOIN web_manager.site s
ON s.id = a.id_site
WHERE ...
ORDER BY a.id_subscription, a.dupe_ids IS NULL, a.id
We'd expect that to return:
id | dupe_ids | id_subscription | derived_col
1 NULL 5343 1
2 3, 4 5343 0
3 2, 4 5343 0
4 2, 3 5343 0
5 NULL 5343 5
So all of the rows with non-null dupe_ids have the same value, and the rows with NULL dupe_ids have a unique value.
And a COUNT(DISTINCT of that expression will return 3.

Related

adding more table rows depending on the first filtered result with mysql

Using mariadb version 10.5.15 (and SQLAlchemy with python 3.9).
After filtering the following table with e.g. count == 3 i would get the rows with id's
2, 3, 4, 7 and 12.
Then for each of these rows i want to add every row (of the same table) if row 2, 3, 4, 7 or 12 have the same group_id (excluding null) but a different group_leader value. So i would like to add
(same group_id, not same group_leader)
1, 3 (coming from id 2)
5 (coming from id 4)
10 (coming from id 7 and only id 10, because group_leader must be different)
id
count
group_id
group_leader
1
7
1
null
2
3
1
1
3
2
1
null
4
3
2
1
5
6
2
null
6
2
3
null
7
3
3
null
8
1
3
null
9
2
3
null
10
5
3
1
11
5
null
null
12
3
null
null
Is it possible to first do the select...from...where... and then add these other rows or do i first have to do something like join?
This is the actual example:
def query_positions(position_filter: dict):
result = db.session.query(Positions).join(
ProjectCrafts, Positions.project_craft_id == ProjectCrafts.project_craft_id).join(
Projects, Positions.project_id == Projects.project_id
)
if "firm_id" in position_filter:
result = result.filter(Positions.firm_id == position_filter["firm_id"])
if "craft" in position_filter:
result = result.filter(ProjectCrafts.craft == position_filter["craft"])
if "craft_name" in position_filter:
result = result.filter(ProjectCrafts.craft_name == position_filter["craft_name"])
positions1 = aliased(Positions)
result = result.join(positions1, Positions.is_parent == 1, Positions.family_id == positions1.family_id).join(
Positions.family_id == positions1.family_id)
positions = result.all()
return positions
The problem comes after the positions1 = aliased(Positions) and i get this error
...
in _join_determine_implicit_left_side
raise sa_exc.InvalidRequestError( sqlalchemy.exc.InvalidRequestError: Don't know how to join to
<AliasedInsp at 0x7fabd1ad30; Positions(Positions)>. Please use the
.select_from() method to establish an explicit left side, as well as
providing an explicit ON clause if not present already to help resolve
the ambiguity.
You can join the filtered table on the count_ with the original table, where you impose the two main conditions:
"group_id" are the same
"group_leader" are different
Then apply a UNION between the two result sets, optionally followed by and ORDER BY clause to order your values on the id.
Given that NULL values and NON-NULL values are neither equal nor different, a way to compare them is transforming NULL values to -1 (assuming this value cannot be employed by "group_leader" values) using the COALESCE function.
WITH cte AS (
SELECT * FROM tab WHERE count_ = 3
)
SELECT tab.*
FROM tab
INNER JOIN cte
ON tab.group_id = cte.group_id
AND COALESCE(tab.group_leader, -1) <> COALESCE(cte.group_leader, -1)
UNION
SELECT * FROM cte
ORDER BY id
Check the demo here.

MySQL Query to replace string with value

I have requirement like as below.
Need a MYSQL query to replace value with maching the below condition.
i have a table containg the Product ID
Product_ID
1
2
3
4
5
15
25
I want to replace the 5 with value of 1.111. My requiremnet is this that it should only replace the 5 value not the 15 value.
example 5 should be 1.111 but it sould not replace the 15 value.
You can use IF() or CASE to select a different value when the value meets a condition.
SELECT IF(product_id = '5', '1.111', product_id)
FROM yourTable
or
SELECT CASE product_id
WHEN '5' THEN '1.111'
ELSE product_id
END
FROM yourTable
CASE generalizes more easily to other values that you want to replace, since you can have multiple WHEN clauses.

Return rows matching one condition and if there aren't any then another in MYSQL

I have the following table as an example:
numbers type
--------------
1 1
5 2
6 1
8 2
9 3
14 2
3 1
From this table I would like to select the closest number that is less or equal to 5 AND of type 1 and if there is no such row matching, then (and only then) I would like to return the first closest number larger than 5 of type 2
I can solve this by running two queries:
SELECT number FROM numbers WHERE number <= 5 AND type = 1 ORDER BY number LIMIT 1
and if above query returns 0 results, I simply run the second query:
SELECT number FROM numbers WHERE number > 5 AND type = 2 ORDER BY number LIMIT 1
But is it possible, to achieve the same result by only using one query?
I was thinking something like
SELECT number FROM numbers WHERE (number <= 5 AND type = 1) OR (number > 5 AND type = 2) ORDER BY number LIMIT 1
But that would only work, if mysql first checks the first conditional in the parentheses against all rows and if it finds a match, it returns it, and if not, then it checks all rows against the second parenthesed conditional. It will not work, if it checks each row against both parentheses and only then moves to the next row, which is how I suspect it works.
This query will do what you want. It selects all numbers that match your two query constraints, and orders the results first by type (so that if there is a result for type 1 it will appear first) and then by either -number or number dependent on type (so that numbers <= 5 sort in descending order but numbers > 5 sort in ascending order):
SELECT number
FROM numbers
WHERE ( number <= 5 AND type = 1 )
OR ( number > 5 AND type = 2 )
ORDER BY type, CASE WHEN type = 1 THEN -number ELSE number END
LIMIT 1
Output:
3
Demo on dbfiddle
Combine the two, and you always prefer type 1 over type 2, hence the ORDER BY and LIMIT. The ABS means whichever is first by type, is the closes to the number 5.
SELECT number, type
FROM numbers
WHERE (number <=5 AND type=1) OR
(number > 5 AND type=2)
ORDER BY type ASC, ABS(number-5) ASC
LIMIT 1

count comma-separated values from a column - sql

I want count the length of a comma separated column
I have use these
(LENGTH(Col2) - LENGTH(REPLACE(Col2,",","")) + 1)
in my select query.
Demo:
id | mycolumn
1 2,5,8,60
2 4,5,1
3 5,Null,Null
query result for first two row is coming correctly.for 1 = 4 ,2 = 3 but for 3rd row it is calculating null value also.
Here is what I believe the actual state of your data is:
id | mycolumn
1 2,5,8,60
2 4,5,1
3 NULL
In other words, the entire value for mycolumn in your third record is NULL, likely from doing an operation involving a NULL value. If you actually had the text NULL your current query should still work.
The way to get around this would be to use COALESCE(val, "") when handling the NULL values in your strings.
Crude way of doing it is to replace the occurances of ',Null' with nothing first:-
SELECT a.id, (LENGTH(REPLACE(mycolumn, ',Null', '')) - LENGTH(REPLACE(REPLACE(mycolumn, ',Null', ''),",","")) + 1)
FROM some_table a
If the values refer to the id of rows in another table then you can join against that table using FIND_IN_SET and then count the matches (assuming that the string 'Null' is not an id on that other table)
SELECT a.id, COUNT(b.id)
FROM some_table a
INNER JOIN id_list_table b
ON FIND_IN_SET(b.id, a.mycolumn)
GROUP BY a.id

MySQL return max value or null if one column has no value

I try to get the max value of a mysql select, but want to have it null/empty/0 if there is one row containing no timestamp.
Table stats (simplyfied):
ID CLIENT ORDER_DATE CANCEL_DATE
1 5 1213567200
2 5 1213567200
3 6 1210629600 1281736799
4 6 1210629600 1281736799
5 7 1201042800 1248386399
6 7 1201042800
7 8 1205449200 1271282399
I'm now looking to get the lowest order date (no problem, as it is never empty), and
the maximum cancel date. If the client has already cancelled his subscription, the cancel date is filled, but if he is still active, there is no cancel date at all.
Query:
SELECT ID, min(ORDER_DATE) AS OD, max(CANCEL_DATE) AS CD FROM stats GROUP BY CLIENT
Returns:
ID OD CD
5 1213567200 // fine
6 1210629600 1281736799 // fine
7 1201042800 1248386399 // Should be empty
8 1205449200 1271282399 // fine
I can't figure it out how to return empty/0/NULL if there is one (or more) empty colums for a client. Also tried with NULL fields.
Thanks for any hint.
I don't know how fast it will be but I guess it can be solved like this:
SELECT ID, min(ORDER_DATE) AS OD,
IF(COUNT(*)=COUNT(CANCEL_DATE),max(CANCEL_DATE),NULL) AS CD
FROM stats GROUP BY CLIENT
I couldn't test it but the idea behind this solution is that count(cancel_date) should count all not null value entries and if it's equal to count(*) that means that there are no null values and it will return max(cancel_date), otherwise null.
You could use a query like this:
SELECT
client,
min(ORDER_DATE) AS OD,
case when MAX(CANCEL_DATE IS NULL)=0 THEN max(CANCEL_DATE) END AS CD
FROM
stats
GROUP BY
CLIENT
Please see fiddle here.
CANCEL_DATE IS NULL will be evaluated either to 0, when CANCEL_DATE is not null, or to 1 when it is null
MAX(CANCEL_DATE IS NULL) will be evaluated to 0 if there are no cancel_date with null values, otherwise its value will be 1.
when MAX(CANCEL_DATE IS NULL)=0 it means that there are no rows where CANCEL_DATE is null, and we need to return MAX(cancel_date) in that case, otherwise we need to return NULL.