BigQuery: Count non-null values across all columns with REGEX - json

I have the following query that helps me count how many null values were reported in each column across all columns of a table in BQ:
SELECT col_name, COUNT(1) nulls_count
FROM table t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(\w+)":null')) col_name
GROUP BY col_name
;
I need to adjust it so it counts the non-null values. I tried to use negative lookahead but it doesn't seem to work.
My end goal is to indicate wether a certain column reports at least 1 non-null value.
Input example (the table):
Output example:
column_c is not present since all of its values are nulls.

You can try this, (without REGEX) solution
select * from (select column, countif(val!= 'null') non_null
from `dataset.table` table1
,unnest(array(
select as struct trim(ar[offset(0)], '"') column, trim(ar[offset(1)], '"') val
from unnest(split(trim(to_json_string(table1), '{}'))) pb,
unnest([struct(split(pb, ':') as ar)])
)) record
group by column) where non_null!=0
output:

Related

Order by first non-null result that comes from two different columns

I want to browse through all values of two columns in a table:
if the value in column 1 is not null, select it, otherwise select the value in column 2 instead.
then sort the final result in alphabetical ascending order, wherever column its values came from.
I tried the following query but it doesn't work and I'm not even sure it is supposed to do what I want to do.
SELECT *
FROM table
ORDER BY (CASE WHEN col1 IS NOT NULL THEN 1 ELSE 2 END ),
col1 DESC,
col2 DESC)
Besides the fact that it doesn't work (nothing outputted), it seems to sort the values of each column separately while I want to sort the final set of values retrieved, regardless of the column they are from.
Thank you for your help.
If you want to fix it with the CASE expression, it'd look like the following:
SELECT *,
CASE WHEN col1 IS NOT NULL
THEN col1
ELSE col2
END AS col
FROM table
ORDER BY col
Although a nice option is using the COALESCE function. It returns the first non-null value in the list of arguments.
SELECT *, COALESCE(col1, col2) AS col
FROM table
ORDER BY col

Combining JSON_SEARCH and JSON_EXTRACT get me: "Invalid JSON path expression."

I have a table names "campaigns". One of the columns is named "filter_apps" and his type is JSON
I have file rows and they just contain array of tokens like so:
["be3beb1fe916ee653ab825fd8fe022", "c130b917983c719495042e31306ffb"]
["4fef3f1999c78cf987960492da4d2a"]
["106c274e319bdeae8bcf8daf515b1f"]
["2521f0df6cffb7487d527319674cf3"]
["c130b917983c719495042e31306ffb"]
Examples:
SELECT JSON_SEARCH(filter_apps, 'one', 'c130b917983c719495042e31306ffb') FROM campaigns;
Result:
"$[1]"
null
null
null
"$[0]"
Right now everything is correct, the matched columns come back. If I make a test I can prove it:
SELECT JSON_EXTRACT(filter_apps, '$[1]') FROM campaigns;
Result
"c130b917983c719495042e31306ffb"
null
null
null
null
So at this point I think I can extract the values using JSON_EXTRACT, my query:
SELECT JSON_EXTRACT(filter_apps, JSON_SEARCH(filter_apps, 'one', 'c130b917983c719495042e31306ffb')) FROM campaigns;
That leads me to an error:
"[42000][3143] Invalid JSON path expression. The error is around character position 1."
SOLUTION
Simple as that:
SELECT JSON_EXTRACT(filter_apps, JSON_UNQUOTE(JSON_SEARCH(filter_apps, 'one', 'c130b917983c719495042e31306ffb'))) FROM campaigns;
Problem resolved! I wrap JSON_SEARCH in a JSON_UNQUOTE method!
A little tip, I found the solution here: https://dev.mysql.com/doc/refman/5.7/en/json-function-reference.html
It took me hours, as my JSON object is way more complex, but I found the solution for the 'all' option.
SELECT *,
REPLACE(REPLACE(LTRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(filter_apps, ',', n), ',', -1)), '[', ''), ']', '') AS all_json
FROM (
SELECT *, JSON_EXTRACT(filter_apps, JSON_UNQUOTE(JSON_SEARCH(filter_apps, 'all', 'c130b917983c719495042e31306ffb'))) AS hit
FROM campaigns
) AS t
JOIN (SELECT #N := #N +1 AS n FROM campaigns, (SELECT #N:=0) dum LIMIT 10) numbers
ON CHAR_LENGTH(filter_apps) - CHAR_LENGTH(REPLACE(filter_apps, ',', '')) >= n - 1
WHERE hit IS NOT NULL;
# for the "JOIN-FROM" use a table that has more or equal entries than the length of your longest JSON array
# make sure the "JOIN-LIMIT" is higher or equal than the length of your longest JSON array
Query Explanation:
Inner SELECT:
Main Select as asked in question with JSON_SEARCH Option 'all'
JOIN:
a) SELECT table 'numbers':
create a table which contains the numbers from 1 to user defined LIMIT.
compare SQL SELECT to get the first N positive integers
b) JOIN ON combined with Outer SELECT SUBSTRING_INDEX:
splits the defined array column 'filter_apps' to the number of element of the array. Note user defined limit of 2)a) must be equal or greater than the longest array to split. compare SQL split values to multiple rows
REPLACE and LTRIM of Outer SELECT:
used to remove remaining brackets and spaces of previous array
WHERE clause:
to show only matching results of Inner SELECT

mysql get max number from a string field

I need to get maximum number from a part of the value that generally start with year followed by slash(/). So I need a maximum number after the slash(/) but year should be 2016
2016/422
2016/423
2016/469
2016/0470
2014/777
2015/123
2015/989
I tried this query
SELECT columname FROM tablename WHERE columname LIKE '2016/%' ORDER BY id DESC
the above query always giving '2016/469' as first record, how to get '2016/0470' as the maximum number?
any help will be much appreciated.
Thank you.
If columname follows that pattern YEAR/0000, you can use SUBSTRING function from MySQL to remove the part of the string you don't want.
SELECT value FROM (
SELECT CAST(SUBSTRING(columname, 0, 4) AS UNSIGNED) as year, CAST(SUBSTRING(columname FROM 6) AS UNSIGNED) as value FROM tablename
) total
ORDER BY year DESC, value DESC
LIMIT 1;
You need to split the string into 2 parts and evaluate them as numbers, instead of strings. The following formula will return the number after the / in the fieldname. All functions used below are described in the string functions section of the MySQL documentation. This way you can get the number after the / character, even if it is not year before the /, but sg else. The + 0 converts the string to a number, eliminating any leading 0.
select right(columnname, char_length(columnname)-locate('/',columnname)) + 0
from tablename
Just take the max() of the above expression to get the expected results.
UPDATE:
If you need the original number and the result has to be restricted to a specific year, then you need to join back the results to the original table:
select columnname
from tablename t1
inner join (select max(right(t.columnname, char_length(t.columnname)-locate('/',t.columnname)) + 0) as max_num
from tablename t
where left(t.columnname,4)='2016'
) t2
on right(t1.columnname, char_length(1t.columnname)-locate('/',t1.columnname)) + 0 = t2.max_num
where left(t1.columnname,4)='2016'
There are lots of suggestions given as answers already. But some of those seem overkill to me.
Seems like the only change needed to the OP query is the expression in the ORDER BY clause.
Instead of:
ORDER BY id
We just need to order by the numeric value following the slash. And there are several approaches, several expressions, that will get that from the example data.
Since the query already includes a condition columname LIKE '2016/%'
We can get the characters after the first five characters, and then convert that string to a numeric value by adding zero.
ORDER BY SUBSTRING(columname,6) + 0 DESC
If we only want to return one row, add
LIMIT 1
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_substring
If we only want to return the numeric value, we could use the same expression in the SELECT list, in addition columnname.
This isn't the only approach. There are lots of other approaches that will work, and don't use SUBSTRING.
Try like this:
SELECT
MAX(CAST(SUBSTRING(t.name,
LOCATE('/', t.name) + 1)
AS UNSIGNED)) AS max_value
FROM
tablename AS t;
You can try with this little uggly approach:
SELECT t.id, t2.secondNumber FROM table AS t
JOIN (SELECT id,
CONCAT(SUBSTRING(field,1,5),
if(SUBSTRING(SUBSTRING(field, 6),1,1)='0',
SUBSTRING(field, 6),
SUBSTRING(field,7)
)
) as secondNumber FROM table ) AS t2 ON t2.id=t.id
ORDER BY t2.secondNumber DESC
Would be valid only if the 0 (zeroes) before the second number (after the slash) are no more than 1.
Or if the year doesn`t matter you can try to order them only by the second number if it is ok:
SELECT t.id, t2.secondNumber FROM table AS t
JOIN (SELECT id,
if(SUBSTRING(SUBSTRING(field, 6),1,1)='0',
SUBSTRING(field, 6),
SUBSTRING(field,7)
) as secondNumber FROM table ) AS t2 ON t2.id=t.id
ORDER BY t2.secondNumber DESC

Temporary Variable with aggregate field and group by doesn't work in mysql

I am trying to get cumulative value of a column using temporary variable.
SELECT sum(price), #temp := #temp + sum(price) AS cumulative_price FROM `table`, (SELECT #temp := 0) B GROUP BY item
It work when there is no group by and aggregate field. However, when there is group by field, value of cumulative_price is same as sum(price), which is not a expected.
What could be the reason of this inconsistency?
It should not work. According to Doc
In a SELECT statement, each select expression is evaluated only when
sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY
clause, referring to a variable that is assigned a value in the select
expression list does not work as expected:
mysql> SELECT (#aa:=id) AS a, (#aa+3) AS b FROM tbl_name HAVING b=5;
The reference to b in the HAVING clause refers to an alias for an
expression in the select list that uses #aa. This does not work as
expected: #aa contains the value of id from the previous selected row,
not from the current row.
So when you define a variable it will work for current row not set of rows

MySQL COUNT() and nulls

Am I correct in saying:
COUNT(expr)
WHERE expr IS NOT *
Will count only non nulls?
Will COUNT(*) always count all rows? And What if all columns are null?
Correct. COUNT(*) is all rows in the table, COUNT(Expression) is where the expression is non-null only.
If all columns are NULL (which indicates you don't have a primary key, so this shouldn't happen in a normalized database) COUNT(*) still returns all of the rows inserted. Just don't do that.
You can think of the * symbol as meaning "in the table" and not "in any column".
This is covered in the MySQL Reference Manual.
If you want to count NULLs as well, try
SELECT COUNT(IFNULL(col, 1)) FROM table;
just checked:
select count(*)
returns 1 with one record filled with NULLs
select count(field)
returns 0.
I don't see the point in the record with NULL values. Such record must not exist.
count(*) is not for non-null columns, it's just the way to ask to count all rows. Roughly equivalent to count(1).
Using MySQL I found this simple way:
SELECT count(ifnull(col,1)) FROM table WHERE col IS NULL;
This way will not work:
SELECT count(col) FROM table WHERE col IS NULL;
If you want to count only the nulls you can also use COUNT() with IF.
Example:
select count(*) as allRows, count(if(nullableField is null, 1, NULL)) as missing from myTable;
You can change the if condiditon to count what you actually want. So you can have multiple counts in one query.
select count(*) as 'total', sum(if(columna is null, 1, 0)) as 'nulos' from tabla;