After reading Postgres manual and many posts here, I wrote this function tacking in mind all I found regarding security. It works great and does everything I looked for.
takes a json, each key has an array [visible, filter, arg1, optional arg2]
SELECT public.goods__list_json2('{"name": [true, 7, "Ad%"], "category": [true], "stock": [false, 4, 0]}', 20, 0);
returns a json array with requested data.
[{"name": "Adventures of TRON", "category": "Atari 2600"}, {"name": "Adventure", "category": "Atari 2600"}]
My question is, how could I really be sure that when I create the query using user input arguments, passing them as %L with format is injection safe?
By my db design, all is done through functions, running most of them as security definer only allowing certain roles to execute them.
Being secure, my intention is to convert old functions to this dynamic logic and save myself to write a lot of lines of code creating new or specific queries.
I would really appreciate a experienced Postgres developer could give me an advice on this.
I'm using Postgres 13.
CREATE FUNCTION public.goods__list_json (IN option__j jsonb, IN limit__i integer, IN offset__i integer)
RETURNS jsonb
LANGUAGE plpgsql
VOLATILE
STRICT
SECURITY DEFINER
COST 1
AS $$
DECLARE
table__v varchar := 'public.goods_full';
column__v varchar[] := ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'];
filter__v varchar[] := ARRAY['<', '>', '<=', '>=', '=', '<>', 'LIKE', 'NOT LIKE', 'ILIKE', 'NOT ILIKE', 'BETWEEN', 'NOT BETWEEN'];
select__v varchar[];
where__v varchar[];
sql__v varchar;
key__v varchar;
format__v varchar;
temp__v varchar;
temp__i integer;
betw__v varchar;
result__j jsonb;
BEGIN
FOR key__v IN SELECT jsonb_object_keys(option__j) LOOP
IF key__v = ANY(column__v) THEN
IF (option__j->key__v->0)::bool THEN
select__v := array_append(select__v, key__v);
END IF;
temp__i := (option__j->key__v->1)::int;
IF temp__i > 0 AND temp__i <= array_length(filter__v, 1) THEN
temp__v := (option__j->key__v->>2)::varchar;
IF temp__i >= 11 THEN
betw__v := (option__j->key__v->>3)::varchar;
format__v := format('%I %s %L AND %L', key__v, filter__v[temp__i], temp__v, betw__v);
ELSE
format__v := format('%I %s %L', key__v, filter__v[temp__i], temp__v);
END IF;
where__v := array_append(where__v, format__v);
END IF;
END IF;
END LOOP;
sql__v := 'SELECT jsonb_agg(t) FROM (SELECT '
|| array_to_string(select__v, ', ')
|| format(' FROM %s WHERE ', table__v)
|| array_to_string(where__v, ' AND ')
|| format(' OFFSET %L LIMIT %L', offset__i, limit__i)
|| ') t';
RAISE NOTICE 'SQL: %', sql__v;
EXECUTE sql__v INTO result__j;
RETURN result__j;
END;
$$;
A word of warning: this style with dynamic SQL in SECURITY DEFINER functions can be elegant and convenient. But don't overuse it. Do not nest multiple levels of functions this way:
The style is much more error prone than plain SQL.
The context switch with SECURITY DEFINER has a price tag.
Dynamic SQL with EXECUTE cannot save and reuse query plans.
No "function inlining".
And I'd rather not use it for big queries on big tables at all. The added sophistication can be a performance barrier. Like: parallelism is disabled for query plans this way.
That said, your function looks good, I see no way for SQL injection. format() is proven good to concatenate and quote values and identifiers for dynamic SQL. On the contrary, you might remove some redundancy to make it cheaper.
Function parameters offset__i and limit__i are integer. SQL injection is impossible through integer numbers, there is really no need to quote them (even though SQL allows quoted string constants for LIMIT and OFFSET). So just:
format(' OFFSET %s LIMIT %s', offset__i, limit__i)
Also, after verifying that each key__v is among your legal column names - and while those are all legal, unquoted column names - there is no need to run it through %I. Can just be %s
I'd rather use text instead of varchar. Not a big deal, but text is the "preferred" string type.
Related:
Format specifier for integer variables in format() for EXECUTE?
Function to return dynamic set of columns for given table
COST 1 seems too low. The manual:
COST execution_cost
A positive number giving the estimated execution cost for the
function, in units of cpu_operator_cost. If the function
returns a set, this is the cost per returned row. If the cost is not
specified, 1 unit is assumed for C-language and internal functions,
and 100 units for functions in all other languages. Larger values
cause the planner to try to avoid evaluating the function more often
than necessary.
Unless you know better, leave COST at its default 100.
Single set-based operation instead of all the looping
The whole looping can be replaced with a single SELECT statement. Should be noticeably faster. Assignments are comparatively expensive in PL/pgSQL. Like this:
CREATE OR REPLACE FUNCTION goods__list_json (_options json, _limit int = NULL, _offset int = NULL, OUT _result jsonb)
RETURNS jsonb
LANGUAGE plpgsql SECURITY DEFINER AS
$func$
DECLARE
_tbl CONSTANT text := 'public.goods_full';
_cols CONSTANT text[] := '{id, id__category, category, name, barcode, price, stock, sale, purchase}';
_oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN"}';
_sql text;
BEGIN
SELECT concat('SELECT jsonb_agg(t) FROM ('
, 'SELECT ' || string_agg(t.col, ', ' ORDER BY ord) FILTER (WHERE t.arr->>0 = 'true')
-- ORDER BY to preserve order of objects in input
, ' FROM ' || _tbl
, ' WHERE ' || string_agg (
CASE WHEN (t.arr->>1)::int BETWEEN 1 AND 10 THEN
format('%s %s %L' , t.col, _oper[(arr->>1)::int], t.arr->>2)
WHEN (t.arr->>1)::int BETWEEN 11 AND 12 THEN
format('%s %s %L AND %L', t.col, _oper[(arr->>1)::int], t.arr->>2, t.arr->>3)
-- ELSE NULL -- = default - or raise exception for illegal operator index?
END
, ' AND ' ORDER BY ord) -- ORDER BY only cosmetic
, ' OFFSET ' || _offset -- SQLi-safe, no quotes required
, ' LIMIT ' || _limit -- SQLi-safe, no quotes required
, ') t'
)
FROM json_each(_options) WITH ORDINALITY t(col, arr, ord)
WHERE t.col = ANY(_cols) -- only allowed column names - or raise exception for illegal column?
INTO _sql;
IF _sql IS NULL THEN
RAISE EXCEPTION 'Invalid input resulted in empty SQL string! Input: %', _options;
END IF;
RAISE NOTICE 'SQL: %', _sql;
EXECUTE _sql INTO _result;
END
$func$;
db<>fiddle here
Shorter, faster and still safe against SQLi.
Quotes are only added where necessary for syntax or to defend against SQL injection. Burns down to filter values only. Column names and operators are verified against the hard-wired list of allowed options.
Input is json instead of jsonb. Order of objects is preserved in json, so you can determine the sequence of columns in the SELECT list (which is meaningful) and WHERE conditions (which is purely cosmetic). The function observes both now.
Output _result is still jsonb. Using an OUT parameter instead of the variable. That's totally optional, just for convenience. (No explicit RETURN statement required.)
Note the strategic use of concat() to silently ignore NULL and the concatenation operator || so that NULL makes the concatenated string NULL. This way, FROM, WHERE, LIMIT, and OFFSET are only inserted where needed. A SELECT statement works without either of those. An empty SELECT list (also legal, but I suppose unwanted) results in a syntax error. All intended.
Using format() only for WHERE filters, for convenience and to quote values. See:
String concatenation using operator "||" or format() function
The function isn't STRICT anymore. _limit and _offset have default value NULL, so only the first parameter _options is required. _limit and _offset can be NULL or omitted, then each is stripped from the statement.
Using text instead of varchar.
Made constant variables actually CONSTANT (mostly for documentation).
Other than that the function does what your original does.
I tried to put all that I learned here and I came up with this below and new questions =D.
Is there any advantage declaring _oper this way '{LIKE, "NOT LIKE"}' instead of ARRAY['LIKE', 'NOT LIKE']?
Casting as int _limit and _offset, I'm assuming no SQLi, right?
Is it an elegant way for 'IN' and 'NOT IN' CASE? I wonder why string_agg() is allowed nested in concat() but not there where I needed to use a sub query.
This is a naive private function.
Edit: Removed "SECURITY DEFINER" as identified dangerous.
CREATE FUNCTION public.crud__select (IN _tbl text, IN _cols text[], IN _opts json, OUT _data jsonb)
LANGUAGE plpgsql STRICT AS
$$
DECLARE
_oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN", IN, "NOT IN"}';
BEGIN
EXECUTE (
SELECT concat('SELECT jsonb_agg(t) FROM ('
, 'SELECT ' || string_agg(e.col, ', ' ORDER BY ord) FILTER (WHERE e.arr->>0 = 'true')
, ' FROM ', _tbl
, ' WHERE ' || string_agg(
CASE
WHEN (e.arr->>1)::int BETWEEN 1 AND 10 THEN
format('%s %s %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2)
WHEN (e.arr->>1)::int BETWEEN 11 AND 12 THEN
format('%s %s %L AND %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2, e.arr->>3)
WHEN (e.arr->>1)::int BETWEEN 13 AND 14 THEN
format('%s %s (%s)', e.col, _oper[(e.arr->>1)::int], (
SELECT string_agg(format('%L', ee), ',') FROM json_array_elements_text(e.arr->2) ee)
)
END, ' AND ')
, ' OFFSET ' || (_opts->>'_offset')::int
, ' LIMIT ' || (_opts->>'_limit')::int
, ') t'
)
FROM json_each(_opts) WITH ORDINALITY e(col, arr, ord)
WHERE e.col = ANY(_cols)
) INTO _data;
END;
$$;
Then for table or view, I create wrapper function executable for some roles.
CREATE FUNCTION public.goods__select (IN _opts json, OUT _data jsonb)
LANGUAGE sql STRICT SECURITY DEFINER AS
$$
SELECT public.crud__select(
'public.goods_full',
ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'],
_opts
);
$$;
SELECT public.goods__select('{"_limit": 10, "name": [true, 9, "a%"], "id__category": [true, 13, [1, 2]], "category": [true]}'::json);
[{"name": "Atlantis II", "category": "Atari 2600", "id__category": 1}, .. , {"name": "Amidar", "category": "Atari 2600", "id__category": 1}]
I wish to extract only a number if it's between 8 and 12 digits long, otherwise I wish it to be a blank. However in the column of data there can be text which I want as a blank.
Have tried may alterations of the code below with different brackets, though I get an error
SELECT CASE WHEN
isnumeric(dbo.worksheet_pvt.MPRNExpected) = 0 THEN '' ELSE(
CASE WHEN(
len(dbo.worksheet_pvt.MPRNExpected) >= 8
AND len(dbo.worksheet_pvt.MPRNExpected) < 13
) THEN dbo.worksheet_pvt.MPRNExpected ELSE ''
END
) AS [ MPRN Expected ]
Assuming you are using SQL Server, I would suggest:
select (case when p.MPRNExpected not like '%[^0-9]%' and
len(p.MPRNExpected) between 8 and 12
then p.MPRNExpected
end) as MPRN_Expected
. . .
from dbo.worksheet_pvt p
Presumably, you don't want isnumeric(), because it allows characters such as '.', '-', and 'e' in the "number".
The problem with your code is that you have two case expressions and they are not terminated correctly.
As a note, in MySQL, you would use regular expressions:
select (case when p.MPRNExpected regexp '^[0-9]{8-12}$'
then p.MPRNExpected
end) as MPRN_Expected
. . .
from dbo.worksheet_pvt p
Try this query
SELECT CASE WHEN ISNUMERIC(COLUMN_NAME)=0 THEN '' ELSE
CASE WHEN (LEN(COLUMN_NAME)>=8 AND LEN(COLUMN_NAME)<13)
THEN COLUMN_NAME ELSE ''END END AS data FROM TABLE_NAME
Thanks.
I have the following query. According to http://dev.mysql.com/doc/refman/5.0/en/case.html, null=null is false, thus the first case row will never occur. What would be an alternative query to accomplish this?
SELECT CASE c1
WHEN NULL THEN CONCAT("Hi",name," Your value is NULL")
WHEN "v1" THEN CONCAT("Hello ",name," The value is ",
CASE c2
WHEN "v11" THEN "bla"
WHEN "v12" THEN "bla bla"
END
,")")
WHEN "v2" THEN CONCAT("Howdy there ",name," Yep, the value is ",
CASE c3
WHEN "v21" THEN "beebop"
WHEN "v22" THEN "bopbee"
END
,")")
END
AS myLabel
FROM mytable
WHERE bla="bla";
So use the where <condition> form of case:
SELECT CASE WHEN c1 is NULL THEN CONCAT('Hi', name, ' Your value is NULL')
WHEN c1 = 'v1'
THEN CONCAT('Hello ', name, ' The value is ',
(CASE WHEN c2 = 'v11' THEN 'bla'
WHEN c2 = 'v12' THEN 'bla bla'
END), ')'
)
WHEN c1 = 'v21
THEN CONCAT('Howdy there ', name, ' Yep, the value is ',
(CASE WHEN c3 = 'v21' THEN 'beebop'
WHEN c3 = 'v22' THEN 'bopbee'
END), ')'
)
END
FROM mytable
WHERE bla = 'bla';
Also, use single quotes for string (and date) constants. Don't use them for anything else, like quoting delimiters.
An alternative would be to use COALESCE to get a default value in case the field is NULL, and then use that default for comparison where you are comparing with NULL in your present your query, like so:
SELECT CASE COALESCE(c1,'')
WHEN '' THEN CONCAT("Hi",name," Your value is NULL")
Of course, '' may be a legitimate value on it's own, in which case you need to use another value as default. Owing to this, I would personally recommend Gordon's solution as the way to go.
I have a query that I am trying to convert to MySQL from MS SQL Server 2008. It runs fine on MSSQL,
I get the error
"Incorrect parameter count in the call to native function 'ISNULL'".
How do I solve this?
SELECT DISTINCT
dbo.`#EIM_PROCESS_DATA`.U_Tax_year,
dbo.`#EIM_PROCESS_DATA`.U_Employee_ID,
CASE
WHEN dbo.`#EIM_PROCESS_DATA`.U_PD_code = 'SYS033' THEN SUM(dbo.`#EIM_PROCESS_DATA`.U_Amount)
END AS PAYE,
CASE
WHEN dbo.`#EIM_PROCESS_DATA`.U_PD_code = 'SYS014' THEN SUM(dbo.`#EIM_PROCESS_DATA`.U_Amount)
END AS TOTALTAXABLE,
dbo.OADM.CompnyName,
dbo.OADM.CompnyAddr,
dbo.OADM.TaxIdNum,
dbo.OHEM.lastName + ', ' + ISNULL(dbo.OHEM.middleName, '') + '' + ISNULL(dbo.OHEM.firstName, '') AS EmployeeName
FROM
dbo.`#EIM_PROCESS_DATA`
INNER JOIN
dbo.OHEM ON dbo.`#EIM_PROCESS_DATA`.U_Employee_ID = dbo.OHEM.empID
CROSS JOIN
dbo.OADM
GROUP BY dbo.`#EIM_PROCESS_DATA`.U_Tax_year , dbo.`#EIM_PROCESS_DATA`.U_Employee_ID , dbo.OADM.CompnyName , dbo.OADM.CompnyAddr , dbo.OADM.TaxIdNum , dbo.OHEM.lastName , dbo.OHEM.firstName , dbo.OHEM.middleName , dbo.`#EIM_PROCESS_DATA`.U_PD_code
MySQL
SELECT DISTINCT
processdata.taxYear, processdata.empID,
CASE WHEN processdata.edCode = 'SYS033' THEN SUM (processdata.amount) END AS PAYE,
CASE WHEN processdata.edCode = 'SYS014' THEN SUM (processdata.amount) END AS TOTALTAXABLE,
company.companyName, company.streetAddress, company.companyPIN, employeemaster.lastName + ', ' + IFNULL(employeemaster.middleName, '')
+ ' ' + IFNULL(employeemaster.firstName, '') AS EmployeeName
FROM
processdata INNER JOIN
employeemaster ON processdata.empID = employeemaster.empID
CROSS JOIN company
GROUP BY processdata.taxYear, processdata.empID, company.companyName, company.streetAddress, company.companyPIN,
employeemaster.lastName, employeemaster.firstName, employeemaster.middleName, processdata.edCode
The MySQL equivalent of ISNULL is IFNULL
If expr1 is not NULL, IFNULL() returns expr1; otherwise it returns
expr2.
Maybe also look at SQL NULL Functions
The ISNULL from MySQL is used to check if a value is null
If expr is NULL, ISNULL() returns 1, otherwise it returns 0.
I would suggest that you switch to the ANSI standard function coalesce():
(dbo.OHEM.lastName + ', ' + coalesce(dbo.OHEM.middleName, '') + '' + coalesce(dbo.OHEM.firstName, '')
) AS EmployeeName
You could also make your query easier to read by including table aliases.
EDIT:
As a note, I seemed to have missed the direction of conversion. The MySQL query would use concat():
CONCAT(OHEM.lastName, ', ', coalesce(OHEM.middleName, ''),
coalesce(concat(' ', OHEM.firstName), '')
) AS EmployeeName
I was getting an error when running JUnit tests against a query which had ISNULL(value) with the error saying ISNULL needed two parameters. I fixed this by changing the query to have value is null and the code works the same while the tests now work.