After reading Postgres manual and many posts here, I wrote this function tacking in mind all I found regarding security. It works great and does everything I looked for.
takes a json, each key has an array [visible, filter, arg1, optional arg2]
SELECT public.goods__list_json2('{"name": [true, 7, "Ad%"], "category": [true], "stock": [false, 4, 0]}', 20, 0);
returns a json array with requested data.
[{"name": "Adventures of TRON", "category": "Atari 2600"}, {"name": "Adventure", "category": "Atari 2600"}]
My question is, how could I really be sure that when I create the query using user input arguments, passing them as %L with format is injection safe?
By my db design, all is done through functions, running most of them as security definer only allowing certain roles to execute them.
Being secure, my intention is to convert old functions to this dynamic logic and save myself to write a lot of lines of code creating new or specific queries.
I would really appreciate a experienced Postgres developer could give me an advice on this.
I'm using Postgres 13.
CREATE FUNCTION public.goods__list_json (IN option__j jsonb, IN limit__i integer, IN offset__i integer)
RETURNS jsonb
LANGUAGE plpgsql
VOLATILE
STRICT
SECURITY DEFINER
COST 1
AS $$
DECLARE
table__v varchar := 'public.goods_full';
column__v varchar[] := ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'];
filter__v varchar[] := ARRAY['<', '>', '<=', '>=', '=', '<>', 'LIKE', 'NOT LIKE', 'ILIKE', 'NOT ILIKE', 'BETWEEN', 'NOT BETWEEN'];
select__v varchar[];
where__v varchar[];
sql__v varchar;
key__v varchar;
format__v varchar;
temp__v varchar;
temp__i integer;
betw__v varchar;
result__j jsonb;
BEGIN
FOR key__v IN SELECT jsonb_object_keys(option__j) LOOP
IF key__v = ANY(column__v) THEN
IF (option__j->key__v->0)::bool THEN
select__v := array_append(select__v, key__v);
END IF;
temp__i := (option__j->key__v->1)::int;
IF temp__i > 0 AND temp__i <= array_length(filter__v, 1) THEN
temp__v := (option__j->key__v->>2)::varchar;
IF temp__i >= 11 THEN
betw__v := (option__j->key__v->>3)::varchar;
format__v := format('%I %s %L AND %L', key__v, filter__v[temp__i], temp__v, betw__v);
ELSE
format__v := format('%I %s %L', key__v, filter__v[temp__i], temp__v);
END IF;
where__v := array_append(where__v, format__v);
END IF;
END IF;
END LOOP;
sql__v := 'SELECT jsonb_agg(t) FROM (SELECT '
|| array_to_string(select__v, ', ')
|| format(' FROM %s WHERE ', table__v)
|| array_to_string(where__v, ' AND ')
|| format(' OFFSET %L LIMIT %L', offset__i, limit__i)
|| ') t';
RAISE NOTICE 'SQL: %', sql__v;
EXECUTE sql__v INTO result__j;
RETURN result__j;
END;
$$;
A word of warning: this style with dynamic SQL in SECURITY DEFINER functions can be elegant and convenient. But don't overuse it. Do not nest multiple levels of functions this way:
The style is much more error prone than plain SQL.
The context switch with SECURITY DEFINER has a price tag.
Dynamic SQL with EXECUTE cannot save and reuse query plans.
No "function inlining".
And I'd rather not use it for big queries on big tables at all. The added sophistication can be a performance barrier. Like: parallelism is disabled for query plans this way.
That said, your function looks good, I see no way for SQL injection. format() is proven good to concatenate and quote values and identifiers for dynamic SQL. On the contrary, you might remove some redundancy to make it cheaper.
Function parameters offset__i and limit__i are integer. SQL injection is impossible through integer numbers, there is really no need to quote them (even though SQL allows quoted string constants for LIMIT and OFFSET). So just:
format(' OFFSET %s LIMIT %s', offset__i, limit__i)
Also, after verifying that each key__v is among your legal column names - and while those are all legal, unquoted column names - there is no need to run it through %I. Can just be %s
I'd rather use text instead of varchar. Not a big deal, but text is the "preferred" string type.
Related:
Format specifier for integer variables in format() for EXECUTE?
Function to return dynamic set of columns for given table
COST 1 seems too low. The manual:
COST execution_cost
A positive number giving the estimated execution cost for the
function, in units of cpu_operator_cost. If the function
returns a set, this is the cost per returned row. If the cost is not
specified, 1 unit is assumed for C-language and internal functions,
and 100 units for functions in all other languages. Larger values
cause the planner to try to avoid evaluating the function more often
than necessary.
Unless you know better, leave COST at its default 100.
Single set-based operation instead of all the looping
The whole looping can be replaced with a single SELECT statement. Should be noticeably faster. Assignments are comparatively expensive in PL/pgSQL. Like this:
CREATE OR REPLACE FUNCTION goods__list_json (_options json, _limit int = NULL, _offset int = NULL, OUT _result jsonb)
RETURNS jsonb
LANGUAGE plpgsql SECURITY DEFINER AS
$func$
DECLARE
_tbl CONSTANT text := 'public.goods_full';
_cols CONSTANT text[] := '{id, id__category, category, name, barcode, price, stock, sale, purchase}';
_oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN"}';
_sql text;
BEGIN
SELECT concat('SELECT jsonb_agg(t) FROM ('
, 'SELECT ' || string_agg(t.col, ', ' ORDER BY ord) FILTER (WHERE t.arr->>0 = 'true')
-- ORDER BY to preserve order of objects in input
, ' FROM ' || _tbl
, ' WHERE ' || string_agg (
CASE WHEN (t.arr->>1)::int BETWEEN 1 AND 10 THEN
format('%s %s %L' , t.col, _oper[(arr->>1)::int], t.arr->>2)
WHEN (t.arr->>1)::int BETWEEN 11 AND 12 THEN
format('%s %s %L AND %L', t.col, _oper[(arr->>1)::int], t.arr->>2, t.arr->>3)
-- ELSE NULL -- = default - or raise exception for illegal operator index?
END
, ' AND ' ORDER BY ord) -- ORDER BY only cosmetic
, ' OFFSET ' || _offset -- SQLi-safe, no quotes required
, ' LIMIT ' || _limit -- SQLi-safe, no quotes required
, ') t'
)
FROM json_each(_options) WITH ORDINALITY t(col, arr, ord)
WHERE t.col = ANY(_cols) -- only allowed column names - or raise exception for illegal column?
INTO _sql;
IF _sql IS NULL THEN
RAISE EXCEPTION 'Invalid input resulted in empty SQL string! Input: %', _options;
END IF;
RAISE NOTICE 'SQL: %', _sql;
EXECUTE _sql INTO _result;
END
$func$;
db<>fiddle here
Shorter, faster and still safe against SQLi.
Quotes are only added where necessary for syntax or to defend against SQL injection. Burns down to filter values only. Column names and operators are verified against the hard-wired list of allowed options.
Input is json instead of jsonb. Order of objects is preserved in json, so you can determine the sequence of columns in the SELECT list (which is meaningful) and WHERE conditions (which is purely cosmetic). The function observes both now.
Output _result is still jsonb. Using an OUT parameter instead of the variable. That's totally optional, just for convenience. (No explicit RETURN statement required.)
Note the strategic use of concat() to silently ignore NULL and the concatenation operator || so that NULL makes the concatenated string NULL. This way, FROM, WHERE, LIMIT, and OFFSET are only inserted where needed. A SELECT statement works without either of those. An empty SELECT list (also legal, but I suppose unwanted) results in a syntax error. All intended.
Using format() only for WHERE filters, for convenience and to quote values. See:
String concatenation using operator "||" or format() function
The function isn't STRICT anymore. _limit and _offset have default value NULL, so only the first parameter _options is required. _limit and _offset can be NULL or omitted, then each is stripped from the statement.
Using text instead of varchar.
Made constant variables actually CONSTANT (mostly for documentation).
Other than that the function does what your original does.
I tried to put all that I learned here and I came up with this below and new questions =D.
Is there any advantage declaring _oper this way '{LIKE, "NOT LIKE"}' instead of ARRAY['LIKE', 'NOT LIKE']?
Casting as int _limit and _offset, I'm assuming no SQLi, right?
Is it an elegant way for 'IN' and 'NOT IN' CASE? I wonder why string_agg() is allowed nested in concat() but not there where I needed to use a sub query.
This is a naive private function.
Edit: Removed "SECURITY DEFINER" as identified dangerous.
CREATE FUNCTION public.crud__select (IN _tbl text, IN _cols text[], IN _opts json, OUT _data jsonb)
LANGUAGE plpgsql STRICT AS
$$
DECLARE
_oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN", IN, "NOT IN"}';
BEGIN
EXECUTE (
SELECT concat('SELECT jsonb_agg(t) FROM ('
, 'SELECT ' || string_agg(e.col, ', ' ORDER BY ord) FILTER (WHERE e.arr->>0 = 'true')
, ' FROM ', _tbl
, ' WHERE ' || string_agg(
CASE
WHEN (e.arr->>1)::int BETWEEN 1 AND 10 THEN
format('%s %s %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2)
WHEN (e.arr->>1)::int BETWEEN 11 AND 12 THEN
format('%s %s %L AND %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2, e.arr->>3)
WHEN (e.arr->>1)::int BETWEEN 13 AND 14 THEN
format('%s %s (%s)', e.col, _oper[(e.arr->>1)::int], (
SELECT string_agg(format('%L', ee), ',') FROM json_array_elements_text(e.arr->2) ee)
)
END, ' AND ')
, ' OFFSET ' || (_opts->>'_offset')::int
, ' LIMIT ' || (_opts->>'_limit')::int
, ') t'
)
FROM json_each(_opts) WITH ORDINALITY e(col, arr, ord)
WHERE e.col = ANY(_cols)
) INTO _data;
END;
$$;
Then for table or view, I create wrapper function executable for some roles.
CREATE FUNCTION public.goods__select (IN _opts json, OUT _data jsonb)
LANGUAGE sql STRICT SECURITY DEFINER AS
$$
SELECT public.crud__select(
'public.goods_full',
ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'],
_opts
);
$$;
SELECT public.goods__select('{"_limit": 10, "name": [true, 9, "a%"], "id__category": [true, 13, [1, 2]], "category": [true]}'::json);
[{"name": "Atlantis II", "category": "Atari 2600", "id__category": 1}, .. , {"name": "Amidar", "category": "Atari 2600", "id__category": 1}]
I have created the below code:
with t as (select *,
case
when `2kids`= '1' then '2kids' else'' end as new_2kids,
case
when `3kids`= '1' then '3kids' else'' end as new_3kids,
case
when kids= '1' then 'kids' else'' end as kids
from test.family)
select concat_ws('/',new_2kids, new_3kids, new_kids) as 'nc_kids'
from t;
If I run this query my output will be:
nc_kids
2kids/new_3kids/
2kids//
/new_3kids/new_kids
2kids/new_3kids/new_kids
How can I remove all the unnecessary '/' which not followed by character.
For example:
nc_kids
2kids/new_3kids
2kids
new_3kids/new_kids
2kids/new_3kids/new_kids
concat_ws() ignore nulls, so you can just turn the empty strings to null values at concatenation time:
select concat_ws('/',
nullif(new_2kids, ''),
nullif(new_3kids, ''),
nullif(new_kids, '')
) as nc_kids
from t;
Better yet, fix the case expressions so they produce null values instead of empty stings in the first place:
with t as (
select f.*,
case when `2kids`= 1 then '2kids' end as new_2kids,
case when `3kids`= 1 then '3kids' end as new_3kids,
case when kids = 1 then 'kids' end as kids
from test.family f
)
select concat_ws('/',new_2kids, new_3kids, new_kids) as nc_kids
from t;
We have x2 columns min and max. Each can be null or integer. When we start search throw table we cannot use BETWEEN command... Question is, how to find in range with this conditions
value is greater then min (if it's not null)
and
value is less then max (if it's not null)
and
value is in range of min and max (if they BOTH not null)
value - our integer number. As you can see we cannot use BETWEEN command.
So NULL means no limit. You can still use BETWEEN:
select *
from mytable
where #value between coalesce(minvalue, #value) and coalesce(maxvalue, #value);
Or simply AND:
select *
from mytable
where #value >= coalesce(minvalue, #value)
and #value <= coalesce(maxvalue, #value);
Or the very basic AND and OR:
select *
from mytable
where (#value >= minvalue or minvalue is null)
and (#value <= maxvalue or maxvalue is null);
Use this:
WHERE col BETWEEN COALESCE(min, -2147483648) AND COALESCE(max, 2147483647)
According to your logic, if either the min or max be NULL, then the restriction should be ignored. In the above WHERE clause, if min be NULL then col will always be greater than the lower boundary, assuming that col is an integer. Similar logic applies to the max condition.
The large (and small) numbers you see represent the largest and smallest possible values for an integer in MySQL.
Without the option of using BETWEEN, I would recommend using a simple WHERE-AND clause.
If null values are not allowed, you should use the COALESCE function
http://dev.mysql.com/doc/refman/5.7/en/comparison-operators.html#function_coalesce
Returns the first non-NULL value in the list, or NULL if there are no non-NULL values.
SELECT *
FROM SCORES
WHERE score >= COALESCE(min_score, score)
AND score <= COALESCE(max_score, score)
Here is a sample fiddle I created
http://sqlfiddle.com/#!9/306947/2/0
My solution Yii2 AR like
$query
->joinWith(['vacancySalary'])
->andWhere([
'and',
'IF (vacancy_salary.min IS NULL, ' . $this->salaryMin . ', vacancy_salary.min) >= ' . $this->salaryMin,
'IF (vacancy_salary.max IS NULL, ' . $this->salaryMin . ', vacancy_salary.max) <= ' . $this->salaryMin
]);
Simple answer is use IF condition and proper values.
ADDED:
Another way to go
$query
->joinWith(['vacancySalary'])
->andWhere($this->salaryMin . ' BETWEEN IF(vacancy_salary.min IS NULL, 0, vacancy_salary.min) AND IF(vacancy_salary.max IS NULL, 0, vacancy_salary.max)');
Is it possible to sum the digits in a string and sort by that?
Example values: 19, 21
19 Should be transformed to 10. Explanation: 1+9=10
21 Should be transformed to 3. Explanation: 2+1= 3
After calculating these results, the table needs to be sorted by the resulting values (using SORT BY).
Originally, I have those values stored as JSON array, so it's ["1","9"] and ["2","1"], and in order to parse the JSON I'm using replace as follows:
REPLACE(REPLACE(REPLACE(item_qty, '["', ''), '"]', ''), '","', '')
How about trying something like:
SELECT (
SUBSTRING('["1","9"]', 3, 3) +
SUBSTRING('["1","9"]', 7, 7)
) AS sumOfDigits;
And then if the value ["1","9"] is stored in a column named json and the table is named table you van do:
SELECT * FROM (
SELECT table.*, (
SUBSTRING('json', 3, 3) +
SUBSTRING('json', 7, 7)
) AS sumOfDigits
FROM table
) tmp
ORDER BY sumOfDigits;
I would define a function to do the sum, as follow:
DELIMITER //
CREATE FUNCTION add_digits
(
number INTEGER
) RETURNS INTEGER
BEGIN
DECLARE my_sum INTEGER;
SET my_sum = 0;
SET number = ABS(number);
WHILE (number > 0) DO
SET my_sum = my_sum + (number MOD 10);
SET number = number DIV 10;
END WHILE;
RETURN my_sum;
END //
DELIMITER ;
You could also create a function that works directly on your json sting, parsing it for digits and adding their values.
I have a table which looks like this: http://i.stack.imgur.com/EyKt3.png
And I want a result like this:
Conditon COL
ted1 4
ted2 1
ted3 2
I.e., the count of the number of '1' only in this case.
I want to know the total no. of 1's only (check the table), neglecting the 0's. It's like if the condition is true (1) then count +1.
Also consider: what if there are many columns? I want to avoid typing expressions for every single one, like in this case ted1 to ted80.
Using proc means is the most efficient method:
proc means data=have noprint;
var ted:; *captures anything that starts with Ted;
output out=want sum =;
run;
proc print data=want;
run;
Try this
select
sum(case when ted1=1 then 1 else 0 end) as ted1,
sum(case when ted2=1 then 1 else 0 end) as ted2,
sum(case when ted3=1 then 1 else 0 end) as ted3
from table
In PostgreSQL (tested with version 9.4) you could unpivot with a VALUES expression in a LATERAL subquery. You'll need dynamic SQL.
This works for any table with any number of columns matching any pattern as long as selected columns are all numeric or all boolean. Only the value 1 (true) is counted.
Create this function once:
CREATE OR REPLACE FUNCTION f_tagcount(_tbl regclass, col_pattern text)
RETURNS TABLE (tag text, tag_ct bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT
'SELECT l.tag, count(l.val::int = 1 OR NULL)
FROM ' || _tbl || ', LATERAL (VALUES '
|| string_agg( format('(%1$L, %1$I)', attname), ', ')
|| ') l(tag, val)
GROUP BY 1
ORDER BY 1'
FROM pg_catalog.pg_attribute
WHERE attrelid = _tbl
AND attname LIKE col_pattern
AND attnum > 0
AND NOT attisdropped
);
END
$func$;
Call:
SELECT * FROM f_tagcount('tbl', 'ted%');
Result:
tag | tag_ct
-----+-------
ted1 | 4
ted2 | 1
ted3 | 2
The 1st argument is a valid table name, possibly schema-qualified. Defense against SQL-injection is built into the data type regclass.
The 2nd argument is a LIKE pattern for the column names. Hence the wildcard %.
db<>fiddle here
Old sqlfiddle
Related:
Select columns with particular column names in PostgreSQL
SELECT DISTINCT on multiple columns