Optimize escape JSON in PostgreSQL 9.0 - json

I'm currently using this JSON escaping function in PostgreSQL as a stand in for future native JSON support. While it works, it's also limiting our systems performance. How can I go about optimizing it? Maybe some kind of lookup array?
CREATE OR REPLACE FUNCTION escape_json(i_text TEXT)
RETURNS TEXT AS
$body$
DECLARE
idx INTEGER;
text_len INTEGER;
cur_char_unicode INTEGER;
rtn_value TEXT := i_text;
BEGIN
-- $Rev: $ --
text_len = LENGTH(rtn_value);
idx = 1;
WHILE (idx <= text_len) LOOP
cur_char_unicode = ASCII(SUBSTR(rtn_value, idx, 1));
IF cur_char_unicode > 255 THEN
rtn_value = OVERLAY(rtn_value PLACING (E'\\u' || LPAD(UPPER(TO_HEX(cur_char_unicode)),4,'0')) FROM idx FOR 1);
idx = idx + 5;
text_len = text_len + 5;
ELSE
/* is the current character one of the following: " \ / bs ff nl cr tab */
IF cur_char_unicode IN (34, 92, 47, 8, 12, 10, 13, 9) THEN
rtn_value = OVERLAY(rtn_value PLACING (E'\\' || (CASE cur_char_unicode
WHEN 34 THEN '"'
WHEN 92 THEN E'\\'
WHEN 47 THEN '/'
WHEN 8 THEN 'b'
WHEN 12 THEN 'f'
WHEN 10 THEN 'n'
WHEN 13 THEN 'r'
WHEN 9 THEN 't'
END)
)
FROM idx FOR 1);
idx = idx + 1;
text_len = text_len + 1;
END IF;
END IF;
idx = idx + 1;
END LOOP;
RETURN rtn_value;
END;
$body$
LANGUAGE plpgsql;

Confession: I am the Google Summer of Code 2010 student who was going to try to bring JSON support to PostgreSQL 9.1. Although my code was fairly feature-complete , it wasn't completely ready for upstream, and the PostgreSQL development community was looking at some alternative implementations. However, with spring break coming up, I'm hoping to finish my rewrite and give it a final push this week.
In the mean time, you can download and install the work-in-progress JSON data type module, which should work on PostgreSQL 8.4.0 and up. It is a PGXS module, so you can compile and install it without having to compile all of PostgreSQL. However, you will need the PostgreSQL server development headers.
Installation goes something like this:
git clone git://git.postgresql.org/git/json-datatype.git
cd json-datatype/
USE_PGXS=1 make
sudo USE_PGXS=1 make install
psql -f json.sql <DBNAME1> # requires database superuser privileges
Although the build and install only needs to be done once, json.sql needs to be run on every database you plan to use the JSON data type on.
With that installed, you can now run:
=> SELECT to_json(E'"quotes and \n newlines"\n'::TEXT);
to_json
--------------------------------
"\"quotes and \n newlines\"\n"
(1 row)
Note that this does not escape non-ASCII characters.

All my approaches boil down to "do it some other way":
Write it in some other language, e.g. use pl/perl, pl/python, pl/ruby
Write a wrapper round some external JSON library written in C
Do the JSON escaping in the client rather than in the query (assuming your client has some good JSON escaping support)
In my experience pl/pgsql isn't fast at this sort of thing- its strength is in its integral support for exchanging data with the database, not as a general-purpose programming language.
Example:
create or replace function escape_json_perl(text) returns text
strict immutable
language plperlu as $$
use JSON;
return JSON->new->allow_nonref->encode($_[0]);
$$;
A quick test suggests this is on the order of 15x faster than the plpgsql function (although it returns quotes around the value which you probably want to strip off)

I have found a PostgreSQL function implemented in C here : http://code.google.com/p/pg-to-json-serializer/
I have not compared it with your PLSQL method but it should be faster than any interpreted language.
Another one : http://miketeo.net/wp/index.php/projects/json-functions-for-postgresql

Related

How to create a json file from postgres table with parameters as a filename

is there any way for me to create a json file from postgres table
as i did create using
copy to
but when i create in a postgres function to be scheduled, it is unable to capture the parameters as the file name
copy v_originaltext to '/var/lib/postgresql/sftp/smrptesting1.json';
You could likely accomplish what you're looking to do in a couple of different ways, though both would require you to GRANT a combination of pg_write_server_files and/or pg_execute_server_program to the executing role/user.
Method 1: Using Dynamic SQL (with pg_write_server_files)
Here, you'd extrapolate upon your initial plan of using the COPY command, changing
COPY v_originaltext to '/var/lib/postgresql/sftp/smrptesting1.json';
to instead DECLARE a SQL string variable (v_sql) within the function block that you'd then EXECUTE, i.e. something like
CREATE OR REPLACE FUNCTION json_putter(_parameters TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $function$
DECLARE
v_sql TEXT DEFAULT NULL;
v_originaltext JSONB;
v_filename TEXT;
BEGIN
-- < ... code to populate v_originaltext here ... >
v_filename := '/var/lib/postgresql/sftp/' || _parameters || '.json';
v_sql := 'COPY $1 TO $2';
EXECUTE v_sql USING v_originaltext, v_filename;
-- or, alternatively, with literal single quotes:
v_sql := 'COPY (SELECT v_originaltext) TO ' || CHR(39) || v_filename || CHR(39);
EXECUTE v_sql;
END
$function$
which will build up the COPY statement as if you were manually running from the client.
Method 2: Using server-side psql command (with pg_execute_server_program)
You can output the json directly from the psql command, and redirect the stdout to a file descriptor that you generate from the parameters.
To implement this, you'd probably want to create two atomic functions for simplicity's sake:
One that acts as a wrapper to generate the value of v_originaltext [e.g. fx_originaltext(_parameters => text)]
One that will run a generic command via psql using the COPY ... FROM PROGRAM utility and consumes the stdout with a temp table or similar:
--
CREATE OR REPLACE FUNCTION psql_runner(_function TEXT, _parameters TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $function$
DECLARE
v_filename TEXT;
BEGIN
v_filename := '/var/lib/postgresql/sftp/' || _parameters || '.json';
CREATE TEMP TABLE _out ( stdout text );
COPY _out FROM PROGRAM ('psql -tq -c "SELECT ' || _function || '(_parameters :=' || quote_literal(_parameters) || ') " > ' || v_filename);
END
$function$
Notably, the second method requires a larger scope of privileges, and depending on which version of postgres you're using, may need to execute the COPY as dynamic SQL as in the first method.

unescape diactrics in \u0 format (json) in ms sql (SQL Server)

I'm getting json file, which I load to Azure SQL databese. This json is direct output from API, so there is nothing I can do with it before loading to DB.
In that file, all Polish diactircs are escaped to "C/C++/Java source code" (based on: http://www.fileformat.info/info/unicode/char/0142/index.htm
So for example:
ł is \u0142
I was trying to find some method to convert (unescape) those to proper Polish letters.
In worse case scenario, I can write function which will replace all combinations
Repalce(Replace(Replace(string,'\u0142',N'ł'),'\u0144',N'ń')))
And so on, making one big, terrible function...
I was looking for some ready functions like there is for URLdecode, which was answered here on stack in many topics, and here: https://www.codeproject.com/Articles/1005508/URL-Decode-in-T-SQL
Using this solution would be possible but I cannot figure out cast/convert with proper collation and types in there, to get result I'm looking for.
So if anyone knows/has function that would make conversion in string for unescaping that \u this would be great, but I will manage to write something on my own if I would get right conversion. For example I tried:
select convert(nvarchar(1), convert(varbinary, 0x0142, 1))
I made assumption that changing \u to 0x will be the answer but it gives some Chinese characters. So this is wrong direction...
Edit:
After googling more I found exactly same question here on stack from #Pasetchnik: Json escape unicode in SQL Server
And it looks this would be the best solution that there is in MS SQL.
Onlty thing I needed to change was using NVARCHAR instead of VARCHAR that is in linked solution:
CREATE FUNCTION dbo.Json_Unicode_Decode(#escapedString nVARCHAR(MAX))
RETURNS nVARCHAR(MAX)
AS
BEGIN
DECLARE #pos INT = 0,
#char nvarCHAR,
#escapeLen TINYINT = 2,
#hexDigits TINYINT = 4
SET #pos = CHARINDEX('\u', #escapedString, #pos)
WHILE #pos > 0
BEGIN
SET #char = NCHAR(CONVERT(varbinary(8), '0x' + SUBSTRING(#escapedString, #pos + #escapeLen, #hexDigits), 1))
SET #escapedString = STUFF(#escapedString, #pos, #escapeLen + #hexDigits, #char)
SET #pos = CHARINDEX('\u', #escapedString, #pos)
END
RETURN #escapedString
END
Instead of nested REPLACE you could use:
DECLARE #string NVARCHAR(MAX)= N'\u0142 \u0144\u0142';
SELECT #string = REPLACE(#string,u, ch)
FROM (VALUES ('\u0142',N'ł'),('\u0144', N'ń')) s(u, ch);
SELECT #string;
DBFiddle Demo

check each character from a string by pgsql function

I want to make a function which will check each character of a string.
For example, lets take a word "pppppoooossssttt", in this case the function will return a warning if same character repeated for more that 2 times. Here 'p' is repeated for 5 times. So the function will return a warning message.
If you're able to install plpython on your setup, this is what I would do. Then you could simply place this test inside a "WHERE" clause of a standard SQL function. It's immutable so it will only be called once, and it always returns True so it won't affect the results of a SQL query. Postgres has had some pretty shaky python implementations but either they or EnterpriseDB cleaned things up in the latest release.
CREATE OR REPLACE FUNCTION two_or_less(v text)
RETURNS BOOLEAN AS $$
#If the string is two characters
#or less, we can quit now.
ct = len(v)
if ct < 3:
return True
import plpy
warned = set()
a,b,c = v[:3]
for d in v[2:]:
if a == b == c:
if a not in warned:
warned.add(a)
plpy.warning('The character %r is repeated more than twice in a row.' % a)
a,b,c = b,c,d
return True
$$ LANGUAGE 'plpython3u' IMMUTABLE;

In Delphi using MyDAC, how do I write an entire record as a string?

As the title suggests, using Delphi 2010 and MyDAC 7.1, how do I output an entire string as a string like JSON / XML / CSV or some other plain text option?
eg output:
{user_id:1;username:testuser;password:testpass}
Presuming that MyDAC is a standard TDataSet descendant, you can build the string manually. For instance, for JSON:
var
i: Integer;
Output: string;
begin
Output := '{'; // #123;
for i := 0 to MyQry.FieldCount - 1 do
Output := Output +
MyQry.Fields[i].FieldName + ':' + // #58
MyQry.Fields[i].AsString + ';'; // #59
// Replace final ; with closing } instead
Output[Length(Output)] := '}'; // #125
end;
Or you can Google to find a Delphi JSON library (like SuperObject) and use it instead, as is done here.
For XML, use the same type loop with TXMLDocument. You can search for previous posts here tagged with Delphi to find examples.
For CSV, it can get complicated based on your data and the requirements. For instance, do you want or need a header row with field names? Does your data contain data that contains spaces or punctuation or CR/LFs? The easiest solution is to search for an already-existing library (via Google or Bing, not here) that exports to CSV.
According to the documentation you can use the SaveToXML procedures. should be something like this:
var
MyQuery: TMyQuery;
begin
try
MyQuery := TMyQuery.Create();
MyQuery.Connection := AConnection;
MyQuery.SQL.Text := ACommand;
MyQuery.Execute();
MyQuery.SaveToXML(<tstream or file>)
except
raise;
end;
end;

VHDL, using functions in for generate statement

VHDL, using functions in for generate statement
I have a component that should be instantiated about 8000 times, I used for-generate statement with the help of some constant values for reducing amount of code, but I had to declare a function for parametrization of component connections.
My function looks like this:
function dim1_calc (
cmp_index : integer;
prt_index : integer
) return integer is
variable updw : integer := 0;
variable shft_v : integer := 0;
variable result : integer := 0;
begin
if (cmp_index < max_up) then
updw := 1;
else
updw := 2;
end if;
case prt_index is
when 1 =>
shft_v := cnst_rom(updw)(1) + (i-1);
when 2 =>
shft_v := cnst_rom(updw)(2) + (i);
--
--
--
when 32 =>
shft_v := cnst_rom(updw)(32) + (i);
when others =>
shft_v := 0;
end case;
if (updw = 1) then
if (shft_v = min_up & ((prt_index mod 2) = 0)) then
result <= max_up;
elsif (shft_v = max_up & ((prt_index mod 2) = 1)) then
result <= min_up;
elsif (shft_v < max_up) then
result <= shft_v;
else
result <= shft_v - max_up;
end if;
else
--something like first condition statements...
--
--
end if;
return result;
end function;
and part of my code that uses this function plus some related part looks like this:
--these type definitions are in my package
type nx_bits_at is array (natural range <>) of std_logic_vector (bits-1 downto 0);
type mxn_bits_at is array (natural range <>) of nx_bits_at;
--
--
--
component pn_cmpn is
port(
clk : in std_logic;
bn_to_pn : in nx_bits_at(1 to row_wght);
pn_to_bn : out nx_bits_at(1 to row_wght)
);
end component;
--
--
--
signal v2c : mxn_bits_at(1 to bn_num)(1 to col_wght);
signal c2v : mxn_bits_at(1 to pn_num)(1 to row_wght);
--
--
--
gen_pn : for i in (1 to pn_num) generate
ins_pn : pn_cmpn port map (
clk => clk,
bn_to_pn(1) => b2p (dim1_calc(i, 1)) (dim2_calc(i, 1)),
bn_to_pn(2) => b2p (dim1_calc(i, 2)) (dim2_calc(i, 2)),
.
.
.
bn_to_pn(32) => b2p (dim1_calc(i, 32)) (dim2_calc(i, 32)),
pn_to_bn => p2b (i)
);
end generate;
I know that using too many sequential statements together is not appropriate in general, and I'm avoiding them as much as possible, but in this case I assumed that this function won't synthesize into some real hardware, and synthesizer just calculates the output value and will put it in corresponding instantiations of that component. Am I right? or this way of coding leads to extra hardware compared to just 8000 instantiations.
PS1: Initially I used "0 to..." for defining ranges of the 2nd and 3rd dimension of my arrays, but because of confusion that were made in dimension calculation function based on for-generate statement parameter, I replaced them with "1 to...". Is that an OK! coding style or should I avoid it?
PS2: Is there a way that port mapping part in above code combines into something like this:
(I know this is strongly wrong, it's just a clarification of what I want)
gen_pn : for i in (1 to pn_num) generate
ins_pn : pn_cmpn port map (
clk => clk,
gen_bn_to_pn : for j in (1 to 32) generate
bn_to_pn(j) => b2p (dim1_calc(i, j)) (dim2_calc(i, j)),
end generate;
pn_to_bn => p2b (i)
);
end generate;
Let me give another example
Assume that I have a component instantiation like this:
ins_test : test_comp port map (
clk => clk,
test_port(1) => test_sig(2)
test_port(2) => test_sig(3)
test_port(3) => test_sig(4)
);
Is there a way that I can use for generate here? something like:
ins_test : test_comp port map (
clk => clk,
gen_pn : for i in (1 to 3) generate
test_port(i) => test_sig(i+1)
end generate;
);
PS3: Is it possible to call a function inside another function in VHDL?
Functions are usable this way. If you encounter problems, I am sure they will regard details in the design or design tools, rather than the basic approach.
One potential issue is that the function refers to some external "things" such as max_up, i, cnst_rom whose declarations are not part of the function nor parameters to it. This makes it an "impure function" which - because it refers to external state or even modifies it - has restrictions on calling it (because the external state may change, results may depend on order of evaluation etc).
If you can make it pure, do so. I have a feeling that max_up, cnst_rom are constants : if they aren't used elsewhere, declare them local to the function. And i probably ought to be a parameter.
If this is not possible, make the external declarations constants, and preferably wrap them and the function together in a package.
This will just generate the values you need in a small, comprehensible, maintainable form, and not an infinite volume of hardware. I have used a complex nest of functions performing floating point arithmetic then fiddly range reduction and integer rounding to initialise a lookup table, so fundamentally the approach does work.
Potential pitfall:
Some design tools have trouble with perfectly valid VHDL, if its use is slightly unorthodox. Synplicity cannot synthesise some forms of function (which DO generate hardware) though has no trouble with the equivalent procedure returning the result through an OUT parameter!. XST is considerably better.
XST parsing my lookup table init has an absurd slowdown, quadratic in the number of function calls. But only if you are using the old VHDL parser (the default for Spartan-3). Spartan-6 uses the new parser and works fine ( under a second instead of half an hour!) as do Modelsim and Isim. (haven't tried Synplicity on that project)
Some tools object to unorthodox things in port maps : you may get away with function calls there; or you may have to workaround tool bugs by initialising constants with the calls, and using those constants in the port maps.
And your supplementary questions:
PS1) The correct coding style for an array range is ... whatever makes your intent clear.
If you find yourself mentally offsetting by 1 and getting confused or even making errors, STOP! and improve the design.
Some good array indexing styles:
type colour is (red, green, blue);
subtype brightness is natural range 0 to 255;
hue : array (colour) of brightness;
gamma : array (brightness) of brightness;
-- here 0 is a legitimate value
channel : array (1 to 99) of frequency;
PS2) I think you're asking if you can nest generate statements. Yes.
Details may be awkward and difficult, but yes.
PS3) Yes of course! You can even declare functions local to others; eliminating the possibility they will be accidentally called somewhere they make no sense. They (impure functions) can access the execution scope of the outer function (or process), simplifying parameter lists.
Q1 - in this case I assumed that this function won't synthesize into some ...
It depends on which synthesizer you're using. See this relevant question and comments below.
Q2 - PS1: Initially I used "0 to..." for defining ranges of the ...
Surely it's OK. And please allow we to post a suggestion on coding style here. (from this book)
When defining the loop parameter specification, either use a type (or subtype) definition, or use predefined object attributes (e.g., PredefinedObject'range, PredefinedObject'length - 1 downto 0). Avoid using discrete range (e.g., 1 to 4).
This rule makes the code more reusable and flexible for maintenance.
Q3 - PS2: Is there a way that port mapping part in above code combines into ...
I think this is why you asked the 4th question. So refer to the next answer:).
Q4 - Is it possible to call a function inside another function in VHDL?
Though I can't find some official reference to this, the answer is yes.
PS: Coding rules are defined by the synthesizer tools. So the best way to find an answer is to try it yourself.