Inserting valid json with copy into postgres table - json

Valid JSON can naturally have the backslash character: \. When you insert data in a SQL statement like so:
sidharth=# create temp table foo(data json);
CREATE TABLE
sidharth=# insert into foo values( '{"foo":"bar", "bam": "{\"mary\": \"had a lamb\"}" }');
INSERT 0 1
sidharth=# select * from foo;
data
\-----------------------------------------------------
{"foo":"bar", "bam": "{\"mary\": \"had a lamb\"}" }
(1 row)
Things work fine.
But if I copy the JSON to a file and run the copy command I get:
sidharth=# \copy foo from './tests/foo' (format text);
ERROR: invalid input syntax for type json
DETAIL: Token "mary" is invalid.
CONTEXT: JSON data, line 1: {"foo":"bar", "bam": "{"mary...
COPY foo, line 1, column data: "{"foo":"bar", "bam": "{"mary": "had a lamb"}" }"
Seems like postgres is not processing the backslashes. I think because of http://www.postgresql.org/docs/8.3/interactive/sql-syntax-lexical.html and
it I am forced to use double backslash. And that works, i.e. when the file contents are:
{"foo":"bar", "bam": "{\\"mary\\": \\"had a lamb\\"}" }
The copy command works. But is it correct to expect special treatment for json data types
because afterall above is not a valid json.

http://adpgtech.blogspot.ru/2014/09/importing-json-data.html
copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';

PostgreSQL's default bulk load format, text, is a tab separated markup. It requires backslashes to be escaped because they have special meaning for (e.g.) the \N null placeholder.
Observe what PostgreSQL generates:
regress=> COPY foo TO stdout;
{"foo":"bar", "bam": "{\\"mary\\": \\"had a lamb\\"}" }
This isn't a special case for json at all, it's true of any string. Consider, for example, that a string - including json - might contain embedded tabs. Those must be escaped to prevent them from being seen as another field.
You'll need to generate your input data properly escaped. Rather than trying to use the PostgreSQL specific text format, it'll generally be easier to use format csv and use a tool that writes correct CSV, with the escaping done for you on writing.

Related

Unable to load csv file into Snowflake

Iam getting the below error when I try to load CSV From my system to Snowflake table:
Unable to copy files into table.
Numeric value '"4' is not recognized File '#EMPP/ui1591621834308/snow.csv', line 2, character 25 Row 1, column "EMPP"["SALARY":5] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
You appear to be loading your CSV with the file format option of FIELD_OPTIONALLY_ENCLOSED_BY='"' specified.
This option will allow reading any fields properly quoted with the " character, and even support such fields carrying the delimiter character as well as the " character if properly escaped. Some examples that could be considered valid:
CSV FORM | ACTUAL DATA
------------------------
abc | abc
"abc" | abc
"a,bc" | a,bc
"a,""bc""" | a,"bc"
In particular, notice that the final example follows the specified rule:
When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:
A ""B"" C
If your CSV file carries quote marks within the data but is not necessarily quoting the fields (and delimiters and newlines do not appear within data fields), you can remove the FIELD_OPTIONALLY_ENCLOSED_BY option from your file format definition and just read the file at the delimited (,) fields.
If your CSV does use quoting, ensure that whatever is producing the CSV files is using a valid CSV format writer and not simple string munging, and recreate it with the quotes properly escaped. If the above data example is to be considered valid in quoted form, it must instead appear within the file as "4" or 4.
The error message is saying that you have a value in your file that contains a "4 which is being added into a table that has a number field for that value. Since that isn't a number, it fails. This appears to be happening in your very first row of your file, so you could open it up and take a look at the value. If its just one record, you can add the ON_ERROR = 'CONTINUE' to your command, so that it skips it and moves on.

Importing csv with json value with psql COPY (problem with escaping)

I am trying to import csv file to table in postgres using COPY command. I have problem that one column is of json data type. I tried to escape json data in csv using dollars ($$...$$) docu_4.1.2.2.
This is first line of csv:
3f382d8c-bd27-4092-bd9c-8b50e24df7ec;370038757|PRIMARY_RESIDENTIAL;$${"CustomerData": "{}", "PersonModule": "{}"}$$
This is command used for import:
psql -c "COPY table(id, name, details) FROM '/path/table.csv' DELIMITER ';' ENCODING 'UTF-8' CSV;"
This is error I get:
ERROR: invalid input syntax for type json
DETAIL: Token "$" is invalid.
CONTEXT: JSON data, line 1: $...
COPY table, line 1, column details: "$${CustomerData: {}, PersonModule: {}}$$"
How should I escape/import json value using COPY? Should I give up and use something like pg_loader instead? Thank you
In case of failing with importing the JSON data please give a try to the following setup - this worked for me even for quite complicated data:
COPY "your_schema_name.yor_table_name" (your, column_names, here)
FROM STDIN
WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.

Using \COPY to load CSV with JSON fields into Postgres

I'm attempting to load TSV data from a file into a Postgres table using the \COPY command.
Here's an example data row:
2017-11-22 23:00:00 "{\"id\":123,\"class\":101,\"level\":3}"
Here's the psql command I'm using:
\COPY bogus.test_table (timestamp, sample_json) FROM '/local/file.txt' DELIMITER E'\t'
Here's the error I'm receiving:
ERROR: invalid input syntax for type json
DETAIL: Token "sample_json" is invalid.
CONTEXT: JSON data, line 1: "{"sample_json...
COPY test_table, line 1, column sample_json: ""{\"id\":123,\"class\":101,\"level\":3}""
I verified the JSON is in the correct JSON format and read a couple similar questions, but I'm still not sure what's going on here. An explanation would be awesome
To load your data file as it is:
\COPY bogus.test_table (timestamp, sample_json) FROM '/local/file.txt' CSV DELIMITER E'\t' QUOTE '"' ESCAPE '\'
Your json is quoted. It shouldn't have surrounding " characters, and the " characters surrounding the field names shouldn't be escaped.
It should look like this:
2017-11-22 23:00:00 {"id":123,"class":101,"level":3}
The answer of Aeblisto almost did the trick for my crazy JSON fields, but needed to modify an only small bit - THE QUOTE with backslash - here it is in full form:
COPY "your_schema_name.yor_table_name" (your, column_names, here)
FROM STDIN
WITH CSV DELIMITER E'\t' QUOTE E'\b' ESCAPE '\';
--here rows data
\.

Insert JSON into PostgreSQL that contains quotation marks

I'm trying to import a JSON file into a table. I'm using the solution mentioned here: https://stackoverflow.com/a/33130304/1663462:
create temporary table temp_json (values text) on commit drop;
copy temp_json from 'data.json';
select
values->>'annotations' as annotationstext
from (
select json_array_elements(replace(values,'\','\\')::json) as values
from temp_json
) a;
Json file content is:
{"annotations": "<?xml version=\"1.0\"?>"}
I have verified that this is a valid JSON file.
The json file contains a \" which I presume is responsible for the following error:
CREATE TABLE
COPY 1
psql:insertJson2.sql:13: ERROR: invalid input syntax for type json
DETAIL: Expected "," or "}", but found "1.0".
CONTEXT: JSON data, line 1: {"annotations": "<?xml version="1.0...
Are there any additional characters that need to be escaped?
Because copy command processes escape ('\') characters for text format without any options there are two ways to import such data.
1) Process file using external utility via copy ... from program, for example using sed:
copy temp_json from program 'sed -e ''s/\\/\\\\/g'' data.json';
It will replace all backslashes to doubled backslashes, which will be converted back to single ones by copy.
2) Use csv import:
copy temp_json from 'data.json' with (format csv, quote '|', delimiter E'\t');
Here you should to set quote and delimiter characters such that it does not occur anywhere in your file.
And after that just use direct conversion:
select values::json->>'annotations' as annotationstext from temp_json;

Loading json data from a file into Postgres

I need to load data from multiple JSON files each having multiple records within them to a Postgres table. I am using the following code but it does not work (am using pgAdmin III on windows)
COPY tbl_staging_eventlog1 ("EId", "Category", "Mac", "Path", "ID")
from 'C:\\SAMPLE.JSON'
delimiter ','
;
Content of SAMPLE.JSON file is like this (giving two records out of many such):
[{"EId":"104111","Category":"(0)","Mac":"ABV","Path":"C:\\Program Files (x86)\\Google","ID":"System.Byte[]"},{"EId":"104110","Category":"(0)","Mac":"BVC","Path":"C:\\Program Files (x86)\\Google","ID":"System.Byte[]"}]
Try this:
BEGIN;
-- let's create a temp table to bulk data into
create temporary table temp_json (values text) on commit drop;
copy temp_json from 'C:\SAMPLE.JSON';
-- uncomment the line above to insert records into your table
-- insert into tbl_staging_eventlog1 ("EId", "Category", "Mac", "Path", "ID")
select values->>'EId' as EId,
values->>'Category' as Category,
values->>'Mac' as Mac,
values->>'Path' as Path,
values->>'ID' as ID
from (
select json_array_elements(replace(values,'\','\\')::json) as values
from temp_json
) a;
COMMIT;
As mentioned in Andrew Dunstan's PostgreSQL and Technical blog
In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.
so we have to turn around to the CSV format mode.
copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';
In the official document sql-copy, some Parameters list here:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
[ WHERE condition ]
where option can be one of:
FORMAT format_name
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
FORMAT
Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
DELIMITER
Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
HEADER
Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.
You can use spyql.
Running the following command would generate INSERT statements that you can pipe into psql:
$ jq -c .[] *.json | spyql -Otable=tbl_staging_eventlog1 "SELECT json->EId, json->Category, json->Mac, json->Path, json->ID FROM json TO sql"
INSERT INTO "tbl_staging_eventlog1"("EId","Category","Mac","Path","ID") VALUES ('104111','(0)','ABV','C:\Program Files (x86)\Google','System.Byte[]'),('104110','(0)','BVC','C:\Program Files (x86)\Google','System.Byte[]');
jq is used to transform the json arrays from all json files in the current directory into json lines (1 json object per line) and then spyql takes care of converting json lines into INSERT statements.
To import the data into PostgreSQL:
$ jq -c .[] *.json | spyql -Otable=tbl_staging_eventlog1 "SELECT json->EId, json->Category, json->Mac, json->Path, json->ID FROM json TO sql" | psql -U your_user_name -h your_host your_database
Disclaimer: I am the author of spyql.