I'm attempting to load TSV data from a file into a Postgres table using the \COPY command.
Here's an example data row:
2017-11-22 23:00:00 "{\"id\":123,\"class\":101,\"level\":3}"
Here's the psql command I'm using:
\COPY bogus.test_table (timestamp, sample_json) FROM '/local/file.txt' DELIMITER E'\t'
Here's the error I'm receiving:
ERROR: invalid input syntax for type json
DETAIL: Token "sample_json" is invalid.
CONTEXT: JSON data, line 1: "{"sample_json...
COPY test_table, line 1, column sample_json: ""{\"id\":123,\"class\":101,\"level\":3}""
I verified the JSON is in the correct JSON format and read a couple similar questions, but I'm still not sure what's going on here. An explanation would be awesome
To load your data file as it is:
\COPY bogus.test_table (timestamp, sample_json) FROM '/local/file.txt' CSV DELIMITER E'\t' QUOTE '"' ESCAPE '\'
Your json is quoted. It shouldn't have surrounding " characters, and the " characters surrounding the field names shouldn't be escaped.
It should look like this:
2017-11-22 23:00:00 {"id":123,"class":101,"level":3}
The answer of Aeblisto almost did the trick for my crazy JSON fields, but needed to modify an only small bit - THE QUOTE with backslash - here it is in full form:
COPY "your_schema_name.yor_table_name" (your, column_names, here)
FROM STDIN
WITH CSV DELIMITER E'\t' QUOTE E'\b' ESCAPE '\';
--here rows data
\.
Related
The data that I need to insert into a Postgres JSON column from a text file is as before:
"{\"server\":\"[localhost:9001]\",\"event\":\"STARTED\",\"success\":true}"
Inserting directly will result in the following error:
ERROR: invalid input syntax for type json
DETAIL: Token "\" is invalid.
How can I insert this data without doing text pre-processing i.e. replacing \ escape character?
You're using the wrong quotes.
'{"server":"[localhost:9001]","event":"STARTED","success":true}'
I am trying to import csv file to table in postgres using COPY command. I have problem that one column is of json data type. I tried to escape json data in csv using dollars ($$...$$) docu_4.1.2.2.
This is first line of csv:
3f382d8c-bd27-4092-bd9c-8b50e24df7ec;370038757|PRIMARY_RESIDENTIAL;$${"CustomerData": "{}", "PersonModule": "{}"}$$
This is command used for import:
psql -c "COPY table(id, name, details) FROM '/path/table.csv' DELIMITER ';' ENCODING 'UTF-8' CSV;"
This is error I get:
ERROR: invalid input syntax for type json
DETAIL: Token "$" is invalid.
CONTEXT: JSON data, line 1: $...
COPY table, line 1, column details: "$${CustomerData: {}, PersonModule: {}}$$"
How should I escape/import json value using COPY? Should I give up and use something like pg_loader instead? Thank you
In case of failing with importing the JSON data please give a try to the following setup - this worked for me even for quite complicated data:
COPY "your_schema_name.yor_table_name" (your, column_names, here)
FROM STDIN
WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.
I've been trying to load a csv file with the following row in it:
91451960_NE,-1,171717198,50075943,"MARTIN LUTHER KING, JR WAY",1,NE
Note the comma in the name. I've tried all permutations of REMOVEQUOTES, DELIMITER ',', etc... and none of them work.
I have other rows with quotes in the middle of the name, so the ESCAPE option has to be there as well.
According to other posts,
DELIMITER ',' ESCAPE REMOVEQUOTES IGNOREHEADER 1;
should work but does not. Redshift gives a "Delimiter not found" error.
Is the ESCAPE causing issues and do I have to escape the comma?
I have tried loading your data using CSV as data format parameter and this worked for me. Please keep in mind that CSV cannot be used with FIXEDWIDTH, REMOVEQUOTES, or ESCAPE.
create TEMP table awscptest (a varchar(40),b int,c bigint,d bigint,e varchar(40),f int,g varchar(10));
copy awscptest from 's3://sds-dev-db-replica/test.txt'
iam_role 'arn:aws:iam::<accounID>:<IAM_role>'
delimiter as ',' EMPTYASNULL CSV NULL AS '\0';
References: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html
http://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-run-copy.html
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#load-from-csv
This is a commonly recurring question. If you are actually using the CSV format for you files (not just some ad hoc text file that uses commas) then you need to enclose the field in double quotes. If you have commas and quotes then you need to enclose the field in double quotes and escape the double quotes in the field data.
There is a definition for the CSV files format - rfc 4180. All text characters can be represented correctly in CSV if you follow the format.
https://www.ietf.org/rfc/rfc4180.txt
Use the CSV option to the Redshift COPY command, not just TEXT with a Delimiter of ','. Redshift will also follow the official file format if you tell it that the files is CSV
In this case, you have comma (,) in name field. Clean the data by removing that comma before loading to redshift.
df = (df.withColumn('name', F.regexp_replace(F.col('name'), ',', ' ')))
Store the new dataframe in s3 and then use the below copy command to load to redshift
COPY 'table_name'
FROM 's3 path'
IAM_ROLE 'iam role'
DELIMITER ','
ESCAPE
IGNOREHEADER 1
MAXERROR AS 5
COMPUPDATE FALSE
ACCEPTINVCHARS
ACCEPTANYDATE
FILLRECORD
EMPTYASNULL
BLANKSASNULL
NULL AS 'null';
END;
I'm trying to import a JSON file into a table. I'm using the solution mentioned here: https://stackoverflow.com/a/33130304/1663462:
create temporary table temp_json (values text) on commit drop;
copy temp_json from 'data.json';
select
values->>'annotations' as annotationstext
from (
select json_array_elements(replace(values,'\','\\')::json) as values
from temp_json
) a;
Json file content is:
{"annotations": "<?xml version=\"1.0\"?>"}
I have verified that this is a valid JSON file.
The json file contains a \" which I presume is responsible for the following error:
CREATE TABLE
COPY 1
psql:insertJson2.sql:13: ERROR: invalid input syntax for type json
DETAIL: Expected "," or "}", but found "1.0".
CONTEXT: JSON data, line 1: {"annotations": "<?xml version="1.0...
Are there any additional characters that need to be escaped?
Because copy command processes escape ('\') characters for text format without any options there are two ways to import such data.
1) Process file using external utility via copy ... from program, for example using sed:
copy temp_json from program 'sed -e ''s/\\/\\\\/g'' data.json';
It will replace all backslashes to doubled backslashes, which will be converted back to single ones by copy.
2) Use csv import:
copy temp_json from 'data.json' with (format csv, quote '|', delimiter E'\t');
Here you should to set quote and delimiter characters such that it does not occur anywhere in your file.
And after that just use direct conversion:
select values::json->>'annotations' as annotationstext from temp_json;
Valid JSON can naturally have the backslash character: \. When you insert data in a SQL statement like so:
sidharth=# create temp table foo(data json);
CREATE TABLE
sidharth=# insert into foo values( '{"foo":"bar", "bam": "{\"mary\": \"had a lamb\"}" }');
INSERT 0 1
sidharth=# select * from foo;
data
\-----------------------------------------------------
{"foo":"bar", "bam": "{\"mary\": \"had a lamb\"}" }
(1 row)
Things work fine.
But if I copy the JSON to a file and run the copy command I get:
sidharth=# \copy foo from './tests/foo' (format text);
ERROR: invalid input syntax for type json
DETAIL: Token "mary" is invalid.
CONTEXT: JSON data, line 1: {"foo":"bar", "bam": "{"mary...
COPY foo, line 1, column data: "{"foo":"bar", "bam": "{"mary": "had a lamb"}" }"
Seems like postgres is not processing the backslashes. I think because of http://www.postgresql.org/docs/8.3/interactive/sql-syntax-lexical.html and
it I am forced to use double backslash. And that works, i.e. when the file contents are:
{"foo":"bar", "bam": "{\\"mary\\": \\"had a lamb\\"}" }
The copy command works. But is it correct to expect special treatment for json data types
because afterall above is not a valid json.
http://adpgtech.blogspot.ru/2014/09/importing-json-data.html
copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';
PostgreSQL's default bulk load format, text, is a tab separated markup. It requires backslashes to be escaped because they have special meaning for (e.g.) the \N null placeholder.
Observe what PostgreSQL generates:
regress=> COPY foo TO stdout;
{"foo":"bar", "bam": "{\\"mary\\": \\"had a lamb\\"}" }
This isn't a special case for json at all, it's true of any string. Consider, for example, that a string - including json - might contain embedded tabs. Those must be escaped to prevent them from being seen as another field.
You'll need to generate your input data properly escaped. Rather than trying to use the PostgreSQL specific text format, it'll generally be easier to use format csv and use a tool that writes correct CSV, with the escaping done for you on writing.