ERROR: invalid input syntax for type json DETAIL: Token "'" is invalid. while importing csv in pgadmin - json

I have made a new table with three columns
customer_id,media_urls,survey_taste
in a db in pgadmin with attributes as
int,text[],jsonb
respectively.
I have a csv that I was trying to import into this table using pgadmin and
the contents of that file are like this
1,"{'http://example.com','http://example.com'}","{'taste':[1,2,3,4]}"
but I am getting this error
ERROR: invalid input syntax for type json
DETAIL: Token "'" is invalid.
CONTEXT: JSON data, line 1: '...
COPY survey_taste, line 2, column survey_taste: "{'taste': [-0.19101654669350904, 0.08575981750112513, 0.07133783942655376, -0.10579014363010293, 0.0..." ```

To address your comments in reverse order. To have this entered in one field you would need to have it as:
'[{"http":"abc","http":"abc"},{"taste":[1,2,3,4]}]'
Per:
select '[{"http":"abc","http":"abc"},{"taste":[1,2,3,4]}]'::json;
json
---------------------------------------------------
[{"http":"abc","http":"abc"},{"taste":[1,2,3,4]}]
As to the quoting issue:
When you pass a dict to csv you will get:
d = {"taste":[1,2,3,4]}
print(d)
{'taste': [1, 2, 3, 4]
What you need is:
import json
json.dumps(d)
'{"test": [1, 2, 3, 4]}'
Using json.dumps will turn the dict into a proper JSON string representation.
Putting it all together:
# Create list of dicts
l = [{'http': 'abc', 'http': 'abc'}, {'taste': [1,2,3,4]}]
# Create JSON string representattion
json.dumps(l)
'[{"http": "abc"}, {"taste": [1, 2, 3, 4]}]'

Related

Parsing a Pandas column in JSON format

I am parsing a Pandas column of type string that is in JSON format such as the following
kafka_data["MESSAGE_DATA__C"].iloc[0]
Out[20]: '{"userId":"af33f42e","trackingCategory":"ACTION","trackedItem":{"id":"PERSONAL_IDENTIFICATION_STARTED","category":"PERSONAL_IDENTIFICATION","title":"Personal Identification Started"}}'
When I parse a single row everything works
json.loads(kafka_data["MESSAGE_DATA__C"].iloc[0])
Out[25]:
{'userId': 'af33f42e',
'trackingCategory': 'ACTION',
'trackedItem': {'id': 'PERSONAL_IDENTIFICATION_STARTED',
'category': 'PERSONAL_IDENTIFICATION',
'title': 'Personal Identification Started'}}
But when I try to parse altogether the column, an error prompts.
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 212 (char 211)
Am I missing anything?
I need to read this column into a new dataframe.
When applying a function to entire column, use axis=1 parameter.
Please try:
kafka_data[["MESSAGE_DATA__C"]].apply(lambda row: json.loads(str(row["MESSAGE_DATA__C"])), axis=1)

Cannot read geojson string with single quotes into Postgres table

I'm trying to import a large geojson file into a Postgres table. In order to do so, I first convert the json into csv with python:
import pandas as pd
df = pd.read_json('myjson.txt')
df.to_csv('myjson.csv',sep='\t')
The resulting csv looks like:
name type features
0 geobase FeatureCollection {'type': 'Feature', 'geometry': {'type': 'LineString', 'coordinates': [[-73.7408048408216, 45.5189595588608], [-73.7408749973688, 45.5189893490944], [-73.7409267622838, 45.5190212771795], [-73.7418867072278, 45.519640108602204], [-73.7419636417947, 45.5196917400376]]}, 'properties': {'ID_TRC': 1010001, 'DEB_GCH': 12320, 'FIN_GCH': 12340}}
The first three lines in json file were:
{"name":"geobase","type":"FeatureCollection"
,"features":[
{"type":"Feature","geometry":{"type":"LineString","coordinates":[[-73.7408048408216,45.5189595588608],[-73.7408749973688,45.5189893490944],[-73.7409267622838,45.5190212771795],[-73.7418867072278,45.5196401086022],[-73.7419636417947,45.5196917400376]]},"properties":{"ID_TRC":1010001,"DEB_GCH":12320,"FIN_GCH":12340}}
Following that, the copy command into my postgres table is:
psql -h (host) -U (user) -d (database) -c "\COPY geometries.geobase_tmp(id,name,type,features) FROM '.../myjson.csv' with (format csv,header true, delimiter E'\t');"
results in my table filled with name,type and features. First feature (a text field) is for example the following string:
{'type': 'Feature', 'geometry': {'type': 'LineString', 'coordinates': [[-73.7408048408216, 45.5189595588608], [-73.7408749973688, 45.5189893490944], [-73.7409267622838, 45.5190212771795], [-73.7418867072278, 45.519640108602204], [-73.7419636417947, 45.5196917400376]]}, 'properties': {'ID_TRC': 1010001, 'DEB_GCH': 12320, 'FIN_GCH': 12340}}
In Postgres, when I try to read from this tmp table into another one:
SELECT features::json AS fc FROM geometries.geobase_tmp
I get the error :
SQL Error [22P02]: ERROR: invalid input syntax for type json
Detail : Token "'" is invalid.
Where : JSON data, line 1: {'...
It's like if Postgres expects double quotes and not single quotes to parse the json text. What can I do to avoid this problem?
EDIT: I followed the procedure described here (datatofish.com/json-string-to-csv-python) to convert json to csv. The source (the json txt file) is a valid json and contains only double quotes. After conversion, it's not a valid json anymore (it contains single quotes instead of double quotes). Is there a way to output a csv while keeping the double quotes?
I figured it out:
Json to csv:
import pandas as pd
import json
import csv
df = pd.read_json('myjson.txt')
df['geom'] = df['features'].apply(lambda x:json.dumps(x['geometry']))
df['properties'] = df['features'].apply(lambda x:json.dumps(x['properties']))
df[['geom','properties']].to_csv('myjson.csv',sep='\t',quoting=csv.QUOTE_ALL)
Now CSV file looks like:
"" "geom" "properties"
"0" "{""type"": ""LineString"", ""coordinates"": [[-73.7408048408216, 45.5189595588608], [-73.7408749973688, 45.5189893490944], [-73.7409267622838, 45.5190212771795], [-73.7418867072278, 45.519640108602204], [-73.7419636417947, 45.5196917400376]]}" "{""ID_TRC"": 1010001, ""DEB_GCH"": 12320, ""FIN_GCH"": 12340}"
...
Postgres tmp table created with:
CREATE TABLE geometries.geobase_tmp (
id int,
geom TEXT,
properties TEXT
)
Copy CSV content into tmp table:
psql -h (host) -U (user) -d (database) -c "\COPY geometries.geobase_tmp(id,geom,properties) FROM 'myjson.csv' with (format csv,header true, delimiter E'\t');"
Creation of final postgres table which contains geometry and properties (each property in its own field):
drop table if exists geometries.troncons;
SELECT
row_number() OVER () AS gid,
ST_GeomFromGeoJSON(geom) as geom,
properties::json->'ID_TRC' AS ID_TRC,
properties::json->'DEB_GCH' AS DEB_GCH,
properties::json->'FIN_GCH' AS FIN_GCH
INTO TABLE geometries.troncons
FROM geometries.geobase_tmp

python error on string format with "\n" exec(compile(contents+"\n", file, 'exec'), glob, loc)

i try to construct JSON with string that contains "\n" in it like this :
ver_str= 'Package ID: version_1234\nBuild\nnumber: 154\nBuilt\n'
proj_ver_str = 'Version_123'
comb = '{"r_content": {0}, "s_version": {1}}'.format(ver_str,proj_ver_str)
json_content = json.loads()
d =json.dumps(json_content )
getting this error:
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Dev/python/new_tester/simple_main.py", line 18, in <module>
comb = '{"r_content": {0}, "s_version": {1}}'.format(ver_str,proj_ver_str)
KeyError: '"r_content"'
The error arises not because of newlines in your values, but because of { and } characters in your format string other than the placeholders {0} and {1}. If you want to have an actual { or a } character in your string, double them.
Try replacing the line
comb = '{"r_content": {0}, "s_version": {1}}'.format(ver_str,proj_ver_str)
with
comb = '{{"r_content": {0}, "s_version": {1}}}'.format(ver_str,proj_ver_str)
However, this will give you a different error on the next line, loads() missing 1 required positional argument: 's'. This is because you presumably forgot to pass comb to json.loads().
Replacing json.loads() with json.loads(comb) gives you another error: json.decoder.JSONDecodeError: Expecting value: line 1 column 15 (char 14). This tells you that you've given json.loads malformed JSON to parse. If you print out the value of comb, you see the following:
{"r_content": Package ID: version_1234
Build
number: 154
Built
, "s_version": Version_123}
This isn't valid JSON, because the string values aren't surrounded by quotes. So a JSON parsing error is to be expected.
At this point, let's take a look at what your code is doing and what you seem to want it to do. It seems you want to construct a JSON string from your data, but your code puts together a JSON string from your data, parses it to a dict and then formats it back as a JSON string.
If you want to create a JSON string from your data, it's far simpler to create a dict with your values and use json.dumps on that:
d = json.dumps({"r_content": ver_str, "s_version": proj_ver_str})

parse json data to pandas column with missing quotes

I have a pandas dataframe "df". the dataframe has a field in it "json_field" that contains json data. I think it's actually yaml format. I'm trying to parse the json values in to their own columns. I have example data below. when I run the code below I'm getting an error when it hits the 'name' field. the code and the error are below. it seems like maybe it's having trouble because the value associated with the name field isn't quoted, or has spaces. does anyone see what the issue might be or suggest how to fix?
example:
print(df['json_field'][0])
output:
- start:"test"
name: Price Unit
field: priceunit
type: number
operator: between
value:
- '40'
- '60'
code:
import yaml
pd.io.json.json_normalize(yaml.load(df['json_field'][0]), 'name','field','type','operator','value').head()
error:
An error was encountered:
mapping values are not allowed here
in "<unicode string>", line 3, column 7:
name: Price Unit
^
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 94, in safe_load
return load(stream, SafeLoader)
File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 72, in load
return loader.get_single_data()
File "/usr/local/lib64/python3.6/site-packages/yaml/constructor.py", line 35, in get_single_data
node = self.get_single_node()
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 82, in compose_node
node = self.compose_sequence_node(anchor)
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 110, in compose_sequence_node
while not self.check_event(SequenceEndEvent):
File "/usr/local/lib64/python3.6/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/local/lib64/python3.6/site-packages/yaml/parser.py", line 382, in parse_block_sequence_entry
if self.check_token(BlockEntryToken):
File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens
return self.fetch_value()
File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 580, in fetch_value
self.get_mark())
yaml.scanner.ScannerError: mapping values are not allowed here
in "<unicode string>", line 3, column 7:
name: Price Unit
desired output:
name field type operator value
Price Unit priceunit number between - '40' - '60'
Update:
I tried the suggestion below
import yaml
df['json_field']=df['json_field'].str.replace('"test"', "")
pd.io.json.json_normalize(yaml.safe_load(df['json_field'][0].lstrip()), 'name','field','type','operator','value').head()
and got the output below:
operator0 typefield
0 P priceunit
1 r priceunit
2 i priceunit
3 c priceunit
4 e priceunit
Seems to me that it's having an issue because your yaml is improperly formatted. That "test" should not be there at all if you want a dictionary named "start" with 5 mappings inside of it.
import yaml
a = """
- start:
name: Price Unit
field: priceunit
type: number
operator: between
value:
- '40'
- '60'
""".lstrip()
# Single dictionary named start, with 5 entries
yaml.safe_load(a)[0]
{'start':
{'name': 'Price Unit',
'field': 'priceunit',
'type': 'number',
'operator': 'between',
'value': ['40', '60']}
}
To do this with your data, try:
data = yaml.load(df['json_field'][0].replace('"test"', ""))
df = pd.json_normalize(yaml.safe_load(a)[0]["start"])
print(df)
name field type operator value
0 Price Unit priceunit number between [40, 60]
It seems that your real problem in one line earlier.
Note that your YAML fragment starts with start:"test".
One of key principles of YAML is that a key is separated
from its value by a colon and a space, whereas your
sample does not contain this (mandatory) space.
Maybe you should manually edit your input, i.e. add this
missing space.
Other solution is to write a "specialized pre-parser",
which adds such missing spaces and its result is fed into
the YAML parser. This can be an option if the number of such
cases is big.

Saving json file by dumping dictionary in a for loop, leading to malformed json

So I have the following dictionaries that I get by parsing a text file
keys = ["scientific name", "common names", "colors]
values = ["somename1", ["name11", "name12"], ["color11", "color12"]]
keys = ["scientific name", "common names", "colors]
values = ["somename2", ["name21", "name22"], ["color21", "color22"]]
and so on. I am dumping the key value pairs using a dictionary to a json file using a for loop where I go through each key value pair one by one
for loop starts
d = dict(zip(keys, values))
with open("file.json", 'a') as j:
json.dump(d, j)
If I open the saved json file I see the contents as
{"scientific name": "somename1", "common names": ["name11", "name12"], "colors": ["color11", "color12"]}{"scientific name": "somename2", "common names": ["name21", "name22"], "colors": ["color21", "color22"]}
Is this the right way to do it?
The purpose is to query the common name or colors for a given scientific name. So then I do
with open("file.json", "r") as j:
data = json.load(j)
I get the error, json.decoder.JSONDecodeError: Extra data:
I think this is because I am not dumping the dictionaries in json in the for loop correctly. I have to insert some square brackets programatically. Just doing json.dump(d, j) won't suffice.
JSON may only have one root element. This root element can be [], {} or most other datatypes.
In your file, however, you get multiple root elements:
{...}{...}
This isn't valid JSON, and the error Extra data refers to the second {}, where valid JSON would end instead.
You can write multiple dicts to a JSON string, but you need to wrap them in an array:
[{...},{...}]
But now off to how I would fix your code. First, I rewrote what you posted, because your code was rather pseudo-code and didn't run directly.
import json
inputs = [(["scientific name", "common names", "colors"],
["somename1", ["name11", "name12"], ["color11", "color12"]]),
(["scientific name", "common names", "colors"],
["somename2", ["name21", "name22"], ["color21", "color22"]])]
for keys, values in inputs:
d = dict(zip(keys, values))
with open("file.json", 'a') as j:
json.dump(d, j)
with open("file.json", 'r') as j:
print(json.load(j))
As you correctly realized, this code failes with
json.decoder.JSONDecodeError: Extra data: line 1 column 105 (char 104)
The way I would write it, is:
import json
inputs = [(["scientific name", "common names", "colors"],
["somename1", ["name11", "name12"], ["color11", "color12"]]),
(["scientific name", "common names", "colors"],
["somename2", ["name21", "name22"], ["color21", "color22"]])]
jsonData = list()
for keys, values in inputs:
d = dict(zip(keys, values))
jsonData.append(d)
with open("file.json", 'w') as j:
json.dump(jsonData, j)
with open("file.json", 'r') as j:
print(json.load(j))
Also, for python's json library, it is important that you write the entire json file in one go, meaning with 'w' instead of 'a'.