Replacing a character from mysql table to export to JSON file - mysql

I have a very long sized table (over 4M of records,i work with MySql) and it has a lot of records with this string: \\"
I'm trying to export this table to mongodb, but when I import the JSON file mongodb throws to me this error:
Failed: error processing document #18: invalid character 't' after object key:value pair
this is my query:
MySQL
SELECT json_object(
"id", id,
"execution_id", execution_id,
"type", type,
"info", info,
"position", position,
"created_at", json_object("$date", DATE_FORMAT(created_at,'%Y-%m-%dT%TZ')),
"updated_at", json_object("$date", DATE_FORMAT(updated_at,'%Y-%m-%dT%TZ'))
)as 'json'
FROM myTable
INTO OUTFILE 'myPath';
I know the problem is the string, my question is: how can I change this certain string to \"? Manually change it´s not an option, and my knowledge about query is limited. Please help. Thank you for reading me .
The column that has this character is "info", here is an example:
{
"id": 30,
"execution_id": 2,
"type": "PHASE",
"info": "{ \\r\\n \\"title\\": \\"Phase\\",
\\r\\n \\"order\\": \\"1\\",
\\r\\n \\"description\\": \\"Example Phase 1\\",
\\r\\n \\"step\\": \\"end\\",
\\r\\n \\"status\\": \\"True\\"\\r\\n}",
"position": 24,
"created_at": {"$date": "2018-01-11T15:01:46Z"},
"updated_at": {"$date": "2018-01-11T15:01:46Z"}
}

You should be able to do this using the MySQL REPLACE() function.
The backslash is a bit of a special case in the MySQL REPLACE() function, so you will need to use \\ to represent each literal \, thus to replace \\ with \ you need to run something like this:
REPLACE(info,'\\\\','\\')
Your full query would look something like this:
SELECT json_object(
"id", id,
"execution_id", execution_id,
"type", type,
"info", REPLACE(info,'\\\\','\\'),
"position", position,
"created_at", json_object("$date", DATE_FORMAT(created_at,'%Y-%m-%dT%TZ')),
"updated_at", json_object("$date", DATE_FORMAT(updated_at,'%Y-%m-%dT%TZ'))
)as 'json'
FROM myTable
INTO OUTFILE 'myPath';

Related

Analysing and formatting JSON using PostgreSQL

I have a table called api_details where i dump the below JSON value into the JSON column raw_data.
Now i need to make a report from this JSON string and the expected output is something like below,
action_name. sent_timestamp Sent. Delivered
campaign_2475 1600416865.928737 - 1601788183.440805. 7504. 7483
campaign_d_1084_SUN15_ex 1604220248.153903 - 1604222469.087918. 63095. 62961
Below is the sample JSON OUTPUT
{
"header": [
"#0 action_name",
"#1 sent_timestamp",
"#0 Sent",
"#1 Delivered"
],
"name": "campaign - lifetime",
"rows": [
[
"campaign_2475",
"1600416865.928737 - 1601788183.440805",
7504,
7483
],
[
"campaign_d_1084_SUN15_ex",
"1604220248.153903 - 1604222469.087918",
63095,
62961
],
[
"campaign_SUN15",
"1604222469.148829 - 1604411016.029794",
63303,
63211
]
],
"success": true
}
I tried like below, but is not getting the results.I can do it using python by lopping through all the elements in row list.
But is there an easy solution in PostgreSQL(version 11).
SELECT raw_data->'rows'->0
FROM api_details
You can use JSONB_ARRAY_ELEMENTS() function such as
SELECT (j.value)->>0 AS action_name,
(j.value)->>1 AS sent_timestamp,
(j.value)->>2 AS Sent,
(j.value)->>3 AS Delivered
FROM api_details
CROSS JOIN JSONB_ARRAY_ELEMENTS(raw_data->'rows') AS j
Demo
P.S. in this case the data type of raw_data is assumed to be JSONB, otherwise the argument within the function raw_data->'rows' should be replaced with raw_data::JSONB->'rows' in order to perform explicit type casting.

how to generate schema from a newline delimited JSON file in python

I want to generate schema from a newline delimited JSON file, having each row in the JSON file has variable-key/value pairs. File size can vary from 5 MB to 25 MB.
Sample Data:
{"col1":1,"col2":"c2","col3":100.75}
{"col1":2,"col3":200.50}
{"col1":3,"col3":300.15,"col4":"2020-09-08"}
Exptected Schema:
[
{"name": "col1", "type": "INTEGER"},
{"name": "col2", "type": "STRING"},
{"name": "col3", "type": "FLOAT"},
{"name": "col4", "type": "DATE"}
]
Notes:
There is no scope to use any tool, as files loaded into an inbound location dynamically. The code will use to trigger an event as-soon-as file arrives and perform schema comparison.
Your first problem is, that json does not have a date-type. So you will get str there.
What I would do, if I was you is this:
import json
# Wherever your input comes from
inp = """{"col1":1,"col2":"c2","col3":100.75}
{"col1":2,"col3":200.50}
{"col1":3,"col3":300.15,"col4":"2020-09-08"}"""
schema = {}
# Split it at newlines
for line in inp.split('\n'):
# each line contains a "dict"
tmp = json.loads(line)
for key in tmp:
# if we have not seen the key before, add it
if key not in schema:
schema[key] = type(tmp[key])
# otherwise check the type
else:
if schema[key] != type(tmp[key]):
raise Exception("Schema mismatch")
# format however you like
out = []
for item in schema:
out.append({"name": item, "type": schema[item].__name__})
print(json.dumps(out, indent=2))
I'm using python types for simplicity, but you can write your own function to get the type, e.g. if you want to check if a string is actually a date.

PostgreSQL jsonb string format

I'm using PostgreSQL jsonb and have the following in my database record:
{"tags": "[\"apple\",\" orange\",\" pineapple\",\" fruits\"]",
"filename": "testname.jpg", "title_en": "d1", "title_ja": "1",
"description_en": "d1", "description_ja": "1"}
and both SELECT statements below retrived no results:
SELECT "photo"."id", "photo"."datadoc", "photo"."created_timestamp","photo"."modified_timestamp"
FROM "photo"
WHERE datadoc #> '{"tags":> ["apple"]}';
SELECT "photo"."id", "photo"."datadoc", "photo"."created_timestamp", "photo"."modified_timestamp"
FROM "photo"
WHERE datadoc -> 'tags' ? 'apple';
I wonder it is because of the extra backslash added to the json array string, or the SELECT statement is incorrect.
I'm running "PostgreSQL 10.1, compiled by Visual C++ build 1800, 64-bit" on Windows 10.
PostgreSQL doc is here.
As far as any JSON parser is concerned, the value of your tags key is a string, not an array.
"tags": "[\"apple\",\" orange\",\" pineapple\",\" fruits\"]"
The string itself happens to be another JSON document, like the common case in XML where the contents of a string happen to be an XML or HTML document.
["apple"," orange"," pineapple"," fruits"]
What you need to do is extract that string, then parse it as a new JSON object, and then query that new object.
I can't test it right now, but I think that would look something like this:
(datadoc ->> 'tags') ::jsonb ? 'apple'
That is, "extract the tags value as text, cast that text value as jsonb, then query that new jsonb value.
Hey there i know this is very late answer, but here is the good approach, with data i have.
initital data in db:
"{\"data\":{\"title\":\"test\",\"message\":\"string\",\"image\":\"string\"},\"registration_ids\":[\"s
tring\"],\"isAllUsersNotification\":false}"
to convert it to json
select (notificationData #>> '{}')::jsonb from sent_notification
result:
{"data": {"image": "string", "title": "string", "message": "string"}, "registration_ids": ["string"], "isAllUsersNotification": false}
getting a data object from json
select (notificationData #>> '{}' )::jsonb -> 'data' from sent_notification;
result:
{"image": "string", "title": "string", "message": "string"}
getting a field from above result:
select (notificationData #>> '{}' )::jsonb -> 'data' ->>'title' from sent_notification;
result:
string
performing where operations,
Q: get records where title ='string'
ans:
select * from sent_notification where (notificationData #>> '{}' )::jsonb -> 'data' ->>'title' ='string'

Error DeserializeJSON() MySQL json_object

I am getting back a JSON string from a MySQL 5.7 query in ColdFusion 9.0.1. Here is my query:
SELECT (
SELECT GROUP_CONCAT(
JSON_OBJECT(
'nrtype', nrt.nrtype,
'number', nr.number
)
)
) AS nrJSON
FROM ...
The returned data looks like this:
{"nrtype": "Phone 1", "number": "12345678"},{"nrtype": "E-Mail 1", "number": "some#email.com"}
But as soon as I try to use DeserializeJSON() on it I am getting the following error:
JSON parsing failure at character 44:',' in {"nrtype": "Phone 1", "number": "12345678"},{"nrtype": "E-Mail 1", "number": "some#email.com"}
I am a little confused. What I want to get is a structure created by the DeserializeJSON() function.
What can I do?
That is not valid JSON as the parser is describing. If you wrap that JSON within square brackets '[' and ']' it would be valid (or at least parsable). They will make it an array of structures. Not sure how to make MySQL return the data within those brackets?
I guess you could add the brackets using ColdFusion but I would prefer to have the source do it correctly.
jsonhack = '[' & queryname.nrJSON & ']';
datarecord = DeserializeJSON(jsonhack);
writeDump(datarecord);
I created an example with your data that you can see here - trycf.com gist
From the comments
The solution indeed was [to add the following to the SQL statement]:
CONTACT('[',
GROUP_CONCAT(
JSON_OBJECT(...)
),
']')
If you have columns with some already containing JSON format String, try this : https://stackoverflow.com/a/45278722/2282880
Portion of code with JSON_MERGE() :
...
CONCAT(
'{"elements": [',
GROUP_CONCAT(
JSON_MERGE(
JSON_OBJECT(
'type', T2.`type`,
'data', T2.`data`
),
CONCAT('{"info": ', T2.`info`, '}')
)
),
']}'
) AS `elements`,
...

How to query by "full" JSON field?

I updated a few fields like this:
UPDATE designs
SET prices = '{ "at": 507, "ch": 751, "de": 447 }'
WHERE prices IS NULL;
Now I want to find all those rows:
SELECT * FROM designs
WHERE prices = '{ "at": 507, "ch": 751, "de": 447 }';
But I get this error:
ERROR: operator does not exist: json = unknown
Variations like WHERE prices LIKE '%"at": 507, "ch": 751, "de": 447%' doesn't work neither.
The field prices is from type json and used PG version is 9.3
There is jsonb in Postgres 9.4, which has an equality operator. This data type effectively ignores insignificant white space (and some other insignificant details), so your query would work as is:
SELECT *
FROM designs
WHERE prices = '{ "at": 507, "ch": 751, "de": 447 }';
The same is not possible with json that preserves insignificant white space, so "equality" between two json values is hard to establish. You could compare text representations, but that's not reliable:
How to query a json column for empty objects?
Using pg 9.4 once more, you could also make this work with a json column, by casting the value to jsonb on the fly:
SELECT *
FROM designs
WHERE prices::jsonb = '{ "at": 507, "ch": 751, "de": 447 }'::jsonb;
Unfortunately the operator = is not defined for JSON fields. If you really want to do this, your only option is to cast as TEXT, but I'm sure you understand the potential problems with that approach, e.g.,
SELECT * FROM designs WHERE prices::TEXT = '{ "x": 3 }';
However, it just occurred to me that a safe approach to that would be:
SELECT * FROM designs WHERE prices::TEXT = '{ "x": 3 }'::JSON::TEXT;
Nope, this doesn't work. Apparently, the JSON data type preserves the whitespace of the original JSON, so if the whitespace in the two strings is different, it won't work. (I regard this as a bug, but others might disagree.)
My answer is correct for 9.3, which the questioner is using, but if you are using 9.4+, Erwin Brandstetter's answer is the better choice.