Missing quotes in Json file - json

I have a very large Json file. It contains 27000 records.
a record looks like this:
{
adlibJSON: {
recordList: {
record: [
{
#attributes: {
priref: "4372",
created: "2011-12-09T23:09:57",
modification: "2012-08-11T17:07:51",
selected: "False"
},
acquisition.date: [
"1954"
],
acquisition.method: [
"bruikleen"
],
association.person: [
"Backer, Bregitta"
],
association.subject: [
"heraldiek"
],
collection: [
"Backer, collectie"
], ... ...
The problem is that this is not valid Json. The quotes are missing for the names.
Like for example acquisition.date should be "acquisition.date":
I need to edit this big json file and add all the quotation marks, otherwise the file doesn't parse with for example D3.js
What is the best way to repair this Json file?

I'd use a decent text editor with regex find and replace capability (e.g., Visual Studio, UltraEdit, etc.).
Then Do: find
^\s*(\w+\.\w+)\s*:
and replace with
"$1":
Or you could use powershell:
$allText = gc yourfile.txt
$allText -replace '^\s*(\w+\.\w+)\s*:', '"$1":'

If you can open it in a text editor, I think you can simply use a replace function for:
], --> ],"
and
: [ --> ": [
If your JSON is formatted the same throughout and doesn't contain those characters, this should work.
--
Note that you'll have to manually edit the first key yourself.

Related

Perl add content to JSON data after processing

I have a json file, which I am processing using perl JSON module.
Once I process it, I want to insert some content to in it.
Here is my input json file:
{
"sequence" : [
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "11"
}
}
],
"current" : 0,
"next" : 1
}
And below is my script:
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
use Data::Dumper;
my $needed = 2;
my $filename = "test_file.json";
my $json_text = do {
open(my $json_fh, "<:encoding(UTF-8)", $filename)
or die("Can't open \$filename\": $!\n");
local $/;
<$json_fh>
};
my $json = JSON->new;
my $data = $json->decode($json_text);
my $aref = $data->{sequence};
print Dumper($aref);
my $number;
for my $element (#$aref) {
$number = $element->{attribute}->{contentNumber}."\n";
}
print "Number:$number\n";
my $total = $number + $needed;
foreach my $each_number ($number+1..$total){
print $each_number."\n";
}
print Dumper $data;
So what I needed over here is fetch contentNumber from given json file and increment value by 1 till $needed is mentioned and form a new json file.
And finally it should form JSON file which should have content like below:
Where whatever $needed variable value is mentioned that many times the json should form the data including the initial data.
{
"sequence" : [
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "11"
}
},
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "12"
}
},
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "13"
}
}
],
"current" : 0,
"next" : 1
}
I was thinking to push the data in foreach loop. But no clue how we can put it in data object which should give me an output with json format.
From the desired output it appears that you need the hashref which is in sequence's array. Then you need to add $needed number of its copies to that array, with contentNumber incremented in each. (I can't reconcile that with the shown code and I'll go with the desired output, which seems clear.)
Don't forget that the copies must be deep copies;† here I use dclone from Storable for that.
use Storable qw(dclone);
...
my $seq_href = dclone( $data->{sequence}[0] );
for (1..$needed) {
++$seq_href->{attribute}{contentNumber};
push #{$data->{sequence}}, dclone( $seq_href );
}
my $new_json_string = $json->encode($data); # then write it to file
This produces the desired output JSON in my tests.
† A variable or data structure containing references cannot be copied into a new, independent one by merely assigning
my #copy = #ary; # oups ... any references in there?
The problem is that when the elements that are references in #ary are copied into #copy then those elements in #copy, being the same references, point to same memory locations as the ones from #ary! So #copy and #ary are by no means independent -- they share data.
Sometimes that may be desired but if we need an independent copy, like in this problem, then we need to follow those references all the way and actually copy the data so that the copied structure indeed has its own data. And there are modules that do that of course.
Complex (nested) data structures by definition have references for elements and so we certainly can't get independent copies by one top-level assignment.
This is a very skinny description of a potentially sneaky and subtle bug. I'd suggest to read up more on it. One resource that comes up is an Effective Perler article.

What JSON format does STRIP_OUTER_ARRAY support?

I have a file composed of a single array containing multiple records.
{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}
I load it with the following code:
create or replace stage stage.test
url='azure://xxx/xxx'
credentials=(azure_sas_token='xxx');
create table if not exists stage.client (json_data variant not null);
copy into stage.client_test
from #stage.test/client_test.json
file_format = (type = 'JSON' strip_outer_array = true);
Snowflake imports the entire file as one row.
I would like the the COPY INTO command to remove the outer array structure and load the records into separate table rows.
When I load larger files, I hit the size limit for variant and get the error Error parsing JSON: document is too large, max size 16777216 bytes.
If you can import the file into Snowflake, into a single row, then you can use LATERAL FLATTEN on the Clients field to generate one row per element in the array.
Here's a blog post on LATERAL and FLATTEN (or you could look them up in the snowflake docs):
https://support.snowflake.net/s/article/How-To-Lateral-Join-Tutorial
If the format of the file is, as specified, a single object with a single property that contains an array with 500 MB worth of elements in it, then perhaps importing it will still work -- if that works, then LATERAL FLATTEN is exactly what you want. But that form is not particularly great for data processing. You might want to use some text processing script to massage the data if that's needed.
RECOMMENDATION #1:
The problem with your JSON is that it doesn't have an outer array. It has a single outer object containing a property with an inner array.
If you can fix the JSON, that would be the best solution, and then STRIP_OUTER_ARRAY will work as you expected.
You could also try to recompose the JSON (an ugly business) after reading line for line with:
CREATE OR REPLACE TABLE X (CLIENT VARCHAR);
COPY INTO X FROM (SELECT $1 CLIENT FROM #My_Stage/Client.json);
User Response to Recommendation #1:
Thank you. So from what I gather, COPY with STRIP_OUTER_ARRAY can handle a file starting and ending with square brackets, and parse the file as if they were not there.
The real files don't have line breaks, so I can't read the file line by line. I will see if the source system can change the export.
RECOMMENDATION #2:
Also if you would like to see what the JSON parser does, you can experiment using this code, I have parsed JSON on the copy command using similar code. Working with your JSON data in small project can help you shape the Copy command to work as intended.
CREATE OR REPLACE TABLE SAMPLE_JSON
(ID INTEGER,
DATA VARIANT
);
INSERT INTO SAMPLE_JSON(ID,DATA)
SELECT
1,parse_json('{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}');
SELECT
C.value:ClientNo AS ClientNo
,C.value:ClientName::STRING AS ClientName
,ClientBusiness.value:BusinessNo::Integer AS BusinessNo
,ClientBusiness.value:IndustryCode::Integer AS IndustryCode
from SAMPLE_JSON f
,table(flatten( f.DATA,'Client' )) C
,table(flatten(c.value:ClientBusiness,'')) ClientBusiness;
User Response to Recommendation #2:
Thank you for the parse_json example!
Trouble is, the real files are sometimes 500 MB, so the parse_json function chokes.
Follow-up on Recommendation #2:
The JSON needs to be in the NDJSON http://ndjson.org/ format. Otherwise the JSON will be impossible to parse because of the potential for large files.
Hope the above helps other running into similar questions!

How to mass-replace text in "sounds" : ["test/test"]

I'm trying to make a resource pack in Minecraft, and I'm replacing it so there's only one sound. When I went to go and edit sounds.json in VSC, I want to set all the locations to just one file. It should look like this :
"sounds" : [
"test/test"
],
to test/test for all the "sounds". But I have no idea of how to do this. The sounds.json file is so big it would take more than a day to do all the work by hand. So I checked to see if VSC had any options to replace the text. There wasn't.
I've tried looking around in VSC and there wasn't anything useful.
I've tried replacing all the sounds by pasting .ogg files and renaming them, it took too long, so I realized I could just set all the locations to point at one sound file.
I've gone on Google to do some research but found nothing of use.
"block.enderchest.open": {
"sounds": [
"test/test"
],
"subtitle": "subtitles.block.chest.open"
},
"block.fence_gate.close": {
"sounds": [
"block/fence_gate/close1",
"block/fence_gate/close2"
],
"subtitle": "subtitles.block.fence_gate.toggle"
},
"block.fence_gate.open": {
"sounds": [
"block/fence_gate/open1",
"block/fence_gate/open2"
],
"subtitle": "subtitles.block.fence_gate.toggle"
},
"block.fire.ambient": {
"sounds": [
"fire/fire"
],
I expect a convenient way in order to edit "sounds" : [] []'s all at once.
The actual result is not really a convenient way and a time waster to edit all of the sounds [] values.
One way to do it is with regex, see regex101 demo.
Search for : ("sounds": \[\n)((\s*)[^\]]*\n)*(\s*\],)
Replace with: $1$3"test/test"\n$4

Using logical operators in json path file

Is there a functionality to use logical operators in JSON path file used in copy command.
For example, I have a JSON which can contain a key which can either be
Desc
Or
Description
So in the JSON it would be something like -
{
"Desc": "Hello",
"City" : "City1",
"Age": "21"
}
{
"Description" : "World",
"City" : "City2",
"Age": "25"
}
I'm using copy command to pull the data from the JSON above into my table in redshift. The table has a column named "description_data". This would store values of either "Desc" or "Description". So I want my path file to identify using an "OR" condition.
This is the path file that I'm currently using -
{
"jsonpaths": [
"$['Desc']",
"$['City']",
"$['Age']"
]
}
Which is working fine.
What I'm trying to do is the below (this is where I'm unsure if there is any syntax or functionality to achieve the objective)
{
"jsonpaths": [
"$['Desc']" or "$['Description']",
"$['City']",
"$['Age']"
]
}
No, Redshift doesn't support this.
You can issue two copy commands, one with Desc, and another with Description, to load the data into two temporary tables. After that, you can merge the two into your final table.

How to display 'c' array values alone from the given JSON document below using MongoDB?

I am a newbie to MongoDB. I am experimenting the various ways of extracting fields from a document inside collection.
Here in the below JSON document, I am finding it difficult to get extract it according to my need
{
"_id":1,
"dependencies":{
"a":[
"hello",
"hi"
],
"b":[
"Hmmm"
],
"c":[
"Vanilla",
"Strawberry",
"Pista"
],
"d":[
"Carrot",
"Cauliflower",
"Potato",
"Cabbage"
]
},
"productid":"25",
"date":"Thu Jul 30 11:36:49 PDT 2015"
}
I need to display the following output:
c:[
"Vanilla",
"Strawberry",
"Pista"
]
Can anyone please help me in solving it?
MongoDB Aggregation comes into rescue to get the result you are looking for :
$Project--> Passes along the documents with only the specified fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
db.collection.aggregate( [
{ $project :
{ c: "$dependencies.c", _id : 0 }
}
]).pretty();
As per the output you required, we just need to project ( display) the field "dependencies.c" , so we are creating a new field "c" and assigining the value of the "dependencies.c" into it.
Also by defalut "_id" field will be display along with the result, since you dont need it, so we are suppressing of the _id field by assigining "_id" : <0 or false>, so that it will not display the _id field in the output.
The above query will fetch you the result as below :
"c" : [
"Vanilla",
"Strawberry",
"Pista"
]