I have a use case where we have text file like key value format .
The file is not any of the fixed format but created like key value .
We need to create JSON out of that file .
I am able to create JSON but when text format has array like structure it creates just Key value json not the array json structure .
This is my Input .
[DOCUMENT]
Headline=This is Headline
MainLanguage=EN
DocType.MxpCode=1000
Subject[0].MxpCode=BUSNES
Subject[1].MxpCode=CONS
Subject[2].MxpCode=ECOF
Author[0].MxpCode=6VL6
Industry[0].CtbCode=53
Industry[1].CtbCode=5340
Industry[2].CtbCode=534030
Industry[3].CtbCode=53403050
Symbol[0].Name=EXPE.OQ
Symbol[1].Name=ABNB.OQ
WorldReg[0].CtbCode=G4
Country[0].CtbCode=G26
Country[1].CtbCode=G2V
[ENDOFFILE]
Exiting code to create json is below
with open("file1.csv") as f:
lines = f.readlines()
data = {}
for line in lines:
parts = line.split('=')
if len(parts) == 2:
data[parts[0].strip()] = parts[1].strip()
print(json.dumps(data, indent=' '))
The current output is below
{
"Headline": "This is Headline",
"MainLanguage": "EN",
"DocType.MxpCode": "1000",
"Subject[0].MxpCode": "BUSNES",
"Subject[1].MxpCode": "CONS",
"Subject[2].MxpCode": "ECOF",
"Author[0].MxpCode": "6VL6",
"Industry[0].CtbCode": "53",
"Industry[1].CtbCode": "5340",
"Industry[2].CtbCode": "534030",
"Industry[3].CtbCode": "53403050",
"Symbol[0].Name": "EXPE.OQ",
"Symbol[1].Name": "ABNB.OQ",
"WorldReg[0].CtbCode": "G4",
"Country[0].CtbCode": "G26",
"Country[1].CtbCode": "G2V"
}
Expected out is is something like below
For the Subject key and like wise for others also
{
"subject": [
{
"mxcode": 123
},
{
"mxcode": 123
},
{
"mxcode": 123
}
]
}
Like wise for Industry and Symbol and Country.
so the idea is when we have position in the text file it should be treated as array in the json output .
Use one more loop as it is nested. Use for loop from where subject starts. try it that way.
Related
I have a json file, which I am processing using perl JSON module.
Once I process it, I want to insert some content to in it.
Here is my input json file:
{
"sequence" : [
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "11"
}
}
],
"current" : 0,
"next" : 1
}
And below is my script:
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
use Data::Dumper;
my $needed = 2;
my $filename = "test_file.json";
my $json_text = do {
open(my $json_fh, "<:encoding(UTF-8)", $filename)
or die("Can't open \$filename\": $!\n");
local $/;
<$json_fh>
};
my $json = JSON->new;
my $data = $json->decode($json_text);
my $aref = $data->{sequence};
print Dumper($aref);
my $number;
for my $element (#$aref) {
$number = $element->{attribute}->{contentNumber}."\n";
}
print "Number:$number\n";
my $total = $number + $needed;
foreach my $each_number ($number+1..$total){
print $each_number."\n";
}
print Dumper $data;
So what I needed over here is fetch contentNumber from given json file and increment value by 1 till $needed is mentioned and form a new json file.
And finally it should form JSON file which should have content like below:
Where whatever $needed variable value is mentioned that many times the json should form the data including the initial data.
{
"sequence" : [
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "11"
}
},
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "12"
}
},
{
"type" : "event",
"attribute" : {
"contentText" : "Test Content",
"contentNumber" : "13"
}
}
],
"current" : 0,
"next" : 1
}
I was thinking to push the data in foreach loop. But no clue how we can put it in data object which should give me an output with json format.
From the desired output it appears that you need the hashref which is in sequence's array. Then you need to add $needed number of its copies to that array, with contentNumber incremented in each. (I can't reconcile that with the shown code and I'll go with the desired output, which seems clear.)
Don't forget that the copies must be deep copies;† here I use dclone from Storable for that.
use Storable qw(dclone);
...
my $seq_href = dclone( $data->{sequence}[0] );
for (1..$needed) {
++$seq_href->{attribute}{contentNumber};
push #{$data->{sequence}}, dclone( $seq_href );
}
my $new_json_string = $json->encode($data); # then write it to file
This produces the desired output JSON in my tests.
† A variable or data structure containing references cannot be copied into a new, independent one by merely assigning
my #copy = #ary; # oups ... any references in there?
The problem is that when the elements that are references in #ary are copied into #copy then those elements in #copy, being the same references, point to same memory locations as the ones from #ary! So #copy and #ary are by no means independent -- they share data.
Sometimes that may be desired but if we need an independent copy, like in this problem, then we need to follow those references all the way and actually copy the data so that the copied structure indeed has its own data. And there are modules that do that of course.
Complex (nested) data structures by definition have references for elements and so we certainly can't get independent copies by one top-level assignment.
This is a very skinny description of a potentially sneaky and subtle bug. I'd suggest to read up more on it. One resource that comes up is an Effective Perler article.
slurperresponse = new JsonSlurper().parseText(responseContent)
log.info (slurperresponse.WorkItems[0].WorkItemExternalId)
The above code helps me get the node value "WorkItems[0].WorkItemExternalId" using Groovy. Below is the response.
{
"TotalRecordCount": 1,
"TotalPageCount": 1,
"CurrentPage": 1,
"BatchSize": 10,
"WorkItems": [ {
"WorkItemUId": "4336c111-7cd6-4938-835c-3ddc89961232",
"WorkItemId": "20740900",
"StackRank": "0",
"WorkItemTypeUId": "00020040-0200-0010-0040-000000000000",
"WorkItemExternalId": "79853"
}
I need to append the string "WorkItems[0].WorkItemExternalId" (being read from a excel file) and multiple other such nodes dynamically to "slurperresponse" to get the value of nodes rather than directly hard coding as slurperresponse.WorkItems[0].WorkItemExternalId..
Tried append and "+" operator but i get a compilation error. What other way can I do this?
slurperrsesponse is an object its not a string that's why the concatenation does not work
Json Slurper creates an object out of the input string. This object is dynamic by nature, you can access it, you can add fields to it or alter the existing fields. Contatenation won't work here.
Here is an example:
import groovy.json.*
def text = '{"total" : 2, "students" : [{"name": "John", "age" : 20}, {"name": "Alice", "age" : 21}] }'
def json = new JsonSlurper().parseText(text)
json.total = 3 // alter the value of the existing field
json.city = 'LA' // add a totally new field
json.students[0].age++ // change the field in a list
println json
This yields the output:
[total:3, students:[[name:John, age:21], [name:Alice, age:21]], city:LA]
Now if I've got you right you want to add a new student dynamically and the input is a text that you've read from Excel. So here is the example:
json.students << new JsonSlurper().parseText('{"name" : "Tom", "age" : 25}')
// now there are 3 students in the list
Update
Its also possible to get the values without 'hardcoding' the property name:
// option 1
println json.city // prints 'LA'
// option 2
println json.get('city') // prints 'LA' but here 'city' can be a variable
// option 3
println json['city'] // the same as option 2
I have a file composed of a single array containing multiple records.
{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}
I load it with the following code:
create or replace stage stage.test
url='azure://xxx/xxx'
credentials=(azure_sas_token='xxx');
create table if not exists stage.client (json_data variant not null);
copy into stage.client_test
from #stage.test/client_test.json
file_format = (type = 'JSON' strip_outer_array = true);
Snowflake imports the entire file as one row.
I would like the the COPY INTO command to remove the outer array structure and load the records into separate table rows.
When I load larger files, I hit the size limit for variant and get the error Error parsing JSON: document is too large, max size 16777216 bytes.
If you can import the file into Snowflake, into a single row, then you can use LATERAL FLATTEN on the Clients field to generate one row per element in the array.
Here's a blog post on LATERAL and FLATTEN (or you could look them up in the snowflake docs):
https://support.snowflake.net/s/article/How-To-Lateral-Join-Tutorial
If the format of the file is, as specified, a single object with a single property that contains an array with 500 MB worth of elements in it, then perhaps importing it will still work -- if that works, then LATERAL FLATTEN is exactly what you want. But that form is not particularly great for data processing. You might want to use some text processing script to massage the data if that's needed.
RECOMMENDATION #1:
The problem with your JSON is that it doesn't have an outer array. It has a single outer object containing a property with an inner array.
If you can fix the JSON, that would be the best solution, and then STRIP_OUTER_ARRAY will work as you expected.
You could also try to recompose the JSON (an ugly business) after reading line for line with:
CREATE OR REPLACE TABLE X (CLIENT VARCHAR);
COPY INTO X FROM (SELECT $1 CLIENT FROM #My_Stage/Client.json);
User Response to Recommendation #1:
Thank you. So from what I gather, COPY with STRIP_OUTER_ARRAY can handle a file starting and ending with square brackets, and parse the file as if they were not there.
The real files don't have line breaks, so I can't read the file line by line. I will see if the source system can change the export.
RECOMMENDATION #2:
Also if you would like to see what the JSON parser does, you can experiment using this code, I have parsed JSON on the copy command using similar code. Working with your JSON data in small project can help you shape the Copy command to work as intended.
CREATE OR REPLACE TABLE SAMPLE_JSON
(ID INTEGER,
DATA VARIANT
);
INSERT INTO SAMPLE_JSON(ID,DATA)
SELECT
1,parse_json('{
"Client": [
{
"ClientNo": 1,
"ClientName": "Alpha",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "12345"
},
{
"BusinessNo": 2,
"IndustryCode": "23456"
}
]
},
{
"ClientNo": 2,
"ClientName": "Bravo",
"ClientBusiness": [
{
"BusinessNo": 1,
"IndustryCode": "34567"
},
{
"BusinessNo": 2,
"IndustryCode": "45678"
}
]
}
]
}');
SELECT
C.value:ClientNo AS ClientNo
,C.value:ClientName::STRING AS ClientName
,ClientBusiness.value:BusinessNo::Integer AS BusinessNo
,ClientBusiness.value:IndustryCode::Integer AS IndustryCode
from SAMPLE_JSON f
,table(flatten( f.DATA,'Client' )) C
,table(flatten(c.value:ClientBusiness,'')) ClientBusiness;
User Response to Recommendation #2:
Thank you for the parse_json example!
Trouble is, the real files are sometimes 500 MB, so the parse_json function chokes.
Follow-up on Recommendation #2:
The JSON needs to be in the NDJSON http://ndjson.org/ format. Otherwise the JSON will be impossible to parse because of the potential for large files.
Hope the above helps other running into similar questions!
I'm trying to get python to create a json formatted like :
[
{
"machine_working": true
},
{
"MachineName": "TBL165-169",
"MachineType": "Rig Test"
}
]
However, i can seam to do it, this is the code i have currently but its giving me error
this_is_a_dict_too=[]
this_is_a_dict_too = dict(State="on",dict(MachineType="machinetype1",MachineName="MachineType2"))
File "c:\printjson.py", line 40
this_is_a_dict_too = dict(Statedsf="test",dict(MachineType="Rig Test",MachineName="TBL165-169")) SyntaxError: non-keyword arg after
keyword arg
this_is_a_dict_too = [dict(machine_working=True),dict(MachineType="machinetype1",MachineName="MachineType2")]
print(this_is_a_dict_too)
You are trying to make dictionary in dictionary, the error message say that you try to add element without name (corresponding key)
dict(a='b', b=dict(state='on'))
will work, but
dict(a='b', dict(state='on'))
won't.
The thing that you presented is list, so you can use
list((dict(a='b'), dict(b='a')))
Note that example above use two dictionaries packed into tuple.
or
[ dict(a='b'), dict(b='a') ]
I have a column of text type be contain JSON value.
{
"customer": [
{
"details": {
"customer1": {
"name": "john",
"addresses": {
"address1": {
"line1": "xyz",
"line2": "pqr"
},
"address2": {
"line1": "abc",
"line2": "efg"
}
}
}
"customer2": {
"name": "robin",
"addresses": {
"address1": null
}
}
}
}
]
}
How can I extract 'address1' JSON field of column with query?
First I am trying to fetch JSON value then I will go with parsing.
SELECT JSON customer from text_column;
With my query, I get following error.
com.datastax.driver.core.exceptions.SyntaxError: line 1:12 no viable
alternative at input 'customer' (SELECT [JSON] customer...)
com.datastax.driver.core.exceptions.SyntaxError: line 1:12 no viable
alternative at input 'customer' (SELECT [JSON] customer...)
Cassandra version 2.1.13
You can't use SELECT JSON in Cassandra v2.1.x CQL v3.2.x
For Cassandra v2.1.x CQL v3.2.x :
The only supported operation after SELECT are :
DISTINCT
COUNT (*)
COUNT (1)
column_name AS new_name
WRITETIME (column_name)
TTL (column_name)
dateOf(), now(), minTimeuuid(), maxTimeuuid(), unixTimestampOf(), typeAsBlob() and blobAsType()
In Cassandra v2.2.x CQL v3.3.x Introduce : SELECT JSON
With SELECT statements, the new JSON keyword can be used to return each row as a single JSON encoded map. The remainder of the SELECT statment behavior is the same.
The result map keys are the same as the column names in a normal result set. For example, a statement like “SELECT JSON a, ttl(b) FROM ...” would result in a map with keys "a" and "ttl(b)". However, this is one notable exception: for symmetry with INSERT JSON behavior, case-sensitive column names with upper-case letters will be surrounded with double quotes. For example, “SELECT JSON myColumn FROM ...” would result in a map key "\"myColumn\"" (note the escaped quotes).
The map values will JSON-encoded representations (as described below) of the result set values.
If your Cassandra version is 2.1x and below, you can use the Python-based approach.
Write a python script using Cassandra-Python API
Here you have to get your row first and then use python json's loads method, which will convert your json text column value into JSON object which will be dict in Python. Then you can play around with Python dictionaries and extract your required nested keys. See the below code snippet.
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import json
if __name__ == '__main__':
auth_provider = PlainTextAuthProvider(username='xxxx', password='xxxx')
cluster = Cluster(['0.0.0.0'],
port=9042, auth_provider=auth_provider)
session = cluster.connect("keyspace_name")
print("session created successfully")
rows = session.execute('select * from user limit 10')
for user_row in rows:
customer_dict = json.loads(user_row.customer)
print(customer_dict().keys()