problems with converting to json - json

i'm trying to convert it to json via jiffy and get an exception, seem that its correct
{"PurchaseOrder",
[{"PurchaseOrderNumber","99503"},
{"OrderDate","1999-10-20"},
{"Address",
[[{"Type","Shipping"},
{"Name",[{<<"#text">>,"Ellen Adams"}]},
{"Street",[{<<"#text">>,"123 Maple Street"}]},
{"City",[{<<"#text">>,"Mill Valley"}]},
{"State",[{<<"#text">>,"CA"}]},
{"Zip",[{<<"#text">>,"10999"}]},
{"Country",[{<<"#text">>,"USA"}]}],
[{"Type","Billing"},
{"Name",[{<<"#text">>,"Tai Yee"}]},
{"Street",[{<<"#text">>,"8 Oak Avenue"}]},
{"City",[{<<"#text">>,"Old Town"}]},
{"State",[{<<"#text">>,"PA"}]},
{"Zip",[{<<"#text">>,"95819"}]},
{"Country",[{<<"#text">>,"USA"}]}]]},
{"DeliveryNotes",
[{<<"#text">>,"Please leave packages in shed by driveway."}]},
{"Items",
[{"Item",
[[{"PartNumber","872-AA"},
{"ProductName",[{<<"#text">>,"Lawnmower"}]},
{"Quantity",[{<<"#text">>,"1"}]},
{"USPrice",[{<<"#text">>,"148.95"}]},
{"Comment",[{<<"#text">>,"Confirm this is electric"}]}],
[{"PartNumber","926-AA"},
{"ProductName",[{<<"#text">>,"Baby Monitor"}]},
{"Quantity",[{<<"#text">>,"2"}]},
{"USPrice",[{<<"#text">>,"39.98"}]},
{"ShipDate",[{<<"#text">>,"1999-05-21"}]}]]}]}]}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
help please, what's wrong?

Put your proplists/objects in tuple
instead of [{a,b}] should be {[{a,b}]}
Use binary strings instead of lists
instead of "string" should be <<"string">>
Rtfm on jiffy data format: https://github.com/davisp/jiffy#data-format
Example:
{[{<<"PurchaseOrder">>,
{[{<<"PurchaseOrderNumber">>,<<"99503">>},
{<<"OrderDate">>,<<"1999-10-20">>},
{<<"Address">>,
[{[{<<"Type">>,<<"Shipping">>},
{<<"Name">>,{[{<<"#text">>,<<"Ellen Adams">>}]}},
{<<"Street">>,{[{<<"#text">>,<<"123 Maple Street">>}]}},
{<<"City">>,{[{<<"#text">>,<<"Mill Valley">>}]}},
{<<"State">>,{[{<<"#text">>,<<"CA">>}]}},
{<<"Zip">>,{[{<<"#text">>,<<"10999">>}]}},
{<<"Country">>,{[{<<"#text">>,<<"USA">>}]}}]},
{[{<<"Type">>,<<"Billing">>},
{<<"Name">>,{[{<<"#text">>,<<"Tai Yee">>}]}},
{<<"Street">>,{[{<<"#text">>,<<"8 Oak Avenue">>}]}},
{<<"City">>,{[{<<"#text">>,<<"Old Town">>}]}},
{<<"State">>,{[{<<"#text">>,<<"PA">>}]}},
{<<"Zip">>,{[{<<"#text">>,<<"95819">>}]}},
{<<"Country">>,{[{<<"#text">>,<<"USA">>}]}}]}]},
{<<"DeliveryNotes">>,
{[{<<"#text">>,<<"Please leave packages in shed by driveway.">>}]}},
{<<"Items">>,
{[{<<"Item">>,
[{[{<<"PartNumber">>,<<"872-AA">>},
{<<"ProductName">>,{[{<<"#text">>,<<"Lawnmower">>}]}},
{<<"Quantity">>,{[{<<"#text">>,<<"1">>}]}},
{<<"USPrice">>,{[{<<"#text">>,<<"148.95">>}]}},
{<<"Comment">>,{[{<<"#text">>,<<"Confirm this is electric">>}]}}]},
{[{<<"PartNumber">>,<<"926-AA">>},
{<<"ProductName">>,{[{<<"#text">>,<<"Baby Monitor">>}]}},
{<<"Quantity">>,{[{<<"#text">>,<<"2">>}]}},
{<<"USPrice">>,{[{<<"#text">>,<<"39.98">>}]}},
{<<"ShipDate">>,{[{<<"#text">>,<<"1999-05-21">>}]}}]}]}]}}]}}]}

Related

flutter: how to make list from mysql-data?

From the MySQL query I get data like this:
(Fields: {IDAufgaben: 2630, Aufgabe: erste Aufgabe},
Fields: {IDAufgaben: 2627, Aufgabe: Testen})
json.decode gives a FormatException — I think because the quotes are lacking.
How can I change the MySQL data received in a Dart list?
Thanks a lot for help, I am newbie in Flutter and Dart…
should quote marks too, but if you take it from the terminallog then what happens is the quotation tent is not included, the solution is to change to json using jsonencode
like this one :
final myfiled = {"IDAufgaben": "2630", "Aufgabe": "erste Aufgabe"};
print(JsonEncoder.withIndent(" ").convert(myfiled));
/// result terminal is :
{
"IDAufgaben": "2630",
"Aufgabe": "erste Aufgabe"
}

How do I parse a nested json string in Snowflake?

thanks for reading and hope you can help me.
This is what my json string looks like. I'm struggling to find a way to parse it in Snowflake.
{"date":"2020-07-13T00:00:00.0000000","Reason":"{\"description\":\"Test\",\"alternates\":{},\"position\":10}","forename":"Tester","surname":"Test","title":"Mr","dateOfBirth":"2000-11-22T00:00:00.0000000"}
When I try PARSE_JSON() I get the following error
SQL Error [100069] [22P02]: Error parsing JSON: missing comma, pos 51
I'm exploring the possibility of cleansing/transforming the data before ingestion but perhaps someone out there has a better solution to deal with this issue within Snowflake.
So far I haven't been able to parse this or create a regular expression to replace the quote marks after the backwards slash.
Any help is much appreciated
Thanks!
jc
JCB,
I am unable to reproduce your issue. Here is what I am using:
WITH X AS (
SELECT PARSE_JSON($1) AS MY_JSON
FROM VALUES ($$
{
"date": "2020-07-13T00:00:00.0000000",
"Reason": "{\"description\":\"Test\",\"alternates\":{},\"position\":10}",
"forename": "Tester",
"surname": "Test",
"title": "Mr",
"dateOfBirth": "2000-11-22T00:00:00.0000000"
}
$$)
)
SELECT MY_JSON
FROM X
;
Please provide the EXACT SQL that you are using, so that others here can assist you better.
I managed to parse the json with Darren's help. Also managed to list any new keys and attributes with a lateral join to a flatten subquery.
SELECT DISTINCT
f.path,
typeof(f.value)
FROM
REPORT_DATA,
LATERAL FLATTEN(SRC, RECURSIVE=>true) f
WHERE
TYPEOF(f.value) != 'OBJECT';

Encoding Erlang Tuple as JSON

How do I convert a list with Tuples & Atoms & Binary strings in a list into JSON?
I see Erlang : Tuple List into JSON
and I found https://github.com/rustyio/BERT-JS
I want an API I can call like
erlang_json:convert([{a, b, {{c, d}}, 1}, {"a", "b", {{cat, dog}}, 2}
where the atoms would be converted to strings or some other standard way to process on the Javascript side.
I have complicated Erlang lists I need to send to my webpage.
It's unclear what [{a, b, {{c, d}}, 1}, {"a", "b", {{cat, dog}}, 2}... would turn into as JSON, but you might take a look at jiffy or jsx. Both of them work on simple key/value structures. For instance:
> Term = #{a => b, c => 1, <<"x">> => <<"y">>}.
#{a => b,c => 1,<<"x">> => <<"y">>}
> jiffy:encode(Term).
<<"{\"x\":\"y\",\"c\":1,\"a\":\"b\"}">>
> jsx:encode(Term).
<<"{\"a\":\"b\",\"c\":1,\"x\":\"y\"}">>
If you can say what JSON you want your example input to turn into, I might be able to give you a better suggestion.
Just for you
https://github.com/romanr321/t2j
You don't need to wrap it in a list though, it takes one tuple argument and returnes a json formated string.
>Tuple = {{key, value}, { key2, {key3, [value1, 2,3]}}}.
>t2j:t2jp(Tuple).
{"key":"value", "key2, {"key3":["value1", 2,3]}}
The library jsone is pretty good. It can translate between maps or tuples:
https://github.com/sile/jsone
I've used it extensively and it's lightning fast.
The only problem I've found is that a map that contains a list of maps throws an error. I hope this is fixed, but maybe I'm the only tart trying to do that.

Parse complex Json string contained in Hadoop

I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.

Importing and converting specific attributes of JSON files in R

I have been given a rather large corpus of conversational data with which to import the relevant information into R and run some statistical analysis.
The problem is I do not need half the information provided in each entry. Each line in a specific JSON file from the dataset relates to a particular conversation of the nature A->B->A. The attributes provided are contained within a nested array for each of the respective statements in the conversation. This is best illustrated diagrammatically:
What I need is to simply extract the 'actual_sentence' attribute from each turn (turn_1,turn_2,turn_3 - aka A->B->A) and remove the rest.
So far my efforts have been in vain as I have been using the jsonlite package which seems to import the JSON fine but lacks the 'tree depth' to discern between the specific attributes of each turn.
An example:
The following is an example of one row/record of a provided JSON formatted .txt file:
{"semantic_distance_1": 0.375, "semantic_distance_2": 0.6486486486486487, "turn_2": "{\"sentence\": [\"A\", \"transmission\", \"?\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"AT\", null, \".\"], \"semantic_set\": [\"infection.n.04\", \"vitamin_a.n.01\", \"angstrom.n.01\", \"transmittance.n.01\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"adenine.n.01\", \"a.n.07\", \"a.n.06\", \"deoxyadenosine_monophosphate.n.01\"], \"additional_info\": [], \"original_sentence\": \"A transmission?\", \"actual_sentence\": \"A transmission?\", \"dependency_grammar\": null, \"actor\": \"standard\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 58}", "turn_3": "{\"sentence\": [\"A\", \"voice\", \"transmission\", \".\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"AT\", \"NN\", null, \".\"], \"semantic_set\": [\"vitamin_a.n.01\", \"voice.n.10\", \"voice.n.09\", \"angstrom.n.01\", \"articulation.n.03\", \"deoxyadenosine_monophosphate.n.01\", \"a.n.07\", \"a.n.06\", \"infection.n.04\", \"spokesperson.n.01\", \"transmittance.n.01\", \"voice.n.02\", \"voice.n.03\", \"voice.n.01\", \"voice.n.06\", \"voice.n.07\", \"voice.n.05\", \"voice.v.02\", \"voice.v.01\", \"part.n.11\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"adenine.n.01\"], \"additional_info\": [], \"original_sentence\": \"A voice transmission.\", \"actual_sentence\": \"A voice transmission.\", \"dependency_grammar\": null, \"actor\": \"computer\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 59}", "turn_1": "{\"sentence\": [\"I\", \"have\", \"intercepted\", \"a\", \"transmission\", \"of\", \"unknown\", \"origin\", \".\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"PPSS\", \"HV\", \"VBD\", \"AT\", null, \"IN\", \"JJ\", \"NN\", \".\"], \"semantic_set\": [\"i.n.03\", \"own.v.01\", \"receive.v.01\", \"consume.v.02\", \"accept.v.02\", \"rich_person.n.01\", \"vitamin_a.n.01\", \"have.v.09\", \"have.v.07\", \"nameless.s.01\", \"have.v.01\", \"obscure.s.04\", \"have.v.02\", \"stranger.n.01\", \"angstrom.n.01\", \"induce.v.02\", \"hold.v.03\", \"wiretap.v.01\", \"give_birth.v.01\", \"a.n.07\", \"a.n.06\", \"deoxyadenosine_monophosphate.n.01\", \"infection.n.04\", \"unknown.n.03\", \"unknown.s.03\", \"get.v.03\", \"origin.n.03\", \"origin.n.02\", \"transmittance.n.01\", \"origin.n.05\", \"origin.n.04\", \"one.s.01\", \"have.v.17\", \"have.v.12\", \"have.v.10\", \"have.v.11\", \"take.v.35\", \"experience.v.03\", \"intercept.v.01\", \"unknown.n.01\", \"iodine.n.01\", \"strange.s.02\", \"suffer.v.02\", \"beginning.n.04\", \"one.n.01\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"lineage.n.01\", \"unknown.a.01\", \"adenine.n.01\"], \"additional_info\": [], \"original_sentence\": \"I have intercepted a transmission of unknown origin.\", \"actual_sentence\": \"I have intercepted a transmission of unknown origin.\", \"dependency_grammar\": null, \"actor\": \"computer\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 57}", "syntax_distance_1": null, "syntax_distance_2": null}
As you can see there is a great deal of information that I do not need and given my poor knowledge of R, importing it (and the rest of the file it is contained within) in this form leads me to the following in R:
The command used for this was:
json = fromJSON(paste("[",paste(readLines("JSONfile.txt"),collapse=","),"]"))
Essentially it is picking up on syntax_distance_1, syntax_distance_2, semantic_distance_1,semantic_distance_2 and then lumping all of the turn data into three enormous and unstructured arrays.
What I would like to know is if I can somehow either:
Specify a tree depth that enables R to discern between each of the 'turn' variables
OR
Simply cherry pick the turn$actual_sentence information from the outset and remove all the rest in the import process.
Hope that is enough information, please let me know if there is anything else I can add to clear it up.
Since in this case you know that you need to go one level deeper, what you can do is use one of the apply functions to parse the turn_x strings. The following snippet of code illustrates the basic idea:
# Read the json file
json_file <- fromJSON("JSONfile.json")
# use the apply function to parse the turn_x strings.
# Checking that the element is a character helps avoid
# issues with numerical values and nulls.
pjson_file <- lapply(json_file, function(x) {if (is.character(x)){fromJSON(x)}})
If we look at the results, we see that the whole data structure has been parsed this time. To access the actual_sentence field, what you can do is:
> pjson_file$turn_1$actual_sentence
[1] "I have intercepted a transmission of unknown origin."
> pjson_file$turn_2$actual_sentence
[1] "A transmission?"
> pjson_file$turn_3$actual_sentence
[1] "A voice transmission."
If you want to scale this logic so that it works with a large dataset, you can encapsulate it in a function that would return the three sentences as a character vector or a dataframe if you wish.