converting a string to json format in scala - json

I have to convert a string to json format in scala. The string is like this:
"classification" : "Map(Metals -> List(Cu, Co, Ni), Nonmetals -> List(N,O,C), Noblegases -> List(Ar, Kr))"
The desired json format is like this:
"classification" : {"Metals": [Cu, Co, Ni],
"Nonmetals":[N,O,C],
"Noblegases":[Ar, Kr]
}
Any quick suggestions would be appreciated.

Your question is not very specific so my answer is a bit vague as well.
First you will have to parse the input string and extract the values. I would use a combination of regular expressions and simple String operations like searching for the first occurrence of a certain character (e.g. colon) and splitting the string there.
In the next step you create the JSON object. There are several libraries out there that you can use. I suggest JSON-Java/org.json or if you like to use a scala library you can use play-json.

Related

Reading the Json string which is put together in one field

I have a Json pattern string in a text file, I have to pharse the below string like below and put it in to a external file.
Please let me know how this can be handled with Informatica Powercenter or Unix or Python?
{"CONTACTID":"3b2a25b2","ANI":"+16146748702","DNIS":"+18006081123","START_TIME":"01/22/2023 03:31:42","MODULE":[{"Name":"MainIVR","Time":"01/22/2023 03:31:42",Dialog":[{"name":"offer_Spanish","dialogeresult":"(|raw:7|R|7|1.0|nm=0|ni=0|2023/22/21 03:02:01)"}],"backend":[{"Time":"01/22/2023)"}],"END_STATE":"XC"}
In The above sample string the special charcters should be removed and the values should be assigned to the corresponding columns like below 2 o/p formats
Output:
CONTACTID, ANI, DNIS, START_TIME, MODULE, Time,Dialog,dialogeresult,END_STATE
3b2a25b2,+16146748702 +18006081123 01/22/2023 03:31:42,Name:MainIVR,
or
Output:
CONTACTID : 3b2a25b2
ANI:16146748702
DNI :+18006081123
I tried this to read thru Informatica powercenter and using the expression tranformations but nothing worked and tried with Python too.
For a start, your JSON is invalid. The opening double quotes for Dialog are missing and it's not properly closed - MODULE array is not closed and root is not closed. Here's the fixed JSON:
{"CONTACTID":"3b2a25b2","ANI":"+16146748702","DNIS":"+18006081123","START_TIME":"01/22/2023 03:31:42","MODULE":[{"Name":"MainIVR","Time":"01/22/2023 03:31:42","Dialog":[{"name":"offer_Spanish","dialogeresult":"(|raw:7|R|7|1.0|nm=0|ni=0|2023/22/21 03:02:01)"}],"backend":[{"Time":"01/22/2023)"}],"END_STATE":"XC"}]}
Use some JSON validation tool, like this one - it helps a lot.
Next, here's some starter code you may use to achieve the required result:
import json
# some JSON:
x = '{"CONTACTID":"3b2a25b2","ANI":"+16146748702","DNIS":"+18006081123","START_TIME":"01/22/2023 03:31:42","MODULE":[{"Name":"MainIVR","Time":"01/22/2023 03:31:42","Dialog":[{"name":"offer_Spanish","dialogeresult":"(|raw:7|R|7|1.0|nm=0|ni=0|2023/22/21 03:02:01)"}],"backend":[{"Time":"01/22/2023)"}],"END_STATE":"XC"}]}'
# parse x:
y = json.loads(x)
# the result is a Python dictionary:
print(y.keys())
You may test it on Replit
Finally regarding Informatica Powercenter - it is a terrible choice for complex string processing. You would need a Hierarchy Parser Transformation. Long story short: it's very tedious, but possible. I would highly recommend picking up a differen approach, if this is not a regular data loading process you will need to build.

Custom Formatting of JSON output using Spark

I have a dataset with a bunch of BigDecimal values. I would like to output these records to a JSON file, but when I do the BigDecimal values will often be written with trailing zeros (123.4000000000000), but the spec we are must conform to does not allow this (for reasons I don't understand).
I am trying to see if there is a way to override how the data is printed to JSON.
Currently, my best idea is to convert each record to a string using JACKSON and then writing the data using df.write().text(..) rather than JSON.
I suggest to convert Decimal type to String before writing to JSON.
Below code is in Scala, but you can use it in Java easily
import org.apache.spark.sql.types.StringType
# COLUMN_NAME is your DataFrame column name.
val new_df = df.withColumn('COLUMN_NAME_TMP', df.COLUMN_NAME.cast(StringType)).drop('COLUMN_NAME').withColumnRenamed('COLUMN_NAME_TMP', 'COLUMN_NAME')

How do I store a JSON String inside of a JSON String?

I'm using jsoncpp to store information in JSON format. I now have the need to store a json string inside of another json string... In other words, I need to store a sub_item inside of an item.
Using jsoncpp to generate the JSON string, I get this...
{"id":"1","name":"Advil","sub_item":"{\"id\":\"2\",\"name\":\"Liquid Gel Advil\"}\n"}
Which works perfectly fine during runtime. However, when my program saves this information into a MySQL database (on exit) and then loads it back up when I restart the program, it loads the same JSON string from the MySQL database, but it now looks like this...
{"id":"1","name":"Advil","sub_item":"{"id":"2","name":"Liquid Gel Advil"}"}
Which is an invalid JSON string. I'm not sure why this is happening can someone please tell me what the heck is going on...
My MySQL query string reads like so:
UPDATE json_string_test SET jsonstring='{"id":"1","name":"Advil","sub_item":"{\"id\":\"2\",\"name\":\"Liquid Gel Advil\"}\n"}';
Upon further research I found out FastWriter was depreciated, and StreamWriterBuilder was the recommended writer. However, it still produced the same problem as FastWrtier....
I managed to rig a fix by doing the following...
1) Before saving to the database, I replaced ALL substrings matching \" to \\" in ONLY the child JSON string (with id 2).
2) Upon loading the JSON string, I replaced ALL substrings matching \\" to \" in ONLY the child JSON string (with id 2).
I don't understand why the heck I have to do this, so if anyone has a better solution or an explanation... I'd love to hear it.
I think you need to transpose your last 2 characters (assuming your posted string is verbatim). You have
Advil\"}\n"}
but I think you need
Advil\"}\n}"

Library to convert JSON string to Erlang record

I've a large JSON string, I want to convert this string into Erlang record.
I found jiffy library but it doesn't completely convert to record.
For example:
jiffy:decode(<<"{\"foo\":\"bar\"}">>).
gives
{[{<<"foo">>,<<"bar">>}]}
but I want the following output:
{ok,{obj,[{"foo",<<"bar">>}]},[]}
Is there any library that can be used for the desired output?
Or is there any library that can be used in combination of jiffy for further modifying the output of it.
Consider the fact the JSON string is large, and I want the output is minimum time.
Take a look at ejson, from the documentation:
JSON library for Erlang on top of jsx. It gives a declarative interface for jsx by which we need to specify conversion rules and ejson will convert tuples according to the rules.
I made this library to make easy not just the encoding but rather the decoding of JSONs to Erlang records...
In order for ejson to take effect the source files need to be compiled with parse_transform ejson_trans. All record which has -json attribute can be converted to JSON later.

Parsing large JSON file with Scala and JSON4S

I'm working with Scala in IntelliJ IDEA 15 and trying to parse a large twitter record json file and count the total number of hashtags. I am very new to Scala and the idea of functional programming. Each line in the json file is a json object (representing a tweet). Each line in the file starts like so:
{"in_reply_to_status_id":null,"text":"To my followers sorry..
{"in_reply_to_status_id":null,"text":"#victory","in_reply_to_screen_name"..
{"in_reply_to_status_id":null,"text":"I'm so full I can't move"..
I am most interested in a property called "entities" which contains a property called "hastags" with a list of hashtags. Here is an example:
"entities":{"hashtags":[{"text":"thewayiseeit","indices":[0,13]}],"user_mentions":[],"urls":[]},
I've browsed the various scala frameworks for parsing json and have decided to use json4s. I have the following code in my Scala script.
import org.json4s.native.JsonMethods._
var json: String = ""
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line
val data = parse(json)
My logic here is that I am trying to read each line from twitter38.json into a string and then parse the entire string with parse(). The parse function is throwing an error claiming:
"Type mismatch, expected: Nothing, found:String."
I have seen examples that use parse() on strings that hold json objects such as
val jsontest =
"""{
|"name" : "bob",
|"age" : "50",
|"gender" : "male"
|}
""".stripMargin
val data = parse(jsontest)
but I have received the same error. I am coming from an object oriented programming background, is there something fundamentally wrong with the way I am approaching this problem?
You have most likely incorrectly imported dependencies to your Intellij project or modules into your file. Make sure you have the following lines imported:
import org.json4s.native.JsonMethods._
Even if you correctly import this module, parse(String: json) will not work for you, because you have incorrectly formed a json. Your json String will look like this:
"""{"in_reply_...":"someValue1"}{"in_reply_...":"someValues2"}"""
but should look as follows to be a valid json that can be parsed:
"""{{"in_reply_...":"someValue1"},{"in_reply_...":"someValues2"}}"""
i.e. you need starting and ending brackets for the json, and a comma between each line of tweets. Please read the json4s documenation for more information.
Although being almost 6 years old, I think this question deserves another try.
JSON format has a few misunderstandings in people's minds, especially how they are stored and how they are read back.
JSON documents, are stored as either a single object having all the other fields, or an array of multiple object possibly in same format. this second part is important because arrays in almost every programming language are defined by angle brackets and values separated by commas (note here I used a person object as my single value):
[
{"name":"John","surname":"Doe"},
{"name":"Jane","surname":"Doe"}
]
also note that everything except brackets, numbers and booleans are enclosed in quotes when written into file.
however, there is another use that is not official but preferred to transfer datasets easily where every object, or document as in nosql/mongo language, are stored in a new line like this:
{"name":"John","surname":"Doe"}
{"name":"Jane","surname":"Doe"}
so for the question, OP has a document written in this second form, but tries an algorithm written to read the first form. following code has few simple changes to achieve this, and the user must read the file knowing that:
var json: String = "["
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line + ","
json=json.splitAt(json.length()-1)._1
json+= "]"
val data = parse(json)
PS: although #sbrannon, has the correct idea, the example he/she gave has mistakenly curly braces instead of angle brackets to surround the data.
EDIT: I have added json=json.splitAt(json.length()-1)._1 because the code above ends with a trailing comma which will cause parse error per the JSON format definition.