Accessing value from mongodb with Scala - json

After executing a MongoDB query my result is of type : res = Seq[Document]
To access the BsonString I use : res (0).get("n"))
Which returns :
Some(BsonString{value='value'})
How can I access the value value from the BsonString as a String ?
Accessing the value of Some(BsonString{value='value'}) returns BsonString{value='value'} do I need to convert BsonString{value='value'} to a Scala object using a library (for example Jackson) and then access the value ?

I suppose you are using the mongo scala driver (and not ReactiveMongo).
In that case, the returned BsonString is a java object; here is the scaladoc that points to the javadoc.
And you can access the value via the getValue method.
As you are getting back Option objects, I would recommend to use proper for comprehension to avoid runtime exceptions; something like:
val optionalResult = for {
doc <- res.headOption
element <- doc.get[BsonString]("n")
} yield (element.getValue)
optionalResult will be of type Option[String].
You can then check if you have a value and use it; via map, flatMap, foreach or even if (optionalResult.isDefined).

Related

convert nested json string column into map type column in spark

overall aim
I have data landing into blob storage from an azure service in form of json files where each line in a file is a nested json object. I want to process this with spark and finally store as a delta table with nested struct/map type columns which can later be queried downstream using the dot notation columnName.key
data nesting visualized
{
key1: value1
nestedType1: {
key1: value1
keyN: valueN
}
nestedType2: {
key1: value1
nestedKey: {
key1: value1
keyN: valueN
}
}
keyN: valueN
}
current approach and problem
I am not using the default spark json reader as it is resulting in some incorrect parsing of the files instead I am loading the files as text files and then parsing using udfs by using python's json module ( eg below ) post which I use explode and pivot to get the first level of keys into columns
#udf('MAP<STRING,STRING>' )
def get_key_val(x):
try:
return json.loads(x)
except:
return None
Post this initial transformation I now need to convert the nestedType columns to valid map types as well. Now since the initial function is returning map<string,string> the values in nestedType columns are not valid jsons so I cannot use json.loads, instead I have regex based string operations
#udf('MAP<STRING,STRING>' )
def convert_map(string):
try:
regex = re.compile(r"""\w+=.*?(?:(?=,(?!"))|(?=}))""")
obj = dict([(a.split('=')[0].strip(),(a.split('=')[1])) for a in regex.findall(s)])
return obj
except Exception as e:
return e
this is fine for second level of nesting but if I want to go further that would require another udf and subsequent complications.
question
How can I use a spark udf or native spark functions to parse the nested json data such that it is queryable in columnName.key format.
also there is no restriction of spark version, hopefully I was able to explain this properly. do let me know if you want me to put some sample data and the code for ease. Any help is appreciated.

Unable to return a json inside Future[JsValue] from a WebSocket in Play 2.4

I have implemented Play framework's WebSocket so as to perform server communication using a WebSocket instead of Http. I have created a function as WebSocket.using[JsValue]. My json response is stored inside a Future[JsValue] variable and I am trying to fetch and return the json value from within Future[JsValue] variable. However I have been unable to return the json data from the Future[JsValue] variable. When I tried creating the WebSocket function as WebSocket.using[Future[JsValue]], in this case I was unable to create a json FrameFormatter for it.
def socketTest = WebSocket.using[JsValue] { request =>
val in = Iteratee.ignore[JsValue]
val out = Enumerator[JsValue](
Json.toJson(futureJsonVariable)
).andThen(Enumerator.eof)
(in, out)
}
futureJsonVariable is a variable of type Future[JsValue] In the above code the error at runtime is No Json serializer found for type scala.concurrent.Future[play.api.libs.json.JsValue]. Try to implement an implicit Writes or Format for this type. How can I return a json using a WebSocket method in Scala ? How can it be achieved using an Actor class instance ? If anyone knows best available online tutorials for WebSocket in Play framework. Any help is appreciated.
Use tryAccept to return either the result of the future when it is redeemed, or an error:
def socketTest = WebSocket.tryAccept[JsValue] { request =>
futureJsonVariable.map { json =>
val in = Iteratee.ignore[JsValue]
val out = Enumerator(json).andThen(Enumerator.eof)
Right((in, out))
} recover {
case err => Left(InternalServerError(err.getMessage))
}
}
This is similar to using but returns a Future[Either[Result, (Iteratee[A, _], Enumerator[A])]]. The Either[Result, ...] allows you to handle the case where something unexpected occurs calculating the future value A by providing a play.api.mvc.Result in the Left branch. The corollary is that you need to also wrap the "happy path" (where nothing goes wrong) in Right, in this case the iteratee/enumerator tuple you'd ordinarily return from using.
You can do something similar with the tryAcceptWithActor function.

Convert JOOQ results to a Map

I am developing a Web API using Scala with Scalatra and JOOQ. I would like to deal with Maps instead of Records, Case Classes etc
Using JacksonJsonSupport to automatically serialize my data to JSON :
get("/test") {
val r = DBManager.query select(MODULE.ID, MODULO.NAME) from MODULE fetchArrays
Map("result" -> r)
}
hitting 0.0.0.0:8080/test produces the following output:
{"result":[[1,"VelanRT"],[2,"GeobodyMorphologicalConvolution"], [3,"Sismofacies"]}
but, if using fetchMaps instead of fetchArrays :
{"result":[{},{},{}]}
what I expected is a Map[String, AnyVal], with the column name as the key and the value being the DB tuple value
Is there any additional setup I need to do? I there a chance that the json serialization from JacksonSupport is messing with things?

How to find node exists in JSON

I have following JSON
{"subscription":
{
"callbackReference": "xyz" ,
"criteria": "Vote",
"destinationAddress": "3456" ,
"notificationFormat" : "JSON"
}
}
I want to check whether "notificationFormat" elements exits there using JSONPath expression. I can get the value of above element using following JSONPath expression.
$.subscription.notificationFormat
Can I use similar kind of expression which returns boolean value checking whether elements exists ?
If I understood your question correct here is an answer.
This would check if notificationFormat exists in your json.
$.subscription[?(#.notificationFormat)]
That would get all destinationAddress in case if notificationFormat exists
$.subscription[?(#.notificationFormat)].destinationAddress
ReadContext ctx = JsonPath.parse("{}", com.jayway.jsonpath.Configuration.defaultConfiguration().addOptions(Option.SUPPRESS_EXCEPTIONS));
assertThat(ctx.read("$.components"), nullValue());
If you're using Jayway's Java implementation of JSONPath - the JsonPath library - and parsing the JSON once, ahead of time, to make multiple reads more efficient, then there is an arguably clearer (than using a JSONPath filter expression) way to check whether an optional JSON property exists. Use the ReadContext object representation of the parsed JSON to return the parent JSON object as a Java HashMap, then check whether a map entry exists with the property's name. Using the JSON from the question, the (Java) code to check whether the optional 'notificationFormat' property exists would be -
ReadContext parsedJson = JsonPath.parse(jsonString);
HashMap subscription = parsedJson.read("$.subscription");
if (subscription.containsKey("notificationFormat")) {
...
}

Standardized way to serialize JSON to query string?

I'm trying to build a restful API and I'm struggling on how to serialize JSON data to a HTTP query string.
There are a number of mandatory and optional arguments that need to be passed in the request, e.g (represented as a JSON object below):
{
"-columns" : [
"name",
"column"
],
"-where" : {
"-or" : {
"customer_id" : 1,
"services" : "schedule"
}
},
"-limit" : 5,
"return" : "table"
}
I need to support a various number of different clients so I'm looking for a standardized way to convert this json object to a query string. Is there one, and how does it look?
Another alternative is to allow users to just pass along the json object in a message body, but I read that I should avoid it (HTTP GET with request body).
Any thoughts?
Edit for clarification:
Listing how some different languages encodes the given json object above:
jQuery using $.param: -columns[]=name&-columns[]=column&-where[-or][customer_id]=1&-where[-or][services]=schedule&-limit=5&return=column
PHP using http_build_query: -columns[0]=name&-columns[1]=column&-where[-or][customer_id]=1&-where[-or][services]=schedule&-limit=5&return=column
Perl using URI::query_form: -columns=name&-columns=column&-where=HASH(0x59d6eb8)&-limit=5&return=column
Perl using complex_to_query: -columns:0=name&-columns:1=column&-limit=5&-where.-or.customer_id=1&-where.-or.services=schedule&return=column
jQuery and PHP is very similar. Perl using complex_to_query is also pretty similar to them. But none look exactly the same.
URL-encode (https://en.wikipedia.org/wiki/Percent-encoding) your JSON text and put it into a single query string parameter. for example, if you want to pass {"val": 1}:
mysite.com/path?json=%7B%22val%22%3A%201%7D
Note that if your JSON gets too long then you will run into a URL length limitation problem. In which case I would use POST with a body (yes, I know, sending a POST when you want to fetch something is not "pure" and does not fit well into the REST paradigm, but neither is your domain specific JSON-based query language).
There is no single standard for JSON to query string serialization, so I made a comparison of some JSON serializers and the results are as follows:
JSON: {"_id":"5973782bdb9a930533b05cb2","isActive":true,"balance":"$1,446.35","age":32,"name":"Logan Keller","email":"logankeller#artiq.com","phone":"+1 (952) 533-2258","friends":[{"id":0,"name":"Colon Salazar"},{"id":1,"name":"French Mcneil"},{"id":2,"name":"Carol Martin"}],"favoriteFruit":"banana"}
Rison: (_id:'5973782bdb9a930533b05cb2',age:32,balance:'$1,446.35',email:'logankeller#artiq.com',favoriteFruit:banana,friends:!((id:0,name:'Colon Salazar'),(id:1,name:'French Mcneil'),(id:2,name:'Carol Martin')),isActive:!t,name:'Logan Keller',phone:'+1 (952) 533-2258')
O-Rison: _id:'5973782bdb9a930533b05cb2',age:32,balance:'$1,446.35',email:'logankeller#artiq.com',favoriteFruit:banana,friends:!((id:0,name:'Colon Salazar'),(id:1,name:'French Mcneil'),(id:2,name:'Carol Martin')),isActive:!t,name:'Logan Keller',phone:'+1 (952) 533-2258'
JSURL: ~(_id~'5973782bdb9a930533b05cb2~isActive~true~balance~'!1*2c446.35~age~32~name~'Logan*20Keller~email~'logankeller*40artiq.com~phone~'*2b1*20*28952*29*20533-2258~friends~(~(id~0~name~'Colon*20Salazar)~(id~1~name~'French*20Mcneil)~(id~2~name~'Carol*20Martin))~favoriteFruit~'banana)
QS: _id=5973782bdb9a930533b05cb2&isActive=true&balance=$1,446.35&age=32&name=Logan Keller&email=logankeller#artiq.com&phone=+1 (952) 533-2258&friends[0][id]=0&friends[0][name]=Colon Salazar&friends[1][id]=1&friends[1][name]=French Mcneil&friends[2][id]=2&friends[2][name]=Carol Martin&favoriteFruit=banana
URLON: $_id=5973782bdb9a930533b05cb2&isActive:true&balance=$1,446.35&age:32&name=Logan%20Keller&email=logankeller#artiq.com&phone=+1%20(952)%20533-2258&friends#$id:0&name=Colon%20Salazar;&$id:1&name=French%20Mcneil;&$id:2&name=Carol%20Martin;;&favoriteFruit=banana
QS-JSON: isActive=true&balance=%241%2C446.35&age=32&name=Logan+Keller&email=logankeller%40artiq.com&phone=%2B1+(952)+533-2258&friends(0).id=0&friends(0).name=Colon+Salazar&friends(1).id=1&friends(1).name=French+Mcneil&friends(2).id=2&friends(2).name=Carol+Martin&favoriteFruit=banana
The shortest among them is URL Object Notation.
How about you try this sending them as follows:
http://example.com/api/wtf?
[-columns][]=name&
[-columns][]=column&
[-where][-or][customer_id]=1&
[-where][-or][services]=schedule&
[-limit]=5&
[return]=table&
I tried with a REST Client
And on the server side (Ruby with Sinatra) I checked the params, it gives me exactly what you want. :-)
Another option might be node-querystring. It also uses a similar scheme to the ones you've so far listed.
It's available in both npm and bower, which is why I have been using it.
Works well for nested objects.
Passing complex objects as query parameters of a url.
In the example below, obj is the JSON object to pass into query parameters.
Injecting JSON object as query parameters:
value = JSON.stringify(obj);
URLSearchParams to convert a string to an object representing search params. toString to retain string type for appending to url:
queryParams = new URLSearchParams(value).toString();
Pass the query parameters using template literals:
url = `https://some-url.com?key=${queryParams}`;
Now url will contain the JSON object as query parameters under key (user-defined name)
Extracing JSON from url:
This is assuming you have access to the url (either as string or URL object)
url_obj = new URL(url); (only if url is NOT a URL object, otherwise ignore this step)
Extract all query parameters in the url:
queryParams = new URLSearchParams(url_obj.search);
Use the key to extract the specific value:
obj = JSON.parse(queryParams.get('key').slice(0, -1));
slice() is used to extract a tailing = in the query params which is not required.
Here obj will be the same object passed in the query params.
I recommend to try these steps in the web console to understand better.
You can test with JSON examples here: https://json.org/example.html