Convert JOOQ results to a Map - json

I am developing a Web API using Scala with Scalatra and JOOQ. I would like to deal with Maps instead of Records, Case Classes etc
Using JacksonJsonSupport to automatically serialize my data to JSON :
get("/test") {
val r = DBManager.query select(MODULE.ID, MODULO.NAME) from MODULE fetchArrays
Map("result" -> r)
}
hitting 0.0.0.0:8080/test produces the following output:
{"result":[[1,"VelanRT"],[2,"GeobodyMorphologicalConvolution"], [3,"Sismofacies"]}
but, if using fetchMaps instead of fetchArrays :
{"result":[{},{},{}]}
what I expected is a Map[String, AnyVal], with the column name as the key and the value being the DB tuple value
Is there any additional setup I need to do? I there a chance that the json serialization from JacksonSupport is messing with things?

Related

convert nested json string column into map type column in spark

overall aim
I have data landing into blob storage from an azure service in form of json files where each line in a file is a nested json object. I want to process this with spark and finally store as a delta table with nested struct/map type columns which can later be queried downstream using the dot notation columnName.key
data nesting visualized
{
key1: value1
nestedType1: {
key1: value1
keyN: valueN
}
nestedType2: {
key1: value1
nestedKey: {
key1: value1
keyN: valueN
}
}
keyN: valueN
}
current approach and problem
I am not using the default spark json reader as it is resulting in some incorrect parsing of the files instead I am loading the files as text files and then parsing using udfs by using python's json module ( eg below ) post which I use explode and pivot to get the first level of keys into columns
#udf('MAP<STRING,STRING>' )
def get_key_val(x):
try:
return json.loads(x)
except:
return None
Post this initial transformation I now need to convert the nestedType columns to valid map types as well. Now since the initial function is returning map<string,string> the values in nestedType columns are not valid jsons so I cannot use json.loads, instead I have regex based string operations
#udf('MAP<STRING,STRING>' )
def convert_map(string):
try:
regex = re.compile(r"""\w+=.*?(?:(?=,(?!"))|(?=}))""")
obj = dict([(a.split('=')[0].strip(),(a.split('=')[1])) for a in regex.findall(s)])
return obj
except Exception as e:
return e
this is fine for second level of nesting but if I want to go further that would require another udf and subsequent complications.
question
How can I use a spark udf or native spark functions to parse the nested json data such that it is queryable in columnName.key format.
also there is no restriction of spark version, hopefully I was able to explain this properly. do let me know if you want me to put some sample data and the code for ease. Any help is appreciated.

GCP Proto Datastore encode JsonProperty in base64

I store a blob of Json in the datastore using JsonProperty.
I don't know the structure of the json data.
I am using endpoints proto datastore in order to retrieve my data.
The probleme is the json property is encoded in base64 and I want a plain json object.
For the example, the json data will be:
{
first: 1,
second: 2
}
My code looks something like:
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Model(EndpointsModel):
data = ndb.JsonProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class DataEndpoint(remote.Service):
#Model.method(path='mymodel2', http_method='POST',
name='mymodel.insert')
def MyModelInsert(self, my_model):
my_model.data = {"first": 1, "second": 2}
my_model.put()
return my_model
#Model.method(path='mymodel/{entityKey}',
http_method='GET',
name='mymodel.get')
def getMyModel(self, model):
print(model.data)
return model
API = endpoints.api_server([DataEndpoint])
When I call the api for getting a model, I get:
POST /_ah/api/myapi/v1/mymodel2
{
"data": "eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ=="
}
where eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ== is the base64 encoded of {"second": 2, "first": 1}
And the print statement give me: {u'second': 2, u'first': 1}
So, in the method, I can explore the json blob data as a python dict.
But, in the api call, the data is encoded in base64.
I expeted the api call to give me:
{
'data': {
'second': 2,
'first': 1
}
}
How can I get this result?
After the discussion in the comments of your question, let me share with you a sample code that you can use in order to store a JSON object in Datastore (it will be stored as a string), and later retrieve it in such a way that:
It will show as plain JSON after the API call.
You will be able to parse it again to a Python dict using eval.
I hope I understood correctly your issue, and this helps you with it.
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class MyApi(remote.Service):
# URL: .../_ah/api/myapi/v1/mymodel - POSTS A NEW ENTITY
#Sample.method(path='mymodel', http_method='GET', name='Sample.insert')
def MyModelInsert(self, my_model):
dict={'first':1, 'second':2}
dict_str=str(dict)
my_model.column1="Year"
my_model.column2=2018
my_model.column3=dict_str
my_model.put()
return my_model
# URL: .../_ah/api/myapi/v1/mymodel/{ID} - RETRIEVES AN ENTITY BY ITS ID
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
dict=eval(my_model.column3)
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
application = endpoints.api_server([MyApi], restricted=False)
I have tested this code using the development server, but it should work the same in production using App Engine with Endpoints and Datastore.
After querying the first endpoint, it will create a new Entity which you will be able to find in Datastore, and which contains a property column3 with your JSON data in string format:
Then, if you use the ID of that entity to retrieve it, in your browser it will show the string without any strange encoding, just plain JSON:
And in the console, you will be able to see that this string can be converted to a Python dict (or also a JSON, using the json module if you prefer):
I hope I have not missed any point of what you want to achieve, but I think all the most important points are covered with this code: a property being a JSON object, store it in Datastore, retrieve it in a readable format, and being able to use it again as JSON/dict.
Update:
I think you should have a look at the list of available Property Types yourself, in order to find which one fits your requirements better. However, as an additional note, I have done a quick test working with a StructuredProperty (a property inside another property), by adding these modifications to the code:
#Define the nested model (your JSON object)
class Structured(EndpointsModel):
first = ndb.IntegerProperty()
second = ndb.IntegerProperty()
#Here I added a new property for simplicity; remember, StackOverflow does not write code for you :)
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
column4 = ndb.StructuredProperty(Structured)
#Modify this endpoint definition to add a new property
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
#Add the new nested property here
dict=eval(my_model.column3)
my_model.column4=dict
print(json.dumps(my_model.column3))
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
With these changes, the response of the call to the endpoint looks like:
Now column4 is a JSON object itself (although it is not printed in-line, I do not think that should be a problem.
I hope this helps too. If this is not the exact behavior you want, maybe should play around with the Property Types available, but I do not think there is one type to which you can print a Python dict (or JSON object) without previously converting it to a String.

Accessing value from mongodb with Scala

After executing a MongoDB query my result is of type : res = Seq[Document]
To access the BsonString I use : res (0).get("n"))
Which returns :
Some(BsonString{value='value'})
How can I access the value value from the BsonString as a String ?
Accessing the value of Some(BsonString{value='value'}) returns BsonString{value='value'} do I need to convert BsonString{value='value'} to a Scala object using a library (for example Jackson) and then access the value ?
I suppose you are using the mongo scala driver (and not ReactiveMongo).
In that case, the returned BsonString is a java object; here is the scaladoc that points to the javadoc.
And you can access the value via the getValue method.
As you are getting back Option objects, I would recommend to use proper for comprehension to avoid runtime exceptions; something like:
val optionalResult = for {
doc <- res.headOption
element <- doc.get[BsonString]("n")
} yield (element.getValue)
optionalResult will be of type Option[String].
You can then check if you have a value and use it; via map, flatMap, foreach or even if (optionalResult.isDefined).

Parsing the JSON representation of database rows in Scala.js

I am trying out scala.js, and would like to use simple extracted row data in json format from my Postgres database.
Here is an contrived example of the type of json I would like to parse into some strongly typed scala collection, features to note are that there are n rows, various column types including an array just to cover likely real life scenarios, don't worry about the SQL which creates an inline table to extract the JSON from, I've included it for completeness, its the parsing of the JSON in scala.js that is causing me problems
with inline_t as (
select * from (values('One',1,True,ARRAY[1],1.0),
('Six',6,False,ARRAY[1,2,3,6],2.4494),
('Eight',8,False,ARRAY[1,2,4,8],2.8284)) as v (as_str,as_int,is_odd,factors,sroot))
select json_agg(t) from inline_t t;
[{"as_str":"One","as_int":1,"is_odd":true,"factors":[1],"sroot":1.0},
{"as_str":"Six","as_int":6,"is_odd":false,"factors":[1,2,3,6],"sroot":2.4494},
{"as_str":"Eight","as_int":8,"is_odd":false,"factors":[1,2,4,8],"sroot":2.8284}]
I think this should be fairly easy using something like upickle or prickle as hinted at here: How to parse a json string to a case class in scaja.js and vice versa but I haven't been able to find a code example, and I'm not up to speed enough with Scala or Scala.js to work it out myself. I'd be very grateful if someone could post some working code to show how to achieve the above
EDIT
This is the sort of thing I've tried, but I'm not getting very far
val jsparsed = scala.scalajs.js.JSON.parse(jsonStr3)
val jsp1 = jsparsed.selectDynamic("1")
val items = jsp1.map{ (item: js.Dynamic) =>
(item.as_str, item.as_int, item.is_odd, item.factors, item.sroot)
.asInstanceOf[js.Array[(String, Int, Boolean, Array[Int], Double)]].toSeq
}
println(items._1)
So you are in a situation where you actually want to manipulate JSON values. Since you're not serializing/deserializing Scala values from end-to-end, serialization libraries like uPickle or Prickle will not be very helpful to you.
You could have a look at a cross-platform JSON manipulation library, such as circe. That would give you the advantage that you wouldn't have to "deal with JavaScript data structures" at all. Instead, the library would parse your JSON and expose it as a Scala data structure. This is probably the best option if you want your code to also cross-compile.
If you're only writing Scala.js code, and you want a more lightweight version (no dependency), I recommend declaring types for your JSON "schema", then use those types to perform the conversion in a safer way:
import scala.scalajs.js
import scala.scalajs.js.annotation._
// type of {"as_str":"Six","as_int":6,"is_odd":false,"factors":[1,2,3,6],"sroot":2.4494}
#ScalaJSDefined
trait Row extends js.Object {
val as_str: String
val as_int: Int
val is_odd: Boolean
val factors: js.Array[Int]
val sroot: Double
}
type Rows = js.Array[Row]
val rows = js.JSON.parse(jsonStr3).asInstanceOf[Rows]
val items = (for (row <- rows) yield {
import row._
(as_str, as_int, is_odd, factors.toArray, sroot)
}).toSeq

Standardized way to serialize JSON to query string?

I'm trying to build a restful API and I'm struggling on how to serialize JSON data to a HTTP query string.
There are a number of mandatory and optional arguments that need to be passed in the request, e.g (represented as a JSON object below):
{
"-columns" : [
"name",
"column"
],
"-where" : {
"-or" : {
"customer_id" : 1,
"services" : "schedule"
}
},
"-limit" : 5,
"return" : "table"
}
I need to support a various number of different clients so I'm looking for a standardized way to convert this json object to a query string. Is there one, and how does it look?
Another alternative is to allow users to just pass along the json object in a message body, but I read that I should avoid it (HTTP GET with request body).
Any thoughts?
Edit for clarification:
Listing how some different languages encodes the given json object above:
jQuery using $.param: -columns[]=name&-columns[]=column&-where[-or][customer_id]=1&-where[-or][services]=schedule&-limit=5&return=column
PHP using http_build_query: -columns[0]=name&-columns[1]=column&-where[-or][customer_id]=1&-where[-or][services]=schedule&-limit=5&return=column
Perl using URI::query_form: -columns=name&-columns=column&-where=HASH(0x59d6eb8)&-limit=5&return=column
Perl using complex_to_query: -columns:0=name&-columns:1=column&-limit=5&-where.-or.customer_id=1&-where.-or.services=schedule&return=column
jQuery and PHP is very similar. Perl using complex_to_query is also pretty similar to them. But none look exactly the same.
URL-encode (https://en.wikipedia.org/wiki/Percent-encoding) your JSON text and put it into a single query string parameter. for example, if you want to pass {"val": 1}:
mysite.com/path?json=%7B%22val%22%3A%201%7D
Note that if your JSON gets too long then you will run into a URL length limitation problem. In which case I would use POST with a body (yes, I know, sending a POST when you want to fetch something is not "pure" and does not fit well into the REST paradigm, but neither is your domain specific JSON-based query language).
There is no single standard for JSON to query string serialization, so I made a comparison of some JSON serializers and the results are as follows:
JSON: {"_id":"5973782bdb9a930533b05cb2","isActive":true,"balance":"$1,446.35","age":32,"name":"Logan Keller","email":"logankeller#artiq.com","phone":"+1 (952) 533-2258","friends":[{"id":0,"name":"Colon Salazar"},{"id":1,"name":"French Mcneil"},{"id":2,"name":"Carol Martin"}],"favoriteFruit":"banana"}
Rison: (_id:'5973782bdb9a930533b05cb2',age:32,balance:'$1,446.35',email:'logankeller#artiq.com',favoriteFruit:banana,friends:!((id:0,name:'Colon Salazar'),(id:1,name:'French Mcneil'),(id:2,name:'Carol Martin')),isActive:!t,name:'Logan Keller',phone:'+1 (952) 533-2258')
O-Rison: _id:'5973782bdb9a930533b05cb2',age:32,balance:'$1,446.35',email:'logankeller#artiq.com',favoriteFruit:banana,friends:!((id:0,name:'Colon Salazar'),(id:1,name:'French Mcneil'),(id:2,name:'Carol Martin')),isActive:!t,name:'Logan Keller',phone:'+1 (952) 533-2258'
JSURL: ~(_id~'5973782bdb9a930533b05cb2~isActive~true~balance~'!1*2c446.35~age~32~name~'Logan*20Keller~email~'logankeller*40artiq.com~phone~'*2b1*20*28952*29*20533-2258~friends~(~(id~0~name~'Colon*20Salazar)~(id~1~name~'French*20Mcneil)~(id~2~name~'Carol*20Martin))~favoriteFruit~'banana)
QS: _id=5973782bdb9a930533b05cb2&isActive=true&balance=$1,446.35&age=32&name=Logan Keller&email=logankeller#artiq.com&phone=+1 (952) 533-2258&friends[0][id]=0&friends[0][name]=Colon Salazar&friends[1][id]=1&friends[1][name]=French Mcneil&friends[2][id]=2&friends[2][name]=Carol Martin&favoriteFruit=banana
URLON: $_id=5973782bdb9a930533b05cb2&isActive:true&balance=$1,446.35&age:32&name=Logan%20Keller&email=logankeller#artiq.com&phone=+1%20(952)%20533-2258&friends#$id:0&name=Colon%20Salazar;&$id:1&name=French%20Mcneil;&$id:2&name=Carol%20Martin;;&favoriteFruit=banana
QS-JSON: isActive=true&balance=%241%2C446.35&age=32&name=Logan+Keller&email=logankeller%40artiq.com&phone=%2B1+(952)+533-2258&friends(0).id=0&friends(0).name=Colon+Salazar&friends(1).id=1&friends(1).name=French+Mcneil&friends(2).id=2&friends(2).name=Carol+Martin&favoriteFruit=banana
The shortest among them is URL Object Notation.
How about you try this sending them as follows:
http://example.com/api/wtf?
[-columns][]=name&
[-columns][]=column&
[-where][-or][customer_id]=1&
[-where][-or][services]=schedule&
[-limit]=5&
[return]=table&
I tried with a REST Client
And on the server side (Ruby with Sinatra) I checked the params, it gives me exactly what you want. :-)
Another option might be node-querystring. It also uses a similar scheme to the ones you've so far listed.
It's available in both npm and bower, which is why I have been using it.
Works well for nested objects.
Passing complex objects as query parameters of a url.
In the example below, obj is the JSON object to pass into query parameters.
Injecting JSON object as query parameters:
value = JSON.stringify(obj);
URLSearchParams to convert a string to an object representing search params. toString to retain string type for appending to url:
queryParams = new URLSearchParams(value).toString();
Pass the query parameters using template literals:
url = `https://some-url.com?key=${queryParams}`;
Now url will contain the JSON object as query parameters under key (user-defined name)
Extracing JSON from url:
This is assuming you have access to the url (either as string or URL object)
url_obj = new URL(url); (only if url is NOT a URL object, otherwise ignore this step)
Extract all query parameters in the url:
queryParams = new URLSearchParams(url_obj.search);
Use the key to extract the specific value:
obj = JSON.parse(queryParams.get('key').slice(0, -1));
slice() is used to extract a tailing = in the query params which is not required.
Here obj will be the same object passed in the query params.
I recommend to try these steps in the web console to understand better.
You can test with JSON examples here: https://json.org/example.html