How do I leverage Scala's type system to maintain type information for a dynamic set of attributes - json

I've run into a use case where I need to parse a bunch of information from a txt file. The meat of the payload is a bunch of key:value pairs, none of which I know at development time.
Wombat
Area: "Northern Alberta"
Tank Level (ft): -3.395
Temperature (C): 19.3
Batt Voltage: 13.09
Last Maintained: 2012-01-01
Secured: "Yes"
As you can see, there is potential for Strings, Numbers, Dates and Booleans. There is also a use case where the user needs to create rules for certain attributes, things like:
When the Tank Level exceeds n, please notify some.user#someplace.com
When a Site that contains "Alberta" is Not Secured, please notify some.user#someplace.com
Depending on the type of attribute, the available rule types will differ. I may also need to do some kind of aggregation on the numeric types. Anyway, to make a long story short, I need the type information. So what kind of data structure is best?
Initially I was going to go with distinct tuples.
val stringAttributes: Array[(String, String)]
val doubleAttributes: Array[(String, Double)]
val dateAttributes: Array[(String, Date)]
Now that seems wrong, or at the very least ugly. Then I though maybe something like:
val attributes: Array[(String, Any)]
Now I have a pattern match in many of places. Also note I'm using a JSON protocol for the web application and database (MongoDB). It'd be convenient to give the front end something like this:
{
site: "Wombat",
attributes: [
{ "Area": "Northern Alberta" },
{ "Tank Level (ft)": -3.395 },
{ "Temperature (C)": 19.3 }
]
}
But in the back end, do I encode the types? Do I parse raw JSON? In the end, I'm looking for the best way to maintain type information for dynamic set of attributes while supporting JSON to both the web client and the database.

+1 on #david's comment.
Why not use JSON end-to-end and derive the type desired for each individual rule? Then use a single function to transform a JSON value to the correct type (using defaults or Option when it doesn't really match).
Then you can use something like this to run the ruleset over all of the parsed objects:
for (object <- parseObjects(); rule <- rules) {
val value = coerce(object, rule.desiredType)
rule(value)
}
Or something similar, such as having the rule itself call the coercion code. It seems like you don't really need the objects to have a type until you're doing rule processing, so why try to?
P.S.
Typesafe Config may be able to parse the file for you already, although it looks to be a simple format to parse.

Related

Mapping JSON to Apama Events

I am trying to use the Mapper codec in my connectivity chain to convert a JSON object that looks like this:
{"test2":[
["column1","column2","column3"],
["16091", "449", "05302018"],
["16092", "705", "05302018"]
]}
to an EPL type. To me it looks like a sequence of sequences, so I used
event test1 {
sequence<string> values;
}
event test2 {
sequence<test1> tests;
}
But this gives me the error
Unable to parse event test.1: Incorrect type in get (you asked for map but its' actually list)
Any ideas how I should be using the Mapper codec to this end?
Unless explicitly remapped, that won't quite work. You have to consider the entire structure of the document from top to bottom. It's not a sequence of strings - it's a JSON object/dictionary at top-level, with a value that is a sequence of sequences of string.
A JSON object/dictionary can map to an event type based on field names. So as Matt's answer said, a JSON document like yours would need an event type like
event SomeEventType {
sequence<sequence<string > > test2;
}
If it's not appropriate to create an event type that exactly corresponds to the JSON document's structure, then you'll need to use the mapping codec to rearrange the fields in the JSON document to match the fields and sub-fields in an event type. Or possibly a custom codec; I think Matt's right that the mapper can't do exactly what you want.
Further, because JSON documents are type-less at the top-level, you'll need to make sure that the event type is defined somehow. There are multiple ways of doing that.
(1) If this particular connectivity will only send you events of one type, you can use the 'defaultEventType' configuration option of the apama.eventMap host plug-in at the top of your chain e.g.
apama.eventMap:
defaultEventMap: SomeEventType
(2) If it depends on the structure of the document, you'll need to use the classifier codec. That can take a message going towards the correlator, and assign it an event type based on the content of fields (or simply their presence). You can learn about it in the documentation.
(3) The transport will sometimes define it on messages being sent towards the correlator. For example, in the case of the Universal Messaging transport, then the 'tag' of the UM event will be used as the type. This may or may not be appropriate.
If you do end up doing anything non-trivial with the classifier or mapper, I'd strongly recommend use of the 'diagnostic codec' to help in developing the classifier or mapper rules. This is a codec you can put anywhere in the chain of codecs that will log every event going through it, so you can see how your rules are operating by seeing what happens before and after classification/mapping. You can read about it in the documentation, but it's usually as simple as putting '- diagnosticCodec' somewhere in your chain. I've found it absolutely invaluable when debugging connectivity chains.
you want your event type to look like:
event type1 {
sequence<sequence<string> > data;
}
it's not possible in the mapper directly to convert to your type2/type1 schema, but you'd be able to write your own codec to do that or do post-filtering in EPL.
HTH,
Matt

Excessive use of map[string]interface{} in go development?

The majority of my development experience has been from dynamically typed languages like PHP and Javascript. I've been practicing with Golang for about a month now by re-creating some of my old PHP/Javascript REST APIs in Golang. I feel like I'm not doing things the Golang way most of the time. Or more generally, I'm not use to working with strongly typed languages. I feel like I'm making excessive use of map[string]interface{} and slices of them to box up data as it comes in from http requests or when it gets shipped out as json http output. So what I'd like to know is if what I'm about to describe goes against the philosophy of golang development? Or if I'm breaking the principles of developing with strongly typed languages?
Right now, about 90% of the program flow for REST Apis I've rewritten with Golang can be described by these 5 steps.
STEP 1 - Receive Data
I receive http form data from http.Request.ParseForm() as formvals := map[string][]string. Sometimes I will store serialized JSON objects that need to be unmarshaled like jsonUserInfo := json.Unmarshal(formvals["user_information"][0]) /* gives some complex json object */.
STEP 2 - Validate Data
I do validation on formvals to make sure all the data values are what I expect before using it in SQL queries. I treat everyting as a string, then use Regex to determine if the string format and business logic is valid (eg. IsEmail, IsNumeric, IsFloat, IsCASLCompliant, IsEligibleForVoting,IsLibraryCardExpired etc...). I've written my own Regex and custom functions for these types of validations
STEP 3 - Bind Data to SQL Queries
I use golang's database/sql.DB to take my formvals and bind them to my Query and Exec functions like this Query("SELECT * FROM tblUser WHERE user_id = ?, user_birthday > ? ",formvals["user_id"][0], jsonUserInfo["birthday"]). I never care about the data types I'm supplying as arguments to be bound, so they're all probably strings. I trust the validation in the step immediately above has determined they are acceptable for SQL use.
STEP 4 - Bind SQL results to []map[string]interface{}{}
I Scan() the results of my queries into a sqlResult := []map[string]interface{}{} because I don't care if the value types are null, strings, float, ints or whatever. So the schema of an sqlResult might look like:
sqlResult =>
[0] {
"user_id":"1"
"user_name":"Bob Smith"
"age":"45"
"weight":"34.22"
},
[1] {
"user_id":"2"
"user_name":"Jane Do"
"age":nil
"weight":"22.22"
}
I wrote my own eager load function so that I can bind more information like so EagerLoad("tblAddress", "JOIN ON tblAddress.user_id",&sqlResult) which then populates sqlResult with more information of the type []map[string]interface{}{} such that it looks like this:
sqlResult =>
[0] {
"user_id":"1"
"user_name":"Bob Smith"
"age":"45"
"weight":"34.22"
"addresses"=>
[0] {
"type":"home"
"address1":"56 Front Street West"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
[1] {
"type":"work"
"address1":"5 Kennedy Avenue"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
},
[1] {
"user_id":"2"
"user_name":"Jane Do"
"age":nil
"weight":"22.22"
"addresses"=>
[0] {
"type":"home"
"address1":"56 Front Street West"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
}
STEP 5 - JSON Marshal and send HTTP Response
then I do a http.ResponseWriter.Write(json.Marshal(sqlResult)) and output data for my REST API
Recently, I've been revisiting articles with code samples that use structs in places I would have used map[string]interface{}. For example, I wanted to refactor Step 2 with a more standard approach that other golang developers would use. So I found this https://godoc.org/gopkg.in/go-playground/validator.v9, except all it's examples are with structs . I also noticed that most blogs that talk about database/sql scan their SQL results into typed variables or structs with typed properties, as opposed to my Step 4 which just puts everything into map[string]interface{}
Hence, i started writing this question. I feel the map[string]interface{} is so useful because majority of the time,I don't really care what the data is and it gives me to the freedom in Step 4 to construct any data schema on the fly before I dump it as JSON http response. I do all this with as little code verbosity as possible. But this means my code is not as ready to leverage Go's validation tools, and it doesn't seem to comply with the golang community's way of doing things.
So my question is, what do other golang developers do with regards to Step 2 and Step 4? Especially in Step 4...do Golang developers really encourage specifying the schema of the data through structs and strongly typed properties? Do they also specify structs with strongly typed properties along with every eager loading call they make? Doesn't that seem like so much more code verbosity?
It really depends on the requirements just like you have said you don't require to process the json it comes from the request or from the sql results. Then you can easily unmarshal into interface{}. And marshal the json coming from sql results.
For Step 2
Golang has library which works on validation of structs used to unmarshal json with tags for the fields inside.
https://github.com/go-playground/validator
type Test struct {
Field `validate:"max=10,min=1"`
}
// max will be checked then min
you can also go to godoc for validation library. It is very good implementation of validation for json values using struct tags.
For STEP 4
Most of the times, We use structs if we know the format and data of our JSON. Because it provides us more control over the data types and other functionality. For example if you wants to empty a JSON feild if you don't require it in your JSON. You should use struct with _ json tag.
Now you have said that you don't care if the result coming from sql is empty or not. But if you do it again comes to using struct. You can scan the result into struct with sql.NullTypes. With that also you can provide json tag for omitempty if you wants to omit the json object when marshaling the data when sending a response.
Struct values encode as JSON objects. Each exported struct field
becomes a member of the object, using the field name as the object
key, unless the field is omitted for one of the reasons given below.
The encoding of each struct field can be customized by the format
string stored under the "json" key in the struct field's tag. The
format string gives the name of the field, possibly followed by a
comma-separated list of options. The name may be empty in order to
specify options without overriding the default field name.
The "omitempty" option specifies that the field should be omitted from
the encoding if the field has an empty value, defined as false, 0, a
nil pointer, a nil interface value, and any empty array, slice, map,
or string.
As a special case, if the field tag is "-", the field is always
omitted. Note that a field with name "-" can still be generated using
the tag "-,".
Example of json tags
// Field appears in JSON as key "myName".
Field int `json:"myName"`
// Field appears in JSON as key "myName" and
// the field is omitted from the object if its value is empty,
// as defined above.
Field int `json:"myName,omitempty"`
// Field appears in JSON as key "Field" (the default), but
// the field is skipped if empty.
// Note the leading comma.
Field int `json:",omitempty"`
// Field is ignored by this package.
Field int `json:"-"`
// Field appears in JSON as key "-".
Field int `json:"-,"`
As you can analyze from above information given in Golang spec for json marshal. Struct provide so much control over json. That's why Golang developer most probably use structs.
Now on using map[string]interface{} you should use it when you don't the structure of your json coming from the server or the types of fields. Most Golang developers stick to structs wherever they can.

Typescript String Based Enums

So I've read all the posts on String Based Enums in Typescript, but I couldn't find a solution that meets my requirements. Those would be:
Enums that provide code completion
Enums that can be iterated over
Not having to specify an element twice
String based
The possibilities I've seen so far for enums in typescript are:
enum MyEnum {bla, blub}: This fails at being string based, so I can't simply read from JSONs which are string based...
type MyEnum = 'bla' | 'blub': Not iterable and no code completion
Do it yourself class MyEnum { static get bla():string{return "bla"} ; static get blub():string{return "blub"}}: Specifies elements twice
So here come the questions:
There's no way to satisfy those requirements simultaneously? If no, will it be possible in the future?
Why didn't they make enums string based?
Did someone experience similar problems and how did you solve them?
I think that implementing Enum in a C-like style with numbers is fine, because an Enum (similar to a Symbol) is usually used to declare a value that is uniquely identifiable on development time. How the machine represents that value on run time doesn't really concern the developer.
But what we developer sometimes want (because we're all lazy and still want to have all the benefits!), is to use the Enum as an API or with an API that does not share that Enum with us, even though the API is essentially an Enum because the valid value of a property only is foo and bar.
I guess this is the reason why some languages have string based Enums :)
How TypeScript handles Enums
If you look at the transpiled JavaScript you can see that TypeScript just uses a plain JavaScript Object to implement an Enum. For example:
enum Color {
Red,
Green,
Blue
}
will be transpiled to:
{
0: "Red",
1: "Green",
2: "Blue",
Blue: 2,
Green: 1,
Red: 0
}
This means you can access the string value like Color[Color.Red]. You will still have code completion and you do not have to specify the values twice. But you can not just do Object.keys(Color) to iterate over the Enum, because the values exist "twice" on the object.
Why didn't they make enums string based
To be clear Enums are both number and string based in that direct access is number and reverse map is string (more on this).
Meeting your requirement
You key reason for ruling out raw enums is
so I can't simply read from JSONs which are string based...
You will experience the same thing e.g. when reading Dates cause JSON has no date data type. You would new Date("someServerDateTime") to convert these.
You would use the same strategy to go from server side enum (string) to TS enum (number). Easy done thanks to the reverse lookup MyEnum["someServerString"]
Hydration
This process of converting server side data to client side active data is sometimes called Hydration. My favorite lib for this at the moment is https://github.com/pleerock/class-transformer
I personally handle this stuff myself at the server access level i.e. hand write an API that makes the XHR + does the serialization.
At my last job we automated this with code generation that did even more than that (supported common validation patterns between server and client code).

If JSON represents the 'object', what represents the 'class'?

JSON appears to be a nice way to represent a complex data structure in plain text. If we think of this complex data structure as analogous to an OOP object - an instance of a class - then is there a commonly used JSON-like format that represents the class itself (just the data part - forget methods)? Can JSON itself be used for this?
To put it another way, if JSON encodes name-value pairs, what should I use if I want to encode only the names?
The reason I want this is that I am designing a protocol to use with jQuery (to which I am a complete novice by the way). The client will communicate to the server the structure of the JSON object it wants back, and the server will return a JSON object of that structure with the values added.
The key point is that it is the client that is in full control of what data fields (name-value pairs) the server returns. It's a bit different from all the examples of jQuery that I've found so far on the web where the client makes a request (which usually includes a very limited set of parameters, if any) and the server makes the decision as to what fields to return in the JSON reply.
(Obviously, what the client asks for must be congruent with the server's data model; if the server has an array of widgets each with its own price, the client can't ask for an array of prices each with its own widget.)
This must be a common problem, and I don't want to reinvent the wheel. I want to adopt a solution that is already in common use across the web.
Edit
I just found JSON Schema. This is not what I am looking for. It contains way more than I need.
Edit
I'm looking more for a 'this is how it is usually done' answer, rather than a 'you could try…' answer. (I can invent dozens of possible answers myself.)
To encode only names within JSON, you could use a key/value pair where the key is either the class name or just a key named 'values' - with the value being an array of strings that are the names to be returned by the server. For example:
{ 'class_name' : [ "name1", "name2", "name3" ] }
The server can then either detect the class name from the key used and return the supplied values for the names in the array if the class supports it or ignore if it does not.
I'm looking more for a 'this is how it is usually done' answer
There is no single "correct" way to do what you want. Many people have their implementation. It depends on various factors -- what you want to do, where you want to do, how efficiently you want it to do?
For simple structures I would prefer and suggest the answer given by #dbr9979.
For nested structures, you can have nested arrays. Something like:
{
"nestedfield1": {
"nestedfield11":["nestedfield111", "nestedfield112"],
"nestedfield12":["nestedfield121", "nestedfield122"],
"__SIMPLE_FIELDS__": ["simplefield13", "simplefield14"]
}
}
The point is, if the key is __SIMPLE_FIELDS__, the value is an array of simple fields (string, numbers etc..), else the key stands for the key in the object.
For something more complex, what I would suggest is you have predefined structures, that both the server and the client know of. This is particularly useful when you have to make multiple identical requests. Assign some unique number for each of them. Something like:
1 => <the structure above>
2 => ["simplefield1", "simplefield2" ..]
3 => etc .. etc
The server stores the above structure and the relevant number in the database or something. And now, as it may be obvious by now, client sends across the id of the required structure, and the server responds in the appropriate fashion.
I think what you meant by this:
the client that is in full control of what data fields (name-value pairs) the server returns.
is like the difference between SELECT * FROM Bags and SELECT color, price FROM Bag in SQL. Am I interpreting you correctly?
You could query with:
{
'resource': 'Bag',
'field_names': ['color', 'price']
}
which will return the response:
{
'status': 'success',
'result': [
{'color': 'red', 'price': 50},
{'color': 'blue', 'price': 45},
]
}
most likely though, you may not actually need your request to be a JSON object; I've seen implementations where the field names is taken from the query string, like http://foo.com/bag?fields=color,price
I was looking for Partial Response.
RESTful API Design: can your API give developers just the information they need? explains it all and gives examples from LinkedIn, Facebook, and Google. Google and Facebook both have similar approaches. Here's how Lie Ryan's example would look using Google's approach:
url?fields=status,result(color,price)
Since Google and Facebook are behind this, I would not be surprised to see this become a de facto standard.
In my case I am likely to run into a length limitation on the URL and so have to use POST instead, but this is an excellent starting point for me.

How should I specify the type of JSON-like unstructured data in Scala?

I'm considering porting a very simple text-templating library to scala, mostly as an exercise in learning the language. The library is currently implemented in both Python and Javascript, and its basic operation more or less boils down to this (in python):
template = CompiledTemplate('Text {spam} blah {eggs[1]}')
data = { 'spam': 1, 'eggs': [ 'first', 'second', { 'key': 'value' }, true ] }
output = template.render(data)
None of this is terribly difficult to do in Scala, but the thing I'm unclear about is how to best express the static type of the data parameter.
Basically this parameter should be able to contain the sorts of things you'd find in JSON: a few primitives (strings, ints, booleans, null), or lists of zero or more items, or maps of zero or more items. (For the purposes of this question the maps can be constrained to having string keys, which seems to be how Scala likes things anyways.)
My initial thought was just to use a Map[string, Any] as a top-level object, but that's doesn't seem entirely correct to me. In fact I don't want to add arbitrary objects of any sort of class in there; I want only the elements I outlined above. At the same time, I think in Java the closest I'd really be able to get would be Map<String, ?>, and I know one of the Scala authors designed Java's generics.
One thing I'm particularly curious about is how other functional languages with similar type systems handle this sort of problem. I have a feeling that what I really want to do here is come up with a set of case classes that I can pattern-match on, but I'm not quite able to envision how that would look.
I have Programming in Scala, but to be honest my eyes started glazing over a bit at the covariance / contravariance stuff and I'm hoping somebody can explain this to me a bit more clearly and succinctly.
You're spot on that you want some sort of case classes to model your datatypes. In functional languages these sorts of things are called "Abstract Data Types", and you can read all about how Haskell uses them by Googling around a bit. Scala's equivalent of Haskell's ADTs uses sealed traits and case classes.
Let's look at a rewrite of the JSON parser combinator from the Scala standard library or the Programming in Scala book. Instead of using Map[String, Any] to represent JSON objects, and instead of using Any to represent arbitrary JSON values, it uses an abstract data type, JsValue, to represnt JSON values. JsValue has several subtypes, representing the possible kinds of JSON values: JsString, JsNumber, JsObject, JsArray, JsBoolean (JsTrue, JsFalse), and JsNull.
Manipulating JSON data of this form involves pattern matching. Since the JsValue is sealed, the compiler will warn you if you haven't dealt with all the cases. For example, the code for toJson, a method that takes a JsValue and returns a String representation of that values, looks like this:
def toJson(x: JsValue): String = x match {
case JsNull => "null"
case JsBoolean(b) => b.toString
case JsString(s) => "\"" + s + "\""
case JsNumber(n) => n.toString
case JsArray(xs) => xs.map(toJson).mkString("[",", ","]")
case JsObject(m) => m.map{case (key, value) => toJson(key) + " : " + toJson(value)}.mkString("{",", ","}")
}
Pattern matching both lets us make sure we're dealing with every case, and also "unwraps" the underlying value from its JsType. It provides a type-safe way of knowing that we've handled every case.
Furthermore, if you know at compile-time the structure of the JSON data you're dealing with, you can do something really cool like n8han's extractors. Very powerful stuff, check it out.
Well, there are a couple ways to approach this. I would probably just use Map[String, Any], which should work just fine for your purposes (as long as the map is from collection.immutable rather than collection.mutable). However, if you really want to go through some pain, it is possible to give a type for this:
sealed trait InnerData[+A] {
val value: A
}
case class InnerString(value: String) extends InnerData[String]
case class InnerMap[A, +B](value: Map[A, B]) extends InnerData[Map[A, B]]
case class InnerBoolean(value: Boolean) extends InnerData[Boolean]
Now, assuming that you were reading the JSON data field into a Scala field named jsData, you would give that field the following type:
val jsData: Map[String, Either[Int, InnerData[_]]
Every time you pull a field out of jsData, you would need to pattern match, checking whether the value was of type Left[Int] or Right[InnerData[_]] (the two sub-types of Either[Int, InnerData[_]]). Once you have the inner data, you would then pattern match on that to determine whether it represents an InnerString, InnerMap or InnerBoolean.
Technically, you have to do this sort of pattern matching anyway in order to use the data once you pull it out of JSON. The advantage to the well-typed approach is the compiler will check you to ensure that you haven't missed any possibilities. The disadvantage is that you can't just skip impossibilities (like 'eggs' mapping to an Int). Also, there is some overhead imposed by all of these wrapper objects, so watch out for that.
Note that Scala does allow you to define a type alias which should cut down on the amount of LoC required for this:
type DataType[A] = Map[String, Either[Int, InnerData[A]]]
val jsData: DataType[_]
Add a few implicit conversions to make the API pretty, and you should be all nice and dandy.
JSON is used as an example in "Programming in Scala", in the chapter on combinator parsing.