Using Microsoft.FSharpLu to serialize JSON to a stream - json

I've been using the Newtonsoft.Json and Newtonsoft.Json.Fsharp libraries to create a new JSON serializer and stream to a file. I like the ability to stream to a file because I'm handling large files and, prior to streaming, often ran into memory issues.
I stream with a simple fx:
open Newtonsoft.Json
open Newtonsoft.Json.FSharp
open System.IO
let writeToJson (path: string) (obj: 'a) : unit =
let serialized = JsonConvert.SerializeObject(obj)
let fileStream = new StreamWriter(path)
let serializer = new JsonSerializer()
serializer.Serialize(fileStream, obj)
fileStream.Close()
This works great. My problem is that the JSON string is then absolutely cluttered with stuff I don't need. For example,
let m =
[
(1.0M, None)
(2.0M, Some 3.0M)
(4.0M, None)
]
let makeType (tup: decimal * decimal option) = {FieldA = fst tup; FieldB = snd tup}
let y = List.map makeType m
Default.serialize y
val it : string =
"[{"FieldA": 1.0},
{"FieldA": 2.0,
"FieldB": {
"Case": "Some",
"Fields": [3.0]
}},
{"FieldA": 4.0}]"
If this is written to a JSON and read into R, there are nested dataframes and any of the Fields associated with a Case end up being a list:
library(jsonlite)
library(dplyr)
q <- fromJSON("default.json")
x <-
q %>%
flatten()
x
> x
FieldA FieldB.Case FieldB.Fields
1 1 <NA> NULL
2 2 Some 3
3 4 <NA> NULL
> sapply(x, class)
FieldA FieldB.Case FieldB.Fields
"numeric" "character" "list"
I don't want to have to handle these things in R. I can do it but it's annoying and, if there are files with many, many columns, it's silly.
This morning, I started looking at the Microsoft.FSharpLu.Json documentation. This library has a Compact.serialize function. Quick tests suggest that this library will eliminate the need for nested dataframes and the lists associated with any Case and Field columns. For example:
Compact.serialize y
val it : string =
"[{
"FieldA": 1.0
},
{
"FieldA": 2.0,
"FieldB": 3.0
},
{
"FieldA": 4.0
}
]"
When this string is read into R,
q <- fromJSON("compact.json")
x <- q
x
> x
FieldA FieldB
1 1 NA
2 2 3
3 4 NA
> sapply(x, class)
FieldA FieldB
"numeric" "numeric
This is much simpler to handle in R. and I'd like to start using this library.
However, I don't know if I can get the Compact serializer to serialize to a stream. I see .serializeToFile, .desrializeStream, and .tryDeserializeStream, but nothing that can serialize to a stream. Does anyone know if Compact can handle writing to a stream? How can I make that work?

The helper to serialize to stream is missing from the Compact module in FSharpLu.Json, but you should be able to do it by following the C# example from
http://www.newtonsoft.com/json/help/html/SerializingJSON.htm. Something along the lines:
let writeToJson (path: string) (obj: 'a) : unit =
let serializer = new JsonSerializer()
serializer.Converters.Add(new Microsoft.FSharpLu.Json.CompactUnionJsonConverter())
use sw = new StreamWriter(path)
use writer = new JsonTextWriter(sw)
serializer.Serialize(writer, obj)

Related

Play JSON Parse and Extract Elements Without a Key Path

I have a JSON that looks like this, yes the JSON is a valid format.
[2,
"19223201",
"BootNotification",
{
"reason": "PowerUp",
"chargingStation": {
"model": "SingleSocketCharger",
"vendorName": "VendorX"
}
}
]
I'm using Play framework's JSON library and I would like to understand how I could parse the 3rd line and extract the BootNotification value as a String.
If it had a key, I can use that key to traverse the JSON and get the corresponding value, but this is not the case here. I also do not have the possibility to load this line by line and infer from line number 3 as with the example above.
Any suggestions on how I could do this?
I think, I have found out a way after trying all this on Ammonite. Here is what I could do:
# val input: JsValue = Json.parse("""[2,"12345678","BNR",{"reason":"PowerUp"}]""")
input: JsValue = JsArray(ArrayBuffer(JsNumber(2), JsString("12345678"), JsString("BNR"), JsObject(Map("reason" -> JsString("PowerUp")))))
Parsing the JSON, I get a nice array and I know that I always expect just 4 elements in the Array, so explicitly looking for an element with the array index is what I need. So to get the text at position 3, I could do the following:
# (input \ 2)
res2: JsLookupResult = JsDefined(JsString("BNR"))
# (input \ 2).toOption
res3: Option[JsValue] = Some(JsString("BNR"))
# (input \ 2).toOption.isDefined
res4: Boolean = true

Scala - How to handle key not found in a Map when need to skip non-existing keys without defaults?

I have a set of Strings and using it as key values to get JValues from a Map:
val keys: Set[String] = Set("Metric_1", "Metric_2", "Metric_3", "Metric_4")
val logData: Map[String, JValue] = Map("Metric_1" -> JInt(0), "Metric_2" -> JInt(1), "Metric_3" -> null)
In the below method I'm parsing values for each metric. First getting all values, then filtering to get rid of null values and then transforming existing values to booleans.
val metricsMap: Map[String, Boolean] = keys
.map(k => k -> logData(k).extractOpt[Int]).toMap
.filter(_._2.isDefined)
.collect {
case (str, Some(0)) => str -> false
case (str, Some(1)) => str -> true
}
I've faced a problem when one of the keys is not found in the logData Map. So I'm geting a java.util.NoSuchElementException: key not found: Metric_4.
Here I'm using extractOpt to extract a value from a JSON and don't need default values. So probably extractOrElse will not be helpful since I only need to get values for existing keys and skip non-existing keys.
What could be a correct approach to handle a case when a key is not present in the logData Map?
UPD: I've achieved the desired result by .map(k => k -> apiData.getOrElse(k, null).extractOpt[Int]).toMap. However still not sure that it's the best approach.
That the values are JSON is a red herring--it's the missing key that's throwing the exception. There's a method called get which retrieves a value from a map wrapped in an Option. If we use Ints as the values we have:
val logData = Map("Metric_1" -> 1, "Metric_2" -> 0, "Metric_3" -> null)
keys.flatMap(k => logData.get(k).map(k -> _)).toMap
> Map(Metric_1 -> 1, Metric_2 -> 0, Metric_3 -> null)
Using flatMap instead of map means unwrap the Some results and drop the Nones. Now, if we go back to your actual example, we have another layer and that flatMap will eliminate the Metric_3 -> null item:
keys.flatMap(k => logData.get(k).flatMap(_.extractOpt[Int]).map(k -> _)).toMap
You can also rewrite this using a for comprehension:
(for {
k <- keys
jv <- logData.get(k)
v <- jv.extractOpt[Int]
} yield k -> v).toMap
I used Success and Failure in place of the JSON values to avoid having to set up a shell with json4s to make an example:
val logData = Map("Metric_1" -> Success(1), "Metric_2" -> Success(0), "Metric_3" -> Failure(new RuntimeException()))
scala> for {
| k <- keys
| v <- logData.get(k)
| r <- v.toOption
| } yield k -> r
res2: scala.collection.immutable.Set[(String, Int)] = Set((Metric_1,1), (Metric_2,0))

Chain http request and merge json response in ELM

I've succeeded in triggering a simple http request in ELM and decoding the JSON response into an ELM value - [https://stackoverflow.com/questions/43139316/decode-nested-variable-length-json-in-elm]
The problem I'm facing now-
How to chain (concurrency preferred) two http requests and merge the json into my new (updated) model. Note - please see the updated Commands.elm
Package used to access remote data - krisajenkins/remotedata http://package.elm-lang.org/packages/krisajenkins/remotedata/4.3.0/RemoteData
Github repo of my code - https://github.com/areai51/my-india-elm
Previous Working Code -
Models.elm
type alias Model =
{ leaders : WebData (List Leader)
}
initialModel : Model
initialModel =
{ leaders = RemoteData.Loading
}
Main.elm
init : ( Model, Cmd Msg )
init =
( initialModel, fetchLeaders )
Commands.elm
fetchLeaders : Cmd Msg
fetchLeaders =
Http.get fetchLeadersUrl leadersDecoder
|> RemoteData.sendRequest
|> Cmd.map Msgs.OnFetchLeaders
fetchLeadersUrl : String
fetchLeadersUrl =
"https://data.gov.in/node/85987/datastore/export/json"
Msgs.elm
type Msg
= OnFetchLeaders (WebData (List Leader))
Update.elm
update msg model =
case msg of
Msgs.OnFetchLeaders response ->
( { model | leaders = response }, Cmd.none )
Updated Code - (need help with Commands.elm)
Models.elm
type alias Model =
{ lsLeaders : WebData (List Leader)
, rsLeaders : WebData (List Leader) <------------- Updated Model
}
initialModel : Model
initialModel =
{ lsLeaders = RemoteData.Loading
, rsLeaders = RemoteData.Loading
}
Main.elm
init : ( Model, Cmd Msg )
init =
( initialModel, fetchLeaders )
Commands.elm
fetchLeaders : Cmd Msg
fetchLeaders = <-------- How do I call both requests here ? And fire separate msgs
Http.get fetchLSLeadersUrl lsLeadersDecoder <----- There will be a different decoder named rsLeadersDecoder
|> RemoteData.sendRequest
|> Cmd.map Msgs.OnFetchLSLeaders
fetchLSLeadersUrl : String
fetchLSLeadersUrl =
"https://data.gov.in/node/85987/datastore/export/json"
fetchRSLeadersUrl : String <------------------ New data source
fetchRSLeadersUrl =
"https://data.gov.in/node/982241/datastore/export/json"
Msgs.elm
type Msg
= OnFetchLSLeaders (WebData (List Leader))
| OnFetchRSLeaders (WebData (List Leader)) <-------- New message
Update.elm
update msg model =
case msg of
Msgs.OnFetchLSLeaders response ->
( { model | lsLeaders = response }, Cmd.none )
Msgs.OnFetchRSLeaders response -> <--------- New handler
( { model | rsLeaders = response }, Cmd.none )
The way to fire off two concurrent requests is to use Cmd.batch:
init : ( Model, Cmd Msg )
init =
( initialModel, Cmd.batch [ fetchLSLeaders, fetchRSLeaders ] )
There is no guarantee on which request will return first and there is no guarantee that they will both be successful. One could fail while the other succeeds, for example.
You mention that you want to merge the results, but you didn't say how the merge would work, so I'll just assume you want to append the lists of leaders together in one list, and it will be useful to your application if you had only to deal with a single RemoteData value rather than multiple.
You can merge multiple RemoteData values together with a custom function using map and andMap.
mergeLeaders : WebData (List Leader) -> WebData (List Leader) -> WebData (List Leader)
mergeLeaders a b =
RemoteData.map List.append a
|> RemoteData.andMap b
Notice that I'm using List.append there. That can really be any function that takes two lists and merges them however you please.
If you prefer an applicative style of programming, the above could be translated to the following infix version:
import RemoteData.Infix exposing (..)
mergeLeaders2 : WebData (List Leader) -> WebData (List Leader) -> WebData (List Leader)
mergeLeaders2 a b =
List.append <$> a <*> b
According to the documentation on andMap (which uses a result tuple rather than an appended list in its example):
The final tuple succeeds only if all its children succeeded. It is still Loading if any of its children are still Loading. And if any child fails, the error is the leftmost Failure value.

BSON structure created by Apache Spark and MongoDB Hadoop-Connector

I'm trying to save some JSON from Spark (Scala) to MongoDB using the MongoDB Hadoop-Connector. The problem I'm having is that this API always seems to save your data as "{_id: ..., value: {your JSON document}}".
In the code example below, my document gets saved like this:
{
"_id" : ObjectId("55e80cfea9fbee30aa703261"),
"value" : {
"_id" : "55e6c65da9fbee285f2f9175",
"year" : 2014,
"month" : 5,
"day" : 6,
"hour" : 18,
"user_id" : 246
}
}
Is there any way to persuade the MongoDB Hadoop Connector to write the JSON/BSON in the structure you've specified, instead of nesting it under these _id/value fields?
Here's my Scala Spark code:
val jsonstr = List("""{
"_id" : "55e6c65da9fbee285f2f9175",
"year" : 2014,
"month" : 5,
"day" : 6,
"hour" : 18,
"user_id" : 246}""")
val conf = new SparkConf().setAppName("Mongo Dummy").setMaster("local[*]")
val sc = new SparkContext(conf)
// DB params
val host = "127.0.0.1"
val port = "27017"
val database = "dummy"
val collection = "fubar"
// input is collection we want to read (not doing so here)
val mongo_input = s"mongodb://$host/$database.$collection"
// output is collection we want to write
val mongo_output = s"mongodb://$host/$database.$collection"
// Set up extra config for Hadoop connector
val hadoopConfig = new Configuration()
//hadoopConfig.set("mongo.input.uri", mongo_input)
hadoopConfig.set("mongo.output.uri", mongo_output)
// convert JSON to RDD
val rdd = sc.parallelize(jsonstr)
// write JSON data to DB
val saveRDD = rdd.map { json =>
(null, Document.parse(json))
}
saveRDD.saveAsNewAPIHadoopFile("file:///bogus",
classOf[Object],
classOf[BSONObject],
classOf[MongoOutputFormat[Object, BSONObject]],
hadoopConfig)
// Finished
sc.stop
And here's my SBT:
name := "my-mongo-test"
version := "1.0"
scalaVersion := "2.10.4"
// Spark needs to appear in SBT BEFORE Mongodb connector!
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0"
// MongoDB-Hadoop connector
libraryDependencies += "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "1.4.0"
To be honest, I'm kind of mystified at how hard it seems to be to save JSON --> BSON --> MongoDB from Spark. So any suggestions on how to save my JSON data more flexibly would be welcomed.
Well, I just found the solution. It turns out that MongoRecordWriter which is used by MongoOutputFormat inserts any value that does not inherit from BSONWritable or MongoOutput or BSONObject under value field.
The most simple solution, therefore, is to create RDD that contain BSONObject as a value, rather than Document.
I tried this solution in Java, but I'm sure it will work in Scala as well. Here is a sample code:
JavaPairRDD<Object, BSONObject> bsons = values.mapToPair(lineValues -> {
BSONObject doc = new BasicBSONObject();
doc.put("field1", lineValues.get(0));
doc.put("field2", lineValues.get(1));
return new Tuple2<Object, BSONObject>(UUID.randomUUID().toString(), doc);
});
Configuration outputConfig = new Configuration();
outputConfig.set("mongo.output.uri",
"mongodb://localhost:27017/my_db.lines");
bsons.saveAsNewAPIHadoopFile("file:///this-is-completely-unused"
, Object.class
, BSONObject.class
, MongoOutputFormat.class
, outputConfig);

Converting epgsql results to JSON

I am a total beginner with Erlang and functional programming in general. For fun, to get me started, I am converting an existing Ruby Sinatra REST(ish) API that queries PostgreSQL and returns JSON.
On the Erlang side I am using Cowboy, Epgsql and Jiffy as the JSON library.
Epgsql returns results in the following format:
{ok, [{column,<<"column_name">>,int4,4,-1,0}], [{<<"value">>}]}
But Jiffy expects the following format when encoding to JSON:
{[{<<"column_name">>,<<"value">>}]}
The following code works to convert epgsql output into suitable input for jiffy:
Assuming Data is the Epgsql output and Key is the name of the JSON object being created:
{_, C, R} = Data,
Columns = [X || {_, X, _, _, _, _} <- C,
Rows = tuple_to_list(hd(R)),
Result = {[{atom_to_binary(Key, utf8), {lists:zip(Columns, Rows)}}]}.
However, I am wondering if this is efficient Erlang?
I've looked into the documentation for Epgsql and Jiffy and can't see any more obvious ways to perform the conversion.
Thank you.
Yes, need parse it.
For example function parse result
parse_result({error, #error{ code = <<"23505">>, extra = Extra }}) ->
{match, [Column]} =
re:run(proplists:get_value(detail, Extra),
"Key \\(([^\\)]+)\\)", [{capture, all_but_first, binary}]),
throw({error, {non_unique, Column}});
parse_result({error, #error{ message = Msg }}) ->
throw({error, Msg});
parse_result({ok, Cols, Rows}) ->
to_map(Cols, Rows);
parse_result({ok, Counts, Cols, Rows}) ->
{ok, Counts, to_map(Cols, Rows)};
parse_result(Result) ->
Result.
And function convert result to map
to_map(Cols, Rows) ->
[ maps:from_list(lists:zipwith(fun(#column{name = N}, V) -> {N, V} end,
Cols, tuple_to_list(Row))) || Row <- Rows ].
And encode it to json. You can change my code and make output as proplist.