I'm writing some code to auto-gen JSON codecs for Elm data-structures. There is a point my code, where a "sub-structure/sub-type", has already been encoded to a Json.Encode.Value, and I need to add another key-value pair to it. Is there any way to "destructure" a Json.Encode.Value in Elm? Or combine two values of type Json.Encode.Value?
Here's some sample code:
type alias Entity record =
{ entityKey: (Key record)
, entityVal: record
}
jsonEncEntity : (record -> Value) -> Entity record -> Value
jsonEncEntity localEncoder_record val =
let
encodedRecord = localEncoder_record val.entityVal
in
-- NOTE: the following line won't compile, but this is essentially
-- what I'm looking for
Json.combine encodedRecord (Json.Encode.object [ ( "id", jsonEncKey val.entityKey ) ] )
You can decode the value into a list of key value pairs using D.keyValuePairs D.value and then append the new field. Here's how you'd do that:
module Main exposing (..)
import Json.Decode as D
import Json.Encode as E exposing (Value)
addKeyValue : String -> Value -> Value -> Value
addKeyValue key value input =
case D.decodeValue (D.keyValuePairs D.value) input of
Ok ok ->
E.object <| ( key, value ) :: ok
Err _ ->
input
> import Main
> import Json.Encode as E
> value = E.object [("a", E.int 1)]
{ a = 1 } : Json.Encode.Value
> value2 = Main.addKeyValue "b" E.null value
{ b = null, a = 1 } : Json.Encode.Value
If the input is not an object, this will return the input unchanged:
> Main.addKeyValue "b" E.null (E.int 1)
1 : Json.Encode.Value
If you want to do this, you need to use a decoder to unwrap the values by one level into a Dict String Value, then combine the dictionaries, and finally re-encode as a JSON value. You can unwrap like so:
unwrapObject : Value -> Result String (Dict String Value)
unwrapObject value =
Json.Decode.decodeValue (Json.Decode.dict Json.Decode.value) value
Notice that you have to work with Results from this point on because there's the possibility, as far as Elm is concerned, that your JSON value wasn't really an object (maybe it was a number or a string instead, for instance) and you have to handle that case. For that reason, it's not really best practice to do too much with JSON Values directly; if you can, keep things as Dicts or some other more informative type until the end of processing and then convert the whole result into a Value as the last step.
Related
I have a value in a JsObject which I want to assign to a specific key in Map, and I want to ask if there is a better way to extract that value without using a case matcher.
I have access to a request variable which is a case class that has a parameter myData
myData is an Option[JsValue] which contains multiple fields and I want to return the boolean value of one specific field in there called “myField” in a string format. The below works, but I want to find a more succinct way of getting the value of "myField" without case matching.
val newMap =
Map(
“myNewKey” -> request.myData.map(_ match {
case JsObject(fields) => fields.getOrElse(“myField”, "unknown").toString
case _ => “unknown”})
The output would then be
"myField": "true"
or
"myField": "false"
or if it isn't true or false, i.e the field doesn't exist
"myField": "unknown"
Rather:
myMap ++ (request.myData \ "myField").validateOpt[String].asOpt.
toSeq.collect { // to map entries
case Some(str) => "keyInMap" -> str
}
Do not use .toString to convert a JSON value or pretty print it.
I'm trying to parse JSON-LD, and one of the possible constructs is
"John" : {
"type": "person",
"friend": [ "Bob", "Jane" ],
}
I would like to decode into records of type
type alias Triple =
{ subject: String, predicate: String, object: String }
so the example above becomes:
Triple "John" "type" "person"
Triple "John" "friend" "Bob"
Triple "John" "friend" "Jane"
But "friend" in the JSON object could also be just a string:
"friend": "Mary"
in which case the corresponding triple would be
Triple "John" "friend" "Mary"
Any idea?
First, you'll need a way to list all key/value pairs from a JSON object. Elm offers the Json.Decode.keyValuePairs function for this purpose. It gives you a list of key names which you'll use for the predicate field, but you'll also have to describe a decoder for it to use for the values.
Since your values are either a string or a list of strings, you can use Json.Decode.oneOf to help. In this example, we'll just convert a string to a singleton list (e.g. "foo" becomes ["foo"]), just because it makes it easier to map over later.
stringListOrSingletonDecoder : Decoder (List String)
stringListOrSingletonDecoder =
JD.oneOf
[ JD.string |> JD.map (\s -> [ s ])
, JD.list JD.string
]
Since the output of keyValuePairs will be a list of (String, List String) values, we'll need a way to flatten those into a List (String, String) value. We can define that function like this:
flattenSnd : ( a, List b ) -> List ( a, b )
flattenSnd ( key, vals ) =
List.map (\val -> ( key, val )) vals
Now you can use these two functions to split up an object into a triple. This accepts a string argument which is the key to look up in your calling function (e.g. we need to look up the wrapping "John" key).
itemDecoder : String -> Decoder (List Triple)
itemDecoder key =
JD.field key (JD.keyValuePairs stringListOrSingletonDecoder)
|> JD.map
(List.map flattenSnd
>> List.concat
>> List.map (\( a, b ) -> Triple key a b)
)
See a working example here on Ellie.
Note that the order of keys may not match how you listed them in the input JSON, but that is just how JSON works. It's a lookup table, not an ordered list
I'm trying to convert a string of key value pairs to a JSON string. The only thing I know about the string of KV pairs is the format of the string i.e. space seperated, comma seperated etc.. For e.g. I don't control over the number or type of the keys coming in as input.
Here is what I came up with and wanted to see if this approach looks OK / awesome / awkward. Would appreciate if there is better alternative than this.
INPUT : clientIp="1.1.1.1" identifier="a.b.c" key1=10 key2="v3"
final val KV_PATTERN = "(\"[^\"]*\"|[^,\\\"\\s]*)=(\"[^\"]*\"|[^,\\\"\\s]*)".r
val cMap = KV_PATTERN.findAllMatchIn(inputString).map(m => (m.group(1).trim(), m.group(2).trim())).toMap
val json = cMap.map { case (key, value) => if (!key.startsWith("\"")) s""""$key"""" + ":" + value else s"$key:$value" }.mkString("{", ",", "}")`
OUTPUT: {"clientIp":"1.1.1.1","identifier":"a.b.c","key1":10,"key2":"v3"}
"{"+ inputString.split(" ").map{case i => val t = i.split("="); s""""${t(0).replaceAll("^\"|\"$", "")}": ${t(1)}"""}.mkString(",") + "}"
Maybe this is more cleaner.
I have a RDD[Map[String,Int]] where the keys of the maps are the column names. Each map is incomplete and to know the column names I would need to union all the keys. Is there a way to avoid this collect operation to know all the keys and use just once rdd.saveAsTextFile(..) to get the csv?
For example, say I have an RDD with two elements (scala notation):
Map("a"->1, "b"->2)
Map("b"->1, "c"->3)
I would like to end up with this csv:
a,b,c
1,2,0
0,1,3
Scala solutions are better but any other Spark-compatible language would do.
EDIT:
I can try to solve my problem from another direction also. Let's say I somehow know all the columns in the beginning, but I want to get rid of columns that have 0 value in all maps. So the problem becomes, I know that the keys are ("a", "b", "c") and from this:
Map("a"->1, "b"->2, "c"->0)
Map("a"->3, "b"->1, "c"->0)
I need to write the csv:
a,b
1,2
3,1
Would it be possible to do this with only one collect?
If you're statement is: "every new element in my RDD may add a new column name I have not seen so far", the answer is obviously can't avoid a full scan. But you don't need to collect all elements on the driver.
You could use aggregate to only collect column names. This method takes two functions, one is to insert a single element into the resulting collection, and another one to merge results from two different partitions.
rdd.aggregate(Set.empty[String])( {(s, m) => s union m.keySet }, { (s1, s2) => s1 union s2 })
You will get back a set of all column names in the RDD. In a second scan you can print the CSV file.
Scala and any other supported language
You can use spark-csv
First lets find all present columns:
val cols = sc.broadcast(rdd.flatMap(_.keys).distinct().collect())
Create RDD[Row]:
val rows = rdd.map {
row => { Row.fromSeq(cols.value.map { row.getOrElse(_, 0) })}
}
Prepare schema:
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
val schema = StructType(
cols.value.map(field => StructField(field, IntegerType, true)))
Convert RDD[Row] to Data Frame:
val df = sqlContext.createDataFrame(rows, schema)
Write results:
// Spark 1.4+, for other versions see spark-csv docs
df.write.format("com.databricks.spark.csv").save("mycsv.csv")
You can do pretty much the same thing using other supported languages.
Python
If you use Python and final data fits in a driver memory you can use Pandas through toPandas() method:
rdd = sc.parallelize([{'a': 1, 'b': 2}, {'b': 1, 'c': 3}])
cols = sc.broadcast(rdd.flatMap(lambda row: row.keys()).distinct().collect())
df = sqlContext.createDataFrame(
rdd.map(lambda row: {k: row.get(k, 0) for k in cols.value}))
df.toPandas().save('mycsv.csv')
or directly:
import pandas as pd
pd.DataFrame(rdd.collect()).fillna(0).save('mycsv.csv')
Edit
One possible way to the second collect is to use accumulators to either build a set of all column names or to count these where you found zeros and use this information to map over rows and remove unnecessary columns or to add zeros.
It is possible but inefficient and feels like cheating. The only situation when it makes some sense is when number of zeros is very low, but I guess it is not the case here.
object ColsSetParam extends AccumulatorParam[Set[String]] {
def zero(initialValue: Set[String]): Set[String] = {
Set.empty[String]
}
def addInPlace(s1: Set[String], s2: Set[String]): Set[String] = {
s1 ++ s2
}
}
val colSetAccum = sc.accumulator(Set.empty[String])(ColsSetParam)
rdd.foreach { colSetAccum += _.keys.toSet }
or
// We assume you know this upfront
val allColnames = sc.broadcast(Set("a", "b", "c"))
object ZeroColsParam extends AccumulatorParam[Map[String, Int]] {
def zero(initialValue: Map[String, Int]): Map[String, Int] = {
Map.empty[String, Int]
}
def addInPlace(m1: Map[String, Int], m2: Map[String, Int]): Map[String, Int] = {
val keys = m1.keys ++ m2.keys
keys.map(
(k: String) => (k -> (m1.getOrElse(k, 0) + m2.getOrElse(k, 0)))).toMap
}
}
val accum = sc.accumulator(Map.empty[String, Int])(ZeroColsParam)
rdd.foreach { row =>
// If allColnames.value -- row.keys.toSet is empty we can avoid this part
accum += (allColnames.value -- row.keys.toSet).map(x => (x -> 1)).toMap
}
It may be simple question, but I am new to Scala and not able to find the proper solution
I am trying to create a JSON object from the Option values. Will check if the value is not empty then create the Json obj, if the value is None I don't want to create the json object. With out else, default else is Unit which will fail to create Json obj
Json.obj(if(position.nonEmpty) ("position" -> position.get),
if(place.nonEmpty) ("place" -> place.get),
if(country.nonEmpty) ("country" -> country.get))
Need to put the If condition so that the final json string to look like
{
"position": "M2",
"place": "place",
"country": "country"
}
val obj = for {
p <- position
o <- otherOption
...
} yield Json.obj(
"position" -> p,
"other" -> o)
Will only yield a Some of Json Object if all options are defined. Otherwise None
Option is a monad and there are few convenient ways for using it.
First, if you want to extract value you should use map or flatMap and getOrElse methods:
val res = position.map(value => Json.obj("position" -> value)).getOrElse(null)
Another way is to keep Option of another type and use it latter:
val jsonOption = position.map(value => Json.obj("position" -> value))
After you can use it in for comprehension with another options or perform another mutations without extracting:
for (positionJson <- jsonOption; xJson <- xJsonOption) yield positionJson.toString + xJson.toString
jsonOption.map(_.toString).foreach(print(_))
And always try to avoid pattern matching on monads.