I'm trying to serialize scala case class to JSON string using Jerkson like this:
case class Page(title: String, id: String, ls: List[(String, String, Int)])
val pageList = new mutable.ArrayBuffer[Page]()
val jsonString = Json.generate(pageList)
pageList is extremely large with several million Page objects.
The call fails with this exception:
Caused by: org.codehaus.jackson.map.JsonMappingException:
[no message for java.lang.ArrayIndexOutOfBoundsException]
You may want to consider using a Streaming solution. You can use one of the the Jackson Streaming APIs:
JsonGenerator jg = jsonFactory.createJsonGenerator(file, JsonEncoding.UTF8); // or Stream, Reader
or, you can use a TokenBuffer (which is considered best practice for some situations):
TokenBuffer buffer = new TokenBuffer();
// serialize object as JSON tokens (but don't serialize as JSON text!)
objectMapper.writeValue(buffer, myBean);
Details: Jackson Streaming Documentation
Given that you've got "several million" objects, I'm guessing you might be hitting the length limit of String. Try generating to an OutputStream, ie, Json.generate(pageList, out).
Related
I have a tuple consisting of an String and a Uuid that I serialize using serde_json:
let log_and_id = (String::from("Test string"), test_id);
let log_and_id_serialized = serde_json::to_string(&log_and_id)
.expect("Serialization failed");
//After serialization (debug print): "[\"Test string\",\"32a8e12d-69d2-421d-a52e-1ee76cc03ed5\"]"
Then I transfer this serialized value over the network and receive a BytesMut (serialized_tuple) on the other end, which I try to deserialize:
//Bytesmut value (debug print): b"\"[\\\"Test string\\\",\\\"32a8e12d-69d2-421d-a52e-1ee76cc03ed5\\\"]\""
let (log, operation_state_id) = serde_json::from_slice::<(String, Uuid)>(&serialized_tuple)?;
But I get the following error:
ERROR actix_http::response] Internal Server Error: SerdeError(Error("invalid type: string \"[\\\"Test string\\\",\\\"32a8e12d-69d2-421d-a52e-1ee76cc03ed5\\\"]\", expected a tuple of size 2", line: 1, column: 68))
(De)serializing single objects this way used to work in other parts of this code, so what could cause it to fail when used with tuples?
You don't have a serialized tuple, but a serialized serialized tuple.
I mean the serialization of the tuple, which was a JSON string, was again serialized.
You can check this with this code (playground):
let serialized_tuple = b"\"[\\\"Test string\\\",\\\"32a8e12d-69d2-421d-a52e-1ee76cc03ed5\\\"]\"";
let serialized_tuple: String = serde_json::from_slice(serialized_tuple).unwrap();
let (log, operation_state_id) = serde_json::from_slice::<(String, String)>(serialized_tuple.as_bytes()).unwrap();
which produces the desired tuple.
Of course, rather than deserializing twice, you should remove the unnecessary serialization from your application (it's not in the code you've shown).
I'm using Flink to process the data coming from some data source (such as Kafka, Pravega etc).
In my case, the data source is Pravega, which provided me a flink connector.
My data source is sending me some JSON data as below:
{"key": "value"}
{"key": "value2"}
{"key": "value3"}
...
...
Here is my piece of code:
PravegaDeserializationSchema<ObjectNode> adapter = new PravegaDeserializationSchema<>(ObjectNode.class, new JavaSerializer<>());
FlinkPravegaReader<ObjectNode> source = FlinkPravegaReader.<ObjectNode>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(adapter)
.build();
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> dataStream = env.addSource(source).name("Pravega Stream");
dataStream.map(new MapFunction<ObjectNode, String>() {
#Override
public String map(ObjectNode node) throws Exception {
return node.toString();
}
})
.keyBy("word") // ERROR
.timeWindow(Time.seconds(10))
.sum("count");
As you see, I used the FlinkPravegaReader and a proper deserializer to get the JSON stream coming from Pravega.
Then I try to transform the JSON data into a String, KeyBy them and count them.
However, I get an error:
The program finished with the following exception:
Field expression must be equal to '*' or '_' for non-composite types.
org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:342)
org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:340)
myflink.StreamingJob.main(StreamingJob.java:114)
It seems that KeyBy threw this exception.
Well, I'm not a Flink expert so I don't know why. I've read the source code of the official example WordCount. In that example, there is a custtom splitter, which is used to split the String data into words.
So I'm thinking if I need to use some kind of splitter in this case too? If so, what kind of splitter should I use? Can you show me an example? If not, why did I get such an error and how to solve it?
I guess you have read the document about how to specify keys
Specify keys
The example codes use keyby("word") because word is a field of POJO type WC.
// some ordinary POJO (Plain old Java Object)
public class WC {
public String word;
public int count;
}
DataStream<WC> words = // [...]
DataStream<WC> wordCounts = words.keyBy("word").window(/*window specification*/);
In your case, you put a map operator before keyBy, and the output of this map operator is a string. So there is obviously no word field in your case. If you actually want to group this string stream, you need to write it like this .keyBy(String::toString)
Or you can even implement a customized keySelector to generate your own key.
Customized Key Selector
I am using Play Framework and I am trying to convert a Scala object to a JSON string.
Here is my code where I get my object:
val profile: Future[List[Profile]] = profiledao.getprofile(profileId);
The object is now in the profile value.
Now I want to convert that profile object which is a Future[List[Profile]] to JSON data and then convert that data into a JSON string then write into a file.
Here is the code that I wrote so far:
val jsondata = Json.toJson(profile)
Jackson.toJsonString(jsondata)
This is how I am trying to convert into JSON data but it is giving me the following output:
{"empty":false,"traversableAgain":true}
I am using the Jackson library to do the conversion.
Can someone help me with this ?
Why bother with Jackson? If you're using Play, you have play-json available to you, which uses Jackson under the hood FWIW:
First, you need an implicit Reads to let play-json know how to serialize Profile. If Profile is a case class, you can do this:
import play.api.libs.json._
implicit val profileFormat = Json.format[Profile]
If not, define your own Reads like this.
Then since getprofile (which should follow convention and be getProfile) returns Future[List[Profile]], you can do this to get a JsValue:
val profilesJson = profiledao.getprofile(profileId).map(toJson)
(profiledao should also be profileDao.)
In the end, you can wrap this in a Result like Ok and return that from your controller.
What is the fastest way to convert this
{"a":"ab","b":"cd","c":"cd","d":"de","e":"ef","f":"fg"}
into mutable map in scala ? I read this input string from ~500MB file. That is the reason I'm concerned about speed.
If your JSON is as simple as in your example, i.e. a sequence of key/value pairs, where each value is a string. You can do in plain Scala :
myString.substring(1, myString.length - 1)
.split(",")
.map(_.split(":"))
.map { case Array(k, v) => (k.substring(1, k.length-1), v.substring(1, v.length-1))}
.toMap
That looks like a JSON file, as Andrey says. You should consider this answer. It gives some example Scala code. Also, this answer gives some different JSON libraries and their relative merits.
The fastest way to read tree data structures in XML or JSON is by applying streaming API: Jackson Streaming API To Read And Write JSON.
Streaming would split your input into tokens like 'beginning of an object' or 'beginning of an array' and you would need to build a parser for these token, which in some cases is not a trivial task.
Keeping it simple. If reading a json string from file and converting to scala map
import spray.json._
import DefaultJsonProtocol._
val jsonStr = Source.fromFile(jsonFilePath).mkString
val jsonDoc=jsonStr.parseJson
val map_doc=jsonDoc.convertTo[Map[String, JsValue]]
// Get a Map key value
val key_value=map_doc.get("key").get.convertTo[String]
// If nested json, re-map it.
val key_map=map_doc.get("nested_key").get.convertTo[Map[String, JsValue]]
println("Nested Value " + key_map.get("key").get)
I'm using playframework 2.1.0 with Anorm to query a db.
I want to serialize the result to json without going through any interim objects/case classes.
this is what the flow looks like:
Using anorm:
DB.withConnection { implicit c =>
val q = SQL(""" long query goes here """)
q().toList
}
then I take this result and transform it from a List[SqlRow] to List[Map[String,Any]].
String,Any is the column name, value (Object/Any)
val asMap = info.toList.map(row => scala.collection.immutable.Map(row.asMap.toSeq:_*))
The i'd like to jsonize this.
I tried some json libs : GSON, spray-json, playframework json lib.
But none of them seem to work with Any out of the box.
I tried writing implicit writer for the Any type with some pattern matching, but the problem is that this writer always overtakes all the other writes so the json is not produced correctly.
Advise?
How would you suggest transforming a result from Anorm to Json? without any interim domain models.
found a solution, not the best, using FlexJson.
The annoying thing is that FlexJson is not very scala oriented so scala collections and some scala types need to be converted to the equivalent Java type.
val info:List[SqlRow] = loadInfoFromDB using Anorm
//transform scala maps to java maps
val asMap: List[util.Map[String, Any]] = info.toList.map(row => JavaConversions.mapAsJavaMap(row.asMap))
//create the basic FlexJson serializer
val flexJson: JSONSerializer = new flexjson.JSONSerializer()
//register a Option transformer so it can serialize Options correctly
flexJson.transform(new flexjson.transformer.AbstractTransformer {
def transform(`object`: Any) {
`object`.asInstanceOf[Option[_]] match {
case None => getContext.write("null")
case Some(b:Any) => getContext.transform(b)
}
}
},classOf[Option[_]])
//finally convert the scala List to java List and use this serializer on it.
val infoJsn: String = flexJson.deepSerialize(JavaConversions.seqAsJavaList(asMap))