Related
I have an example case class Child and I want to serialize it to JSON and print to the screen. This works fine but I don't want to serialize all of the fields, I want to ignore the name field but I can't figure out how. I have tried the following but no luck:
object test extends App{
case class Child(id: Int, name: String, address: String)
val simon = Child(1, "Simon", "Fake street")
implicit val formats: AnyRef with Formats = DefaultFormats + FieldSerializer[Child](ignore("name"))
val output = write(simon)
println(output)
}
The above prints out the whole object but I want to print everything except the name field:
{"id":1,"name":"Simon","address":"Fake street"}
Json:
I need to get only id field from source. Data class:
data class Article(
val sourceId: String,
val author: String,
val title: String,
...
)
Convertor factory is GsonConvertorFactory
In the JSON you provided, source is a complex object, so you can't define it as a string unless you create a custom deserialiser. A quick way to get this working, though, is to create another set of classes to mimic the JSON structure, like this:
data class Source(
val id: String
)
data class Article(
val source: Source,
val author: String,
val title: String
)
Then you can use it this way:
fun main() {
val json = """ {
"source": {
"id": "bbc-news",
"name": "BBC News"
},
"author": "BBC News",
"title": "Afrobeat pioneer Tony Allen dies aged 79"
}
""".trimIndent()
val gson = GsonBuilder().create()
val article = gson.fromJson(json, Article::class.java)
println(article)
}
This prints: Article(source=Source(id=bbc-news), author=BBC News, title=Afrobeat pioneer Tony Allen dies aged 79).
I'm trying to use circe to decode a JSON object into a list of objects. I only want to use some of the fields of the JSON response to create the object, so I feel like I have to create a custom decoder.
The class I want to make a sequence of is defined as follows:
case class Review(Id: String, ProductId: String, Rating: Int)
I tried creating a custom decoder like this:
implicit val reviewDecoder: Decoder[Review] = Decoder.instance { c =>
val resultsC = c.downField("Results")
for {
id <- resultsC.downArray.get[String]("Id")
productId <- resultsC.downArray.get[String]("ProductId")
rating <- resultsC.downArray.get[Int]("Rating")
} yield Review(id, productId, rating)
}
reviewDecoder.decodeJson(json) seems to result in only doing the first result and not all of them.
I have a JSON response like this:
{
"Limit":2,
"Offset":0,
"TotalResults":31,
"Locale":"en_US",
"Results":
[
{"Id":"14518388",
"CID":"21a9436b",
"ProductId":"Product11",
"AuthorId":"jcknwekjcnwjk",
"Rating":3
},
{"Id":"14518035",
"CID":"8d67b6f5",
"ProductId":"Product11",
"AuthorId":"fnkjwernfk",
"Rating":3
}
],
"Includes":{},
"HasErrors":false,
"Errors":[]}
I want to be able to parse this JSON object using circe to create a Seq[Review], but I'm stumped how.
****Edit** Luis' answer does answer this question but say I have a more complicated class I want to create a sequence of:
case class User(id: Int)
case class Review(user: User, ProductId: String, Rating: Int)
How would I be able to create a sequence of Reviews in this case?
I would just use the cursor to getting the Array and then use the generic decoder.
The following code was tested on ammonite, where json is a string containing your sample input.
import $ivy.`io.circe::circe-core:0.11.1`
import $ivy.`io.circe::circe-generic:0.11.1`
import $ivy.`io.circe::circe-parser:0.11.1`
import io.circe.{Decoder, Jsom}
import io.circe.parser.parse
final case class Review(Id: String, ProductId: String, Rating: Int)
implicit val reviewDecoder: Decoder[Review] = io.circe.generic.semiauto.deriveDecoder
parse(json).getOrElse(Json.Null).hcursor.downField("Results").as[List[Review]]
// res: io.circe.Decoder.Result[List[Review]] = Right(List(Review("14518388", "Product11", 3), Review("14518035", "Product11", 3)))
I am trying to programmatically enforce schema(json) on textFile which looks like json. I tried with jsonFile but the issue is for creating a dataframe from a list of json files, spark has to do a 1 pass through the data to create a schema for the dataframe. So it needs to parse all the data which is taking longer time (4 hours since my data is zipped and of size TBs). So I want to try reading it as textFile and enforce schema to get interested fields alone to later query on the resulting data frame. But I am not sure how do I map it to the input. Can some give me some reference on how do I map schema to json like input.
input :
This is the full schema :
records: org.apache.spark.sql.DataFrame = [country: string, countryFeatures: string, customerId: string, homeCountry: string, homeCountryFeatures: string, places: array<struct<freeTrial:boolean,placeId:string,placeRating:bigint>>, siteName: string, siteId: string, siteTypeId: string, Timestamp: bigint, Timezone: string, countryId: string, pageId: string, homeId: string, pageType: string, model: string, requestId: string, sessionId: string, inputs: array<struct<inputName:string,inputType:string,inputId:string,offerType:string,originalRating:bigint,processed:boolean,rating:bigint,score:double,methodId:string>>]
But I am only interested in few fields like :
res45: Array[String] = Array({"requestId":"bnjinmm","siteName":"bueller","pageType":"ad","model":"prepare","inputs":[{"methodId":"436136582","inputType":"US","processed":true,"rating":0,"originalRating":1},{"methodId":"23232322","inputType":"UK","processed":falase,"rating":0,"originalRating":1}]
val records = sc.textFile("s3://testData/sample.json.gz")
val schema = StructType(Array(StructField("requestId",StringType,true),
StructField("siteName",StringType,true),
StructField("model",StringType,true),
StructField("pageType",StringType,true),
StructField("inputs", ArrayType(
StructType(
StructField("inputType",StringType,true),
StructField("originalRating",LongType,true),
StructField("processed",BooleanType,true),
StructField("rating",LongType,true),
StructField("methodId",StringType,true)
),true),true)))
val rowRDD = ??
val inputRDD = sqlContext.applySchema(rowRDD, schema)
inputRDD.registerTempTable("input")
sql("select * from input").foreach(println)
Is there any way to map this ? Or do I need to use son parser or something. I want to use textFile only because of the constraints.
Tried with :
val records =sqlContext.read.schema(schema).json("s3://testData/test2.gz")
But keeping getting the error :
<console>:37: error: overloaded method value apply with alternatives:
(fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
(fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
(fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField)
StructField("inputs",ArrayType(StructType(StructField("inputType",StringType,true), StructField("originalRating",LongType,true), StructField("processed",BooleanType,true), StructField("rating",LongType,true), StructField("score",DoubleType,true), StructField("methodId",StringType,true)),true),true)))
^
It can load with following code with predefined schema, spark don't need to go through the file in ZIP file. The code in the question has ambiguity.
import org.apache.spark.sql.types._
val input = StructType(
Array(
StructField("inputType",StringType,true),
StructField("originalRating",LongType,true),
StructField("processed",BooleanType,true),
StructField("rating",LongType,true),
StructField("score",DoubleType,true),
StructField("methodId",StringType,true)
)
)
val schema = StructType(Array(
StructField("requestId",StringType,true),
StructField("siteName",StringType,true),
StructField("model",StringType,true),
StructField("inputs",
ArrayType(input,true),
true)
)
)
val records =sqlContext.read.schema(schema).json("s3://testData/test2.gz")
Not all the fields need to be provided. While it's good to provide all if possible.
Spark try best to parse all, if some row is not valid. It will add _corrupt_record as a column which contains the whole row.
While if it's plained json file file.
I have been having major problems trying to deserialize a JSON array to a Scala object
[{"name":"Cool","city":"College Park","address":"806","id":1},{"name":"Mars ","city":"Durham","address":"12","id":2},{"name":"Something","city":"Raleigh
","address":"","id":3},{"name":"test","city":"","address":"","id":5}]
I have tried gson, jerkson(jackson scala wrapper), sjson, flexjson. None of them have worked. What I have here is a List of Customers. List[Customer].
This is the closest I've got:
val array = new JsonParser().parse( customers ).getAsJsonArray()
This gave me an 4 arrays. It obviously didn't give me a customer object though. I tried Jerkson.
val array = parse[List[Customer]](customers)
But I get this.
GenericSignatureFormatError occured : null
I'm just trying to find a simple way like I would in Java.
Here is my Scala class.
case class Customer(
id : Pk[ Int ],
name : String,
address : Option[ String ],
city : Option[ String ],
state : Option[ String ],
user_id : Int )
object Customer extends Magic[ Customer ]( Option( "Customer" ) ) {
def apply( name : String, address : String, city : String, state : String, user_id : Int ) = {
new Customer( NotAssigned, name, Some( address ), Some( city ), Some( state ), user_id )
}
def delete( id : Int ) = {
SQL( "DELETE from Customer where id = {id}" ).onParams( id ).executeUpdate()
}
}
Thanks for any help.
With gson, you could write your own json reader:
case class Customer(id: Int, name: String, address: Option[String],
city: Option[String], state: Option[String], user_id: Int)
object CustomerJsonReader {
def read(in: Reader) = readCustomers(new JsonReader(in))
def readCustomers(reader: JsonReader) = {
val customers = new ListBuffer[Customer]
reader.beginArray()
while (reader.hasNext()) {
customers += readCustomer(reader)
}
reader.endArray()
customers toList
}
def readCustomer(reader: JsonReader): Customer = {
var id = 0
var customerName = ""
var address: Option[String] = None
var city: Option[String] = None
var state: Option[String] = None
var userId = 0
reader.beginObject()
while (reader.hasNext()) {
val name = reader.nextName()
name match {
case "id" => id = reader.nextInt()
case "address" => address = Some(reader.nextString())
case "state" => state = Some(reader.nextString())
case "user_id" => userId = reader.nextInt()
case "name" => customerName = reader.nextString()
case "city" => city = Some(reader.nextString())
case _ => reader.skipValue()
}
}
reader.endObject()
Customer(id, customerName, address, city, state, userId)
}
}
val json =
"""
[{"name":"Cool","city":"College Park","address":"806","id":1},
{"name":"Mars ","city":"Durham","address":"12","id":2},
{"name":"Something","city":"Raleigh ","address":"","id":3},
{"name":"test","city":"","address":"","id":5}]
"""
val customers: List[Customer] =
CustomerJsonReader.read(new StringReader(json))
I know that with gson, you would need Array instead of a scala.List. I would suggest giving that a shot. You should use that with gson.fromJson, I think.
You can also try Jerkson = Jackson + Scala
Quite easy to use even if I had problems with special JSON fields containing "-"
A small tuto I saw on twitter recently: http://logician.free.fr/index.php/2011/09/16/play-scala-and-json/
I've been driven insane by this now and went through trying GSON, Lift-Json, Sjson and finally Jerkson, and found peace with that one.
Here's how I use it in combination with Play:
http://logician.eu/index.php/2011/09/16/play-scala-and-json/
http://logician.eu/index.php/2011/11/01/writing-custom-deserializers-for-jerkson/
I use Lift's json library for this purpose, it easily lets you parse JSON and extract values into case classes. It's packaged as a separate jar so you don't need the whole lift framework to use it.
import net.liftweb.json._
import net.liftweb.json.JsonDSL._
implicit val formats = DefaultFormats
val json: String = "[{...."
val parsed: JValue = parse(json)
val customers: List[Customer] = parsed.extract[List[Customer]]
Just make sure any optional fields are defined in the case class using Option. I noticed in your code the objects are missing the user_id field, which would cause a parse error if the user_id field is declared as Int instead of Option[Int].
Aside from trying to make Jerkson (which is a great library to use from what I have heard), you could also try Jackson's Scala module -- modules are the official way Jackson is extended to deal with 3rd party datatypes as well as native datatypes and constructs of other JVM languages.
(this is not to say this is more official than Jerkson, just that there are many useful Jackson extension modules that many developers are not familiar with)
Issues with Scala module are discussed on main Jackson mailing lists (user#jackson.codehaus.org); you may have found an edge case that could be fixed.
I have written a parser/validator dsl, which allows you to explicitly resolve any type erasure issue. Out of the box it handles case classes, tuples, Option, Either, List, Map, joda DatetTime, piping to functions, multiple key mapping and key name remapping.
It uses the Jackson parser
https://github.com/HigherState/jameson
Its pretty simple to with scala and with play libraries and the code
import play.api.libs.json.{Format, Json}
case class Customer(name:String, city:String, address:String, id:Int)
object Customer {
implicit val jsonFormat: Format[Customer] = Json.format(Customer)
}
val jsonDef = Json.parse(<json-string>)
val customers = jsonDef.as[List[Customer]]
customers is list of Customer objects.