Slick dynamic groupby

Slick dynamic groupby - mysql

I have code like so:
def query(buckets: List[String]): Future[Seq[(List[Option[String]], Option[Double])]] = {
database.run {
groupBy(row => buckets.map(bucket => customBucketer(row.metadata, bucket)))
.map { grouping =>
val bucket = grouping._1
val group = grouping._2
(bucket, group.map(_.value).avg)
}
.result
}
}
private def customBucketer(metadata: Rep[Option[String]], bucket: String): Rep[Option[String]] = {
...
}
I am wanting to be able to create queries in slick which groupby and collect on a given list of columns.
Error I am getting upon compilation is:
[error] Slick does not know how to map the given types.
[error] Possible causes: T in Table[T] does not match your * projection,
[error] you use an unsupported type in a Query (e.g. scala List),
[error] or you forgot to import a driver api into scope.
[error] Required level: slick.lifted.FlatShapeLevel
[error] Source type: List[slick.lifted.Rep[Option[String]]]
[error] Unpacked type: T
[error] Packed type: G
[error] groupBy(row => buckets.map(bucket => customBucketer(row.metadata, bucket)))
[error] ^

Here's a workaround for Slick 3.2.3 (and some background on my approach):
You may have noticed dynamically selecting columns is easy as long as you can assume a fixed type, e.g:
columnNames = List("col1", "col2")
tableQuery.map( r => columnNames.map(name => r.column[String](name)) )
But if you try a similar approach with a groupBy operation, Slick will complain that it "does not know how to map the given types".
So, while this is hardly an elegant solution, you can at least satisfy Slick's type-safety by statically defining both:
groupby column type
Upper/lower bound on the quantity of groupBy columns
A simple way of implementing these two constraints is to again assume a fixed type and to branch the code for all possible quantities of groupBy columns.
Here's the full working Scala REPL session to give you an idea:
import java.io.File
import akka.actor.ActorSystem
import com.typesafe.config.ConfigFactory
import slick.jdbc.H2Profile.api._
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
val confPath = getClass.getResource("/application.conf")
val config = ConfigFactory.parseFile(new File(confPath.getPath)).resolve()
val db = Database.forConfig("slick.db", config)
implicit val system = ActorSystem("testSystem")
implicit val executionContext = system.dispatcher
case class AnyData(a: String, b: String)
case class GroupByFields(a: Option[String], b: Option[String])
class AnyTable(tag: Tag) extends Table[AnyData](tag, "macro"){
def a = column[String]("a")
def b = column[String]("b")
def * = (a, b) <> ((AnyData.apply _).tupled, AnyData.unapply)
}
val table = TableQuery[AnyTable]
def groupByDynamically(groupBys: Seq[String]): DBIO[Seq[GroupByFields]] = {
// ensures columns are returned in the right order
def selectGroups(g: Map[String, Rep[Option[String]]]) = {
(g.getOrElse("a", Rep.None[String]), g.getOrElse("b", Rep.None[String])).mapTo[GroupByFields]
}
val grouped = if (groupBys.lengthCompare(2) == 0) {
table
.groupBy( cols => (cols.column[String](groupBys(0)), cols.column[String](groupBys(1))) )
.map{ case (groups, _) => selectGroups(Map(groupBys(0) -> Rep.Some(groups._1), groupBys(1) -> Rep.Some(groups._2))) }
}
else {
// there should always be at least one group by specified
table
.groupBy(cols => cols.column[String](groupBys.head))
.map{ case (groups, _) => selectGroups(Map(groupBys.head -> Rep.Some(groups))) }
}
grouped.result
}
val actions = for {
_ <- table.schema.create
_ <- table.map(a => (a.column[String]("a"), a.column[String]("b"))) += ("a1", "b1")
_ <- table.map(a => (a.column[String]("a"), a.column[String]("b"))) += ("a2", "b2")
_ <- table.map(a => (a.column[String]("a"), a.column[String]("b"))) += ("a2", "b3")
queryResult <- groupByDynamically(Seq("b", "a"))
} yield queryResult
val result: Future[Seq[GroupByFields]] = db.run(actions.transactionally)
result.foreach(println)
Await.ready(result, Duration.Inf)
Where this gets ugly is when you can have upwards of a few groupBy columns (i.e. having a separate if branch for 10+ cases would get monotonous). Hopefully someone will swoop in and edit this answer for how to hide that boilerplate behind some syntactic sugar or abstraction layer.

Related

Create nested JSON of all rows having same Id: DataFrame

I have a DataFrame df4 with three column
id annotating entity
data having JSON Array data
executor_id as string value
Code to create same is as follow:
val df1 = Seq((1, "n1", "d1")).toDF("id", "number", "data")
val df2 = df1.withColumn("data", to_json(struct($"number", $"data"))).groupBy("id").agg(collect_list($"data").alias("data")).withColumn("executor_id", lit("e1"))
val df3 = df1.withColumn("data", to_json(struct($"number", $"data"))).groupBy("id").agg(collect_list($"data").alias("data")).withColumn("executor_id", lit("e2"))
val df4 = df2.union(df3)
Content of DF4 is like
scala> df4.show(false)
+---+-----------------------------+-----------+
|id |data |executor_id|
+---+-----------------------------+-----------+
|1 |[{"number":"n1","data":"d1"}]|e1 |
|1 |[{"number":"n1","data":"d1"}]|e2 |
+---+-----------------------------+-----------+
I have to create new json data with executor_id as key and data as json data, group by id. Resultant dataFrame like
+---+------------------------------------------------------------------------+
|id |new_data |
+---+------------------------------------------------------------------------+
|1 |{"e1":[{"number":"n1","data":"d1"}], "e2":[{"number":"n1","data":"d1"}]}|
+---+------------------------------------------------------------------------+
Versions:
Spark: 2.2
Scala: 2.11

I have been struggling to solve this problem from past three days and finally able to work around it using UserDefinedAggregateFunction. Here is sample code for same
import org.apache.spark.sql.Row
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types._
import scala.collection.mutable
import scala.collection.mutable.ListBuffer
class CustomAggregator extends UserDefinedAggregateFunction {
override def inputSchema: org.apache.spark.sql.types.StructType =
StructType(Array(StructField("key", StringType), StructField("value", ArrayType(StringType))))
// This is the internal fields you keep for computing your aggregate
override def bufferSchema: StructType = StructType(
Array(StructField("mapData", MapType(StringType, ArrayType(StringType))))
)
// This is the output type of your aggregatation function.
override def dataType = StringType
override def deterministic: Boolean = true
// This is the initial value for your buffer schema.
override def initialize(buffer: MutableAggregationBuffer): Unit = {
buffer(0) = scala.collection.mutable.Map[String, String]()
}
// This is how to update your buffer schema given an input.
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
buffer(0) = buffer.getMap(0) + (input.getAs[String](0) -> input.getAs[String](1))
}
// This is how to merge two objects with the bufferSchema type.
override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {
buffer1.update(0, buffer1.getAs[Map[String, Any]](0) ++ buffer2.getAs[Map[String, Any]](0))
}
// This is where you output the final value, given the final value of your bufferSchema.
override def evaluate(buffer: Row): Any = {
val map = buffer(0).asInstanceOf[Map[Any, Any]]
val buff: ListBuffer[String] = ListBuffer()
for ((k, v) <- map) {
val valArray = v.asInstanceOf[mutable.WrappedArray[Any]].array;
val tmp = {
for {
valString <- valArray
} yield valString.toString
}.toList.mkString(",")
buff += "\"" + k.toString + "\":[" + tmp + "]"
}
"{" + buff.toList.mkString(",") + "}"
}
}
Now use customAggregator,
val ca = new CustomAggregator
val df5 = df4.groupBy("id").agg(ca(col("executor_id"), col("data")).as("jsonData"))
Resultant DF is
scala> df5.show(false)
+---+-----------------------------------------------------------------------+
|id |jsonData |
+---+-----------------------------------------------------------------------+
|1 |{"e1":[{"number":"n1","data":"d1"}],"e2":[{"number":"n1","data":"d1"}]}|
+---+-----------------------------------------------------------------------+
Even though I have solved this problem, I am not sure whether this is right way or not. Reasons for my doubts are
In places, I have used Any. I don't feel this it is correct.
For each evaluation, I am creating ListBuffer and many other data type. I am not sure about performance of code.
I still have to test code for many dataType like double, date tpye, nested json etc. as data.

Play JSON serialize/deserialize

I'm trying to go create some formats for this:
case class Work[T](todo: Seq[T], failed: Seq[T], success: Seq[T])
object Work {
implicit def format[T](implicit r: Reads[T], w: Writes[T]): Format[Work[T]] = Json.format[Work[T]]
}
object InternalMessage {
implicit def format[D, R](implicit
rD: Reads[D],
wD: Writes[D],
rR: Reads[R],
wR: Writes[R]
): Format[InternalMessage[D, R]] = Json.format[InternalMessage[D, R]]
}
case class InternalMessage[D, R](
download: List[Work[D]],
refine: List[Work[R]],
numberOfTries: Int
)
This doesn't work and I don't understand why. The error is
[error] /home/oleber/develop/data-platform/modules/importerTemplate/src/main/scala/template/TemplateModel.scala:46: No apply function found matching unapply parameters
[error] implicit def format[T](implicit r: Reads[T], w: Writes[T]): Format[Work[T]] = Json.format[Work[T]]
[error] ^
[error] /home/oleber/develop/data-platform/modules/importerTemplate/src/main/scala/template/TemplateModel.scala:55: No apply function found matching unapply parameters
[error] ): Format[InternalMessage[D, R]] = Json.format[InternalMessage[D, R]]
Thanks for any help

AFAIU you can't generate JSON Format for such data types using Play 2.5 macros. What is limiting you is that you use Seq[T] i.e. you want JSON serialization for a generic class that has fields that use some complex type built from that generic type rather than just raw generic T. Relevant code seems to be in the JsMacroImpl lines 135-141. In those lines inside paramsMatch inner method of maybeApply calculation the macro checks if there are an apply and unapply methods with matching (complement) signatures:
val maybeApply = applies.collectFirst {
case (apply: MethodSymbol) if hasVarArgs && {
// Option[List[c.universe.Type]]
val someApplyTypes = apply.paramLists.headOption.map(_.map(_.asTerm.typeSignature))
val someInitApply = someApplyTypes.map(_.init)
val someApplyLast = someApplyTypes.map(_.last)
val someInitUnapply = unapplyReturnTypes.map(_.init)
val someUnapplyLast = unapplyReturnTypes.map(_.last)
val initsMatch = someInitApply == someInitUnapply
val lastMatch = (for {
lastApply <- someApplyLast
lastUnapply <- someUnapplyLast
} yield lastApply <:< lastUnapply).getOrElse(false)
initsMatch && lastMatch
} => apply
case (apply: MethodSymbol) if {
val applyParams = apply.paramLists.headOption.
toList.flatten.map(_.typeSignature)
val unapplyParams = unapplyReturnTypes.toList.flatten
def paramsMatch = (applyParams, unapplyParams).zipped.forall {
case (TypeRef(NoPrefix, applyParam, _),
TypeRef(NoPrefix, unapplyParam, _)) => // for generic parameter
applyParam.fullName == unapplyParam.fullName
case (applyParam, unapplyParam) => applyParam =:= unapplyParam
}
(applyParams.size == unapplyParams.size && paramsMatch)
} => apply
}
As you can see in the relevant (non-hasVarArgs branch), this code only handles two cases: when the types in apply and unapply match exactly or when the same raw generic type is used. Your case of Seq[T] (as well as other complex generic types) is not handled here and the error you get is generated just a few lines down when maybeApply is empty:
val (tparams, params) = maybeApply match {
case Some(apply) => {
apply.typeParams -> apply.paramLists.head
// assume there is a single parameter group
}
case None => c.abort(c.enclosingPosition, "No apply function found matching unapply parameters")
}
In Play-JSON 2.6 this code was significantly re-worked and now it seems to support your case as well (see conforms inner method).
If upgrading to (not released yet) Play-JSON 2.6 is not acceptable, you can create Formats object(s) yourself using JSON Reads/Writes/Format Combinators doc. Something like this should work for you:
implicit def format[T](implicit r: Reads[T], w: Writes[T]): Format[Work[T]] = {
val workReads = (
(JsPath \ "todo").read[Seq[T]] and
(JsPath \ "failed").read[Seq[T]] and
(JsPath \ "success").read[Seq[T]]
) (Work.apply[T] _)
val workWrites = (
(JsPath \ "todo").write[Seq[T]] and
(JsPath \ "failed").write[Seq[T]] and
(JsPath \ "success").write[Seq[T]]
) (unlift(Work.unapply[T]))
new Format[Work[T]] {
override def reads(json: JsValue): JsResult[Work[T]] = workReads.reads(json)
override def writes(o: Work[T]): JsValue = workWrites.writes(o)
}
}

Scala Compiler (2.11.7) anomaly with Play JSON Writes

So I have the following code:
Base:
import play.api.libs.json.{JsNull, Json, JsValue, Writes}
case class Cost(cost: Option[Double])
This compiles:
case object Cost {
def writes = new Writes[Cost] {
override def writes(r: Cost): JsValue = {
val cost = r.cost.map(Json.toJson(_)).getOrElse(JsNull)
Json.obj(
"cost" -> cost
)
}
}
}
But this doesn't compile
case object Cost {
def writes = new Writes[Cost] {
override def writes(r: Cost): JsValue = {
Json.obj(
"cost" -> r.cost.map(Json.toJson(_)).getOrElse(JsNull)
)
}
}
}
Compiler error is the following:
type mismatch;
[error] found : Object
[error] required: play.api.libs.json.Json.JsValueWrapper
[error] "cost" -> r.cost.map(Json.toJson(_)).getOrElse(JsNull)
In the latter if I use .asInstanceOf[JsValue] it works, but with IntelliJ grays it out saying it's unnecessary as it can't be anything else that JsValue anyway. What might be the reason that Scala compiler (2.11.7) doesn't detect the Class properly?

I am pretty sure the issue comes from .getOrElse(JsNull)
I have successfully compiled this code:
import play.api.libs.json.{JsNull, Json, JsValue, Writes}
case class Cost(cost: Option[Double])
case object Cost {
def writes = new Writes[Cost] {
override def writes(r: Cost): JsValue = {
Json.obj(
"cost" -> r.cost.map(Json.toJson(_))
)
}
}
}
and parsed the output:
scala> Cost(Some(5))
res2: Cost = Cost(Some(5.0))
scala> Json.toJson(res2)(Cost.writes)
res5: play.api.libs.json.JsValue = {"cost":5}
Looking for the source of the issue you can check couple of additional solutions, assuming the writes function:
val cost = r.cost.map(t => Json.toJson(t))
Json.obj(
"cost" -> cost
)
if you want to resolve the Option of cost value with getOrElse you can either cast (as you have tried) or provide the type:
cost.getOrElse[JsValue](JsNull)
cost.getOrElse(JsNull).asInstanceOf[JsValue]
and without the type specification sbt always gives an error saying:
[error] (...) type mismatch;
[error] found : Object
[error] required: play.api.libs.json.Json.JsValueWrapper
which must come from some SBT's compiler bug.

Why does this getOrElse statement return type ANY?

I am trying to follow the tutorial https://www.jamesward.com/2012/02/21/play-framework-2-with-scala-anorm-json-coffeescript-jquery-heroku but of course play-scala has changed since the tutorial (as seems to be the case with every tutorial I find). I am using 2.4.3 This requires I actually learn how things work, not necessarily a bad thing.
One thing that is giving me trouble is the getOrElse method.
Here is my Bar.scala model
package models
import play.api.db._
import play.api.Play.current
import anorm._
import anorm.SqlParser._
case class Bar(id: Option[Long], name: String)
object Bar {
val simple = {
get[Option[Long]]("id") ~
get[String]("name") map {
case id~name => Bar(id, name)
}
}
def findAll(): Seq[Bar] = {
DB.withConnection { implicit connection =>
SQL("select * from bar").as(Bar.simple *)
}
}
def create(bar: Bar): Unit = {
DB.withConnection { implicit connection =>
SQL("insert into bar(name) values ({name})").on(
'name -> bar.name
).executeUpdate()
}
}
}
and my BarFormat.scala Json formatter
package models
import play.api.libs.json._
import anorm._
package object Implicits {
implicit object BarFormat extends Format[Bar] {
def reads(json: JsValue):JsResult[Bar] = JsSuccess(Bar(
Option((json \ "id").as[Long]),
(json \ "name").as[String]
))
def writes(bar: Bar) = JsObject(Seq(
"id" -> JsNumber(bar.id.getOrElse(0L)),
"name" -> JsString(bar.name)
))
}
}
and for completeness my Application.scala controller:
package controllers
import play.api.mvc._
import play.api.data._
import play.api.data.Forms._
import javax.inject.Inject
import javax.inject._
import play.api.i18n.{ I18nSupport, MessagesApi, Messages, Lang }
import play.api.libs.json._
import views._
import models.Bar
import models.Implicits._
class Application #Inject()(val messagesApi: MessagesApi) extends Controller with I18nSupport {
val barForm = Form(
single("name" -> nonEmptyText)
)
def index = Action {
Ok(views.html.index(barForm))
}
def addBar() = Action { implicit request =>
barForm.bindFromRequest.fold(
errors => BadRequest,
{
case (name) =>
Bar.create(Bar(None, name))
Redirect(routes.Application.index())
}
)
}
def listBars() = Action { implicit request =>
val bars = Bar.findAll()
val json = Json.toJson(bars)
Ok(json).as("application/json")
}
and routes
# Routes
# This file defines all application routes (Higher priority routes first)
# ~~~~
# Home page
POST /addBar controllers.Application.addBar
GET / controllers.Application.index
GET /listBars controllers.Application.listBars
# Map static resources from the /public folder to the /assets URL path
GET /assets/*file controllers.Assets.versioned(path="/public", file: Asset)
When I try to run my project I get the following error:
now bar.id is defined as an Option[Long] so bar.id.getOrElse(0L) should return a Long as far as I can tell, but it is clearly returning an Any. Can anyone help me understand why?
Thank You!

That's the way type inference works in Scala...
First of all there is an implicit conversion from Int to BigDecimal:
scala> (1 : Int) : BigDecimal
res0: BigDecimal = 1
That conversion allows for Int to be converted before the option is constructed:
scala> Some(1) : Option[BigDecimal]
res1: Option[BigDecimal] = Some(1)
If we try getOrElse on its own where the type can get fixed we get expected type Int:
scala> Some(1).getOrElse(2)
res2: Int = 1
However, this does not work (the problem you have):
scala> Some(1).getOrElse(2) : BigDecimal
<console>:11: error: type mismatch;
found : Any
required: BigDecimal
Some(1).getOrElse(2) : BigDecimal
^
Scala's implicit conversions kick in last, after type inference is performed. That makes sense, because if you don't know the type how would you know what conversions need to be applied. Scala can see that BigDecimal is expected, but it has an Int result based on the type of the Option it has. So it tries to widen the type, can't find anything that matches BigDecimal in Int's type hierarchy and fails with the error.
This works, however because the type is fixed in the variable declaration:
scala> val v = Some(1).getOrElse(2)
v: Int = 1
scala> v: BigDecimal
res4: BigDecimal = 1
So we need to help the compiler somehow - any type annotation or explicit conversion would work. Pick any one you like:
scala> (Some(1).getOrElse(2) : Int) : BigDecimal
res5: BigDecimal = 1
scala> Some(1).getOrElse[Int](2) : BigDecimal
res6: BigDecimal = 1
scala> BigDecimal(Some(1).getOrElse(2))
res7: scala.math.BigDecimal = 1

Here is the signature for Option.getOrElse method:
getOrElse[B >: A](default: ⇒ B): B
The term B >: A expresses that the type parameter B or the abstract type B refer to a supertype of type A, in this case, Any being the supertype of Long:
val l: Long = 10
val a: Any = l
So, we can do something very similar here with getOrElse:
val some: Option[Long] = Some(1)
val value: Any = option.getOrElse("potatos")
val none: Option[Long] = None
val elseValue: Any = none.getOrElse("potatos")
Which brings us to your scenario: the returned type from getOrElse will be a Any and not a BigDecimal, so you will need another way to handle this situation, like using fold, per instance:
def writes(bar: Bar) = {
val defaultValue = BigDecimal(0)
JsObject(Seq(
"id" -> JsNumber(bar.id.fold(defaultValue)(BigDecimal(_))),
"name" -> JsString(bar.name)
))
}
Some other discussions that can help you:
Why is Some(1).getOrElse(Some(1)) not of type Option[Int]?
Option getOrElse type mismatch error

How marshal nested collection to json with spray json?

Consider a code:
val myList : List[Map[String, AnyRef]] = ...
//during spray routing
import spray.json._
complete(myList.toJson)
And I got en error:
Cannot find JsonWriter or JsonFormat type class for List[Map[String,AnyRef]]
First of all why??? Spray spray.json.CollectionFormats sources contains:
implicit def listFormat[T :JsonFormat] = new RootJsonFormat[List[T]] {
def write(list: List[T]) = JsArray(list.map(_.toJson).toVector)
def read(value: JsValue): List[T] = value match {
case JsArray(elements) => elements.map(_.convertTo[T])(collection.breakOut)
case x => deserializationError("Expected List as JsArray, but got " + x)
}
}
So every list item theoretically can be converted with _.toJson call. _ is Map[String,AnyRef], but Map types are also supported. Why my example is not compiled? How to get it work?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Slick dynamic groupby - mysql

Related

Create nested JSON of all rows having same Id: DataFrame

Play JSON serialize/deserialize

Scala Compiler (2.11.7) anomaly with Play JSON Writes

Why does this getOrElse statement return type ANY?

How marshal nested collection to json with spray json?

Categories

Resources