Summarizing/aggregating a Scala Slick object into another - mysql

I'm essentially trying to recreate the following SQL query using Scala Slick:
select labelOne, labelTwo, sum(countA), sum(countB) from things where date > 'blah' group by labelOne, labelTwo;
As you can see, it takes what a table of labeled things and aggregates them, summing various counts. A table with the following info:
ID | date | labelOne | labelTwo | countA | countB
-------------------------------------------------
0 | 0 | foo | cheese | 1 | 2
1 | 0 | bar | wine | 0 | 3
2 | 1 | foo | cheese | 3 | 4
3 | 1 | bar | wine | 2 | 1
4 | 2 | foo | beer | 1 | 1
Should yield the following result if queried across all dates:
labelOne | labelTwo | countA | countB
-------------------------------------
foo | cheese | 4 | 6
bar | wine | 2 | 4
foo | beer | 1 | 1
This is what my Scala code looks like:
import scala.slick.driver.MySQLDriver.simple._
import scala.slick.jdbc.StaticQuery
import StaticQuery.interpolation
import org.joda.time.LocalDate
import com.github.tototoshi.slick.JodaSupport._
case class Thing(
id: Option[Long],
date: LocalDate,
labelOne: String,
labelTwo: String,
countA: Long,
countB: Long)
// summarized version of "Thing": note there's no date in this object
// each distinct grouping of Thing.labelOne + Thing.labelTwo should become a "SummarizedThing", with summed counts
case class SummarizedThing(
labelOne: String,
labelTwo: String,
countASum: Long,
countBSum: Long)
trait ThingsComponent {
val Things: Things
class Things extends Table[Thing]("things") {
def id = column[Long]("id", O.PrimaryKey, O.AutoInc)
def date = column[LocalDate]("date", O.NotNull)
def labelOne = column[String]("labelOne", O.NotNull)
def labelTwo = column[String]("labelTwo", O.NotNull)
def countA = column[Long]("countA", O.NotNull)
def countB = column[Long]("countB", O.NotNull)
def * = id.? ~ date ~ labelOne ~ labelTwo ~ countA ~ countB <> (Thing.apply _, Thing.unapply _)
val byId = createFinderBy(_.id)
}
}
object Things extends DAO {
def insert(thing: Thing)(implicit s: Session) { Things.insert(thing) }
def findById(id: Long)(implicit s: Session): Option[Thing] = Things.byId(id).firstOption
// ???
def summarizeSince(date: LocalDate)(implicit s: Session): Set[SummarizedThing] = {
Query(Things).where(_.date > date).groupBy(x => (x.labelOne, x.labelTwo)).map {
case(thing: Thing) => {
// obviously this line below is wrong, but you can get an idea of what I'm trying to accomplish:
// create a new SummarizedThing for each unique labelOne + labelTwo combo, summing the count columns
new SummarizedThing(thing.labelOne, thing.labelTwo, thing.countA.sum, thing.countB.sum)
}
} // presumably need to run the query and map to SummarizedThing here, perhaps?
}
}
The summarizeSince function is where I'm having trouble. I seem to be able to query Things just fine, filtering by date, and grouping by my fields... however, I'm having trouble summing countA and countB. With the summed results, I'd then like to create a SummarizedThing for each unique labelOne + labelTwo combination. Hopefully that makes sense. Any help would be greatly appreciated.

presumably need to run the query and map to SummarizedThing here, perhaps?
Exactly.
Query(Things).filter(_.date > date).groupBy(x => (x.labelOne, x.labelTwo)).map {
// match on (key,group)
case ((labelOne, labelTwo), things) => {
// prepare results as tuple (note .sum returns an Option)
(labelOne, labelTwo, things.map(_.countA).sum.get, things.map(_.countB).sum.get)
}
}.run.map(SummarizedThing.tupled) // run and map tuple into case class

Same as the other answer, but expressed as a for comprehension, except that .get is exceptional so you probably need getOrElse.
val q = for {
((l1,l2), ts) <- Things.where(_.date > date).groupBy(t => (t.labelOne, t.labelTwo))
} yield (l1, l2, ts.map(_.countA).sum.getOrElse(0L), ts.map(_.countB).sum.getOrElse(0L))
// see the SQL that generates.
println( q.selectStatement )
// select x2.`labelOne`, x2.`labelTwo`, sum(x2.`countA`), sum(x2.`countB`)
// from `things` x2 where x2.`date` > '2013' group by x2.`labelOne`, x2.`labelTwo`
// map the result(s) of your query to your case class
q.map(SummarizedThing.tupled).list

Related

SQLAlchemy - filtering rows before today with autoloaded DATETIME column?

I have a MARIADB database radio_progs with a table FUTUREEPISODE. I'm using SQLAlchemy and trying to add a function that selects all entries in the table that are before today.
I'm having problems with the datetime field though. Is this as I'm autoloading the fields? In my real world example I have a number of columns so would prefer to autoload than specify each individually.
error is
eps = self.query.filter_by(IN_LIST=1, EP_ENDTIME < todays_datetime).all()
^
SyntaxError: positional argument follows keyword argument
The table has the following columns
| Column | Type |
| ---------- | ---------- |
| ID | int(11) |
| EP_ENDTIME | datetime |
| IN_LIST | tinyint(1) |
from datetime import datetime
from sqlalchemy import and_, func
from .dbmgr import db
class FutureEpisode(db.Model):
__bind_key__ = 'radio_progs'
__tablename__ = 'FUTUREEPISODE'
__table_args__ = {
'autoload': True,
'autoload_with': db.engine
}
def get_expired(self):
todays_datetime = datetime(datetime.today().year, datetime.today().month, datetime.today().day)
eps = self.query.filter_by(IN_LIST=1, EP_ENDTIME < todays_datetime).all()
return eps
Using filter rather than filter_by works, i.e. changing the query to:
eps = self.query.filter(FutureEpisode.IN_LIST==1, FutureEpisode.EP_ENDTIME < todays_datetime).all()

Output Error for Matrix in Python?

I am trying to create a function that outputs a matrix that contains each item in a list on a separate line with lines in between. The only output I'm getting is quotations (''). I do not understand why. I think I set it all up correctly to output what is needed but there has to be something missing?
I included examples below my code.
def show_table(table):
table=[]
s=[[str(e) for e in row] for row in table]
lens= [max(map(len, col)) for col in zip(*s)]
fmt= '\t'.join('{{:{}}}'.format(x) for x in lens)
table= [fmt.format(*row) for row in s]
return '\n'.join(table)
show_table([['A','BB'],['C','DD']])
output:
'| A | BB |\n| C | DD |\n'
print(show_table([['A','BB'],['C','DD']]))
output:
| A | BB |
| C | DD |
The issue is on the second line where you are initialising your list to an empty list. Instead try:
if table is None:
table = []
Perhaps a better way to accomplish this could be:
def show_table(table):
if table is None:
table = []
data = ""
for row in table:
for val in row:
data += "| " + val + " "
data += "|\n"
return data.strip("\n")
print show_table([['a','bb'],['c','dd']])
Output:
| a | bb |
| c | dd |

django query return primary key related column values

I'm new to using databases and making django queries to get information.
If I have a table with id as the primary key, and ages and height as other columns, what query would bring me back a dictionary of all the ids and the related ages?
For instance if my table looks like below:
special_id | ages | heights
1 | 5 | x1
2 | 10 | x2
3 | 15 | x3
I'd like to have a key-value pair like {special_id: ages} where special_id is also the primary key.
Is this possible?
Try this:
from django.http import JsonResponse
def get_json(request):
result = MyModel.objects.all().values('id', 'ages') # or simply .values() to get all fields
result_list = list(result) # important: convert the QuerySet to a list object
return JsonResponse(result_list, safe=False)
You will get classic:
{field_name: field_value}
And if you want {field_value: field_value} you can do:
from django.http import JsonResponse
def get_json(request):
result = MyModel.objects.all()
a = {}
for item in result:
a[item.id] = item.age
return JsonResponse(a)

Spark Row to JSON

I would like to create a JSON from a Spark v.1.6 (using scala) dataframe. I know that there is the simple solution of doing df.toJSON.
However, my problem looks a bit different. Consider for instance a dataframe with the following columns:
| A | B | C1 |  C2 | C3 |
-------------------------------------------
| 1 | test | ab | 22 | TRUE |
| 2 | mytest | gh | 17 | FALSE |
I would like to have at the end a dataframe with
| A | B | C |
----------------------------------------------------------------
| 1 | test | { "c1" : "ab", "c2" : 22, "c3" : TRUE } |
| 2 | mytest | { "c1" : "gh", "c2" : 17, "c3" : FALSE } |
where C is a JSON containing C1, C2, C3. Unfortunately, I at compile time I do not know what the dataframe looks like (except the columns A and B that are always "fixed").
As for the reason why I need this: I am using Protobuf for sending around the results. Unfortunately, my dataframe sometimes has more columns than expected and I would still send those via Protobuf, but I do not want to specify all columns in the definition.
How can I achieve this?
Spark 2.1 should have native support for this use case (see #15354).
import org.apache.spark.sql.functions.to_json
df.select(to_json(struct($"c1", $"c2", $"c3")))
I use this command to solve the to_json problem:
output_df = (df.select(to_json(struct(col("*"))).alias("content")))
Here, no JSON parser, and it adapts to your schema:
import org.apache.spark.sql.functions.{col, concat, concat_ws, lit}
df.select(
col(df.columns(0)),
col(df.columns(1)),
concat(
lit("{"),
concat_ws(",",df.dtypes.slice(2, df.dtypes.length).map(dt => {
val c = dt._1;
val t = dt._2;
concat(
lit("\"" + c + "\":" + (if (t == "StringType") "\""; else "") ),
col(c),
lit(if(t=="StringType") "\""; else "")
)
}):_*),
lit("}")
) as "C"
).collect()
First lets convert C's to a struct:
val dfStruct = df.select($"A", $"B", struct($"C1", $"C2", $"C3").alias("C"))
This is structure can be converted to JSONL using toJSON as before:
dfStruct.toJSON.collect
// Array[String] = Array(
// {"A":1,"B":"test","C":{"C1":"ab","C2":22,"C3":true}},
// {"A":2,"B":"mytest","C":{"C1":"gh","C2":17,"C3":false}})
I am not aware of any built-in method that can convert a single column but you can either convert it individually and join or use your favorite JSON parser in an UDF.
case class C(C1: String, C2: Int, C3: Boolean)
object CJsonizer {
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.Serialization
import org.json4s.jackson.Serialization.write
implicit val formats = Serialization.formats(org.json4s.NoTypeHints)
def toJSON(c1: String, c2: Int, c3: Boolean) = write(C(c1, c2, c3))
}
val cToJSON = udf((c1: String, c2: Int, c3: Boolean) =>
CJsonizer.toJSON(c1, c2, c3))
df.withColumn("c_json", cToJSON($"C1", $"C2", $"C3"))

mysql recursive self join

create table test(
container varchar(1),
contained varchar(1)
);
insert into test values('X','A');
insert into test values('X','B');
insert into test values('X','C');
insert into test values('Y','D');
insert into test values('Y','E');
insert into test values('Y','F');
insert into test values('A','P');
insert into test values('P','Q');
insert into test values('Q','R');
insert into test values('R','Y');
insert into test values('Y','X');
select * from test;
mysql> select * from test;
+-----------+-----------+
| container | contained |
+-----------+-----------+
| X | A |
| X | B |
| X | C |
| Y | D |
| Y | E |
| Y | F |
| A | P |
| P | Q |
| Q | R |
| R | Y |
| Y | X |
+-----------+-----------+
11 rows in set (0.00 sec)
Can I find out all the distinct values contained under 'X' using a single self join?
EDIT
Like, Here
X contains A, B and C
A contains P
P contains Q
Q contains R
R contains Y
Y contains C, D and E...
So I want to display A,B,C,D,E,P,Q,R,Y when I query for X.
EDIT
Got it right by programming.
package com.catgen.helper;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.catgen.factories.Nm2NmFactory;
public class Nm2NmHelper {
private List<String> fetched;
private List<String> fresh;
public List<String> findAllContainedNMByMarketId(Connection conn, String marketId) throws SQLException{
fetched = new ArrayList<String>();
fresh = new ArrayList<String>();
fresh.add(marketId.toLowerCase());
while(fresh.size()>0){
fetched.add(fresh.get(0).toLowerCase());
fresh.remove(0);
List<String> tempList = Nm2NmFactory.getContainedNmByContainerNm(conn, fetched.get(fetched.size()-1));
if(tempList!=null){
for(int i=0;i<tempList.size();i++){
String current = tempList.get(i).toLowerCase();
if(!fetched.contains(current) && !fresh.contains(current)){
fresh.add(current);
}
}
}
}
return fetched;
}
}
Not the same table and fields though. But I hope you get the concept.
Thanks guys.
You can't get all the contained objects recursively using a single join with that data structure. You would need a recursive query but MySQL doesn't yet support that.
You could however construct a closure table, then you can do it with a simple query. See Bill Karwin's slideshow Models for heirarchical data for more details and other approaches (for example, nested sets). Slide 69 compares the different designs for ease of implementing 'Query subtree'. Your chosen design (adjacency list) is the most awkward of all four designs for this type of query.
What about reading the whole table into a php array, and determine the children via. a function which would call itself?
But this is not a good solution if the table has more than 10000 rows...