Sum over multiple count field - mysql

I try to sum mutliple count fields with JOOQ and a MySQL database.
At the moment my code looks like this:
int userId = 1;
Field<Object> newField = DSL.select(DSL.count()).from(
DSL.select(DSL.count())
.from(REQUIREMENT)
.where(REQUIREMENT.CREATOR_ID.equal(userId))
.unionAll(DSL.select(DSL.count())
.from(REQUIREMENT)
.where(REQUIREMENT.LEAD_DEVELOPER_ID.equal(userId)))
which always returns 2 as newField. But I want know how many times an user is the creator of a requirement PLUS the lead developer of a requirement.

You say "sum over multiple count", but that's not what you're doing. You do "count the number of counts". The solution is, of course, something like this:
// Assuming this to avoid referencing DSL all the time:
import static org.jooq.impl.DSL.*;
select(sum(field(name("c"), Integer.class)))
.from(
select(count().as("c"))
.from(REQUIREMENT)
.where(REQUIREMENT.CREATOR_ID.equal(userId))
.unionAll(
select(count().as("c"))
.from(REQUIREMENT)
.where(REQUIREMENT.LEAD_DEVELOPER_ID.equal(userId)))
);
Alternatively, if you plan to add many more of these counts to the sum, this might be a faster option:
select(sum(choose()
.when(REQUIREMENT.CREATOR_ID.eq(userId)
.and(REQUIREMENT.LEAD_DEVELOPER_ID.eq(userId)), inline(2))
.when(REQUIREMENT.CREATOR_ID.eq(userId), inline(1))
.when(REQUIREMENT.LEAD_DEVELOPER_ID.eq(userId), inline(1))
.otherwise(inline(0))
))
.from(REQUIREMENT);
More details about the second technique in this blog post

Related

Rails, MySql, JSON column which stores array of UUIDs - Need to do exact match

I have a model called lists, which has a column called item_ids. item_ids is a JSON column (MySQL) and the column contains array of UUIDs, each referring to one item.
Now when someone creates a new list, I need to search whether there is an existing list with same set of UUIDs, and I want to do this search using query itself for faster response. Also use ActiveRecord querying as much as possible.
How do i achieve this?
item_ids = ["11E85378-CFE8-39F8-89DC-7086913CFD4B", "11E85354-304C-0664-9E81-0A281BE2CA42"]
v = List.new(item_ids: item_ids)
v.save!
Now, how do I check whether a list exists which has item ids exactly matches with that mentioned in query ? Following wont work.
list_count = List.where(item_ids: item_ids).count
Edit 1
List.where("JSON_CONTAINS(item_ids, ?) ", item_ids.to_json).count
This statement works, but it counts even if only one of the item matches. Looking for exact number of items.
Edit 2
List.where("JSON_CONTAINS( item_ids, ?) and JSON_LENGTH(item_ids) = ?", item_ids.to_json, item_ids.size).count
Looks like this is working
You can implement a has many relation between lists and items and then access like this.
List.includes(:item).where('items.id in (?)',item_ids)
To implement has_many relation:
http://guides.rubyonrails.org/association_basics.html#the-has-many-through-association

Does Django (w/ MySQL) do only 1 table lookup or multiple when counting with CASE

Lets say we have a model of the form
class Person(models.Model):
is_gay = models.BooleanField()
is_tall = models.BooleanField()
is_nice = models.BooleanField()
...
Now lets say we want to know how many people we have meeting different criteria. We can achieve this by counting them
num_gays = models.Person.objects.filter(is_gay=True).count()
num_tall_and_nice = models.Person.objects.filter(is_tall=True, is_nice=True).count()
Unfortunately, this would need 2 database queries. As you can imagine, as the number of types of people grows large, (e.g. an enumeration of 25/30) this can slow down quite a lot.
So now we can optimize by instead using aggregations
aggregations = {}
when = models.When(is_gay=True, then=1)
case = models.Case(when, output_fields=models.IntegerField())
sum = models.Sum(case)
aggregations['num_gays'] = sum
when = models.When(is_tall=True, is_nice=Ttrue, then=1)
case = models.Case(when, output_fields=models.IntegerField())
sum = models.Sum(case)
aggregations['num_tall_and_nice'] = sum
result = models.Person.objects.aggregate(**aggregations)
However, I am curious as to how Django (using MySQL) processes this query.
Does it look at the table only once, and each time it looks at a row, it adds 1 to every single CASE statement that applies. Or does it look at the table N times where N is the number of CASE statements?
No this will be just one query. Because django Case/When more or less directly translates to mysql CASE/WHEN. However when in doubt you can always find out what queries have been executed by django using this bit of code
from django.db import connection
print connection.queries
Any query on and RDBMS without a where clause examines the full table. Every single one of the rows in your table will be looked at. This query doesn't seem to have a where clause.
But as for the number of queries that are executed. It's exactly 1

Rows count of Couchbase view subset

When I query some view in Couchbase I get the response that has following structure:
{
"total_rows":100,
"rows":[...]
}
'total_rows' is very useful property that I can use for paging.
But lets say I select only a subset of view using 'start_key' and 'end_key' and of course I don't know how big this subset will be. 'total_rows' is still the same number (as I understand it's just total of whole view). Is there any easy way to know how many rows was selected in subset?
You can use the in-built reduce function _count to get the total count of your query.
Just add _count as reduce function for your view. After that, you will need to make two calls to couchbase:
In one call, you'll set the query param reduce=true (along with either group=true or group_level=n, depending upon how you're sending your key(s)). This will give you the total count of your filtered rows.
In the other call, you'll disable the reduce function with reduce=false because you now need the actual rows.
You can find more details about map and reduce at http://docs.couchbase.com/admin/admin/Views/views-writing.html
You can just use an array count/total/length in whatever language you are using.
For example in PHP:
$result = $cb->view("dev_beer", "beer_by_name", array('startkey' => 'O', 'endkey'=>'P'));
echo "total = >>".count($result["rows"])
If you're actually wanting to paginate your data then you should use limit and skip:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing-querying-pagination.html
If you have to paginate the view in the efficient way, you actually don't need to specify both start and the end.
Generally it is possible to use startkey/startkey_id and limit. In this case the limit will tell you that the page won't be bigger than known size.
Both cases are described in CouchDB book: http://guide.couchdb.org/draft/recipes.html#pagination
Here is how it works:
Request rows_per_page + 1 rows from the view
Display rows_per_page rows, store + 1 row as next_startkey and next_startkey_docid
As page information, keep startkey and next_startkey
Use the next_* values to create the next link, and use the others to create the previous link

What's a good pattern to implement running queries on individual entities from a obtained result set in Datanucleus/JPA

I am basically obtaining a decently sized result set (a few thousand) through datanucleus by running a JPQL query. On each of these, I also want to find the number of references from another table. The data is in a MySQL db.
For example:
List<Instrument> instruments = em().createQuery("SELECT i FROM Instrument AS i").getResultList();
for(Instrument i : instruments)
{
Query q = em().createQuery("SELECT COUNT(c) FROM Component AS c WHERE c.instrument.id = :id")
q.setParameter("id", i.getId());
long count = (Long) q.getSingleResult();
}
So, basically I want the list of instruments and also the list of components attached to the instrument as per the above example.
I've used similar code at a bunch of places and it performs pretty poorly. I understand that for 2000 instruments, I'll fire 2000 additional queries to count components and that will slow things down. I'm sure there's a better pattern to obtain the same result that I want. How can I get things to speed up?
That's right, this is not an optimal solution. But the good news is that all of this can be done with one or at most two queries.
For instance you don't have to execute the counting query once for each instrument.
You can use grouping and get all counts with one query:
List<Instrument> instruments = em().createQuery("SELECT i FROM Instrument AS i").getResultList();
Query q = em().createQuery("SELECT c.instrument.id, COUNT(c) FROM Component AS c GROUP BY c.instrument.id")
List<Object[]> counts = q.getResultList();
for (Object[] elem : counts) {
// do something
// elem[0] is instrument ID
// elem[1] is count
}
I haven't check that but you can probably also do everything with one query by putting the second query as a subquery in the first one:
SELECT i
(SELECT COUNT(c) FROM Component AS c WHERE c.instrument.id = i.id)
FROM Instrument AS i
Similar to the first example, result list elem[0] would be an Instrument and elem[1] the count. It can be less efficient because the DB will have to execute the subquery for each instrument anyway, but it will be still quicker than your code, because it happens fully on DB side (no round-trips to DB for each counting query).

Logical Column in MySQL - How?

I have a datamodel, let's say: invoices (m:n) invoice_items
and currently I store the invoice total, calculated in PHP by totalling invoice_items, in a column in invoices. I don't like storing derived data as it paves the way for errors later.
How can I create a logical column in the invoices table in MySql? Is this something I would be better handling in the PHP (in this case CakePHP)?
There's something called Virtual Fields in CakePHP which allows you to achieve the same result from within your Model instead of relying on support from MySQL. Virtual Fields allow you to "mashup" various data within your model and provide that as an additional column in your record. It's cleaner than the other approaches here...(no afterFind() hacking).
Read more here: http://book.cakephp.org/view/1608/Virtual-fields
Leo,
One thing you could do is to modify the afterFind() method in your model. This would recalculate the total any time you retrieve an invoice (costing runtime processing), but would mean you're not storing it in the invoices table, which is apparently what you want to avoid (correct if I'm wrong).
Try this:
class Invoice extends AppModel {
// .. other stuff
function afterFind() {
parent::afterFind();
$total = 0;
foreach( $this->data['Invoice']['InvoiceItems'] as $item )
$total += ($item['cost'] * $item['quantity']);
$this->data['Invoice']['total'] = $total;
}
}
I may have messed up the arrays on the hasMany relationship (the foreach line), but I hope you get the jist of it. HTH,
Travis
Either you can return the derived one when you want it via
SELECT COUNT(1) as total FROM invoice_items
Or if invoices can be multiple,
//assuming that invoice_items.num is how many there are per row
SELECT SUM(num) as total FROM invoice_items
Or you can use a VIEW, if you have a certain way you want it represented all the time.
http://forge.mysql.com/wiki/MySQL_virtual_columns_preview
It's not implemented yet, but it should be implemented in mysql 6.0
Currently you could create a view.