How to use the distinct method in Rails with Arel Table?

How to use the distinct method in Rails with Arel Table? - mysql

I am looking to run the following query in Rails (I have used the scuttle.io site to convert my SQL to rails-friendly syntax):
Here is the original query:
SELECT pools.name AS "Pool Name", COUNT(DISTINCT stakings.user_id) AS "Total Number of Users Per Pool" from stakings
INNER JOIN pools ON stakings.pool_id = pools.id
INNER JOIN users ON users.id = stakings.user_id
INNER JOIN countries ON countries.code = users.country
WHERE countries.kyc_flow = 1
GROUP BY (pools.name);
And here is the scuttle.io query:
<%Staking.select(
[
Pool.arel_table[:name].as('Pool_Name'), Staking.arel_table[:user_id].count.as('Total_Number_of_Users_Per_Pool')
]
).where(Country.arel_table[:kyc_flow].eq(1)).joins(
Staking.arel_table.join(Pool.arel_table).on(
Staking.arel_table[:pool_id].eq(Pool.arel_table[:id])
).join_sources
).joins(
Staking.arel_table.join(User.arel_table).on(
User.arel_table[:id].eq(Staking.arel_table[:user_id])
).join_sources
).joins(
Staking.arel_table.join(Country.arel_table).on(
Country.arel_table[:code].eq(User.arel_table[:country])
).join_sources
).group(Pool.arel_table[:name]).each do |x|%>
<p><%=x.Pool_Name%><p>
<p><%=x.Total_Number_of_Users_Per_Pool%>
<%end%>
Now, as you may notice, sctuttle.io does not include the distinct parameter which I need. How in the world can I use distinct here without getting errors such as "method distinct does not exist for Arel Node?" or just syntax errors?
Is there any way to write the above query using rails ActiveRecord? I am sure there is, but I am really not sure how.

Answer
The Arel::Nodes::Count class (an Arel::Nodes::Function) accepts a boolean value for distinctness.
def initialize expr, distinct = false, aliaz = nil
super(expr, aliaz)
#distinct = distinct
end
The #count expression is a shortcut for the same and also accepts a single argument
def count distinct = false
Nodes::Count.new [self], distinct
end
So in your case you could use either of the below options
Arel::Nodes::Count.new([Staking.arel_table[:user_id]],true,'Total_Number_of_Users_Per_Pool')
# OR
Staking.arel_table[:user_id].count(true).as('Total_Number_of_Users_Per_Pool')
Suggestion 1:
The Arel you have seems a bit overkill. Given the natural relationships you should be able to simplify this a bit e.g.
country_table = Country.arel_table
Staking
.joins(:pools,:users)
.joins( Arel::Nodes::InnerJoin(
country_table,
country_table.create_on(country_table[:code].eq(User.arel_table[:country])))
.select(
Pool.arel_table[:name],
Staking.arel_table[:user_id].count(true).as('Total_Number_of_Users_Per_Pool')
)
.where(countries: {kyc_flow: 1})
.group(Pool.arel_table[:name])
Suggestion 2: Move this query to your controller. The view has no business making database calls.

Related

Query matching() causing duplicate rows and distinct() not working

I am trying to filter one table Payments by a field on the associated table Invoices.
Using the function matching() on the query object filters correctly but causes duplicate rows. It seemed like the solution was using distinct(), but calling distinct(Payments.id) results in an invalid query. I'm doing the following in a controller action.
$conditions = [
'Payments.is_deleted =' => false
];
$args = [
'conditions' => $conditions,
'contain' => ['Invoices', 'Invoices.Clients'],
];
$payments = $this->Payments->find('all', $args);
if($issuer) {
// This causes duplicate rows
$payments->matching('Invoices', function ($q) use ($issuer) {
return $q->where(['Invoices.issuer_id' => $issuer['id']]);
});
// $payments->distinct('Payments.id'); // Causes a mysql error
}
Am I correct in thinking that distinct() is what I need, and if so any idea what's missing to make it work?
I'm getting the following mysql error when uncommenting the line above:
Error: SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #8 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'InvoicesPayments.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Full query:
SELECT
PAYMENTS.ID AS `PAYMENTS__ID`,
PAYMENTS.CREATED AS `PAYMENTS__CREATED`,
PAYMENTS.MODIFIED AS `PAYMENTS__MODIFIED`,
PAYMENTS.DATE_REGISTERED AS `PAYMENTS__DATE_REGISTERED`,
PAYMENTS.USER_ID AS `PAYMENTS__USER_ID`,
PAYMENTS.AMOUNT AS `PAYMENTS__AMOUNT`,
PAYMENTS.IS_DELETED AS `PAYMENTS__IS_DELETED`,
INVOICESPAYMENTS.ID AS `INVOICESPAYMENTS__ID`,
INVOICESPAYMENTS.INVOICE_ID AS `INVOICESPAYMENTS__INVOICE_ID`,
INVOICESPAYMENTS.PAYMENT_ID AS `INVOICESPAYMENTS__PAYMENT_ID`,
INVOICESPAYMENTS.PART_AMOUNT AS `INVOICESPAYMENTS__PART_AMOUNT`,
INVOICES.ID AS `INVOICES__ID`,
INVOICES.CREATED AS `INVOICES__CREATED`,
INVOICES.MODIFIED AS `INVOICES__MODIFIED`,
INVOICES.IS_PAID AS `INVOICES__IS_PAID`,
INVOICES.IS_DELETED AS `INVOICES__IS_DELETED`,
INVOICES.CLIENT_ID AS `INVOICES__CLIENT_ID`,
INVOICES.ISSUER_ID AS `INVOICES__ISSUER_ID`,
INVOICES.NUMBER AS `INVOICES__NUMBER`,
INVOICES.SUBTOTAL AS `INVOICES__SUBTOTAL`,
INVOICES.TOTAL AS `INVOICES__TOTAL`,
INVOICES.DATE_REGISTERED AS `INVOICES__DATE_REGISTERED`,
INVOICES.CURRENCY AS `INVOICES__CURRENCY`,
INVOICES.RECEIVER_NAME AS `INVOICES__RECEIVER_NAME`,
INVOICES.RECEIVER_RFC AS `INVOICES__RECEIVER_RFC`,
INVOICES.EMAIL_SENDER AS `INVOICES__EMAIL_SENDER`,
INVOICES.PDF_PATH AS `INVOICES__PDF_PATH`
FROM
PAYMENTS PAYMENTS
INNER JOIN
INVOICES_PAYMENTS INVOICESPAYMENTS
ON PAYMENTS.ID = (
INVOICESPAYMENTS.PAYMENT_ID
)
INNER JOIN
INVOICES INVOICES
ON (
INVOICES.ISSUER_ID = :C0
AND INVOICES.ID = (
INVOICESPAYMENTS.INVOICE_ID
)
)
WHERE
(
PAYMENTS.IS_DELETED = :C1
AND PAYMENTS.DATE_REGISTERED >= :C2
AND PAYMENTS.DATE_REGISTERED <= :C3
)
GROUP BY
PAYMENT_ID
ORDER BY
PAYMENTS.DATE_REGISTERED ASC

That behavior is expected, as matching will use an INNER join, and yes, grouping is how you avoid duplicates:
As this function will create an INNER JOIN, you might want to consider calling distinct on the find query as you might get duplicate rows if your conditions don’t exclude them already. This might be the case, for example, when the same users comments more than once on a single article.
Cookbook > Database Access & ORM > Query Builder > Loading Associations > Filtering by Associated Data
As the error message states, your MySQL server is configured to use the strict only_full_group_by mode, where your query is invalid. You can either disable that strict mode as mentioned by Akash prajapati (which can come with its own problems, as MySQL is then allowed to pretty much pick values of a group at random), or you could change how you query things in order to conform to the strict mode.
In your case where you need to group on the primary key, you could simply switch to using innerJoinWith() instead, unlike matching() this will not add any fields of that association to the SELECT list, and things should be fine in strict mode, as everything else is functionally dependent:
In cases where you would group on a key that would break functional dependency detection, one way to solve that could for example be to use a subquery for filtering, one that only selects that key, something along the lines of this:
$conditions = [
'Payments.is_deleted =' => false
];
$payments = $this->Payments
->find()
->contain(['Invoices.Clients']);
if($issuer) {
$matcherQuery = $this->Payments
->find()
->select(['Payments.some_other_field'])
->where($conditions)
->matching('Invoices', function ($q) use ($issuer) {
return $q->where(['Invoices.issuer_id' => $issuer['id']]);
})
->distinct('Payments.some_other_field');
$payments->where([
'Payments.some_other_field IN' => $matcherQuery
]);
} else {
$payments->where($conditions);
}
This will result in a query similar to this, where the outer query can then select all the fields you want:
SELECT
...
FROM
payments
WHERE
payments.some_other_field IN (
SELECT
payments.some_other_field
FROM
payments
INNER JOIN
invoices_payments ON
payments.id = invoices_payments.payment_id
INNER JOIN
invoices ON
invoices.issuer_id = ...
AND
invoices.id = invoices_payments.invoice_id
WHERE
payments.is_deleted = ...
GROUP BY
payments.some_other_field
)

The problem with sql_mode value in mysql so you need to set the sql_mode value as blank and then you can try and working fine for you.
SET GLOBAL sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''));
Please let me know still anything else.

I had the same issue, but was too afraid to set the sql_mode as mentioned by #Akash and also too much in a hurry to restructure the query. So I decided to use the inherited Collection method indexBy()
https://book.cakephp.org/4/en/core-libraries/collections.html#Cake\Collection\Collection::indexBy
$resultSetFromYourPaymentsQuery = $resultSetFromYourPaymentsQuery->indexBy('id');
It worked like a charm and it is DB independent.
EDIT: After some more tinkering, this might not be practical for all use cases. Replacing matching with innerJoinWith as proposed in the accepted answer will probably solve it in more generalized manner.

"NOT IN" for Active Record

I have a MySQL query that I am trying to chain a "NOT IN" at the end of it.
Here is what it looks like in ruby using Active Record:
not_in = find_by_sql("SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6;").map(&:parent_dimension_id)
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
.where(relation_type_id: model_relation_id,
parent_dimension_id: sub_type_ids,
child_dimension_id: model_type)
.where.not(parent_dimension_id: not_in)
So the SQL query I'm trying to do looks like this:
INNER JOIN dimensions ON child_dimension_id = dimensions.id
WHERE relations.relation_type_id = 5
AND relations.parent_dimension_id
NOT IN(SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6);
Can someone confirm to me what I should use for that query?
do I chain on where.not ?

If you really do want
SELECT parent_dimension_id
FROM relations
WHERE relation_type_id = 6
as a subquery, you just need to convert that SQL to an ActiveRecord relation:
Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
then use that as a value in a where call the same way you'd use an array:
not_parents = Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
Relation.joins('...')
.where(relation_type_id: model_relation_id, ...)
.where.not(parent_dimension_id: not_parents)
When you use an ActiveRecord relation as a value in a where and that relation selects a single column:
r = M1.select(:one_column).where(...)
M2.where(:column => r)
ActiveRecord is smart enough to inline r's SQL as an in (select one_column ...) rather than doing two queries.
You could probably replace your:
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
with a simpler joins(:some_relation) if your relations are set up too.

You can feed where clauses with values or arrays of values, in which case they will be translated into in (?) clauses.
Thus, the last part of your query could contain a mapping:
.where.not(parent_dimension_id:Relation.where(relation_type_id:6).map(&:parent_dimension_id))
Or you can prepare a statement
.where('parent_dimension_id not in (?)', Relation.where(relation_type_id:6).map(&:parent_dimension_id) )
which is essentially exactly the same thing

INNER JOIN Results from Select Statement using Doctrine QueryBuilder

Can you use Doctrine QueryBuilder to INNER JOIN a temporary table from a full SELECT statement that includes a GROUP BY?
The ultimate goal is to select the best version of a record. I have a viewVersion table that has multiple versions with the same viewId value but different timeMod. I want to find the version with the latest timeMod (and do a lot of other complex joins and filters on the query).
Initially people assume you can do a GROUP BY viewId and then ORDER BY timeMod, but ORDER BY has no effect on GROUP BY, and MySQL will return random results. There are a ton of answers out there (e.g. here) that explain the problem with using GROUP and offer a solution, but I am having trouble interpreting the Doctrine docs to find a way to implement the SQL with Doctrine QueryBuilder (if it's even possible). Why don't I just use DQL? I may have to, but I have a lot of dynamic filters and joins that are much easier to do with QueryBuilder, so I wanted to see if that's possible.
Sample MySQL to Reproduce in Doctrine QueryBuilder
SELECT vv.*
FROM view_version vv
#inner join only returns where the result sets overlap, i.e. one record
INNER JOIN (
SELECT MAX(timeMod) maxTimeMod, viewId
FROM view_version
GROUP BY viewId
) version ON version.viewId = vv.viewId AND vv.timeMod = version.maxTimeMod
#join other tables for filter, etc
INNER JOIN view v ON v.id = vv.viewId
INNER JOIN content_type c ON c.id = v.contentTypeId
WHERE vv.siteId=1
AND v.contentTypeId IN (2)
ORDER BY vv.title ASC;
Theoretical Solution via Query Builder (not working)
I am thinking that the JOIN needs to inject a DQL statement, e.g.
$em = $this->getDoctrine()->getManager();
$viewVersionRepo = $em->getRepository('GutensiteCmsBundle:View\ViewVersion');
$queryMax = $viewVersionRepo->createQueryBuilder()
->addSelect('MAX(timeMod) AS timeModMax')
->addSelect('viewId')
->groupBy('viewId');
$queryBuilder = $viewVersionRepo->createQueryBuilder('vv')
// I tried putting the query in a parenthesis, to no avail
->join('('.$queryMax->getDQL().')', 'version', 'WITH', 'vv.viewId = version.viewId AND vv.timeMod = version.timeModMax')
// Join other Entities
->join('e.view', 'view')
->addSelect('view')
->join('view.contentType', 'contentType')
->addSelect('contentType')
// Perform random filters
->andWhere('vv.siteId = :siteId')->setParameter('siteId', 1)
->andWhere('view.contentTypeId IN(:contentTypeId)')->setParameter('contentTypeId', $contentTypeIds)
->addOrderBy('e.title', 'ASC');
$query = $queryBuilder->getQuery();
$results = $query->getResult();
My code (which may not match the above example perfectly) outputs:
SELECT e, view, contentType
FROM Gutensite\CmsBundle\Entity\View\ViewVersion e
INNER JOIN (
SELECT MAX(v.timeMod) AS timeModMax, v.viewId
FROM Gutensite\CmsBundle\Entity\View\ViewVersion v
GROUP BY v.viewId
) version WITH vv.viewId = version.viewId AND vv.timeMod = version.timeModMax
INNER JOIN e.view view
INNER JOIN view.contentType contentType
WHERE e.siteId = :siteId
AND view.contentTypeId IN (:contentTypeId)
ORDER BY e.title ASC
This Answer seems to indicate that it's possible in other contexts like IN statements, but when I try the above method in the JOIN, I get the error:
[Semantical Error] line 0, col 90 near '(SELECT MAX(v.timeMod)': Error: Class '(' is not defined.

A big thanks to #AdrienCarniero for his alternative query structure for sorting the highest version with a simple JOIN where the entity's timeMod is less than the joined table timeMod.
Alternative Query
SELECT view_version.*
FROM view_version
#inner join to get the best version
LEFT JOIN view_version AS best_version ON best_version.viewId = view_version.viewId AND best_version.timeMod > view_version.timeMod
#join other tables for filter, etc
INNER JOIN view ON view.id = view_version.viewId
INNER JOIN content_type ON content_type.id = view.contentTypeId
WHERE view_version.siteId=1
# LIMIT Best Version
AND best_version.timeMod IS NULL
AND view.contentTypeId IN (2)
ORDER BY view_version.title ASC;
Using Doctrine QueryBuilder
$em = $this->getDoctrine()->getManager();
$viewVersionRepo = $em->getRepository('GutensiteCmsBundle:View\ViewVersion');
$queryBuilder = $viewVersionRepo->createQueryBuilder('vv')
// Join Best Version
->leftJoin('GutensiteCmsBundle:View\ViewVersion', 'bestVersion', 'WITH', 'bestVersion.viewId = e.viewId AND bestVersion.timeMod > e.timeMod')
// Join other Entities
->join('e.view', 'view')
->addSelect('view')
->join('view.contentType', 'contentType')
->addSelect('contentType')
// Perform random filters
->andWhere('vv.siteId = :siteId')->setParameter('siteId', 1)
// LIMIT Joined Best Version
->andWhere('bestVersion.timeMod IS NULL')
->andWhere('view.contentTypeId IN(:contentTypeId)')->setParameter('contentTypeId', $contentTypeIds)
->addOrderBy('e.title', 'ASC');
$query = $queryBuilder->getQuery();
$results = $query->getResult();
In terms of performance, it really depends on the dataset. See this discussion for details.
TIP: The table should include indexes on both these values (viewId and timeMod) to speed up results. I don't know if it would also benefit from a single index on both fields.
A native SQL query using the original JOIN method may be better in some cases, but compiling the query over an extended range of code that dynamically creates it, and getting the mappings correct is a pain. So this is at least an alternative solution that I hope helps others.

Django ORM - Grouped aggregates with different select clauses

Imagine we have the Django ORM model Meetup with the following definition:
class Meetup(models.Model):
language = models.CharField()
speaker = models.CharField()
date = models.DateField(auto_now=True)
I'd like to use a single query to fetch the language, speaker and date for the
latest event for each language.
>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
{'speaker': u'mike', 'language': u'python', 'latest_date': ...},
{'speaker': u'ryan', 'language': u'python', 'latest_date': ...},
{'speaker': u'noah', 'language': u'node', 'latest_date': ...},
{'speaker': u'shawn', 'language': u'node', 'latest_date': ...},
]
D'oh! We're getting the latest event, but for the wrong grouping!
It seems like I need a way to GROUP BY the language but SELECT on a different
set of fields?
Update - this sort of query seems fairly easy to express in SQL:
SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;
I'd love a way to do this without using Django's raw() - is it possible?
Update 2 - after much searching, it seems there are similar questions on SO:
Django Query that gets the most recent objects
How can I do a greatest n per group query in Django
MySQL calls this sort of query a group-wise maximum of a certain column.
Update 3 - in the end, with #danihp's help, it seems the best you can do
is two queries. I've used the following approach:
# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
.annotate(latest_pk=Max("pk"))
.values_list("latest_pk", flat=True))
# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)
This question is a follow up to my previous question:
Django ORM - Get latest record for group

This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )
But this is not SQL, this is django query api. Easy way is to do a query for each language:
languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [ Meetup
.objects
.filter( language = l )
.latest( 'date' )
for l in languages
]
This crash if some language don't has meetings.
The other approach is to get all max data for each language:
last_dates = ( Meetup
.objects
.values("language")
.annotate(ldate=models.Max("date"))
.order_by() )
q= reduce(lambda q,meetup:
q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ),
last_dates, Q())
your_query = Meetup.objects.filter(q)
Perhaps someone can explain how to do it in a single query without raw sql.
Edited due OP comment
You are looking for:
"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"
Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker is on select clause (without aggregated function) but not appear in group by.
In mysql they are not guaranties than showed result speaker was that match with max date. Because this, we are not facing a easy query.
Quoting MySQL docs:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause...However, this is useful primarily when all values
in each nonaggregated column not named in the GROUP BY are the same
for each group.
The most close query to match your requirements is:
Reults = ( Meetup
.objects
.values("language","speaker")
.annotate(ldate=models.Max("date"))
.order_by() )

Why is this ActiveRecord Query NOT ambiguous?

With Rails 3, I am using the following kind of code to query a MySQL database:
MyData.joins('JOIN (SELECT id, name FROM sellers) AS Q
ON seller_id = Q.id').
select('*').
joins('JOIN (SELECT id, name FROM users) AS T
ON user_id = T.id').
select("*").each do |record|
#..........
Then, a bit further down, I try to access a "name" with this code: (note that both sellers and users have a name column).
str = record.name
This line is giving me a "user name" instead of a "seller name", but shouldn't it give nothing? Since I joined multiple tables with a name column, shouldn't I be get an error like "column 'name' is ambiguous"? Why isn't this happening?
And by the way, the code behaves the same way whether I include that first "select('*')" line or not.
Thank you.

Firstly, there's no reason to call select twice - only the last call will actually be used. Secondly, you should not be using select("*"), because the SQL database (and Rails) will not rename the ambiguous columns for you. Instead, use explicit naming for the extra columns that you need:
MyData.joins('JOIN (SELECT..) AS Q ON ...', 'JOIN (SELECT...) AS T ON ...').
select('my_datas.*, T.name as t_name, Q.name as q_name').
each do |record|
# do something
end
Because of this, there's no reason to make a subquery in your JOIN statements:
MyData.joins('JOIN sellers AS Q ON ...', 'JOIN users AS T ON ...').
And finally, you should already have belongs_to associations set up for seller and user. That would mean that you can just do this:
MyData.joins(:seller, :user).
select("my_datas.*, sellers.name as seller_name, users.name as user_name").
each do |record|
# do something
end
Now you can call record.seller_name and record.user_name without any ambiguity.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to use the distinct method in Rails with Arel Table? - mysql

Related

Query matching() causing duplicate rows and distinct() not working

"NOT IN" for Active Record

INNER JOIN Results from Select Statement using Doctrine QueryBuilder

Django ORM - Grouped aggregates with different select clauses

Why is this ActiveRecord Query NOT ambiguous?

Categories

Resources