Query matching() causing duplicate rows and distinct() not working - mysql

I am trying to filter one table Payments by a field on the associated table Invoices.
Using the function matching() on the query object filters correctly but causes duplicate rows. It seemed like the solution was using distinct(), but calling distinct(Payments.id) results in an invalid query. I'm doing the following in a controller action.
$conditions = [
'Payments.is_deleted =' => false
];
$args = [
'conditions' => $conditions,
'contain' => ['Invoices', 'Invoices.Clients'],
];
$payments = $this->Payments->find('all', $args);
if($issuer) {
// This causes duplicate rows
$payments->matching('Invoices', function ($q) use ($issuer) {
return $q->where(['Invoices.issuer_id' => $issuer['id']]);
});
// $payments->distinct('Payments.id'); // Causes a mysql error
}
Am I correct in thinking that distinct() is what I need, and if so any idea what's missing to make it work?
I'm getting the following mysql error when uncommenting the line above:
Error: SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #8 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'InvoicesPayments.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Full query:
SELECT
PAYMENTS.ID AS `PAYMENTS__ID`,
PAYMENTS.CREATED AS `PAYMENTS__CREATED`,
PAYMENTS.MODIFIED AS `PAYMENTS__MODIFIED`,
PAYMENTS.DATE_REGISTERED AS `PAYMENTS__DATE_REGISTERED`,
PAYMENTS.USER_ID AS `PAYMENTS__USER_ID`,
PAYMENTS.AMOUNT AS `PAYMENTS__AMOUNT`,
PAYMENTS.IS_DELETED AS `PAYMENTS__IS_DELETED`,
INVOICESPAYMENTS.ID AS `INVOICESPAYMENTS__ID`,
INVOICESPAYMENTS.INVOICE_ID AS `INVOICESPAYMENTS__INVOICE_ID`,
INVOICESPAYMENTS.PAYMENT_ID AS `INVOICESPAYMENTS__PAYMENT_ID`,
INVOICESPAYMENTS.PART_AMOUNT AS `INVOICESPAYMENTS__PART_AMOUNT`,
INVOICES.ID AS `INVOICES__ID`,
INVOICES.CREATED AS `INVOICES__CREATED`,
INVOICES.MODIFIED AS `INVOICES__MODIFIED`,
INVOICES.IS_PAID AS `INVOICES__IS_PAID`,
INVOICES.IS_DELETED AS `INVOICES__IS_DELETED`,
INVOICES.CLIENT_ID AS `INVOICES__CLIENT_ID`,
INVOICES.ISSUER_ID AS `INVOICES__ISSUER_ID`,
INVOICES.NUMBER AS `INVOICES__NUMBER`,
INVOICES.SUBTOTAL AS `INVOICES__SUBTOTAL`,
INVOICES.TOTAL AS `INVOICES__TOTAL`,
INVOICES.DATE_REGISTERED AS `INVOICES__DATE_REGISTERED`,
INVOICES.CURRENCY AS `INVOICES__CURRENCY`,
INVOICES.RECEIVER_NAME AS `INVOICES__RECEIVER_NAME`,
INVOICES.RECEIVER_RFC AS `INVOICES__RECEIVER_RFC`,
INVOICES.EMAIL_SENDER AS `INVOICES__EMAIL_SENDER`,
INVOICES.PDF_PATH AS `INVOICES__PDF_PATH`
FROM
PAYMENTS PAYMENTS
INNER JOIN
INVOICES_PAYMENTS INVOICESPAYMENTS
ON PAYMENTS.ID = (
INVOICESPAYMENTS.PAYMENT_ID
)
INNER JOIN
INVOICES INVOICES
ON (
INVOICES.ISSUER_ID = :C0
AND INVOICES.ID = (
INVOICESPAYMENTS.INVOICE_ID
)
)
WHERE
(
PAYMENTS.IS_DELETED = :C1
AND PAYMENTS.DATE_REGISTERED >= :C2
AND PAYMENTS.DATE_REGISTERED <= :C3
)
GROUP BY
PAYMENT_ID
ORDER BY
PAYMENTS.DATE_REGISTERED ASC

That behavior is expected, as matching will use an INNER join, and yes, grouping is how you avoid duplicates:
As this function will create an INNER JOIN, you might want to consider calling distinct on the find query as you might get duplicate rows if your conditions don’t exclude them already. This might be the case, for example, when the same users comments more than once on a single article.
Cookbook > Database Access & ORM > Query Builder > Loading Associations > Filtering by Associated Data
As the error message states, your MySQL server is configured to use the strict only_full_group_by mode, where your query is invalid. You can either disable that strict mode as mentioned by Akash prajapati (which can come with its own problems, as MySQL is then allowed to pretty much pick values of a group at random), or you could change how you query things in order to conform to the strict mode.
In your case where you need to group on the primary key, you could simply switch to using innerJoinWith() instead, unlike matching() this will not add any fields of that association to the SELECT list, and things should be fine in strict mode, as everything else is functionally dependent:
In cases where you would group on a key that would break functional dependency detection, one way to solve that could for example be to use a subquery for filtering, one that only selects that key, something along the lines of this:
$conditions = [
'Payments.is_deleted =' => false
];
$payments = $this->Payments
->find()
->contain(['Invoices.Clients']);
if($issuer) {
$matcherQuery = $this->Payments
->find()
->select(['Payments.some_other_field'])
->where($conditions)
->matching('Invoices', function ($q) use ($issuer) {
return $q->where(['Invoices.issuer_id' => $issuer['id']]);
})
->distinct('Payments.some_other_field');
$payments->where([
'Payments.some_other_field IN' => $matcherQuery
]);
} else {
$payments->where($conditions);
}
This will result in a query similar to this, where the outer query can then select all the fields you want:
SELECT
...
FROM
payments
WHERE
payments.some_other_field IN (
SELECT
payments.some_other_field
FROM
payments
INNER JOIN
invoices_payments ON
payments.id = invoices_payments.payment_id
INNER JOIN
invoices ON
invoices.issuer_id = ...
AND
invoices.id = invoices_payments.invoice_id
WHERE
payments.is_deleted = ...
GROUP BY
payments.some_other_field
)

The problem with sql_mode value in mysql so you need to set the sql_mode value as blank and then you can try and working fine for you.
SET GLOBAL sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''));
Please let me know still anything else.

I had the same issue, but was too afraid to set the sql_mode as mentioned by #Akash and also too much in a hurry to restructure the query. So I decided to use the inherited Collection method indexBy()
https://book.cakephp.org/4/en/core-libraries/collections.html#Cake\Collection\Collection::indexBy
$resultSetFromYourPaymentsQuery = $resultSetFromYourPaymentsQuery->indexBy('id');
It worked like a charm and it is DB independent.
EDIT: After some more tinkering, this might not be practical for all use cases. Replacing matching with innerJoinWith as proposed in the accepted answer will probably solve it in more generalized manner.

Related

How to use the distinct method in Rails with Arel Table?

I am looking to run the following query in Rails (I have used the scuttle.io site to convert my SQL to rails-friendly syntax):
Here is the original query:
SELECT pools.name AS "Pool Name", COUNT(DISTINCT stakings.user_id) AS "Total Number of Users Per Pool" from stakings
INNER JOIN pools ON stakings.pool_id = pools.id
INNER JOIN users ON users.id = stakings.user_id
INNER JOIN countries ON countries.code = users.country
WHERE countries.kyc_flow = 1
GROUP BY (pools.name);
And here is the scuttle.io query:
<%Staking.select(
[
Pool.arel_table[:name].as('Pool_Name'), Staking.arel_table[:user_id].count.as('Total_Number_of_Users_Per_Pool')
]
).where(Country.arel_table[:kyc_flow].eq(1)).joins(
Staking.arel_table.join(Pool.arel_table).on(
Staking.arel_table[:pool_id].eq(Pool.arel_table[:id])
).join_sources
).joins(
Staking.arel_table.join(User.arel_table).on(
User.arel_table[:id].eq(Staking.arel_table[:user_id])
).join_sources
).joins(
Staking.arel_table.join(Country.arel_table).on(
Country.arel_table[:code].eq(User.arel_table[:country])
).join_sources
).group(Pool.arel_table[:name]).each do |x|%>
<p><%=x.Pool_Name%><p>
<p><%=x.Total_Number_of_Users_Per_Pool%>
<%end%>
Now, as you may notice, sctuttle.io does not include the distinct parameter which I need. How in the world can I use distinct here without getting errors such as "method distinct does not exist for Arel Node?" or just syntax errors?
Is there any way to write the above query using rails ActiveRecord? I am sure there is, but I am really not sure how.
Answer
The Arel::Nodes::Count class (an Arel::Nodes::Function) accepts a boolean value for distinctness.
def initialize expr, distinct = false, aliaz = nil
super(expr, aliaz)
#distinct = distinct
end
The #count expression is a shortcut for the same and also accepts a single argument
def count distinct = false
Nodes::Count.new [self], distinct
end
So in your case you could use either of the below options
Arel::Nodes::Count.new([Staking.arel_table[:user_id]],true,'Total_Number_of_Users_Per_Pool')
# OR
Staking.arel_table[:user_id].count(true).as('Total_Number_of_Users_Per_Pool')
Suggestion 1:
The Arel you have seems a bit overkill. Given the natural relationships you should be able to simplify this a bit e.g.
country_table = Country.arel_table
Staking
.joins(:pools,:users)
.joins( Arel::Nodes::InnerJoin(
country_table,
country_table.create_on(country_table[:code].eq(User.arel_table[:country])))
.select(
Pool.arel_table[:name],
Staking.arel_table[:user_id].count(true).as('Total_Number_of_Users_Per_Pool')
)
.where(countries: {kyc_flow: 1})
.group(Pool.arel_table[:name])
Suggestion 2: Move this query to your controller. The view has no business making database calls.

Zend 2 subquery columns

I want to create a SQL(MySQL) query in Zend Framework 2 like:
SELECT a.id,
a.name,
a.age,
(SELECT MAX(score)
FROM scores AS s
WHERE s.user_id = a.id) AS max_score,
(SELECT SUM(time)
FROM games_played_time AS gpt
WHERE gpt.user_id = a.id) AS time_played
FROM users AS a
ORDER BY last_visited DESC
LIMIT 0, 100
Mind that this is an artificial example of existing query.
I tried creating sub-queries and then creating main select query where when I use:
$select->columns(
array(
'id',
'name',
'age',
'max_score' => new Expression('?', array($sub1),
'time_played' => new Expression('?', array($sub2)
)
I also tried using:
$subquery = new \Zend\Db\Sql\Expression("({$sub->getSqlString()})")
And even lambda functions like suggested here: http://circlical.com/blog/2014/1/27/zend-framework-2-subqueries-subselect-and-table-gateway
Still no luck because all the time I keep getting errors like:
No data supplied for parameters in prepared statement
And when I succeed in making the query work, it ends up that column contains the text of sub-queries. It starts to look that it is not possible to make multiple expressions in columns method. Any ideas?
SOLVED:
I rewrote query by query as #Tim Klever proposed. Everythin worked except one query. It turns out there is some kind of issue when using limit in subquery and in main query. In my case one of the subqueries returns multiple rows, so I ussed limit(1) to force return of a single value. But using that turned out to produce error:
No data supplied for parameters in prepared statement
I changed the query to use MAX instead of limit and now it works. Later will try to debug why this is happening.. Thank you!
The following worked for me to produce the query you listed
$maxScoreSelect = new Select();
$maxScoreSelect->from(array('s' => 'scores'));
$maxScoreSelect->columns(array(new Expression('MAX(score)')));
$maxScoreSelect->where->addPredicates('s.user_id = a.id');
$sumTimeSelect = new Select();
$sumTimeSelect->from(array('gpt' => 'games_played_time'));
$sumTimeSelect->columns(array(new Expression('SUM(time)')));
$sumTimeSelect->where->addPredicates('gpt.user_id = a.id');
$select = new Select();
$select->from(array('a' => 'users'));
$select->columns(array(
'id',
'name',
'age',
'max_score' => new Expression('?', array($maxScoreSelect)),
'time_played' => new Expression('?', array($sumTimeSelect))
));
$select->order('last_visited DESC');
$select->limit(100);
$select->offset(0);

INNER JOIN Results from Select Statement using Doctrine QueryBuilder

Can you use Doctrine QueryBuilder to INNER JOIN a temporary table from a full SELECT statement that includes a GROUP BY?
The ultimate goal is to select the best version of a record. I have a viewVersion table that has multiple versions with the same viewId value but different timeMod. I want to find the version with the latest timeMod (and do a lot of other complex joins and filters on the query).
Initially people assume you can do a GROUP BY viewId and then ORDER BY timeMod, but ORDER BY has no effect on GROUP BY, and MySQL will return random results. There are a ton of answers out there (e.g. here) that explain the problem with using GROUP and offer a solution, but I am having trouble interpreting the Doctrine docs to find a way to implement the SQL with Doctrine QueryBuilder (if it's even possible). Why don't I just use DQL? I may have to, but I have a lot of dynamic filters and joins that are much easier to do with QueryBuilder, so I wanted to see if that's possible.
Sample MySQL to Reproduce in Doctrine QueryBuilder
SELECT vv.*
FROM view_version vv
#inner join only returns where the result sets overlap, i.e. one record
INNER JOIN (
SELECT MAX(timeMod) maxTimeMod, viewId
FROM view_version
GROUP BY viewId
) version ON version.viewId = vv.viewId AND vv.timeMod = version.maxTimeMod
#join other tables for filter, etc
INNER JOIN view v ON v.id = vv.viewId
INNER JOIN content_type c ON c.id = v.contentTypeId
WHERE vv.siteId=1
AND v.contentTypeId IN (2)
ORDER BY vv.title ASC;
Theoretical Solution via Query Builder (not working)
I am thinking that the JOIN needs to inject a DQL statement, e.g.
$em = $this->getDoctrine()->getManager();
$viewVersionRepo = $em->getRepository('GutensiteCmsBundle:View\ViewVersion');
$queryMax = $viewVersionRepo->createQueryBuilder()
->addSelect('MAX(timeMod) AS timeModMax')
->addSelect('viewId')
->groupBy('viewId');
$queryBuilder = $viewVersionRepo->createQueryBuilder('vv')
// I tried putting the query in a parenthesis, to no avail
->join('('.$queryMax->getDQL().')', 'version', 'WITH', 'vv.viewId = version.viewId AND vv.timeMod = version.timeModMax')
// Join other Entities
->join('e.view', 'view')
->addSelect('view')
->join('view.contentType', 'contentType')
->addSelect('contentType')
// Perform random filters
->andWhere('vv.siteId = :siteId')->setParameter('siteId', 1)
->andWhere('view.contentTypeId IN(:contentTypeId)')->setParameter('contentTypeId', $contentTypeIds)
->addOrderBy('e.title', 'ASC');
$query = $queryBuilder->getQuery();
$results = $query->getResult();
My code (which may not match the above example perfectly) outputs:
SELECT e, view, contentType
FROM Gutensite\CmsBundle\Entity\View\ViewVersion e
INNER JOIN (
SELECT MAX(v.timeMod) AS timeModMax, v.viewId
FROM Gutensite\CmsBundle\Entity\View\ViewVersion v
GROUP BY v.viewId
) version WITH vv.viewId = version.viewId AND vv.timeMod = version.timeModMax
INNER JOIN e.view view
INNER JOIN view.contentType contentType
WHERE e.siteId = :siteId
AND view.contentTypeId IN (:contentTypeId)
ORDER BY e.title ASC
This Answer seems to indicate that it's possible in other contexts like IN statements, but when I try the above method in the JOIN, I get the error:
[Semantical Error] line 0, col 90 near '(SELECT MAX(v.timeMod)': Error: Class '(' is not defined.
A big thanks to #AdrienCarniero for his alternative query structure for sorting the highest version with a simple JOIN where the entity's timeMod is less than the joined table timeMod.
Alternative Query
SELECT view_version.*
FROM view_version
#inner join to get the best version
LEFT JOIN view_version AS best_version ON best_version.viewId = view_version.viewId AND best_version.timeMod > view_version.timeMod
#join other tables for filter, etc
INNER JOIN view ON view.id = view_version.viewId
INNER JOIN content_type ON content_type.id = view.contentTypeId
WHERE view_version.siteId=1
# LIMIT Best Version
AND best_version.timeMod IS NULL
AND view.contentTypeId IN (2)
ORDER BY view_version.title ASC;
Using Doctrine QueryBuilder
$em = $this->getDoctrine()->getManager();
$viewVersionRepo = $em->getRepository('GutensiteCmsBundle:View\ViewVersion');
$queryBuilder = $viewVersionRepo->createQueryBuilder('vv')
// Join Best Version
->leftJoin('GutensiteCmsBundle:View\ViewVersion', 'bestVersion', 'WITH', 'bestVersion.viewId = e.viewId AND bestVersion.timeMod > e.timeMod')
// Join other Entities
->join('e.view', 'view')
->addSelect('view')
->join('view.contentType', 'contentType')
->addSelect('contentType')
// Perform random filters
->andWhere('vv.siteId = :siteId')->setParameter('siteId', 1)
// LIMIT Joined Best Version
->andWhere('bestVersion.timeMod IS NULL')
->andWhere('view.contentTypeId IN(:contentTypeId)')->setParameter('contentTypeId', $contentTypeIds)
->addOrderBy('e.title', 'ASC');
$query = $queryBuilder->getQuery();
$results = $query->getResult();
In terms of performance, it really depends on the dataset. See this discussion for details.
TIP: The table should include indexes on both these values (viewId and timeMod) to speed up results. I don't know if it would also benefit from a single index on both fields.
A native SQL query using the original JOIN method may be better in some cases, but compiling the query over an extended range of code that dynamically creates it, and getting the mappings correct is a pain. So this is at least an alternative solution that I hope helps others.

NHibernate INNER JOIN on a SubQuery

I would like to do a subquery and then inner join the result of that to produce a query. I want to do this as I have tested an inner join query and it seems to be far more performant on MySql when compared to a straight IN subquery.
Below is a very basic example of the type of sql I am trying to reproduce.
Tables
ITEM
ItemId
Name
ITEMRELATIONS
ItemId
RelationId
Example Sql I would Like to create
Give me the COUNT of RELATIONs for ITEMs having a name of 'bob':
select ir.itemId, count(ir.relationId)
from ItemRelations ir
inner join (select itemId from Items where name = 'bob') sq
on ir.itemId = sq.itemId
group by ir.itemId
The base Nhibernate QueryOver
var bobItems = QueryOver.Of<Item>(() => itemAlias)
.Where(() => itemAlias.Name == "bob")
.Select(Projections.Id());
var bobRelationCount = session.QueryOver<ItemRelation>(() => itemRelationAlias)
.Inner.Join(/* Somehow join the detached criteria here on the itemId */)
.SelectList(
list =>
list.SelectGroup(() => itemRelationAlias.ItemId)
.WithAlias(() => itemRelationCountAlias.ItemId)
.SelectCount(() => itemRelationAlias.ItemRelationId)
.WithAlias(() => itemRelationCountAlias.Count))
.TransformUsing(Transformers.AliasToBean<ItemRelationCount>())
.List<ItemRelationCount>();
I know it may be possible to refactor this into a single query, however the above is merely as simple example. I cannot change the detached QueryOver, as it is handed to my bit of code and is used in other parts of the system.
Does anyone know if it is possible to do an inner join on a detached criteria?
MySql 5.6.5 has addressed the performance issue related to the query structure.
See here: http://bugs.mysql.com/bug.php?id=42259
No need for me to change the output format of my NHibernate queries anymore. :)

Performing Join with Multiple Criteria in Propel 1.5

This question follows on from the questions here and here.
I have recently upgraded to Propel 1.5, and have started using it's Query features over Criteria. I have a query I cannot translate, however - a left join with multiple criteria:
SELECT * FROM person
LEFT JOIN group_membership ON
person.id = group_membership.person_id
AND group_id = 1
WHERE group_membership.person_id is null;
Its aim is to find all people not in the specified group. Previously I was using the following code to accomplish this:
$criteria->addJoin(array(
self::ID,
GroupMembershipPeer::GROUP_ID,
), array(
GroupMembershipPeer::PERSON_ID,
$group_id,
),
Criteria::LEFT_JOIN);
$criteria->add(GroupMembershipPeer::PERSON_ID, null, Criteria::EQUAL);
I considered performing a query for all people in that group, getting the primary keys and adding a NOT IN on the array, but there didn't seem a particularly easy way to get the primary keys from a find, and it didn't seem very elegant.
An article on codenugget.org details how to add extra criteria to a join, which I attempted:
$result = $this->leftJoin('GroupMembership');
$result->getJoin('GroupMembership')
->addCondition(GroupMembershipPeer::GROUP_ID, $group->getId());
return $result
->useGroupMembershipQuery()
->filterByPersonId(null)
->endUse();
Unfortunately, the 'useGroupMembershipQuery' overrides the left join. To solve this, I tried the following code:
$result = $this
->useGroupMembershipQuery('GroupMembership', Criteria::LEFT_JOIN)
->filterByPersonId(null)
->endUse();
$result->getJoin('GroupMembership')
->addCondition(GroupMembershipPeer::GROUP_ID, $group->getId());
return $tmp;
For some reason this results in a cross join being performed for some reason:
SELECT * FROM `person`
CROSS JOIN `group_membership`
LEFT JOIN group_membership GroupMembership ON
(person.ID=GroupMembership.PERSON_ID
AND group_membership.GROUP_ID=3)
WHERE group_membership.PERSON_ID IS NULL
Does anyone know why this might be doing this, or how one might perform this join successfully in Propel 1.5, without having to resort to Criteria, again?
Propel 1.6 supports multiple criteria on joins with addJoinCondition(). If you update the Symfony plugin, or move to sfPropelORMPlugin, you can take advantage of that. The query can then be written like this:
return $this
->leftJoin('GroupMembership')
->addJoinCondition('GroupMembership', 'GroupMembership.GroupId = ?', $group->getId())
->where('GroupMembership.PersonId IS NULL');