How to query distinct on a joined column? - sqlalchemy

This is the code, I would like to have UserCheckpoint.checkpoint to be distincts. Meaning all the UserCheckpoints queried should all have distinct Checkpoint objects.
friends_ucp = (db.session.query(UserCheckpoint).
join(UserCheckpoint.checkpoint).
filter(radius_cond).
filter(Checkpoint.demo == False).
filter(UserCheckpoint.user_id.in_(friends))
)
How can I do it? Thanks.

i solved this using group_by()

Related

SELECT statement inside a CASE statement in SNOWFLAKE

I have a query where i have "TEST"."TABLE" LEFT JOINED to PUBLIC."SchemaKey". Now in my final select statement i have a case statement where i check if c."Type" = 'FOREIGN' then i want to grab a value from another table but the table name value i am using in that select statement is coming from the left joined table column value. I've tried multiple ways to get to work but i keep getting an error, although if i hard code the table name it seems to work. i need the table name to come from c."FullParentTableName". Is what i am trying to achieve possible in snowflake and is there a way to make this work ? any help would be appreciated !
SELECT
c."ParentColumn",
c."FullParentTableName",
a."new_value",
a."column_name"
CASE WHEN c."Type" = 'FOREIGN' THEN (SELECT "Name" FROM TABLE(c."FullParentTableName") WHERE "Id" = 'SOME_ID') ELSE null END "TestColumn" -- Need assistance on this line...
FROM "TEST"."TABLE" a
LEFT JOIN (
select s."Type", s."ParentSchema", s."ParentTable", s."ParentColumn", concat(s."ParentSchema",'.','"',s."ParentTable",'"') "FullParentTableName",s."ChildSchema", s."ChildTable", trim(s."ChildColumn",'"') "ChildColumn"
from PUBLIC."SchemaKey" as s
where s."Type" = 'FOREIGN'
and s."ChildTable" = 'SOMETABLENAME'
and "ChildSchema" = 'SOMESCHEMANAME'
) c
on a."column_name" = c."ChildColumn"
Thanks !
In Snowflake you cannot dynamically use the partial results as tables.
You can use a single bound value via identifier to bind a value to table name
But you could write a Snowflake Scripting but it would need to explicitly join the N tables. Thus if you N is fixed, you should just join those.

NodeJS + MySQL, UPDATE with a JOIN, ambiguous column name

First of all, I understand why I'm getting this error message, and I know of a way to solve it, but I'm hoping for something more efficient than what I have in mind. Here is basically what I have:
UPDATE customer c
JOIN customer d ON c.customer_id = d.parent_customer_id
SET ?
WHERE d.customer_type = "Big Cheese";
So, the data being fed in to the "?" parameter looks like this:
{"customer_id": 10, "customer_name": "Cheese-It", ... }
The problem is, since I'm joining on a table that is basically itself, all of the columns have the same name. The only way I know how to fix this is edit the JSON and prefix all of the fields with the alias it needs:
{"c.customer_id": 10, "c.customer_name": "Cheese-It", ... }
I was hoping for a more elegant way of going about this. Is there a way to refactor my SQL so that it knows which table alias I want to update? Any ideas?
A subquery will do what you are wanting, but it's actually less efficient, as subqueries inside the WHERE clause are generally performance killers. I feel like you have to be parsing the JSON into SQL, so I would simply add the alias at that point.
Anyway, for reference, here's how you can refactor the SQL to not need an alias:
UPDATE customer
SET ?
WHERE customer_id IN (
SELECT c.customer_id
FROM customer c
JOIN customer d ON c.customer_id = d.parent_customer_id
WHERE d.customer_type = 'Big Cheese'
);
NOTE: this is untested
EDIT:
On second thought, an EXISTS clause would be slightly better for performance:
UPDATE customer c
SET ?
WHERE EXISTS (
SELECT 1
FROM customer d
WHERE d.parent_customer_id = c.customer_id
AND d.customer_type = 'Big Cheese'
);
Either way should work. As long as you don't have a JOIN in the update, there is only one table the SET columns can reference, so you will avoid the ambiguous column name error.
I know this is an older question, but I found a better solution that doesn't have the performance hit. You can add the alias to your property names in the object you're updating.
Here is the helper function to translate the standard property names with an alias.
const allowUpdate = ['name']
function addUpdateAlias(updated, alias) {
let validUpdate = {}
for (let p in updated) {
if (allowUpdate.indexOf(p) > -1) {
validUpdate[`${alias}.${p}`] = updated[p]
}
}
return validUpdate;
}
Now wrap the object you want to update with the function above and the alias is applied in the update!
Your parameters would then be: [addUpdateAlias(customer, 'c')] to pass into your original query.

Query table based on input parameter

I have an input parameter to a query I'm trying to write. Basically, if mostRecentSnapshot == true then I want to select only the most recent records from the process run (basically where max(creationDate)) and if mostRecentSnapshot == false then select creationDate and other columns normally.
To me, it makes sense to do this if statement in the from clause, but I don't think that's possible. Normally I would use a CTE, but I those don't exist in MySQL.
What is the best way to achieve this?
It would be something along the lines of this:
SELECT
CASE mostRecentSnapshot WHEN FALSE THEN
(
processauditheader.creationDate as processCreationDate,
processauditheader.processName,
processauditheader.processType,
processauditheader.processHost,
processauditheader.processDatabase,
processauditheader.tableAudited,
processauditheader.processInvokedByName,
processauditheader.processInvokedByType,
processauditheader.processInvokedByDatabase,
processauditheader.processIntervalValue,
processauditheader.processIntervalField,
processauditheader.auditScenarios,
processaudititerationdetail.creationDate as iterationDate,
processaudititerationdetail.connectionId,
processaudititerationdetail.processDate,
processaudititerationdetail.tableRowCount,
processaudititerationdetail.tableRowCountLastDay,
processaudititerationdetail.previousProcessAuditIterationDetailID,
processauditmetricdetail.creationDate,
processauditmetricdetail.processAuditIterationDetailID,
processauditmetricdetail.auditMetric,
processauditmetricdetail.auditTotal,
processauditmetricdetail.auditExample
)
WHEN TRUE THEN
(
((SELECT MAX(processaudititerationdetail.creationDate) as maxSnapshot,
processaudititerationdetail.id
from reporting_audit.processaudititerationdetail
group by processaudititerationdetail.creationDate) mostRecent
JOIN reporting_audit.processaudititerationdetail ON mostRecent.id = processaudititerationdetail.id)
)
END
FROM reporting_audit.processauditheader
JOIN processaudititerationdetail ON processaudititerationdetail.processAuditHeaderID = processauditheader.id
LEFT JOIN processauditmetricdetail ON processauditmetricdetail.processAuditIterationDetailID = processaudititerationdetail.id
A query can only return a fixed set of columns. Perhaps the following does what you want:
select paid.*
from reporting_audit.processaudititerationdetail paid
where (not v_mostRecentSnapshot) or
paid.creation_date = (select max(paid2.creationDate from reporting_audit.processaudititerationdetail paid2);
It will either select all records from the table or only the record(s) that have the most recent creationDate.

count(*) in mysql returning only one record

I want to count how many records from another table in the same select statement , i used Left join
and in the select statement i put count(ag.*)
see the
Example :
$q = Doctrine_Query::create()
->select("a.answer_id,a.date_added , count(ag.content_id) AS agree_count")
->from('Answer a')
->leftJoin("a.Agree ag ON a.answer_id = ag.content_id AND ag.content_type = 'answer' ")
->where('a.question_id= ? ', $questionId)
But its only returning the first record, can i Fix that? or to make another table and make it only for counting ?
You are missing a GROUP BY in your query.
More infos here.
When you don't have a GROUP BY clause, it's normal to get only one row.
Count(*) will only return one record if you don't use Group By. You are asking it to count all the records, so there can be only one result.
The count() SQL function changes how results are returned from the database - without a GROUP BY the database will only return one record, regardless of other colums in the SELECT.
if you add:
group by a.answer_id
to the end of your SQL query, that might DWYM.

How to avoid filesort for that mysql query?

I'm using this kind of queries with different parameters :
EXPLAIN SELECT SQL_NO_CACHE `ilan_genel`.`id` , `ilan_genel`.`durum` , `ilan_genel`.`kategori` , `ilan_genel`.`tip` , `ilan_genel`.`ozellik` , `ilan_genel`.`m2` , `ilan_genel`.`fiyat` , `ilan_genel`.`baslik` , `ilan_genel`.`ilce` , `ilan_genel`.`parabirimi` , `ilan_genel`.`tarih` , `kgsim_mahalleler`.`isim` AS mahalle, `kgsim_ilceler`.`isim` AS ilce, (
SELECT `ilanresimler`.`resimlink`
FROM `ilanresimler`
WHERE `ilanresimler`.`ilanid` = `ilan_genel`.`id`
LIMIT 1
) AS resim
FROM (
`ilan_genel`
)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `kgsim_mahalleler` ON `kgsim_mahalleler`.`id` = `ilan_genel`.`mahalle`
WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
ORDER BY `ilan_genel`.`id` DESC
LIMIT 225 , 15
and this is what i get in explain section:
these are the indexes that i already tried to use:
any help will be deeply appreciated what kind of index will be the best option or should i use another table structure ?
You should first simplify your query to understand your problem better. As it appears your problem is constrained to the ilan_gen1 table, the following query would also show you the same symptoms.:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
So the first thing to do is check that this is the case. If so, the simpler question is simply why does this query require a file sort on 3661 rows. Now the 'hepsi' index sort order is:
ilce->mahelle->durum->kategori->tip->ozelik
I've written it that way to emphasise that it is first sorted on 'ilce', then 'mahelle', then 'durum', etc. Note that your query does not specify the 'mahelle' value. So the best the index can do is lookup on 'ilce'. Now I don't know the heuristics of your data, but the next logical step in debugging this would be:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'`
Does this return 3661 rows?
If so, you should be able to see what is happening. The database is using the hepsi index, to the best of it's ability, getting 3661 rows back then sorting those rows in order to eliminate values according to the other criteria (i.e. 'durum', 'kategori', 'tip').
The key point here is that if data is sorted by A, B, C in that order and B is not specified, then the best logical thing that can be done is: first a look up on A then a filter on the remaining values against C. In this case, that filter is performed via a file sort.
Possible solutions
Supply 'mahelle' (B) in your query.
Add a new index on 'ilan_gene1' that doesn't require 'mahelle', i.e. A->C->D...
Another tip
In case I have misdiagnosed your problem (easy to do when I don't have your system to test against), the important thing here is the approach to solving the problem. In particular, how to break a complicated query into a simpler query that produces the same behaviour, until you get to a very simple SELECT statement that demonstrates the problem. At this point, the answer is usually much clearer.