In relation to the answer I accepted for this post, SQL Group By and Limit issue, I need to figure out how to create that query using SQLAlchemy. For reference, the query I need to run is:
SELECT t.id, t.creation_time, c.id, c.creation_time
FROM (SELECT id, creation_time
FROM thread
ORDER BY creation_time DESC
LIMIT 5
) t
LEFT OUTER JOIN comment c ON c.thread_id = t.id
WHERE 3 >= (SELECT COUNT(1)
FROM comment c2
WHERE c.thread_id = c2.thread_id
AND c.creation_time <= c2.creation_time
)
I have the first half of the query, but I am struggling with the syntax for the WHERE clause and how to combine it with the JOIN. Any one have any suggestions?
Thanks!
EDIT: First attempt seems to mess up around the .filter() call:
c = aliased(Comment)
c2 = aliased(Comment)
subq = db.session.query(Thread.id).filter_by(topic_id=122098).order_by(Thread.creation_time.desc()).limit(2).offset(2).subquery('t')
subq2 = db.session.query(func.count(1).label("count")).filter(c.id==c2.id).subquery('z')
q = db.session.query(subq.c.id, c.id).outerjoin(c, c.thread_id==subq.c.id).filter(3 >= subq2.c.count)
this generates the following SQL:
SELECT t.id AS t_id, comment_1.id AS comment_1_id
FROM (SELECT count(1) AS count
FROM comment AS comment_1, comment AS comment_2
WHERE comment_1.id = comment_2.id) AS z, (SELECT thread.id AS id
FROM thread
WHERE thread.topic_id = :topic_id ORDER BY thread.creation_time DESC
LIMIT 2 OFFSET 2) AS t LEFT OUTER JOIN comment AS comment_1 ON comment_1.thread_id = t.id
WHERE z.count <= 3
Notice the sub-query ordering is incorrect, and subq2 somehow is selecting from comment twice. Manually fixing that gives the right results, I am just unsure of how to get SQLAlchemy to get it right.
Try this:
c = db.aliased(Comment, name='c')
c2 = db.aliased(Comment, name='c2')
sq = (db.session
.query(Thread.id, Thread.creation_time)
.order_by(Thread.creation_time.desc())
.limit(5)
).subquery(name='t')
sq2 = (
db.session.query(db.func.count(1))
.select_from(c2)
.filter(c.thread_id == c2.thread_id)
.filter(c.creation_time <= c2.creation_time)
.correlate(c)
.as_scalar()
)
q = (db.session
.query(
sq.c.id, sq.c.creation_time,
c.id, c.creation_time,
)
.outerjoin(c, c.thread_id == sq.c.id)
.filter(3 >= sq2)
)
Related
I have the following tables:
Task (id,....)
TaskPlan (id, task_id,.......,end_at)
Note that end_at is a timestamp and that one Task has many TaskPlans. I need to query for the MAX end_at for each Task.
This query works fine, except when you have the same exact timestamp for different TaskPlans. In that case, I would be returned multiple TaskPlans with the MAX end_at for the same Task.
I know this is an unlikely situation, but is there anyway I can limit the number of results for each task_id to 1?
My current code is:
SELECT * FROM Task AS t
INNER JOIN (
SELECT * FROM TaskPlan WHERE end_at in (SELECT MAX(end_at) FROM TaskPlan GROUP BY task_id )
) AS pt
ON pt.task_id = t.id
WHERE status = 'plan';
This works, except in the above situation, how can this be achieved?
Also in the subquery, instad of SELECT MAX(end_at) FROM TaskPlan GROUP BY task_id, is it possible to do something like this so I can use TaskPlan.id for the where in clause?
SELECT id, MAX(end_at) FROM TaskPlan GROUP BY task_id
When I try, it gives the following error:
SQL Error [1055] [42000]: Expression #1 of SELECT list is not in GROUP
BY clause and contains nonaggregated column 'TaskPlan.id' which is not
functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
Any explaination and suggestion would be much welcome!
Note on duplicate label: (Now reopened)
I already studied the this question, but it does not provide an answer for my situation where there are multiple max values in the result and it needs to be filtered out to include only one result row per group.
Use the id rather than the timestamp:
SELECT *
FROM Task AS t INNER JOIN
(SELECT tp.*
FROM TaskPlan tp
WHERE tp.id = (SELECT tp2.id FROM TaskPlan tp2 WHERE tp2.task_id = tp.task_id ORDER BY tp2.end_at DESC LIMIT 1)
) tp
ON tp.task_id = t.id
WHERE status = 'plan';
Or use in with tuples:
SELECT *
FROM Task AS t INNER JOIN
(SELECT tp.*
FROM TaskPlan tp
WHERE (tp.task_id, tp.end_at) in (SELECT tp2.task_id, MAX(tp2.end_at)
FROM TaskPlan tp2
GROUP BY tp2.task_id
)
) tp
ON tp.task_id = t.id
WHERE status = 'plan';
If you want to get a list of task ID's with MAX end_at for each, run the query below:
SELECT t.id, MAX(tp.end_at) FROM Task t JOIN TaskPlan tp on t.id = tp.task_id GROUP BY t.id;
EDIT:
Now, I know what exactly you are going to do.
If the TaskPlan table is so big, you can avoid the 'GROUP BY' and run the query below that is very efficient:
SET #first_row := 0;
SET #task_id := 0;
SELECT * FROM Task t JOIN (
SELECT tp.*
, IF(#task_id = tp.`task_id`, #first_row := 0, #first_row := 1) AS temp
, #first_row AS latest_record
, #task_id := tp.`task_id`
FROM TaskPlan tp ORDER BY task_id, end_at DESC) a ON t.task_id = a.task_id AND a.latest_record = 1;
Try this query:
select t.ID , tp1.end_at
from TASK t
left join TASKPLAN tp1 on t.ID = tp1.id
left join TASKPLAN tp2 on t.ID = tp2.id and tp1.end_at < tp2.end_at
where tp2.end_at is null;
I want to speed up one of my slower queries.
The problem is that I can't access the outer colum value within a subquery.
What I have:
SELECT r.id AS room_id, r.room_name, coalesce(d.score,0) AS total_messages, d.latest
FROM cf_rooms_time_frames tf
INNER JOIN cf_rooms r on r.id = tf.room_id
INNER JOIN(
SELECT cf.room_id, count(*) as score, max(cf.id) as latest
FROM cf_rooms_messages cf
WHERE EXISTS(
SELECT NULL FROM cf_rooms_time_frames tf
WHERE tf.start <= cf.id AND ( tf.end IS NULL OR tf.end >= cf.id )
AND tf.room_id = cf.room_id AND tf.uid = 8
)
GROUP BY cf.room_id
ORDER BY latest
DESC ) d on d.room_id = r.id
WHERE tf.uid = 8
ORDER BY coalesce(latest, score) DESC LIMIT 0, 20
What I want:
SELECT r.id AS room_id, r.room_name, coalesce(d.score,0) AS total_messages, d.latest
FROM cf_rooms_time_frames tf
INNER JOIN cf_rooms r on r.id = tf.room_id
INNER JOIN(
SELECT cf.room_id, count(*) as score, max(cf.id) as latest
FROM cf_rooms_messages cf
/* line added here */
WHERE cf.room_id = tf.room_id
/* */
AND EXISTS(
SELECT NULL FROM cf_rooms_time_frames tf
WHERE tf.start <= cf.id AND ( tf.end IS NULL OR tf.end >= cf.id )
AND tf.room_id = cf.room_id AND tf.uid = 8
)
GROUP BY cf.room_id
ORDER BY latest
DESC ) d on d.room_id = r.id
WHERE tf.uid = 8
ORDER BY coalesce(latest, score) DESC LIMIT 0, 20
I think the markup explains what the query does.
It searches for "chatrooms" for a given user and orders them by the last message, gets the number of total message which ids are in a given range ( timeframes ), and the last message id.
I don't know why, but the first query returns all rows within the chatmessage table ( cf ) if I can trust EXPLAIN. It delivers the correct results but is kind of slow on a huge table.
I tested the second one with a "hardcoded" room_id and this one was very fast and doesn't "touched" the whole table.
I have two requests
UPDATE :
I need to do something like that :
SELECT poste_nom, ups_type_contrat,
(SELECT `entpro_date`
FROM ENT_PRO
WHERE entpro_user_id = 2
ORDER BY `entpro_id` DESC
LIMIT 1) ,
serv_nom,
serv_id_resp,
user_credit_cpf,
user_indice_salarial,
FLOOR( DATEDIFF( CURDATE( ) , user_dateentree ) /365 ) AS dateEntree
FROM USER
INNER JOIN USER_POSTE_SERVICE
ON USER.user_id= USER_POSTE_SERVICE.ups_poste_id
INNER JOIN POSTE
ON USER_POSTE_SERVICE. ups_poste_id = POSTE.poste_id
INNER JOIN SERVICE
ON USER_POSTE_SERVICE.ups_id_serv = SERVICE.serv_id
WHERE user_id = 2
ORDER BY user_nom ASC
Is it possible to gather two requests in only one ?
From what I understood you want to simple merge the result of your sub-query to your main SELECT, if so you could try it this way:
SELECT poste_nom,
ups_type_contrat,
ENT_PRO_RESULT.entpro_date,
serv_nom,
serv_id_resp,
user_credit_cpf,
user_indice_salarial,
FLOOR( DATEDIFF( CURDATE( ) , user_dateentree ) /365 ) AS dateEntree
FROM USER
LEFT JOIN (SELECT entpro_date,
entpro_user_id
FROM ENT_PRO
ORDER BY entpro_id DESC
LIMIT 1) ENT_PRO_RESULT
ON USER.user_id = ENT_PRO_RESULT.entpro_user_id
INNER JOIN USER_POSTE_SERVICE
ON USER.user_id = USER_POSTE_SERVICE.ups_poste_id
INNER JOIN POSTE
ON USER_POSTE_SERVICE.ups_poste_id = POSTE.poste_id
INNER JOIN SERVICE
ON USER_POSTE_SERVICE.ups_id_serv = SERVICE.serv_id
WHERE user_id = 2
ORDER BY user_nom ASC
I've joined it on:
ON USER.user_id = ENT_PRO_RESULT.entpro_user_id
So you only need to specify the:
WHERE user_id = 2
And the sub-query will use the current row user id for the LEFT JOIN.
I have this QUERY:
select
a.*
from
mt_proyecto a,
mt_mockup b,
mt_diseno c,
mt_modulo d
where
a.estado = 'A' and
(
(b.encargado = '1' and b.idproyecto = a.idmtproyecto) or
(c.encargado = '1' and c.idproyecto = a.idmtproyecto) or
(d.encargado = '1' and d.idproyecto = a.idmtproyecto)
)
group by
a.idmtproyecto
order by a.finalizado asc, a.feccrea desc
Result:
Then, I run the same code on server with the same database:
Is there any problem with the query?
It seems that the query is running correctly on your server, and returning no rows. Please make sure you have the same table contents on your local machine and your server.
Other things:
Pro tip: never use SELECT * or SELECT table.* in software. Always enumerate the columns you want in your result set.
Unless you use GROUP BY with aggregate functions like SUM() or `COUNT(), and naming the correct columns from the result set, it returns unpredictable results. Read this. http://dev.mysql.com/doc/refman/5.1/en/group-by-extensions.html
I solved with this QUERY:
select
a.*
from
(
mt_proyecto a
left join mt_mockup b on
b.idproyecto = a.idmtproyecto
left join mt_diseno c on
c.idproyecto = a.idmtproyecto
left join mt_modulo d on
d.idproyecto = a.idmtproyecto
left join mt_integracion e on
e.idproyecto = a.idmtproyecto
left join mt_pruebas_internas f on
f.idproyecto = a.idmtproyecto
)
where
a.estado = 'A' and
(
(a.idmtproyecto = b.idproyecto and
b.encargado = '1' ) or
(a.idmtproyecto = c.idproyecto and
c.encargado = '1' ) or
(a.idmtproyecto = d.idproyecto and
d.encargado = '1' ) or
(a.idmtproyecto = e.idproyecto and
e.encargado = '1' ) or
(a.idmtproyecto = f.idproyecto =
f.encargado = '1' )
)
group by a.idmtproyecto
order by
a.finalizado asc, a.feccrea desc
Thanks everyone for answer me. I'll do your suggestions .
I've got this query but the result is wrong.
How can I use the min() statement and the Group by Statement so that I will get for each AthletenID the lowest DiszOrder?
Select
ar_Leistungen.`AthletenID`,
ar_Leistungen.`Leistung`,
ar_Leistungen.`Disziplin`,
ar_Leistungen.`Klasse`,
min(ar_Leistungen.`DiszOrder`),
ar_Athleten.`Vorname`,
ar_Athleten.`Jahrgang`,
ar_Wettkampf.`Wettkampfdatum`
from
ar_Leistungen,
ar_Athleten,
ar_Wettkampf
Where
ar_Athleten.ID = ar_Leistungen.AthletenID and
ar_Leistungen.WettkampfID = ar_Wettkampf.ID and
ar_Leistungen.`Disziplin` = '100' and
ar_Leistungen.`Leistung` > 0 and
(ar_Athleten.`Jahrgang` = '1995' or ar_Athleten.`Jahrgang` = '1994') and
ar_Wettkampf.`Wettkampfdatum` LIKE '%2013%'
Group By
AthletenID
Order by
DiszOrder Desc
Limit
0, 100
You can have a subquery which separately gets the lowest DiszOrder for each AthletenID and join it with the other table so you can freely get the other value of the columns.
SELECT a.AthletenID,
a.Leistung,
a.Disziplin,
ar_Leistungen.Klasse,
a.DiszOrder),
b.Vorname,
b.Jahrgang,
c.Wettkampfdatum
FROM ar_Leistungen a
INNER JOIN ar_Athleten b
ON b.ID = a.AthletenID
INNER JOIN ar_Wettkampf c
ON a.WettkampfID = c.ID
INNER JOIN
(
SELECT AthletenID, MIN(DiszOrder) DiszOrder
FROM ar_Leistungen
GROUP BY AthletenID
) d ON a.AthletenID = d.AthletenID AND
a.DiszOrder = d.DiszOrder
WHERE a.Disziplin = '100' AND
a.Leistung > 0 AND
(b.Jahrgang IN ('1995', '1994'))