Optimize subqueries with LEFT JOINS MYSQL [closed] - mysql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 11 months ago.
Improve this question
I hope you can help me! I currently have this query:
SELECT servicio.*,
c.num_cliente,c.nombre,c.operador,
mdl.modelo_marca,mdl.modelo_telcel,
col.colores,
prob.problema,
diag.notas,diag.solucion,diag.tipo_servicios AS tipo_servicios_diagnostico,diag.nuevo_imei AS nuevo_imei_diagnostico,
diag.nivel_repar,diag.notasqc AS diagnos_notasqc,diag.fecha AS diagnos_fecha,diag.notas AS diagnos_notas,
user_name.nombre AS name_user,user_name.apellido,
(SELECT revision.status2_ser FROM revision WHERE revision.status2_ser IN('ENTREGADO') AND servicio.id = revision.id_servicio ORDER BY revision.id DESC LIMIT 1) AS status2_ser,
(SELECT revision.fecha_status FROM revision WHERE revision.status2_ser IN('ENTREGADO') AND servicio.id = revision.id_servicio ORDER BY revision.id DESC LIMIT 1) AS fecha_status_revision,
(SELECT revision.status2_ser FROM revision WHERE revision.status2_ser IN('REPARADO') AND servicio.id = revision.id_servicio ORDER BY revision.id DESC LIMIT 1) AS status2_ser_repadado,
(SELECT revision.fecha_status FROM revision WHERE revision.status2_ser IN('REPARADO') AND servicio.id = revision.id_servicio ORDER BY revision.id DESC LIMIT 1) AS revi2_fecha_status_revision,
(SELECT env.guia_entrega FROM envios AS env WHERE JSON_CONTAINS(env.envio_grupal, JSON_ARRAY(CAST(servicio.id AS CHAR))) OR env.id_servicio = servicio.id ORDER BY env.id DESC LIMIT 1) as guia_entrega_envio,
(SELECT env2.fecha_envio FROM envios AS env2 WHERE JSON_CONTAINS(env2.envio_grupal, JSON_ARRAY(CAST(servicio.id AS CHAR))) OR env2.id_servicio = servicio.id ORDER BY env2.id DESC LIMIT 1) as guia_entrega_envio_fecha
FROM servicio
LEFT JOIN usuarios AS user_name ON servicio.id_user = user_name.id
LEFT JOIN clientes AS c ON servicio.id_cac = c.id
LEFT JOIN modelos AS mdl ON servicio.id_modelo = mdl.id
LEFT JOIN colores AS col ON servicio.id_color = col.id
LEFT JOIN problemas_genericos AS prob ON CAST(servicio.problema_generico AS UNSIGNED) = prob.id
LEFT JOIN diagnostico AS diag ON diag.id = (SELECT id FROM diagnostico AS diag WHERE diag.id_servicio = servicio.id AND diag.tipo_servicios <> '' ORDER BY diag.id DESC LIMIT 1)
WHERE
servicio.fecha_ingreso >= '2022-03-07 00:00:00' AND servicio.fecha_ingreso <= '2022-03-16 23:59:59' AND servicio.status_ser IN('ENTREGADO') AND servicio.id_marca = 1
ORDER BY servicio.id DESC
the query works, but the performance is not as expected, it sometimes takes up to 10 seconds to retrieve more than 1000 records, the main table where I consult this data has approximately 210,000 records, could someone help me to make it more optimal please?
This is my explain:
EXPLAIN
Update my query but not there changes in the performance:
SELECT servicio.*,
c.num_cliente,c.nombre,c.operador,
mdl.modelo_marca,mdl.modelo_telcel,
col.colores,
prob.problema,
diag.notas,diag.solucion,diag.tipo_servicios AS tipo_servicios_diagnostico,diag.nuevo_imei AS nuevo_imei_diagnostico,
diag.nivel_repar,diag.notasqc AS diagnos_notasqc,diag.fecha AS diagnos_fecha,diag.notas AS diagnos_notas,
user_name.nombre AS name_user,user_name.apellido,
(SELECT env.guia_entrega FROM envios AS env WHERE JSON_CONTAINS(env.envio_grupal, JSON_ARRAY(CAST(servicio.id AS CHAR))) OR env.id_servicio = servicio.id ORDER BY env.id DESC LIMIT 1) as guia_entrega_envio,
(SELECT env2.fecha_envio FROM envios AS env2 WHERE JSON_CONTAINS(env2.envio_grupal, JSON_ARRAY(CAST(servicio.id AS CHAR))) OR env2.id_servicio = servicio.id ORDER BY env2.id DESC LIMIT 1) as guia_entrega_envio_fecha,
(CASE WHEN x.sta = 'ENTREGADO' THEN x.sta END) AS status2_ser,
(CASE WHEN x.sta = 'ENTREGADO' THEN x.g_fecha_status END) AS fecha_status_revision,
(CASE WHEN w.sta = 'REPARADO' THEN w.sta END) AS revi2_fecha_status_revision,
(CASE WHEN w.sta = 'REPARADO' THEN w.g_fecha_status END) AS revi2_fecha_status_revision
FROM servicio
LEFT JOIN usuarios AS user_name ON servicio.id_user = user_name.id
LEFT JOIN clientes AS c ON servicio.id_cac = c.id
LEFT JOIN modelos AS mdl ON servicio.id_modelo = mdl.id
LEFT JOIN colores AS col ON servicio.id_color = col.id
LEFT JOIN problemas_genericos AS prob ON CAST(servicio.problema_generico AS UNSIGNED) = prob.id
LEFT JOIN diagnostico AS diag ON diag.id = (SELECT id FROM diagnostico AS diag WHERE diag.id_servicio = servicio.id AND diag.tipo_servicios <> '' ORDER BY diag.id DESC LIMIT 1)
LEFT JOIN
(SELECT revision.status2_ser AS sta, revision.id_servicio,max(revision.fecha_status) AS g_fecha_status
FROM revision
WHERE revision.status2_ser IN("ENTREGADO")
GROUP BY revision.id_servicio) x ON servicio.id = x.id_servicio
LEFT JOIN
(SELECT revision.status2_ser AS sta, revision.id_servicio,max(revision.fecha_status) AS g_fecha_status
FROM revision
WHERE revision.status2_ser IN("REPARADO")
GROUP BY revision.id_servicio) w ON servicio.id = w.id_servicio
WHERE
servicio.fecha_ingreso >= '2022-03-07 00:00:00' AND servicio.fecha_ingreso <= '2022-03-16 23:59:59' AND servicio.status_ser IN('ENTREGADO') AND servicio.id_marca = 1
ORDER BY servicio.id DESC

These indexes may help:
servicio: INDEX(id_marca, fecha_ingreso, status_ser)
servicio: INDEX(id_marca, status_ser, fecha_ingreso)
revision: INDEX(status2_ser, id_servicio, id, fecha_status)
envios: INDEX(envio_grupal, id_servicio, id, guia_entrega)
envios: INDEX(envio_grupal, id_servicio, id, fecha_envio)
diag: INDEX(id_servicio, tipo_servicios, id)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
I see some cases of the same subquery being performed twice because you needed two columns. Recommend you use a LEFT JOIN to get both columns at the same time.
If you regularly need to test things inside a JSON column, especially if the types are need CASTing, consider adding an extra column to the table to make it easily testable without all the Json overhead.
(This may eliminate a sort without changing the effect.) Replace ORDER BY servicio.id DESC with ORDER BY fecha_ingreso DESC, servicio.id DESC
ON CAST(servicio.problema_generico AS UNSIGNED) = prob.id probably prevents use of an index. See if you can fix the datatypes to avoid the need for CAST.

Related

Optimizing MySQL query with multiple joins and Sub query

I am using the following query to get data from 10 table, It is working fine but quite slow, Is there any way to Optimizing the query.
Query: SELECT emi.emi_due_date,users.usr_mobile,users.usr_id,concat_ws(" ",users.usr_fname,users.usr_mname,users.usr_lname) as borrower,users.usr_status,users.usr_curnt_city, users.usr_email,emi.loan_id,emi.emi_show_date,sum(emi.emi_amount)-sum(ifnull(emi.settled_amount,0)) as due_amount,cb.cb_type,blr.bloan_collection_executive_id,blr.pp_allow,blr.bloan_legal_team_id,blr.bloan_legal_team_status,concat_ws(" ",cp.cp_fname,cp.cp_lname) as cp_name,cp.cp_mobile,cp.cp_firm_name,cp.cp_type,bg.guarantor_name,bg.guarantor_contact,pl.ecs_date,pd.p2p_date,
(SELECT instrument FROM borrower_payment_master WHERE loan_id = emi.loan_id order by id desc limit 0,1) as last_pmode,
(SELECT IFNULL(DATE_FORMAT(emi_show_date - INTERVAL 1 MONTH,"%m-%Y"),"") FROM emi AS e WHERE e.loan_id=emi.loan_id and e.emi_status < 2 ORDER by e.emi_show_date ASC limit 1) as paid_till,
(select payment_date from borrower_payment_master as bp where bp.loan_id=emi.loan_id order by bp.id desc limit 1) as last_emi_paid FROM emi AS emi
INNER JOIN borrower_loan_reg_requests AS blr ON emi.loan_id=blr.bloan_id
INNER JOIN users AS users ON users.usr_id=blr.bloan_user_id
INNER JOIN borrower_loan_disbursed_funds AS blf ON blf.df_bloan_id=emi.loan_id
LEFT JOIN channel_partners AS cp ON cp.cp_id=users.usr_cp_referral_id
LEFT JOIN borrower_posted_loans AS pl ON pl.pl_bloan_id=emi.loan_id
LEFT JOIN collection_bucket AS cb ON cb.cb_loan_id=emi.loan_id AND cb.cb_status = 1
LEFT JOIN borrower_guarantors AS bg ON bg.guarantor_borrower_id=users.usr_id
LEFT JOIN p2p_dates AS pd ON pd.p2p_loan_id=emi.loan_id AND pd.p2p_status = 1
WHERE emi.emi_status<2 AND emi.emi_amount != 0
AND (SELECT count(*) FROM borrower_payment_master as pm WHERE pm.loan_id = emi.loan_id
AND MONTH(pm.payment_date) = "'.date('m').'" AND YEAR(pm.payment_date) = "'.date('Y').'") = 0
AND (select s.settlement_date as sdate from settlement as s WHERE emi.loan_id=s.loan_id limit 1) !=""
group by emi.loan_id order by emi.loan_id desc

Odd behavior combining multiple tables and using COALESCE

I have a big query that I have been struggling with and tweaking for awhile.
SELECT
tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID,
tastingNotes.note, user.userName,
COALESCE(sum(tasteNoteRate.score),0) as `score`
FROM
tastingNotes
INNER JOIN `user` on tastingNotes.userID = `user`.userID
LEFT JOIN tasteNoteRate on tastingNotes.noteID = tasteNoteRate.noteID
WHERE tastingNotes.beerID = 'C5RJc0'
GROUP BY tastingNotes.noteID
ORDER BY score DESC
LIMIT 0,50;
I am using the COALESCE(sum(tasteNoteRate.score),0) to give results returned a value of zero if they do not have a score yet.
The odd behavior was that when I should have had two results it only returned one note with a score of zero.
When I then gave one a score they then both showed up, one with its score and then the second with zero.
Try
SELECT q.noteID, q.userID, q.beerID, q.note, q.score, u.userName
FROM (
SELECT n.noteID, n.userID, n.beerID, n.note, COALESCE(SUM(r.score), 0) score
FROM tastingNotes n LEFT JOIN tasteNoteRate r
ON n.noteID = r.noteID
WHERE n.beerID = 'C5RJc0'
GROUP BY n.noteID, n.userID, n.beerID, n.note
) q JOIN `user` u ON q.userID = u.userID
ORDER BY score DESC
LIMIT 50
SQLFiddle

Optimize Query with JOINS and Subqueries

I want to speed up one of my slower queries.
The problem is that I can't access the outer colum value within a subquery.
What I have:
SELECT r.id AS room_id, r.room_name, coalesce(d.score,0) AS total_messages, d.latest
FROM cf_rooms_time_frames tf
INNER JOIN cf_rooms r on r.id = tf.room_id
INNER JOIN(
SELECT cf.room_id, count(*) as score, max(cf.id) as latest
FROM cf_rooms_messages cf
WHERE EXISTS(
SELECT NULL FROM cf_rooms_time_frames tf
WHERE tf.start <= cf.id AND ( tf.end IS NULL OR tf.end >= cf.id )
AND tf.room_id = cf.room_id AND tf.uid = 8
)
GROUP BY cf.room_id
ORDER BY latest
DESC ) d on d.room_id = r.id
WHERE tf.uid = 8
ORDER BY coalesce(latest, score) DESC LIMIT 0, 20
What I want:
SELECT r.id AS room_id, r.room_name, coalesce(d.score,0) AS total_messages, d.latest
FROM cf_rooms_time_frames tf
INNER JOIN cf_rooms r on r.id = tf.room_id
INNER JOIN(
SELECT cf.room_id, count(*) as score, max(cf.id) as latest
FROM cf_rooms_messages cf
/* line added here */
WHERE cf.room_id = tf.room_id
/* */
AND EXISTS(
SELECT NULL FROM cf_rooms_time_frames tf
WHERE tf.start <= cf.id AND ( tf.end IS NULL OR tf.end >= cf.id )
AND tf.room_id = cf.room_id AND tf.uid = 8
)
GROUP BY cf.room_id
ORDER BY latest
DESC ) d on d.room_id = r.id
WHERE tf.uid = 8
ORDER BY coalesce(latest, score) DESC LIMIT 0, 20
I think the markup explains what the query does.
It searches for "chatrooms" for a given user and orders them by the last message, gets the number of total message which ids are in a given range ( timeframes ), and the last message id.
I don't know why, but the first query returns all rows within the chatmessage table ( cf ) if I can trust EXPLAIN. It delivers the correct results but is kind of slow on a huge table.
I tested the second one with a "hardcoded" room_id and this one was very fast and doesn't "touched" the whole table.

Join between sub-queries in SQLAlchemy

In relation to the answer I accepted for this post, SQL Group By and Limit issue, I need to figure out how to create that query using SQLAlchemy. For reference, the query I need to run is:
SELECT t.id, t.creation_time, c.id, c.creation_time
FROM (SELECT id, creation_time
FROM thread
ORDER BY creation_time DESC
LIMIT 5
) t
LEFT OUTER JOIN comment c ON c.thread_id = t.id
WHERE 3 >= (SELECT COUNT(1)
FROM comment c2
WHERE c.thread_id = c2.thread_id
AND c.creation_time <= c2.creation_time
)
I have the first half of the query, but I am struggling with the syntax for the WHERE clause and how to combine it with the JOIN. Any one have any suggestions?
Thanks!
EDIT: First attempt seems to mess up around the .filter() call:
c = aliased(Comment)
c2 = aliased(Comment)
subq = db.session.query(Thread.id).filter_by(topic_id=122098).order_by(Thread.creation_time.desc()).limit(2).offset(2).subquery('t')
subq2 = db.session.query(func.count(1).label("count")).filter(c.id==c2.id).subquery('z')
q = db.session.query(subq.c.id, c.id).outerjoin(c, c.thread_id==subq.c.id).filter(3 >= subq2.c.count)
this generates the following SQL:
SELECT t.id AS t_id, comment_1.id AS comment_1_id
FROM (SELECT count(1) AS count
FROM comment AS comment_1, comment AS comment_2
WHERE comment_1.id = comment_2.id) AS z, (SELECT thread.id AS id
FROM thread
WHERE thread.topic_id = :topic_id ORDER BY thread.creation_time DESC
LIMIT 2 OFFSET 2) AS t LEFT OUTER JOIN comment AS comment_1 ON comment_1.thread_id = t.id
WHERE z.count <= 3
Notice the sub-query ordering is incorrect, and subq2 somehow is selecting from comment twice. Manually fixing that gives the right results, I am just unsure of how to get SQLAlchemy to get it right.
Try this:
c = db.aliased(Comment, name='c')
c2 = db.aliased(Comment, name='c2')
sq = (db.session
.query(Thread.id, Thread.creation_time)
.order_by(Thread.creation_time.desc())
.limit(5)
).subquery(name='t')
sq2 = (
db.session.query(db.func.count(1))
.select_from(c2)
.filter(c.thread_id == c2.thread_id)
.filter(c.creation_time <= c2.creation_time)
.correlate(c)
.as_scalar()
)
q = (db.session
.query(
sq.c.id, sq.c.creation_time,
c.id, c.creation_time,
)
.outerjoin(c, c.thread_id == sq.c.id)
.filter(3 >= sq2)
)

SQL order by the result of one operation

I need to order the result of this query by the result of (LIKES (puntuacion=1) - DISLIKE (puntuacion=0).
This is the old query where I order by the sum of likes (puntuacion=1).
"SELECT entradas.* , SUM(puntuacion) AS total_likes
FROM entradas
LEFT JOIN valoraciones ON valoraciones.entradas_id = entradas.id
and valoraciones.puntuacion=1
WHERE fecha>=:fecha1 AND aceptada=1
GROUP BY entradas.id
ORDER BY `total_likes` DESC
limit 5";
Tried this, but total_likes / total_dislikes are temporal variables and can't operation with them.
SELECT entradas.* , SUM(puntuacion=1) AS total_likes, SUM(puntuacion=0) AS total_dislikes, total_likes-total_dislikes AS TOTAL
FROM entradas
LEFT JOIN valoraciones ON valoraciones.entradas_id = entradas.id
WHERE aceptada=1
GROUP BY entradas.id
ORDER BY `total_likes` DESC
limit 5
SELECT entradas.* , (SUM(v1.puntuacion) - SUM(v0.puntuacion)) AS total_likes
FROM entradas
LEFT JOIN valoraciones v1 ON v1.entradas_id = entradas.id and v1.puntuacion=1
LEFT JOIN valoraciones v0 ON v0.entradas_id = entradas.id and v0.puntuacion=0
WHERE fecha >= :fecha1 AND aceptada=1
GROUP BY entradas.id
ORDER BY `total_likes` DESC
limit 5
[EDIT]
Sorry mate, the query abover is not quite alright. I think the right answer for what you are looking for is this one below:
SELECT entradas.* , SUM(IF(v.puntuacion = 1, 1, -1)) AS total_likes
FROM entradas
LEFT JOIN valoraciones v ON v.entradas_id = entradas.id
WHERE fecha >= :fecha1 AND aceptada=1
GROUP BY entradas.id
ORDER BY `total_likes` DESC
LIMIT 5