Need Help Speeding up an Aggregate SQLite Query - mysql

I have a table defined like the following...
CREATE table actions (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
end BOOLEAN,
type VARCHAR(15) NOT NULL,
subtype_a VARCHAR(15),
subtype_b VARCHAR(15),
);
I'm trying to query for the last end action of some type to happen on each unique (subtype_a, subtype_b) pair, similar to a group by (except SQLite doesn't say what row is guaranteed to be returned by a group by).
On an SQLite database of about 1MB, the query I have now can take upwards of two seconds, but I need to speed it up to take under a second (since this will be called frequently).
example query:
SELECT * FROM actions a_out
WHERE id =
(SELECT MAX(a_in.id) FROM actions a_in
WHERE a_out.subtype_a = a_in.subtype_a
AND a_out.subtype_b = a_in.subtype_b
AND a_in.status IS NOT NULL
AND a_in.type = "some_type");
If it helps, I know all the unique possibilities for a (subtype_a,subtype_b)
eg:
(a,1)
(a,2)
(b,3)
(b,4)
(b,5)
(b,6)

Beginning with version 3.7.11, SQLite guarantees which record is returned in a group:
Queries of the form: "SELECT max(x), y FROM table" returns the value of y on the same row that contains the maximum x value.
So greatest-n-per-group can be implemented in a much simpler way:
SELECT *, max(id)
FROM actions
WHERE type = 'some_type'
GROUP BY subtype_a, subtype_b

Is this any faster?
select * from actions where id in (select max(id) from actions where type="some_type" group by subtype_a, subtype_b);

This is the greatest-in-per-group problem that comes up frequently on StackOverflow.
Here's how I solve it:
SELECT a_out.* FROM actions a_out
LEFT OUTER JOIN actions a_in ON a_out.subtype_a = a_in.subtype_a
AND a_out.subtype_b = a_in.subtype_b
AND a_out.id < a_in.id
WHERE a_out.type = "some type" AND a_in.id IS NULL
If you have an index on (type, subtype_a, subtype_b, id) this should run very fast.
See also my answers to similar SQL questions:
Fetch the row which has the Max value for a column
Retrieving the last record in each group
SQL join: selecting the last records in a one-to-many relationship
Or this brilliant article by Jan Kneschke: Groupwise Max.

Related

Get Conditionally Latest record from each group - without Aggregate functions or Partition

I have been trying to do this in many ways suggested.
Note: we do not want aggregate function or Partition since this is just a small part of whole Stored procedure and this is client requirement to not have it, so not in option and not possible duplicate of other existing answers / questions
I have a messages table, which has a column from and to, a foreign key to the user table, basically which user sends to whom at simplest. I also have other columns which are isSnoozed and snoozeAt for if the message is snoozed.
So the ordering is according to case. If messages is snoozed then consider snoozeAt time to Order or if not then consider sendAt. (right now we can ignore this condition while ordering, But I mentioned this since we cannot take simply MAX(id) )
I need to get recent most message from messages group by from user id
messages table like :
id -- to -- from -- isSnoozed -- snoozedAt -- sendAt ...
What I tried :
select * from ( select * from messages order by sendAt DESC) as TEMP GROUP BY TEMP.from
I tried many similar approaches but none worked.
I wasted many paid hours but can't find an approach which meets my exact requirement
NOTE: Please ignore typo in query if any, since I cant type in exact query table and names, So i typed in directly here
I figured this out by doing something like this, which could be explained in a simplified way:
select * from message where message.id in (
select
( select id from message where message.from = user.id order by CASE isSnoozed WHEN 0 THEN sendAt ELSE snoozeAt END DESC limit 1) as id
from user where user.id in ( select friends.`whoIsAdded` from friends where friends.`whoAdded` = myId)
) order by CASE isSnoozed WHEN 0 THEN sendAt ELSE snoozeAt END DESC
If I understand correctly, you just want the largest value in one of two columns. Assuming the values are never NULL, you can use greatest():
select m.*
from messages m
where greatest(m.sendAt, m.snoozedAt) =
(select max(greatest(m2.sendAt, m2.snoozedAt))
from messages m2
where m2.from = m.from
);
If the columns can be NULL, then you can use coalesce() to give them more reasonable values.

Sql GROUP BY conflict with the arrays

I have this tables:
Projectos: Pid, value, value, value...
Sectores: ID, value, value, value...
Projectos-Sectores: ProjectID(Pid), SectorID(ID)
I am doing this query:
SELECT *
FROM `Projectos`
JOIN `Projectos-Sectores` ON `ProjectID` = `Pid`
JOIN `Sectores` ON `SectorID` = `Sectores`.`ID`
GROUP BY `Pid`
All was correct at the first. Make the joins and group correctly. There aren't 'Pid' duplicate.
The problem came when only show the last coincidence in the table "Projectos-Sectores".
I want that return an array with all coincidence between "Projectos" & "Sectores". Without repeat 'Pid'.
Any idea?
The problem came when i get the result and i want get the "sectores" linkeds to "Projectos". Only show the last coincidence in the table "Projectos-Sectores".
Yes, because you group by Pid.
I want that return an array with all coincidence between "Projectos" & "Sectores". Without repeat 'Pid'.
You could for example order by Pid and then process all lines and group in the code which processes the rows, but you will get every row in the table and multiple Pid - transformation which transforms multiple lines into a different data structure which does not look like a relational database has to be done somewhere else but the database.
SELECT *
FROM `Projectos`
JOIN `Projectos-Sectores` ON `ProjectID` = `Pid`
JOIN `Sectores` ON `SectorID` = `Sectores`.`ID`
ORDER BY `Pid`

Too many cartesian products, making query to run slower

Consider I have two tables DETAILS AND RATE with following columns:
DETAILS table:
CREATE TABLE DETAILS(
LONG ID PRIMARY KEY AUTO_INCREMENT,
DOUBLE PRICE1,
DOUBLE PRICE2,
DOUBLE PRICE3,
VARCHAR(25) CURRENCY,
DATE CREATED_DATE,
VARCHAT(50) COMPLETED
..................
Few more columns
);
RATE TABLE:
CREATE TABLE RATE(
LONG ID PRIMARY KEY AUTO_INCREMENT,
DOUBLE RATE,
VARCHAR(25) CURRENCY,
DATE CREATED_DATE
..................
Few more columns
);
And I have a update query for DETAILS table as shown bellow.
UPDATE DETAILS D, RATE R
SET D.PRICE1=D.PRICE1*R.RATE,
D.PRICE2=D.PRICE2*R.RATE,
D.PRICE3=D.PRICE3*R.RATE
WHERE
D.CURRENCY=R.CURRENCY AND
DATE(D.CREATED_DATE) = DATE(R.CREATED_DATE) AND
D.COMPLETED IS NULL OR DO.COMPLETED='ABC' AND
D.CURRENCY!='RUPEE';
Before the query was working fine but as table grown this query is started taking more time and it is giving cartesion product in terms of billions.
Is there any way I can optimise this query?
Any help will be greatly appriciated.
Use explicit joins and fix the where clause using parentheses:
UPDATE DETAILS D JOIN
RATE R
ON D.CURRENCY=R.CURRENCY AND
DATE(D.CREATED_DATE) = DATE(R.CREATED_DATE)
SET D.PRICE1 = D.PRICE1*R.RATE,
D.PRICE2 = D.PRICE2*R.RATE,
D.PRICE3 = D.PRICE3*R.RATE
WHERE (D.COMPLETED IS NULL OR DO.COMPLETED='ABC') AND
D.CURRENCY <> 'RUPEE';
The problem is the parentheses in the where clause. However, you simply should not use commas to mean join.
DATE(D.CREATED_DATE) = DATE(R.CREATED_DATE)
Since those fields are DATE datatype, there is no need to use the DATE() function. In fact, doing so prevents use of an index.
Add INDEX(currency, created_date) at least to RATE. This, plus the above change will greatly speed up the query.
Another improvement would be to make currency an ENUM or normalize it.

mysql query works with some particular records only not with all of the data

first image of student_detail table,second is image of payment_detail table when i fire the query like
SELECT `student_detail`.`id`,
`student_detail`.`first_name`,
`student_detail`.`last_name`,
`student_detail`.`course`,
`payment_detail`.`id`,
`student_id`,
`inst_paid_date`,
`next_inst_date`,
`paid_installment`,
`next_installment_amount`
FROM `student_detail`,`payment_detail`
WHERE MONTH(`next_inst_date`)=MONTH(now())
AND `inst_paid_date`<`next_inst_date`
AND `student_detail`.`id`=`student_id`
AND `student_id`='10'
AND `inst_paid_date` in(select max(`inst_paid_date`) from `payment_detail`)
it do not give any result when records are present like second table but if i delete student id 8 and 9 it gives the result other wise not i cant get how it is conflict with other records when perticularly set the where condition with student_id=10. thanks in advanced
The reason is that you limit your inst_paid_date to the maximum value across the entire payment_detail table. Since this maximum value is for student id 9 when it is present, this conflicts with your filter on student id 10.
Try to add the same filter to your subquery like this:
WHERE
...
AND `student_id`='10'
AND `inst_paid_date` in (select max(`inst_paid_date`)
from `payment_detail`
where `student_id` = '10')
A more generic solution would be to turn the subquery into a correlated subquery. This requires an alias on the outer reference to the payment_detail table:
...
FROM `student_detail`,`payment_detail` as `PD`
WHERE MONTH(`next_inst_date`)=MONTH(now())
AND `inst_paid_date`<`next_inst_date`
AND `student_detail`.`id`=`student_id`
AND `PD`.`student_id`='10'
AND `inst_paid_date` in(select max(`inst_paid_date`)
from `payment_detail`
where `student_id` = `PD`.`student_id`)

MySQL - How can I update a table with values from another table?

I have the task to repair some invalid data in a mysql-database. In one table there are people with a missing date, which should be filled from a second table, if there is a corresponding entry.
TablePeople: ID, MissingDate, ...
TableEvent: ID, people_id, replacementDate, ...
Update TablePeople
set missingdate = (select replacementDate
from TableEvent
where people_id = TablePeople.ID)
where missingdate is null
and (select count(*)
from TableEvent
where people_id = TablePeople.ID) > 0
Certainly doesn't work. Is there any other way with SQL? Or how can I process single rows in mysql to get it done?
We need details about what's not working, but I think you only need to use:
UPDATE TablePeople
SET missingdate = (SELECT MAX(te.replacementDate)
FROM TABLEEVENT te
WHERE te.people_id = TablePeople.id)
WHERE missingdate IS NULL
Notes
MAX is being used to return the latest replacementdate, out of fear of risk that you're getting multiple values from the subquery
If there's no supporting record in TABLEEVENT, it will return null so there's no change