I have a table whose data looks like this:
INSERT INTO `cm_case_notes` (`id`, `case_id`, `date`, `time`, `description`, `username`, `supervisor`, `datestamp`) VALUES
(45977, '1175', '2010-11-19 16:27:15', 600, 'Motion hearing...Denied.', 'bjones', 'jharvey,', '2010-11-19 21:27:15'),
(46860, '1175', '2010-12-11 16:11:19', 300, 'Semester Break Report', 'bjones', 'jharvey,', '2010-12-11 21:11:19'),
(48034, '1175', '2011-05-04 17:30:03', 300, 'test', 'bjones', 'jharvey,', '2011-05-04 22:30:03'),
(14201, '1175', '2009-02-06 00:00:00', 3600, 'In court to talk to prosecutor, re: the file', 'csmith', 'sandrews', '2009-02-07 14:33:34'),
(14484, '1175', '2009-02-13 00:00:00', 6300, 'Read transcript, note taking', 'csmith', 'sandrews', '2009-02-16 17:22:36');
I'm trying to select the most recent case note (by date) on each case by each user. The best I've come up with is:
SELECT * , MAX( `date` ) FROM cm_case_notes WHERE case_id = '1175' GROUP BY username
This, however, doesn't give the most recent entry, but the first one for each user. I've seen several similar posts here, but I just can't seem to get my brain around them. Would anybody take pity on the sql-deficient and help?
If you want only the dates of the most recent case note for every user and every case, you can use this:
--- Q ---
SELECT case_id
, username
, MAX( `date` ) AS recent_date
FROM cm_case_notes
GROUP BY case_id
, username
If you want all the columns from these row (with most recent date) follow the Quassnoi link for various solutions (or the other provided links). The easiest to write would be to make the above query into a subquery and join it to cm_case_notes:
SELECT cn.*
FROM
cm_case_notes AS cn
JOIN
( Q ) AS q
ON ( q.case_id, q.username, q.recent_date )
= ( cn.case_id, cn.username, cn.`date` )
If you just want the lastet case note but only for a particular case_id, then you could add the where condition in both cn and Q (Q slightly modified):
SELECT cn.*
FROM
cm_case_notes AS cn
JOIN
( SELECT username
, MAX( `date` ) AS recent_date
FROM cm_case_notes
WHERE case_id = #particular_case_id
GROUP BY username
) AS q
ON ( q.username, q.recent_date )
= ( cn.username, cn.`date` )
WHERE cn.case_id = #particular_case_id
the reason why you don't get what would like to fetch from the database is the use of SELECT * together with GROUP.
In fact, only the results of aggregate functions and / or the group field(s) itself can be safely SELECTed. selecting anything else leads to unexpected results. (the exact result depends on order, query optimization and such).
What you are trying to achieve is called fetching a "groupwise maximum". This is a common problem / common task in SQL, you can read a nice writeup here:
http://jan.kneschke.de/projects/mysql/groupwise-max/
or in the MySQL manual here:
http://dev.mysql.com/doc/refman/5.1/en/example-maximum-column-group-row.html
or a detailed long explanation by stackoverflow user Quassnoi here:
http://explainextended.com/2009/11/24/mysql-selecting-records-holding-group-wise-maximum-on-a-unique-column/
Have you considered a DESC ordering and simply limiting 1?
Related
First off, take a look at diagram (this is an application for testing students knowledge)
I already have working application, which calculates score (in percents), but to sort by score, it is required to select all the records (of current test). And it drastically slows down app (~ 10 seconds of waiting). So I decided to move that logic into single sql query.
Now, my SQL query looks like this:
select test_results.*,
(
select test_result_total_score * 100 / test_result_total_max_score
from (
select (select sum(question_score)
from (
select question_total_right_answers = question_total_options as question_score
from (
select (
select count(*)
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
and answers.is_chosen = answer_options.is_right
) as question_total_right_answers,
(
select count(*)
from answers
left join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
) as question_total_options
from asked_questions
where asked_questions.test_result_id = test_results.id
) as rigt_per_question
) as questions_scores) as test_result_total_score,
(select count(*)
from asked_questions
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
) as right_per_test_result
) as result_in_percents
from test_results
where test_results.id between 1 and 200;
Here is what it should do: for each asked question collect how many answer options there are (question_total_options) and how many answers user selected right (question_total_right_answers) - the very nested subqueries.
Then for each of this results calculate score (this is basically 1 if user selected all right options and 0 if at least one option is selected not right).
After that, we sum scores of all that questions (test_result_total_score - how many questions user answered right). Also, we calculate how many questions there are in test result (test_result_total_max_score).
With that information we can calculate percentage of right answered questions (test_result_total_score * 100 / test_result_total_max_score)
And the error is on lines 23 and 28:
where asked_questions.test_result_id = test_results.id
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
It says: [42S22][1054] Unknown column 'test_results.id' in 'where clause'
I have tried using variable #test_result_id like this:
select test_results.*,
#test_result_id := test_results.id,
( ... )
where asked_questions.test_result_id = #test_result_id
where asked_questions.test_result_id = #test_result_id) as test_result_total_max_score
And it evaluates, but in wrong way (probably because order of evaluation select values is undefined). BTW, all result_in_percents correspond to very first result.
For those facing similar problem, it seems that there is no simple solution.
First off, you can try rewrite your subqueries with joins as I did (see below). But when you would like to perform group operations on grouped results, you are really unhappy person). A "dirty" solution might be create function to overcome barrier of nesting subqueries.
create function test_result_in_percents(test_result_id bigint unsigned)
returns float
begin
return (
select sum(tmp.question_right) * 100 / count(*)
from (select sum(answers.is_chosen = answer_options.is_right) = count(*) as question_right
, asked_questions.test_result_id as test_result_id
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
inner join asked_questions on asked_questions.id = answers.asked_question_id
where asked_questions.test_result_id = test_result_id
group by answers.asked_question_id
) as tmp
group by test_result_id
);
end;
And then, just use this function:
select (test_result_in_percents(test_results.id)) as `result_percents`
from `test_results`
where `test_results`.`test_id` = 181
and `test_results`.`test_id` is not null
order by `test_results`.`id` desc;
I have two tables with huge amount of data in them (~1.8mil in the main one, ~1.2mil in the secondary one), as follows:
subscriber_table (id, name, email, country, account_status, ...)
subscriber_payment_table (id, subscriber_id, payment_type, payment_credential)
My end goal is having a table, containing all the users and their payment tables (null if non existing), up to yesterday, and with account_status = 1 (active)
Mot all subscribers have a corresponding subscriber_payment, so using an INNER JOIN isn't a viable option, and using a LEFT JOIN has me end up with SQL timing out my query after 2 hrs after much processing effort.
SELECT
`subscribers`.`id` AS `id`,
`subscribers`.`email` AS `email`,
`subscribers`.`name` AS `name`,
`subscribers`.`geoloc_country` AS `country`,
`subscribers_payment`.`payment_type` AS `paymentType`,
`subscribers_payment`.`payment_credential` AS `paymentCredential`
`subscribers`.`create_datetime` AS `createdAt`
FROM
`subscribers`
LEFT JOIN
`subscribers_payment` ON (`subscribers_payment`.`subscriberId` = `subscribers`.`id`)
WHERE
`subscribers`.`account_status` = 1
AND DATE_FORMAT(CAST(`subscribers`.`create_datetime` AS DATE), '%Y-%m-%d') < curdate())
As mentioned, this query takes too much time and ends up timing out and not working.
I've also considered having a UNION, between "All the Subscribers" and "Subscribers with Payment".
(
SELECT
`subscribers`.`id` AS `id`,
`subscribers`.`email` AS `email`,
`subscribers`.`name` AS `name`,
`subscribers`.`geoloc_country` AS `country`,
null AS `paymentType`,
null AS `paymentCredential`
`subscribers`.`create_datetime` AS `createdAt`
FROM
`subscribers`
WHERE
`subscribers`.`account_status` = 1
AND DATE_FORMAT(CAST(`subscribers`.`create_datetime` AS DATE), '%Y-%m-%d') < curdate()))
UNION
(
SELECT
`subscribers`.`id` AS `id`,
`subscribers`.`email` AS `email`,
`subscribers`.`name` AS `name`,
`subscribers`.`geoloc_country` AS `country`,
`subscribers_payment`.`payment_type` AS `paymentType`,
`subscribers_payment`.`payment_credential` AS `paymentCredential`
`subscribers`.`create_datetime` AS `createdAt`
FROM
`subscribers`
INNERJOIN
`subscribers_payment` ON (`subscribers_payment`.`subscriberId` = `subscribers`.`id`)
WHERE
`subscribers`.`account_status` = 1
AND DATE_FORMAT(CAST(`subscribers`.`create_datetime` AS DATE), '%Y-%m-%d') < curdate()))
The problem with that current implementation is that I'm getting duplicate queries (I'm using a UNION but it's not grouping my results together and removing non-distinct values, that's because I have a different value in the paymentType and paymentCredential columns)
This query runs in about ~2mins, so this is more feasible for me. I just need to eliminate duplicate records.. unless there's a wiser option here
Disclaimer: we're using MyISAM tables, so having foreign keys to speed up the queries is a no-go.
For this query:
SELECT . . .
FROM subscribers s LEFT JOIN
subscribers_payment sp
ON sp.subscriberId = s.id
WHERE s.account_status = 1 AND
s.create_datetime < curdate();
Then, you want an index on subscribers(account_status, create_datetime, id) and on subscribers_payment(subscriberId).
I am guessing that the index on subscriber_payment is missing, which explains the performance problems.
Notes:
Use table aliases -- they make the query easier to write and read.
There should be no need to convert a datetime to a string for comparison purposes.
There is no need to use backticks for all identifiers. They just make the query harder to write and read.
This my query with its performance (slow_query_log):
SELECT j.`offer_id`, o.`offer_name`, j.`success_rate`
FROM
(
SELECT
t.`offer_id`,
(
SUM(CASE WHEN `offer_id` = t.`offer_id` AND `sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) / COUNT(*)
) AS `success_rate`
FROM `tblSales` AS t
WHERE DATE(t.`sales_time`) = CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
) AS j
LEFT JOIN `tblOffers` AS o
ON j.`offer_id` = o.`offer_id`
LIMIT 5;
# Time: 180113 18:51:19
# User#Host: root[root] # localhost [127.0.0.1] Id: 71
# Query_time: 10.472599 Lock_time: 0.001000 Rows_sent: 0 Rows_examined: 1156134
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
The query works fine and provides the output I needed. But it appears to be that its a bit slower.
offer_id and sales_status are already indexed in the tblSales. So do you have any suggestion on improving the inner query (where it calculates the success rate) so that performance can be improved? I have been playing with the math for more than 2hrs. But couldn't get a better way.
Btw, tblSales has lots of data. It contains those sales which are SUCCESSFUL, FAILED, PENDING, etc.
Thank you
EDIT
As you requested am including the table design also(only relevant fields are included):
tblSales
`sales_id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
`offer_id` bigint UNSIGNED NOT NULL DEFAULT '0',
`sales_time` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`sales_status` ENUM('WAITING', 'SUCCESS', 'FAILED', 'CANCELLED') NOT NULL DEFAULT 'WAITING',
PRIMARY KEY (`sales_id`),
KEY (`offer_id`),
KEY (`sales_status`)
There are some other fields also in this table, that holds some other info. Amount, user_id, etc. which are not relevant for my question.
Numerous 'problems', none of which involve "math".
JOINs make things difficult. LEFT JOIN says "I don't care whether the row exists in the 'right' table. (I suspect you don't need LEFT??) But it also says "There may be multiple rows in the right table. Based on the column names, I will guess that there is only one offer_name for each offer_id. If this is correct, then here my first recommendation. (This will convince the Optimizer that there is no issue with the JOIN.) Change from
SELECT ..., o.offer_name, ...
LEFT JOIN `tblOffers` AS o ON j.`offer_id` = o.`offer_id`
...
to
SELECT ...,
( SELECT offer_name FROM tbloffers WHERE offer_id j.offer_id
) AS offer_name, ...
It also gets rid of a bug wherein you are assuming that the inner ORDER BY will be preserved for the LIMIT. This used to be the case, but in newer versions of MariaDB / MySQL, it is not. The ORDER BY in a "derived table" (your subquery) is now ignored.
2 down, a few more to go.
"Don't hide an indexed column in a function." I am referring to DATE(t.sales_time) = CURDATE(). Assuming you have no sales_time values for the 'future', then that test can be changed to t.sales_time >= CURDATE(). If you really need to restrict to just today, then do this:
AND sales_time >= CURDATE()
AND sales_time < CURDATE() + INTERVAL 1 DAY
The ORDER BY and the LIMIT should usually be put together. In your case, you may as well add the LIMIT to the "derived table", thereby leading to only 5 rows for the outer query to work with. But... There is still the question of getting them sorted correctly. So change from
SELECT ...
FROM ( SELECT ...
ORDER BY ... )
LIMIT ...
to
SELECT ...
FROM ( SELECT ...
ORDER BY ...
LIMIT 5 ) -- trim sooner
ORDER BY ... -- deal with the loss of ordering from derived table
Rolling it all together, I have
SELECT j.`offer_id`,
( SELECT offer_name
FROM tbloffers
WHERE offer_id = j.offer_id
) AS offer_name,
j.`success_rate`
FROM
( SELECT t.`offer_id`,
AVG(t.sales_status = 'SUCCESS') AS `success_rate`
FROM `tblSales` AS t
WHERE t.sales_time >= CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 5
) AS j
ORDER BY `success_rate` DESC;
(I took the liberty of shortening the SUM(...) in two ways.)
Now for the indexes...
tblSales needs at least (sales_time), but let's go for a "covering" (with sales_time specifically first):
INDEX(sales_time, sales_status, order_id)
If tbloffers has PRIMARY KEY(offer_id), then no further index is worth adding. Else, add this covering index (in this order):
INDEX(offer_id, offer_name)
(Apologies to other Answerers; I stole some of your ideas.)
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
Approach this with a simple JOIN and GROUP BY:
SELECT s.offer_id, o.offer_name,
AVG(s.sales_status = 'SUCCESS') as success_rate
FROM tblSales s JOIN
tblOffers o
ON o.offer_id = s.offer_id
WHERE s.sales_time >= CURDATE() AND
s.sales_time < CURDATE() + INTERVAL 1 DAY
GROUP BY s.offer_id, o.offer_name
ORDER BY success_rate DESC;
Notes:
The use of date arithmetic allows the query to make use of an index on tblSales(sales_time) -- or better yet tblSales(salesTime, offer_id, sales_status).
The arithmetic for success_rate has been simplified -- although this has minimal impact on performance.
I added offer_name to the GROUP BY. If you are learning SQL, you should always have all the unaggregated keys in the GROUP BY clause.
A LEFT JOIN is only needed if you have offers in tblSales which are not in tblOffers. I am guessing you have proper foreign key relationships defined, and this is not the case.
Based on not much information that you have provided (i mean table schema) you could try the following.
SELECT `o`.`offer_id`, `o`.`offer_name`, SUM(CASE WHEN `t`.`sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) AS `success_rate`
FROM `tblOffers` `o`
INNER JOIN `tblSales` `t`
ON `o`.`offer_id` = `t`.`offer_id`
WHERE DATE(`t`.`sales_time`) = CURDATE()
GROUP BY `o`.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 0,5;
You can find a sample of this query in this SQL Fiddle example
Without knowing your schema, the lowest hanging fruit I see is this part....
WHERE DATE(t.`sales_time`) = CURDATE()
Try changing that to something that looks like
Where t.sales_time >= #12-midnight-of-current-date and t.sales_time <= #23:59:59-of-current-date
First question ever on stack, I really couldn't find the answer elsewhere
I am making a ticket reservation system and I would like one query to create a ticketsorder if the total of ordered tickets after this call would be equal or smaller than the maxtickets value of tickettype and the maxticketstype of the event.
I have 3 tables
events with columns id, maxtickets
tickettypes with columns id, event_id, maxtickets
ticketsorders with columns id, tickettype_id, amount (amount is number of tickets)
The reason I want to do it in 1 call is so that there can never be more tickets ordered than the specified maximum on the ticketype or the event. If I would get the two sums in php and than write if my php calculations are okay there will be time between the getting of sums and writing the new value and possibly there will be more tickets ordered than allowed.
Maybe I'm looking for the wrong solution and maybe I should do more homework, any direction is appreciated, a fully working query would be most awesome :)
I mentioned laravel because it uses eloquent databases queries but I can use raw SQL when I want so no problem if the solution is raw SQL.
Thank you in advance!
Update: (Got it working with help from mentioned question)
INSERT into ticketsorders (order_id, tickettype_id, event_id, amount)
SELECT '7', '23', '1', '10'
FROM dual
WHERE ((SELECT COALESCE(SUM(amount), 0) FROM ticketsorders WHERE tickettype_id = 23) + '10' ) <= (SELECT maxtickets from tickettypes where id = '23')
AND ((SELECT COALESCE(SUM(amount), 0) FROM ticketsorders WHERE event_id = 1) + '10' ) <= (SELECT maxtickets from events where id = '1')
LIMIT 1
only uglieness I had to add was event_id to ticketsorders, we learned in school that you shouldnt do this because its available throug relation event->tickettype->ticketsorder but for now this makes quering for event objects alot easier.
Final update concerning laravel:
after adding in the vars and wrapping the sql in a raw statement like this:
$sql = "INSERT into ticketsorders (order_id, tickettype_id, event_id, amount) SELECT $order->id, $tickettype->id, $eventid, $val FROM dual WHERE ((SELECT COALESCE(SUM(amount), 0) FROM ticketsorders WHERE tickettype_id = $tickettype->id) + $val ) <= (SELECT max from tickettypes where id = $tickettype->id) AND ((SELECT COALESCE(SUM(amount), 0) FROM ticketsorders WHERE event_id = $eventid) + $val ) <= (SELECT max from events where id = $eventid) LIMIT 1";
$testupdate = DB::insert(DB::raw("$sql"));
I found out that the query always returns 1 which means true, even if the insert doesn't go through because the query still ran successfully. Rather than breaking my head on this issue trying to get the actual result I decided to do a new select and see if it succeeded and base my return message to the user on the intended insert compared to the select.
Thank you for the comments they helped a lot
I am running into a small problem,
This is a demo query
select
A.order_id,
if(
A.production_date != '0000-00-00',
A.production_date,
if(
SOME INNER QUERY != '0000-00-00',
SOME INNER QUERY ,
SOME OTHER INNER QUERY
)
) as production_start_date
from
orders A
So basically, suppose the SOME INNER QUERY is taking 10 seconds to do its calculations, fetching data from 8 different tables, checking past history for same order type etc. and if its result is a date, I fetch that date in first condition. But now it will take 20 seconds as 10 seconds for calculation for if condition, and 10 seconds to re-execute to return the result.
Is there any way I can reduce this?
if any one is interested in looking at actual query http://pastebin.com/zqzbpEei
Assuming your query looks like this (sorry, I gave up trying to locate the actual query):
IF(
(SELECT aField FROM aTable WHERE bigCondition) != '0000-00-00',
SELECT aField FROM aTable WHERE bigCondition,
SELECT anotherField FROM anotherTable
)
You can rewrite it as follows:
SELECT IF (
someField != '0000-00-00',
someField,
SELECT anotherField FROM anotherTable
)
FROM aTable WHERE bigCondition
This way you compute bigCondition only once.
This query is quite ugly indeed.
Your major problem seems to be the misuse (and abuse, big time) of the IF() construct. It should be reserved to simple conditions and operations. The same applies to logical operators. Do not operate on entire queries. For instance, I see this one bit appears a few times in your query:
IF(
(SELECT v1.weekends FROM vendor v1 WHERE v1.vendor_id = A.vendor_id) IS NULL
OR (SELECT v1.weekends FROM vendor v1 WHERE v1.vendor_id = A.vendor_id) = '',
'6', -- by the way, why is this a string?! This is an integer, isn't it?
(SELECT v1.weekends FROM vendor v1 WHERE v1.vendor_id = A.vendor_id)
)
This is Bad. The condition should be moved into the SELECT directly. Rewrite it as below:
SELECT
IF (v1.weekends IS NULL OR v1.weekends = '', 6, v1.weekends)
FROM vendor v1 WHERE v1.vendor_id = A.vendor_id
That's two SELECT saved. Do this for every IF() that contains a query, and I am ready to bet you are going to speed up your query by several orders of magnitude.
There is a lot more to say about your current code. Unfortunately, you will probably need to refactor some parts of your ORM. Add new, more specialised methods to some classes, and make them use new queries that you crafted manually. Then refactor your current operation so that it uses these new methods.