Need a simple solutin to slow query - mysql

I have following query..
SELECT avg(h.price)
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-01-01'
AND h.date <='2015-04-01'
AND h.dealer_id <> 2389
AND vd.prefix IN
(SELECT concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-03-01'
AND h.date <='2015-04-01'
AND h.dealer_id =2389)
It finds the average market value of a car sold within last 3 months by everyone else other than (2389) but only those car which have the same Make, Model sold by (2389)
can above query be optimized ? it's taking 2 minutes to run for 11 million records..
Thanks

How often will you use that particular "prefix"? If often, then I will direct you toward indexing a 'virtual' column.
Otherwise, you need
INDEX(date) -- for the outer query
INDEX(dealer_id, date) -- for what is now the subquery
Then do the EXISTS as suggested, or use a LEFT JOIN ... WHERE ... IS NULL.
Is date a DATE? or a DATETIME? You may be including an extra day. Suggest this pattern:
WHERE date >= '2015-01-01'
AND date < '2015-01-01' + INTERVAL 3 MONTH

If you want a simple solution, my initial thought is to figure out a way to not have function calls in your joins.
You negatively affect the chance that an index will be helpful.
(concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
Maybe a like statement would be a better idea, however, either approach in a join clause is to be avoided.
Bottom line is your table structure & relationships here leaves room for improvement... If you need the concat because you are avoiding joining intermediate tables, don't -- allow the indexes to be used and it should improve your query performance.
Also, make sure you have indexes.

I suggest 3 things
add a column and index it (avoid the functions in the join)
use an inner join
use EXISTS (...) instead of IN (...)
To "optimize" that query you need to add a column to the table car_history_optimized which contains the result of concat(substr(vin,1,8),'_',substr(vin,10,3)) and this column should be indexed.
Also, use INNER JOIN. In the current query the left outer join is wasted because you require every row of that table to be IN (the subquery) so NULL from that table isn't permitted hence you have the same effect as an inner join.
Use EXISTS instead of IN
SELECT
AVG(h.price)
FROM car_history_optimized h
INNER JOIN vin_data vd ON h.new_column = vd.prefix
WHERE h.`date` >= '2015-01-01'
AND h.`date` <= '2015-04-01'
AND h.dealer_id <> 2389
AND EXISTS (
SELECT
NULL
FROM car_history_optimized cho
WHERE cho.`date` >= '2015-03-01'
AND cho.`date` <= '2015-04-01'
AND cho.dealer_id = 2389
AND vd.prefix = cho.new_column
)
;
By the way:
I assume already have some indexes and those include date and dealer_id
in future avoid using "date" as a column name (it's a reserved word)

Related

MySQL - Where relation exists add a where, ignore where if not exists

I have a flights table and a bookings table.
Flights has a column, max_passengers
Flights have bookings in way of the bookings table referencing a the flight_id
I am using this in laravel / PHP but I am looking help with the actual SQL because its driving me a bit loopy I can currently query a flight and work out the number of free places left by using sum(bookings.places_booked) then sumtracting this number from the max_passengers but how could I write SQL that does:
Randomly selects flight
If bookings are against this flight sum the amount of places booked and add a where < max_passengers
I can do this part but how can you make it so if no bookings are made against a flight the join / where becomes optional and doesn't matter because the max_passengers would be the number of free places.
I had thought about adding a free_places column to the flights table but it would come with issues which the current setup avoids.
SELECT flights.*, bookings.*
FROM flights
LEFT JOIN bookings ON flights.id = bookings.flight_id
WHERE sum(bookings.places_booked) < flights.max_passengers
GROUP BY flights.id
ORDER BY flights.id ASC
This is the thing I am trying to achive but as I said I don't know how to make the join / where optional so where no relation exists it doesn't matter about checking.
EDIT (Final SQL):
SELECT flights.*, bookings.*
FROM flights
LEFT JOIN bookings ON flights.id = bookings.flight_id
GROUP BY flights.id
HAVING COALESCE(sum(bookings.places_booked),0) < flights.max_passengers
ORDER BY RAND()
LIMIT 1
Two things : 1) You should use the HAVING clause and not the WHERE clause for aggregation functions (your query should throw an error).
2) Use COALESCE() to replace the value from NULL to an actual value. This is the second reason why your query is not working. When there is no match , bookings.places_booked is null , then the condition is rendered as NULL < flights.max_passenger , which will always be false.
GROUP BY flights.id
HAVING COALESCE(sum(bookings.places_booked),0) < MAX(flights.max_passengers)
ORDER BY flights.id

Indexing on JOIN and WHERE clauses

If my query looks like:
SELECT *
FROM member
LEFT JOIN group ON (member.group_id = group.id)
WHERE group.created > #date
ORDER BY member.last_updated
If I create two indexes for:
member.group_id, member.last_updated
group.id, group.created
Is that the best way to optimize the original query? Should I add a new field member.group_created and index that like so:
member.group_created, member.last_updated
SELECT *
FROM member AS m
LEFT JOIN group AS g ON (m.group_id = g.id)
WHERE g.created > #date
ORDER BY m.last_updated
If you don't need all (*) the columns of both tables, don't ask for them; it may impact performance.
Do you really need LEFT? That is, do you want NULL for any rows missing from the 'right' table?
If the Optimizer decides to start with member, it might benefit from INDEX(last_updated). Assuming that id is the PRIMARY KEY ofgroup`, no extra index is needed there.
If it decides to start with group, then INDEX(created) may be useful. Then, m needs INDEX(group_id).
So, add the 3 indexes I suggest, if they don't already exist.
If you have more issues, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...
Dont use where clause on left join tables,instead do something like this.
SELECT *
FROM member
LEFT JOIN group ON (member.group_id = group.id and group.created > #date)
ORDER BY member.last_updated
Also add index (id,created) in group table

correct way to write a Sum query

I'm trying to find out if the code below is in the right format to retrieve the yearly sum of payments
select sum(payment)
select mem_type.mtype, member.name, payment.payment_amt
from mem_type, member, payment
where mem_type.mtype = member.mtype
and member.mem_id = payment.mem_id
group by mem_id
having payment.date > '2014-1-1' <'2014-12-31';
There's a few problems with the statement.
The keyword SELECT appears twice, and that's not valid the way you have it. (A SELECT keyword is needed in a subquery or an inline view, but otherwise, it's not valid to repeat the keyword SELECT.
The predicate in the HAVING clause isn't quite right. (MySQL may accept that as valid syntax, but it's not doing what you are wanting to do. To return rows that have a payment.date in a specific year, we'd typically specify that as predicates in the WHERE clause:
WHERE payment.date >= '2014-01-01'
AND payment.date < '2015-01-01'
Also, I'd recommend you ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead, and relocate the join predicates from the WHERE clause to an ON clause. For example:
SELECT ...
FROM member
JOIN mem_type
ON mem_type.mtype = member.mtype
JOIN payment
ON payment.mem_id = member.mem_id
It's good to see that you've qualified all the column references.
Unfortunately, it's not possible to recommend the syntax that will return the resultset you are looking for. There are too many unknowns, we'd just be guessing. An example of the result you are wanting returned, from what data, that would go a long ways towards a specification.
If I had to take a blind "guess" at a query that would meet the ambiguous specification, without any knowledge of the tables, columns, datatypes, et al. my guess would be something like this:
SELECT m.mem_id
, t.mtype
, m.name
, IFNULL(SUM(p.payment_amt),0) AS total_payments_2014
FROM member m
LEFT
JOIN mem_type t
ON t.mtype = m.mtype
LEFT
JOIN payment p
ON p.mem_id = m.mem_id
WHERE p.date >= '2014-01-01'
AND p.date < '2014-01-01' + INTERVAL 1 YEAR
GROUP BY m.mem_id
This only serves as an example. This is premised on a whole lot of information that isn't provided (e.g. what is the datatype of the date column in the payment table? Do we want to exclude payments with dates of 1/1 or 12/31? Is the mem_id column unique in member table? Is mtype column unique in the mem_type table, can mem_type column in the members table be NULL, do we want all rows from the members table returned, or only those that had a payment in 2014, etc. Can the mem_id column on the payment table be NULL, are there rows in payment that we want included but which aren't related to a member? et al.

OUTER JOIN -> want to return something even if empty

I try to return a group_concat on 2 tables
One being my list of schools and the other, some numeric data.
For some dates, i have NO DATA at all in the table SimpleData and so my lEFT OUTER JOINS returns 10 results where i have 11 schools (i need 11 rows for javascript treatment in order too)
here is my query (tell me if i need to give more details about tables
SELECT A.nomEcole,
A.Compteur,
IFNULL(SUM(B.rendementJour), '0') AS TOTAL,
B.jourUS,
B.rendementJour
FROM ecoles A LEFT OUTER JOIN SimpleData B ON A.Compteur = B.compteur
WHERE jourUS LIKE '2013-07-%'
GROUP BY ecole
in this example, i have no data in SimpleData for this month( not data was recorded at all)
I have to show either NULL or '0' for this missing school and i'm starting to lose my head on something easy apparently :(
Thanks for any help !
olivier
As one way is mentioned by #Abhik Chakraborty where will filter out the records which doesn't match the criteria ,another is you can use CASE statement
SELECT A.nomEcole,
A.Compteur,
SUM(CASE WHEN jourUS LIKE '2013-07-%' THEN B.rendementJour ELSE 0 END) AS TOTAL,
B.jourUS,
B.rendementJour
FROM ecoles A
LEFT OUTER JOIN SimpleData B ON A.Compteur = B.compteur
GROUP BY ecole
I suspect you just need to move the where condition to the on clause:
SELECT A.nomEcole, A.Compteur, IFNULL(SUM(B.rendementJour), 0) AS TOTAL,
B.jourUS, B.rendementJour
FROM ecoles A LEFT OUTER JOIN
SimpleData B
ON A.Compteur = B.compteur and b.jourUS >= '2013-07-01' and b.jourUS < '2013-08-01'
GROUP BY A.ecole;
Some other changes:
Don't use single quotes for numeric constants. Single quotes should really only be used for date and string constants.
Don't use like for dates. like is an operation on strings, not dates, and the date has to be implicitly converted to a string. Instead, do direct comparisons on the date ranges you are interested in.
I would also recommend that the table aliases be abbreviations for the tables you are using. This makes the query easier to read. (So e instead of A for ecoles.)
Also note that the values that you are returning for JourUS and RendementJour are indeterminate. If there are multiple rows in the B table that match, then an arbitrary value will be returned. Perhaps you want max() or group_concat() for them.
Your WHERE clause turns the LEFT OUTER JOIN into an INNER JOIN, because outer-joined records values are NULL and NULL is never LIKE '2013-07-%'.
This is the reason you must move jourUS LIKE '2013-07-%' to the ON clause, because you only want to join records where jourUS LIKE '2013-07-%' and otherwise outer join a null record.

INNER JOIN on 3 tables with SUM()

I am having a problem trying to JOIN across a total of three tables:
Table users: userid, cap (ADSL bandwidth)
Table accounting: userid, sessiondate, used bandwidth
Table adhoc: userid, date, amount purchased
I want to have 1 query that returns a set of all users, their cap, their used bandwidth for this month and their adhoc purchases for this month:
< TABLE 1 ><TABLE2><TABLE3>
User | Cap | Adhoc | Used
marius | 3 | 1 | 3.34
bob | 1 | 2 | 1.15
(simplified)
Here is the query I am working on:
SELECT
`msi_adsl`.`id`,
`msi_adsl`.`username`,
`msi_adsl`.`realm`,
`msi_adsl`.`cap_size` AS cap,
SUM(`adsl_adhoc`.`value`) AS adhoc,
SUM(`radacct`.`AcctInputOctets` + `radacct`.`AcctOutputOctets`) AS used
FROM
`msi_adsl`
INNER JOIN
(`radacct`, `adsl_adhoc`)
ON
(CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`)
= `radacct`.`UserName` AND `msi_adsl`.`id`=`adsl_adhoc`.`id`)
WHERE
`canceled` = '0000-00-00'
AND
`radacct`.`AcctStartTime`
BETWEEN
'2010-11-01'
AND
'2010-11-31'
AND
`adsl_adhoc`.`time`
BETWEEN
'2010-11-01 00:00:00'
AND
'2010-11-31 00:00:00'
GROUP BY
`radacct`.`UserName`, `adsl_adhoc`.`id` LIMIT 10
The query works, but it returns wrong values for both adhoc and used; my guess would be a logical error in my joins, but I can't see it. Any help is very much appreciated.
Your query layout is too spread out for my taste. In particular, the BETWEEN/AND conditions should be on 1 line each, not 5 lines each. I've also removed the backticks, though you might need them for the 'time' column.
Since your table layouts don't match your sample query, it makes life very difficult. However, the table layouts all include a UserID (which is sensible), so I've written the query to do the relevant joins using the UserID. As I noted in a comment, if your design makes it necessary to use a CONCAT operation to join two tables, then you have a recipe for a performance disaster. Update your actual schema so that the tables can be joined by UserID, as your table layouts suggest should be possible. Obviously, you can use functions results in joins, but (unless your DBMS supports 'functional indexes' and you create appropriate indexes) the DBMS won't be able to use indexes on the table where the function is evaluated to speed the queries. For a one-off query, that may not matter; for production queries, it often does matter a lot.
There's a chance this will do the job you want. Since you are aggregating over two tables, you need the two sub-queries in the FROM clause.
SELECT u.UserID,
u.username,
u.realm,
u.cap_size AS cap,
h.AdHoc,
a.OctetsUsed
FROM msi_adsl AS u
JOIN (SELECT UserID, SUM(AcctInputOctets + AcctOutputOctets) AS OctetsUsed
FROM radact
WHERE AcctStartTime BETWEEN '2010-11-01' AND '2010-11-31'
GROUP BY UserID
) AS a ON a.UserID = u.UserID
JOIN (SELECT UserID, SUM(Value) AS AdHoc
FROM adsl_adhoc
WHERE time BETWEEN '2010-11-01 00:00:00' AND '2010-11-31 00:00:00'
GROUP BY UserId
) AS h ON h.UserID = u.UserID
WHERE u.canceled = '0000-00-00'
LIMIT 10
Each sub-query computes the value of the aggregate for each user over the specified period, generating the UserID and the aggregate value as output columns; the main query then simply pulls the correct user data from the main user table and joins with the aggregate sub-queries.
I think that the problem is here
FROM `msi_adsl`
INNER JOIN
(`radacct`, `adsl_adhoc`)
ON
(CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`)
= `radacct`.`UserName` AND `msi_adsl`.`id`=`adsl_adhoc`.`id`)
You are mixing joins with Cartesian product, and this is not good idea, because it's a lot harder to debug. Try this:
FROM `msi_adsl`
INNER JOIN
`radacct`
ON
CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`) = `radacct`.`UserName`
JOIN `adsl_adhoc` ON `msi_adsl`.`id`=`adsl_adhoc`.`id`