How to improve query Mysql? - mysql

I made this query in MySql. It found but the time of query is 3 or + minutes.
I would like to know if is possible to improve this query.
The query is this:
SELECT CODARTIOLO, NOMEARTICOLO, SUM(QUANTITA) AS QUANTITA,
(SUM(TOTRIGA)/SUM(QUANTITA)) AS TOTALE,
(SELECT (SUM(QUANTITA * PREZZOCAD))/SUM(QUANTITA)
FROM vistacaricomagazzino cm
WHERE cm.DATA <= '$dataStart' AND cm.codarticolo=CODARTIOLO) AS PREZZOMEDIO
FROM vistascontrini c
WHERE c.DATA >= '$dataStart' AND c.DATA <= '$dataEnd'
GROUP BY NOMEARTICOLO
the table is:
VISTASCONTRINI
+--------------+--------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+------------+-------+
| CODARTIOLO | varchar(13) | YES | | NULL | |
| NOMEARTICOLO | varchar(60) | YES | | NULL | |
| QUANTITA | int(11) | YES | | NULL | |
| TOTRIGA | decimal(9,2) | YES | | NULL | |
| DATA | date | NO | | 0000-00-00 | |
VISTACARICOMAGAZZINO
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| codordine | int(11) | NO | | 0 | |
| Quantita | int(11) | YES | | NULL | |
| PrezzoCad | decimal(10,3) | YES | | NULL | |
| codArticolo | varchar(13) | YES | | NULL | |
| Data | date | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+

If your tables don't have indexes (and it doesn't look like they do), then that's the first thing you need to fix. If you do that right (ie put the indexes on the fields that need to be indexed), it will probably solve the problem for you in one hit.
The fields you need to consider indexing are the ones being used by the WHERE and GROUP BY clauses.
Next, consider converting it from a nested SELECT query into a JOIN query. This will probably give you better performance too.
Finally, you haven't stated just how much data is being collated here, but if it's a large amount of data, then consider storing the collated data totals separately within the database so that you can just query it directly rather than having to re-generate all those sums and groups every time. This obviously has it's own considerations (additional storage, additional code when updating data to also update the totals, possibility of things going out of sync, etc), but if you're really suffering with performance on this, it is a valid solution.

I think you could try to remove the subquery from select clause. Modify the query to something like (untested, just to get the idea),
SELECT c.CODARTIOLO,c.NOMEARTICOLO,SUM(c.QUANTITA) AS QUANTITA,(SUM(c.TOTRIGA)/SUM(c.QUANTITA)) AS TOTALE,(SUM(cm.QUANTITA * cm.PREZZOCAD))/SUM(cm.QUANTITA)
FROM vistascontrini c,
vistacaricomagazzino cm
WHERE cm.codarticolo=c.CODARTIOLO
AND cm.DATA <= '$dataStart'
AND c.DATA >= '$dataStart' AND c.DATA <= '$dataEnd' group by NOMEARTICOLO
Also add indexes on DATA column of tables vistascontrini and vistacaricomagazzino .

Add indexes on vistascontrini.DATA, vistascontrini.NOMEARTICOLO, vistacaricomagazzino.DATA, vistacaricomagazzino.codarticolo
It's not clear do you want to group by CODARTIOLO or not and what value do you need if your group only by NOMEARTICOLO ?
Try to use following query equivalent to yours:
SELECT T.*,T1.PREZZOMEDIO FROM
(
SELECT CODARTIOLO,NOMEARTICOLO,
SUM(QUANTITA) AS QUANTITA,
(SUM(TOTRIGA)/SUM(QUANTITA)) AS TOTALE,
FROM vistascontrini c
WHERE c.DATA >= '$dataStart' AND c.DATA <= '$dataEnd'
group by NOMEARTICOLO
) AS T
LEFT JOIN
( SELECT CODARTIOLO,(SUM(QUANTITA * PREZZOCAD))/SUM(QUANTITA) as PREZZOMEDIO
FROM vistacaricomagazzino cm
WHERE cm.DATA <= '$dataStart'
GROUP BY CODARTIOLO ) as T1
ON T.CODARTIOLO=T1.CODARTIOLO

Related

MySql LEFT OUTER JOIN causing duplicate rows

Im running a query to grab the first 10 profiles (think of them as an article that shows when a shop opens and holds information about that shop). I'm using the OUTER JOIN to select * images that belong to the profile PK.
Im running the following query, the main part I'm trying to focus on is the JOIN. I won't post the whole query as it's just a whole bunch of 'table'.'colname' = 'table.colname'.
But here is where the magic happens during my outer join.
LEFT JOIN `content_image` AS `image` ON `profile`.`content_ptr_id` = `image`.`content_id`
Full Query:
I've formatted like this so everyone can see the query without scrolling endlessly to the right.
select `profile`.`content_ptr_id` AS `profile.content_ptr_id`,
`profile`.`body` AS `profile.body`,
`profile`.`web_site` AS `profile.web_site`,
`profile`.`email` AS `profile.email`,
`profile`.`hours` AS `profile.hours`,
`profile`.`price_range` AS `profile.price_range`,
`profile`.`price_range_high` AS `profile.price_range_high`,
`profile`.`primary_category_id` AS `profile.primary_category_id`,
`profile`.`business_contact_email` AS `profile.business_contact_email`,
`profile`.`business_contact_phone` AS `profile.business_contact_phone`,
`profile`.`show_in_directory` AS `profile.show_in_directory`,
`image`.`id` AS `image.id`,
`image`.`content_id` AS `image.content_id`,
`image`.`type` AS `image.type`,
`image`.`order` AS `image.order`,
`image`.`caption` AS `image.caption`,
`image`.`author_id` AS `image.author_id`,
`image`.`image` AS `image.image`,
`image`.`link_url` AS `image.link_url`
FROM content_profile AS profile
LEFT JOIN `content_image` AS `image` ON `profile`.`content_ptr_id` = `image`.`content_id`
GROUP BY profile.content_ptr_id
LIMIT 10, 12
Is there a way I can group my results per profile? E.g all images will show in the one profile result? I can't use group by as I'm getting an error
Error: ER_WRONG_FIELD_WITH_GROUP: Expression #12 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'broadsheet.image.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by]
code: 'ER_WRONG_FIELD_WITH_GROUP',
errno: 1055,
sqlState: '42000',
index: 0 }
Is there a possible way around this group by error or another query I could run?
Tables:
content_image
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| content_id | int(11) | NO | MUL | NULL | |
| type | varchar(255) | NO | | NULL | |
| order | int(11) | NO | | NULL | |
| caption | longtext | NO | | NULL | |
| author_id | int(11) | YES | MUL | NULL | |
| image | varchar(255) | YES | | NULL | |
| link_url | varchar(200) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
content_profile
+------------------------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+----------------------+------+-----+---------+-------+
| content_ptr_id | int(11) | NO | PRI | NULL | |
| body | longtext | NO | | NULL | |
| web_site | varchar(200) | NO | | NULL | |
| email | varchar(75) | NO | | NULL | |
| menu | longtext | NO | | NULL | |
| hours | longtext | NO | | NULL | |
| price_range | smallint(5) unsigned | YES | MUL | NULL | |
| price_range_high | smallint(5) unsigned | YES | | NULL | |
| primary_category_id | int(11) | NO | | NULL | |
| business_contact_name | varchar(255) | NO | | NULL | |
| business_contact_email | varchar(75) | NO | | NULL | |
| business_contact_phone | varchar(20) | NO | | NULL | |
| show_in_directory | tinyint(1) | NO | | NULL | |
+------------------------+----------------------+------+-----+---------+-------+
From reading your question, I think you don't have a grasp of how the GROUP BY clause works.
So the short summary of my answer is: learn the fundamentals of the GROUP BY clause.
I will use only a small number of columns to make the explanation easier.
The first problem with your query is that you are not using the group by clause properly - when using a group by clause, all columns that are selected must be either in the group by clause OR be selected with an aggregate function.
Lets suppose these are the only columns you are selecting:
profile.content_ptr_id
profile.body
profile.web_site
image.id
image.content_id
And the query looked like this:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
GROUP BY `profile.content_ptr_id`
This query will error out as you did not specify how you want to consolidate multiple rows to one row for profile.body, profile.web_site, image.id, image.content_id. The database does not know how you want to consolidate the other columns as you can group, or use aggregate functions such as min(), max(), count(), etc.
So one solution to fix the error raised in the query above would be the following:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
Here, I put all the columns in the group by clause which makes the query group and select all the unique combinations of profile.content_ptr_id, profile.body, profile.web_site, image.id, image.content_id columns.
Following is an example query which does not have all the columns included in the group by clause:
Lets say, you want to find out how many images there are for each of the profiles. You can use a query such as the following:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, COUNT(`image.id`)
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`
This query lets you find out how many images there are for every unique combination of profile.content_ptr_id, profile.body, profile.web_site columns.
Be aware that in my previous two examples, all the columns that are selected are either included in the group by clause or are selected with an aggregate function. This is a rule all queries need to follow when using the group by clause, otherwise an error will be raised by the database.
Now, lets get onto answering your question:
"Is there a way I can group my results per profile? E.g all images will show in the one profile result?"
I will use the following mock data to explain:
profile
+----------------+--------------+---------------+
| content_ptr_id | body | web_site |
+----------------+--------------+---------------+
| 100 | body1 | web1 |
+----------------+--------------+---------------+
image
+--------+-------------+
| id | content_id |
+--------+-------------+
| iid1 | 100 |
| iid2 | 100 |
+--------+-------------+
Following would be what the result would look like if you don't do a join:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
+----------------+--------------+---------------+--------+-------------+
| content_ptr_id | body | web_site | id | content_id |
+----------------+--------------+---------------+--------+-------------+
| 100 | body1 | web1 | iid1 | 100 |
| 100 | body1 | web1 | iid2 | 100 |
+----------------+--------------+---------------+--------+-------------+
You can't achieve your objective of grouping your results per profile (combining to only show one line per profile) by grouping by all the columns as the result will be the same:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
will return
+----------------+--------------+---------------+--------+-------------+
| content_ptr_id | body | web_site | id | content_id |
+----------------+--------------+---------------+--------+-------------+
| 100 | body1 | web1 | iid1 | 100 |
| 100 | body1 | web1 | iid2 | 100 |
+----------------+--------------+---------------+--------+-------------+
The question you need to answer is how you want to display the non-unique columns you want to combine - in this case image.id. You can use count, but this will only return you a number. If you want to display all the text, you can use GROUP_CONCAT() which will concatenate all the values delimited by comma by default. If you use GROUP_CONCAT() the result will look like the following:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, GROUP_CONCAT(`image.id`), GROUP_CONCAT(`image.content_id`)
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`
This query will return:
+----------------+--------------+---------------+--------------------+-------------+
| content_ptr_id | body | web_site | GROUP_CONCAT(id) | content_id |
+----------------+--------------+---------------+--------------------+-------------+
| 100 | body1 | web1 | iid1,iid2 | 100 |
+----------------+--------------+---------------+--------------------+-------------+
If GROUP_CONCAT() is what you want to use for all the image columns, then go ahead, but doing this for many columns consolidating many rows may make the table less readable. But either way, I would suggest you read some articles to familiarise yourself with how the GROUP BY clause works.
Remove the GROUP BY clause.
I suspect you didn't want to do a GROUP BY operation, given that the expression in the group by is the PRIMARY KEY of the content_profile table.
What is up with all the single quotes? Those are used to enclose string literals, not identifiers.
Thank for sparing us from "scrolling endlessly to the right".
Are you aware that spaces and linebreaks can be included in the SQL text, without altering the meaning of the statement? The parser can easily deal with extra whitespace, and adding the extra whitespace to format the statement can make it much easier for a human reader to decipher.
It's not at all clear why the statement is skipping over the first ten rows, and then returning the next twelve. Very strange.
SELECT p.content_ptr_id AS `profile.content_ptr_id`
, p.body AS `profile.body`
, p.web_site AS `profile.web_site`
, p.email AS `profile.email`
, p.hours AS `profile.hours`
, p.price_range AS `profile.price_range`
, p.price_range_high AS `profile.price_range_high`
, p.primary_category_id AS `profile.primary_category_id`
, p.business_contact_email AS `profile.business_contact_email`
, p.business_contact_phone AS `profile.business_contact_phone`
, p.show_in_directory AS `profile.show_in_directory`
, i.id AS `image.id`
, i.content_id AS `image.content_id`
, i.type AS `image.type`
, i.order AS `image.order`
, i.caption AS `image.caption`
, i.author_id AS `image.author_id`
, i.image AS `image.image`
, i.link_url AS `image.link_url`
FROM `content_profile` p
LEFT
JOIN `content_image` i
ON i.content_id = p.content_ptr_id
ORDER
BY p.content_ptr_id
, i.id
Because content_id is not unique in the content_image table, duplicate rows from content_profile are the expected result.
If your code can't handle the "duplicate" rows, i.e. identifying when the row that was just fetched has the same value for content_ptr_id as the previous row, then your SQL shouldn't do a join operation that creates the duplicated values.

Query on two tables for one report (Advanced)

I'm having some trouble with an advanced SQL query, and it's been a long time since I've worked with SQL databases. We use MySQL.
Background:
We will be working with two tables:
"Transactions Table"
table: expire_history
+---------------+-----------------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------------+------+-----+-------------------+-------+
| m_id | int(11) | NO | PRI | 0 | |
| m_a_ordinal | int(11) | NO | PRI | 0 | |
| a_expired_date| datetime | NO | PRI | | |
| a_state | enum('EXPIRED','UNEXPIRED') | YES | | NULL | |
| t_note | text | YES | | NULL | |
| t_updated_by | varchar(40) | NO | | | |
| t_last_update | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------------+-----------------------------+------+-----+-------------------+-------+
"Information Table"
table: information
+---------------------+---------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------------------+-------+
| m_id | int(11) | NO | PRI | 0 | |
| m_a_ordinal | int(11) | NO | PRI | 0 | |
| a_type | varchar(15) | YES | MUL | NULL | |
| a_class | varchar(15) | YES | MUL | NULL | |
| a_state | varchar(15) | YES | MUL | NULL | |
| a_publish_date | datetime | YES | | NULL | |
| a_expire_date | date | YES | | NULL | |
| a_updated_by | varchar(20) | NO | | | |
| a_last_update | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------------------+---------------+------+-----+---------------------+-------+
We have a set of fields in one table that describe the record. Each record is comprised of a m_id (the person) and an ordinal (a person can have multiple records). So for instance, my m_id could be 1, and i could have multiple ordinals, (1, 2, 3, 4, etc), each with their own individual set of data. The m_id and the m_a_ordinal comprise a composite key in the "information" table, and the m_id, m_a_ordinal, and a_expired_date fields in the "transactions" table comprises a composite key as well.
Essentially when we expire a record, the a_state field in the information table is updated to expired. At the same time, a record is created in the transactions table with the m_id, m_a_ordinal, and a_expired_date. We've found in the past that people get impatient and can click a button twice, so through some previous help I've managed to narrow down the most recent transaction for each expired record using the following query:
SELECT e1.m_id, e1.m_a_ordinal, e1.a_expired_date, e1.t_note, e1.t_updated_by
FROM expire_history e1
INNER JOIN (SELECT m_id, m_a_ordinal, MAX(a_expired_date) AS a_expired_date
FROM expire_history GROUP BY m_id, m_a_ordinal) e2
ON (e2.m_id = e1.m_id AND e2.m_a_ordinal = e1.m_a_ordinal AND e2.a_expired_date = e1.a_expired_date)
WHERE e2.a_expired_date > '2008-05-15 00:00:00' ORDER BY a_date_expired;
Seems simple enough, right?
Let's add some complexity. Each record in the "information" table has a "natural expiration date" as well. The original developer of our software, however, didn't code it to change the state of the record to "expired" once it's reached it's natural expiration date. It also does not write a transaction to the transaction table once it's expired (which I understand because this is only to keep records of ones that were expired by a person, as opposed to automagically). Also, when a record is expired manually, the original expiration date does not change. This is why this is so complicated :P~~.
Essentially I need to build a report that shows all aspects of expiration, whether it was expired manually, or naturally.
This report should take the data from the query above, and combines it with another query on the "information table" that says if a_expire_date <= CURDATE show record, except if record exisits in (query above from expire_history), then show record from (query on expire_history).
a rough structure of the raw logic is as follows:
for x in record_total
if (m_id m_a_ordinal) exists in expire_history
display m_id, m_a_ordinal, a_expired_date, a_state)
else if (m_id_a_ordinal) exists in information AND a_expire_date <= CURDATE
display (m_id, m_a_ordinal, a_expire_date, a_state)
end if
x++
I hope that this is concise enough.
Thanks for any help you can provide!
SELECT i.m_id, I.m_a_ordinal,
coalesce(e1.a_expired_date, I.A_Expire_Date) as Expire_DT,
coalesce(e1.t_note,'insert related item column'),
coalesce(e1.t_updated_by, I.A_Updated_by) as Updated_By
FROM Information I
LEFT JOIN expire_history e1
ON E1.M_ID = I.M_ID
AND I.m_a_ordinal=e1.M_a_ordinal
INNER JOIN
(SELECT m_id, m_a_ordinal, MAX(a_expired_date) AS a_expired_date
FROM expire_history GROUP BY m_id, m_a_ordinal) e2
ON (e2.m_id = e1.m_id
AND e2.m_a_ordinal = e1.m_a_ordinal
AND e2.a_expired_date = e1.a_expired_date)
WHERE coalesce(e2.a_expired_date,i.A_Expire_Date) > '2008-05-15 00:00:00'
ORDER BY a_date_expired;
Syntax may be off a bit don't ahve time to test; but you can get the gist of it from this I hope:
Again what coalesce does is simply return the first NON-null value in a series of values. If you're only dealing with two NULLIF may work as well.

JOIN query is far too slow. Won't use INDEX?

I have a transitional table that I temporarily fill with some values before querying it and destroying it.
CREATE TABLE SearchListA(
`pTime` int unsigned NOT NULL ,
`STD` double unsigned NOT NULL,
`STD_Pos` int unsigned NOT NULL,
`SearchEnd` int unsigned NOT NULL,
UNIQUE INDEX (`pTime`,`STD` ASC) USING BTREE
) ENGINE = MEMORY;
It looks as such:
+------------+------------+---------+------------+
| pTime | STD | STD_Pos | SearchEnd |
+------------+------------+---------+------------+
| 1105715400 | 1.58474499 | 0 | 1105723200 |
| 1106297700 | 2.5997839 | 0 | 1106544000 |
| 1107440400 | 2.04860375 | 0 | 1107440700 |
| 1107440700 | 1.58864998 | 0 | 1107467400 |
| 1107467400 | 1.55207218 | 0 | 1107790500 |
| 1107790500 | 2.04239417 | 0 | 1108022100 |
| 1108022100 | 1.61385678 | 0 | 1108128000 |
| 1108771500 | 1.58835083 | 0 | 1108771800 |
| 1108771800 | 1.65734727 | 0 | 1108772100 |
| 1108772100 | 2.09378189 | 0 | 1109027700 |
+------------+------------+---------+------------+
Only columns pTime and SearchEnd are relevant to my problem.
My intention is to use this table to speed up searching through a much larger, static table.
The first column, pTime, is where the search should start
The fourth column, SearchEnd, is where the search should end
The larger table is similar; it looks like this:
CREATE TABLE `b50d1_abs` (
`pTime` int(10) unsigned NOT NULL,
`Slope` double NOT NULL,
`STD` double NOT NULL,
`Slope_Pos` int(11) NOT NULL,
`STD_Pos` int(11) NOT NULL,
PRIMARY KEY (`pTime`),
KEY `Slope` (`Slope`) USING BTREE,
KEY `STD` (`STD`),
KEY `ID1` (`pTime`,`STD`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED;
+------------+-------------+------------+-----------+---------+
| pTime | Slope | STD | Slope_Pos | STD_Pos |
+------------+-------------+------------+-----------+---------+
| 1107309300 | 1.63257919 | 1.39241698 | 0 | 1 |
| 1107314400 | 6.8959276 | 0.22425643 | 1 | 1 |
| 1107323100 | 18.19909502 | 1.46854808 | 1 | 0 |
| 1107335400 | 2.50135747 | 0.4736305 | 0 | 0 |
| 1107362100 | 4.28778281 | 0.85576985 | 0 | 1 |
| 1107363300 | 6.96289593 | 1.41299044 | 0 | 0 |
| 1107363900 | 8.10316742 | 0.2859726 | 0 | 0 |
| 1107367500 | 16.62443439 | 0.61587645 | 0 | 0 |
| 1107368400 | 19.37918552 | 1.18746968 | 0 | 0 |
| 1107369300 | 21.94570136 | 0.94261744 | 0 | 0 |
| 1107371400 | 25.85701357 | 0.2741292 | 0 | 1 |
| 1107375300 | 21.98914027 | 1.59521158 | 0 | 1 |
| 1107375600 | 20.80542986 | 1.59231289 | 0 | 1 |
| 1107375900 | 19.62714932 | 1.50661679 | 0 | 1 |
| 1107381900 | 8.23167421 | 0.98048205 | 1 | 1 |
| 1107383400 | 10.68778281 | 1.41607579 | 1 | 0 |
+------------+-------------+------------+-----------+---------+
...etc (439340 rows)
Here, the columns pTime, STD, and STD_Pos are relevant to my problem.
For every element in the smaller table (SearchListA), I need to search the specified range within the larger table (b50d1_abs()) and return the row with the lowest b50d1_abs.pTime that is higher than the current SearchListA.pTime and that also matches the following conditions:
SearchListA.STD < b50d1_abs.STD AND SearchListA.STD_Pos <> b50d1_abs.STD_Pos
AND
b50d1_abs.pTime < SearchListA.SearchEnd
The latter condition is simply to reduce the length of the search.
This seems to me like a pretty straightforward query that should be able to use indexes; especially since all values are unsigned numbers - But I cannot get it to execute nearly fast enough! I think it is because it rebuilds the entire table each time instead of just omitting values from it.
I would be extremely grateful if someone takes a look at my code and figures out a more efficient way to go about this:
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
mu.pTime AS CloseTime
FROM
SearchListA m
JOIN b50d1_abs mu ON mu.pTime =(
SELECT
md.pTime
FROM
b50d1_abs as md
WHERE
md.pTime > m.pTime
AND md.pTime <=m.SearchEnd
AND m.STD < md.STD AND m.STD_Pos <> md.STD_Pos
LIMIT 1
);
Here is my EXPLAIN EXTENDED statement:
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
| 1 | PRIMARY | m | ALL | NULL | NULL | NULL | NULL | 365 | 100.00 | |
| 1 | PRIMARY | mu | eq_ref | PRIMARY,ID1 | PRIMARY | 4 | func | 1 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | md | ALL | PRIMARY,STD,ID1 | NULL | NULL | NULL | 439340 | 100.00 | Using where |
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
It looks like the lengthiest query (#2) doesn't use indexes at all!
If I try FORCE INDEX then it will list it under possible_keys, but still list NULL under Key and still take an extremely long time (over 80 seconds).
I need to get this query under 10 second; and even 10 is too long.
Your subquery is a dependent subquery, so the best case is that it's going to be evaluated once for every row in table m. Since m contains few rows, that would be OK.
But if you put that subquery in a JOIN condition, it is going to be executed (rows in m)*(rows in mu) times, no matter what.
Note that your results may be incorrect since :
return the row with the lowest b50d1_abs.pTime
but you don't specify that anywhere.
Try this query :
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
(
SELECT min( big.pTime )
FROM b50d1_abs as big
WHERE big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
) AS CloseTime
FROM SearchListA m
or this one :
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
min( big.pTime )
FROM
SearchListA m
JOIN b50d1_abs as big ON (
big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
)
GROUP BY m.pTime
(if you also want rows where the search was unsuccessful, make that a LEFT JOIN).
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
(
SELECT big.pTime
FROM b50d1_abs as big
WHERE big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
ORDER BY big.pTime LIMIT 1
) AS CloseTime
FROM SearchListA m
(Try an index on b50d1_abs( pTime, STD, STD_Pos)
FYI here are some tests using Postgres on a test data set that should look like yours (maybe remotely, lol)
CREATE TABLE small (
pTime INT PRIMARY KEY,
STD FLOAT NOT NULL,
STD_POS BOOL NOT NULL,
SearchEnd INT NOT NULL
);
CREATE TABLE big(
pTime INTEGER PRIMARY KEY,
Slope FLOAT NOT NULL,
STD FLOAT NOT NULL,
Slope_Pos BOOL NOT NULL,
STD_POS BOOL NOT NULL
);
INSERT INTO small SELECT
n*100000,
random(),
random()<0.1,
n*100000+random()*50000
FROM generate_series( 1, 365 ) n;
INSERT INTO big SELECT
n*100,
random(),
random(),
random() > 0.5,
random() > 0.5
FROM generate_series( 1, 500000 ) n;
Query 1 : 6.90 ms (yes milliseconds)
Query 2 : 48.20 ms
Query 3 : 6.46 ms
I'll start a new answer cause it starts to look like a mess ;)
With your data I get, using MySQL 5.1.41
Query 1 : takes forever, Ctrl-C
Query 2 : 520 ms
Query 3 : takes forever, Ctrl-C
Explain for 2 looks good :
+----+-------------+-------+------+---------------------+------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------+------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | m | ALL | PRIMARY,STD,ID1,ID2 | NULL | NULL | NULL | 743 | Using temporary; Using filesort |
| 1 | SIMPLE | big | ALL | PRIMARY,ID1,ID2 | NULL | NULL | NULL | 439340 | Range checked for each record (index map: 0x7) |
+----+-------------+-------+------+---------------------+------+---------+------+--------+------------------------------------------------+
So, I loaded your data into postgres...
Query 1 : 14.8 ms
Query 2 : 100 ms
Query 3 : 14.8 ms (same plan as 1)
In fact rewriting 2 as query 1 (or 3) fixes a little optimizer shortcoming and finds the optimal query plan for this scenario.
Would you recommend using Postgres over MySql for this scenario?
Speed is extremely important to me.
Well, I don't know why mysql barfs so much on queries 1 and 3 (which are pretty simple and easy), in fact it should even beat postgres (using an index only scan) but apparently not, eh. You should ask a mysql specialist !
I'm more used to postgres... got fed up with mysql a long time ago ! If you need complex queries postgres usually wins big time (but you'll need to re-learn how to optimize and tune your new database)...

Fast complex query to select bookings

I'm trying to write a query to get a courses information and the number of bookings and attendees. Each course can have many bookings and each booking can have many attendees.
We already have a working report, but it uses multiple queries to get the required information. One to get the courses, one to get the bookings, and one to get the number of attendees. This is very slow because of the size that the database has grown to.
There are a number of extra conditions for the reports:
Bookings must be made more than 5
minutes ago, or have been confirmed
The booking must not be canceled
The course must not be marked as deleted
The courses venue and location must be LIKE a search string
Courses with no bookings must appear in the results
This is the table structure: (I've omitted the unneeded information. All fields are not null and have no default)
mysql> DESCRIBE first_aid_courses;
+------------------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+------------------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| course_date | date | | |
| region_id | int(11) | | |
| location | varchar(255) | | |
| venue | varchar(255) | | |
| number_of_spaces | int(11) | | |
| deleted | tinyint(1) | | |
+------------------+--------------+-----+----------------+
mysql> DESCRIBE first_aid_bookings;
+-----------------------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+-----------------------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| first_aid_course_id | int(11) | | |
| placed | datetime | | |
| confirmed | tinyint(1) | | |
| cancelled | tinyint(1) | | |
+-----------------------+--------------+-----+----------------+
mysql> DESCRIBE first_aid_attendees;
+----------------------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+----------------------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| first_aid_booking_id | int(11) | | |
+----------------------+--------------+-----+----------------+
mysql> DESCRIBE regions;
+----------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+----------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| name | varchar(255) | | |
+----------+--------------+-----+----------------+
I need to select the following:
Course ID: first_aid_courses.id
Date: first_aid_courses.course_date
Region regions.name
Location: first_aid_courses.location
Bookings: COUNT(first_aid_bookings)
Attendees: COUNT(first_aid_attendees)
Spaces Remaining: COUNT(first_aid_bookings) - COUNT(first_aid_attendees)
This is what I have so far:
SELECT `first_aid_courses`.*,
COUNT(`first_aid_bookings`.`id`) AS `bookings`,
COUNT(`first_aid_attendees`.`id`) AS `attendees`
FROM `first_aid_courses`
LEFT JOIN `first_aid_bookings`
ON `first_aid_courses`.`id` =
`first_aid_bookings`.`first_aid_course_id`
LEFT JOIN `first_aid_attendees`
ON `first_aid_bookings`.`id` =
`first_aid_attendees`.`first_aid_booking_id`
WHERE ( `first_aid_courses`.`location` LIKE '%$search_string%'
OR `first_aid_courses`.`venue` LIKE '%$search_string%' )
AND `first_aid_courses`.`deleted` = 0
AND ( `first_aid_bookings`.`placed` > '$five_minutes_ago'
AND `first_aid_bookings`.`cancelled` = 0
OR `first_aid_bookings`.`confirmed` = 1 )
GROUP BY `first_aid_courses`.`id`
ORDER BY `course_date` DESC
Its not quite working, can any one help me with writing the correct query? Also there are 1000s of rows in this database, so any help on making it fast is appreciated (like which fields to index).
Ok, Ive answered my own question. Sometimes it helps to ask a question for you to figure out the answer.
SELECT `first_aid_courses`.*,
`regions`.`name` AS `region_name`,
COUNT(DISTINCT `first_aid_bookings`.`id`) AS `bookings`,
COUNT(`first_aid_attendees`.`id`) AS `attendees`
FROM `first_aid_courses`
JOIN `regions`
ON `first_aid_courses`.`region_id` = `regions`.`id`
LEFT JOIN `first_aid_bookings`
ON `first_aid_courses`.`id` =
`first_aid_bookings`.`first_aid_course_id`
LEFT JOIN `first_aid_attendees`
ON `first_aid_bookings`.`id` =
`first_aid_attendees`.`first_aid_booking_id`
WHERE ( `first_aid_courses`.`location` LIKE '%$search_string%'
OR `first_aid_courses`.`venue` LIKE '%$search_string%' )
AND `first_aid_courses`.`deleted` = 0
AND ( `first_aid_bookings`.`cancelled` = 0
AND `first_aid_bookings`.`confirmed` = 1 )
GROUP BY `first_aid_courses`.`id`
ORDER BY `course_date` ASC
This is completely untested, but maybe try selecting a count of non-null rows for bookings and attendees, like this:
SUM(IF(`first_aid_bookings`.`id` IS NOT NULL, 1, 0)) AS `bookings`,
COUNT(IF(`first_aid_attendees`.`id` IS NOT NULL, 1, 0)) AS `attendees`
Unless you have it but just do not show it, have a good look on indexes, without them you loose an order of magnitude on performance on any query that references anything but primary key.
Another major performance hit are the LIKE '%nnn%'.
Would it be possible to do something with those?
But with some good indexes, this query should be fine if you have the hardware to back it up.
I have queries doing LIKE on tables with millions of rows. its not a problem if the rest of the query can eliminate any unnecessary matchings.
You could go for subqueries to lessen the scope for the LIKE queries.

Big SQL SELECT performance difference when using <= against using < on a DATETIME column

Given the following table:
desc exchange_rates;
+------------------+----------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| time | datetime | NO | MUL | NULL | |
| base_currency | varchar(3) | NO | MUL | NULL | |
| counter_currency | varchar(3) | NO | MUL | NULL | |
| rate | decimal(32,16) | NO | | NULL | |
+------------------+----------------+------+-----+---------+----------------+
I have added indexes on time, base_currency and counter_currency, as well as a composite index on (time, base_currency, counter_currency), but I'm seeing a big performance difference when I perform a SELECT using <= against using <.
The first SELECT is:
ExchangeRate Load (95.5ms)
SELECT * FROM `exchange_rates` WHERE (time <= '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1
As you can see this is taking 95ms.
If I change the query such that I compare time using < rather than <= I see this:
ExchangeRate Load (0.8ms)
SELECT * FROM `exchange_rates` WHERE (time < '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1
Now it takes less than 1 millisecond, which sounds right to me. Is there a rational explanation for this behaviour?
The output from EXPLAIN provides further details, but I'm not 100% sure how to intepret this:
-- Output from the first, slow, select
SIMPLE | 5,5 | exchange_rates | 1 | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | index_merge | Using intersect(index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency); Using where | 813 | | index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency
-- Output from the second, fast, select
SIMPLE | 5 | exchange_rates | 1 | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | ref | Using where | 4988 | const | index_exchange_rates_on_counter_currency
(Note: I'm producing these queries through ActiveRecord (in a Rails app) but these are ultimately the queries which are being executed)
In the first case, MySQL tries to combine results from all indexes. It fetches all records from both indexes and joins them on the value of the row pointer (table offset in MyISAM, PRIMARY KEY in InnoDB).
In the second case, it just uses a single index, which, considering LIMIT 1, is the best decision.
You need to create a composite index on (base_currency, counter_currency, time) (in this order) for this query to work as fast as possible.
The engine will use the index for filtering on the leading columns (base_currency, counter_currency) and for ordering on the trailing column (time).
It also seems you want to add something like ORDER BY time DESC to your query to get the last exchange rate.
In general, any LIMIT without ORDER BY should ring the bell.