I have an SQL query. Is it possible to change somehow this query, so it has better performance, but with the same result? This query is working, but it is very slow, and I don't have an idea on improving its performance.
SELECT keyword, query
FROM url_alias ua
JOIN product p
on (p.manufacturer_id =
CONVERT(SUBSTRING_INDEX(ua.query,'=',-1),UNSIGNED INTEGER))
JOIN oc_product_to_storebk ps
on (p.product_id = ps.product_id)
AND ua.query LIKE 'manufacturer_id=%'
AND ps.store_id= '9'
GROUP BY ua.keyword
Table structure:
URL_ALIAS
+-----------------------------------------------+
| url_alias_id | query | keyword |
+--------------+---------------------+----------+
| 1 | manufacturer_id=100 | test |
+--------------+---------------------+----------+
PRODUCT
+-----------------+------------+
| manufacturer_id | product_id |
+-----------------+------------+
| 100 | 1000 |
+-----------------+------------+
OC_PRODUCT_TO_STOREBK
+------------+----------+
| product_id | store_id |
+------------+----------+
| 1000 | 9 |
+------------+----------+
I want all the keywords from the url_alias keyword column, when the following condition is met: LIKE 'manufacturer_id=%' AND ps.store_id='9'
You should avoid the convert function as it will be expensive and provides no way you could profit from indexes on the url_alias table.
Extend your url_alias table so it has additional fields for the parts of the query. You will probably hesitate to go this way, but you will not regret it once you have done it. So your url_alias table should look like this:
create table url_alias (
url_alias_id int,
query varchar(200),
keyword varchar(100),
query_key varchar(200),
query_value_str varchar(200),
query_value_int int
);
If you don't want to recreate it, then add the fields as follows:
alter table url_alias add (
query_key varchar(200),
query_value_str varchar(200),
query_value_int int
);
Update these new columns for the existing records with this statement (only to execute once):
update url_alias
set query_key = substring_index(query, '=', 1),
query_value_str = substring_index(query, '=', -1),
query_value_int = nullif(
convert(substring_index(query,'=',-1),unsigned integer), 0);
Then create a trigger so that these 3 extra fields are updated automatically when you insert a new record:
create trigger ins_sum before insert on url_alias
for each row
set new.query_key = substring_index(new.query, '=', 1),
new.query_value_str = substring_index(new.query, '=', -1),
new.query_value_int = nullif(
convert(substring_index(new.query,'=',-1),unsigned integer), 0);
Note the additional nullif() which will make sure the last field is null when the value after the equal sign is not numerical.
If ever you also update such records, then also create a similar update trigger.
With this set-up, you can still insert records like before:
insert into url_alias (url_alias_id, query, keyword)
values (1, 'manufacturer_id=100', 'test');
When you then select this record, you will see this:
+--------------+---------------------+---------+-----------------+-----------------+-----------------+
| url_alias_id | query | keyword | query_key | query_value_str | query_value_int |
+--------------+---------------------+---------+-----------------+-----------------+-----------------+
| 1 | manufacturer_id=100 | test | manufacturer_id | 100 | 100 |
+--------------+---------------------+---------+-----------------+-----------------+-----------------+
Now the work of extraction and conversion has been done once, and does not have to be repeated any more when you select records. You can rewrite your original query like this:
select ua.keyword, ua.query
from url_alias ua
join product p
on p.manufacturer_id = ua.query_value_int
join oc_product_to_storebk ps
on p.product_id = ps.product_id
and ua.query_key = 'manufacturer_id'
and ps.store_id = 9
group by ua.keyword, ua.query
And now you can improve the performance by creating indexes on both elements of the query:
create index query_key on url_alias(query_key, query_value_int, keyword);
You might need to experiment a bit to get the order of fields right in the index before it gets used by the SQL plan.
See this SQL fiddle.
I asume that you use indexes on the store_id, product_id and keyword columns?
Focus on changing your datamodel to avoid the CONVERT and the LIKE operators. Both of them will cause that the query will not utilize indexes on relevant columns
Also, take a good look on the data that is stored in the ua.query colomn. Possibly you need to distribute data in that column to multiple columns so you can use indexes
Related
In a MySQL JOIN, what is the difference between ON and USING()? As far as I can tell, USING() is just more convenient syntax, whereas ON allows a little more flexibility when the column names are not identical. However, that difference is so minor, you'd think they'd just do away with USING().
Is there more to this than meets the eye? If yes, which should I use in a given situation?
It is mostly syntactic sugar, but a couple differences are noteworthy:
ON is the more general of the two. One can join tables ON a column, a set of columns and even a condition. For example:
SELECT * FROM world.City JOIN world.Country ON (City.CountryCode = Country.Code) WHERE ...
USING is useful when both tables share a column of the exact same name on which they join. In this case, one may say:
SELECT ... FROM film JOIN film_actor USING (film_id) WHERE ...
An additional nice treat is that one does not need to fully qualify the joining columns:
SELECT film.title, film_id -- film_id is not prefixed
FROM film
JOIN film_actor USING (film_id)
WHERE ...
To illustrate, to do the above with ON, we would have to write:
SELECT film.title, film.film_id -- film.film_id is required here
FROM film
JOIN film_actor ON (film.film_id = film_actor.film_id)
WHERE ...
Notice the film.film_id qualification in the SELECT clause. It would be invalid to just say film_id since that would make for an ambiguity:
ERROR 1052 (23000): Column 'film_id' in field list is ambiguous
As for select *, the joining column appears in the result set twice with ON while it appears only once with USING:
mysql> create table t(i int);insert t select 1;create table t2 select*from t;
Query OK, 0 rows affected (0.11 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
Query OK, 1 row affected (0.19 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> select*from t join t2 on t.i=t2.i;
+------+------+
| i | i |
+------+------+
| 1 | 1 |
+------+------+
1 row in set (0.00 sec)
mysql> select*from t join t2 using(i);
+------+
| i |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
mysql>
Thought I would chip in here with when I have found ON to be more useful than USING. It is when OUTER joins are introduced into queries.
ON benefits from allowing the results set of the table that a query is OUTER joining onto to be restricted while maintaining the OUTER join. Attempting to restrict the results set through specifying a WHERE clause will, effectively, change the OUTER join into an INNER join.
Granted this may be a relative corner case. Worth putting out there though.....
For example:
CREATE TABLE country (
countryId int(10) unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
country varchar(50) not null,
UNIQUE KEY countryUIdx1 (country)
) ENGINE=InnoDB;
insert into country(country) values ("France");
insert into country(country) values ("China");
insert into country(country) values ("USA");
insert into country(country) values ("Italy");
insert into country(country) values ("UK");
insert into country(country) values ("Monaco");
CREATE TABLE city (
cityId int(10) unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
countryId int(10) unsigned not null,
city varchar(50) not null,
hasAirport boolean not null default true,
UNIQUE KEY cityUIdx1 (countryId,city),
CONSTRAINT city_country_fk1 FOREIGN KEY (countryId) REFERENCES country (countryId)
) ENGINE=InnoDB;
insert into city (countryId,city,hasAirport) values (1,"Paris",true);
insert into city (countryId,city,hasAirport) values (2,"Bejing",true);
insert into city (countryId,city,hasAirport) values (3,"New York",true);
insert into city (countryId,city,hasAirport) values (4,"Napoli",true);
insert into city (countryId,city,hasAirport) values (5,"Manchester",true);
insert into city (countryId,city,hasAirport) values (5,"Birmingham",false);
insert into city (countryId,city,hasAirport) values (3,"Cincinatti",false);
insert into city (countryId,city,hasAirport) values (6,"Monaco",false);
-- Gah. Left outer join is now effectively an inner join
-- because of the where predicate
select *
from country left join city using (countryId)
where hasAirport
;
-- Hooray! I can see Monaco again thanks to
-- moving my predicate into the ON
select *
from country co left join city ci on (co.countryId=ci.countryId and ci.hasAirport)
;
Wikipedia has the following information about USING:
The USING construct is more than mere syntactic sugar, however, since
the result set differs from the result set of the version with the
explicit predicate. Specifically, any columns mentioned in the USING
list will appear only once, with an unqualified name, rather than once
for each table in the join. In the case above, there will be a single
DepartmentID column and no employee.DepartmentID or
department.DepartmentID.
Tables that it was talking about:
The Postgres documentation also defines them pretty well:
The ON clause is the most general kind of join condition: it takes a
Boolean value expression of the same kind as is used in a WHERE
clause. A pair of rows from T1 and T2 match if the ON expression
evaluates to true.
The USING clause is a shorthand that allows you to take advantage of
the specific situation where both sides of the join use the same name
for the joining column(s). It takes a comma-separated list of the
shared column names and forms a join condition that includes an
equality comparison for each one. For example, joining T1 and T2 with
USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b =
T2.b.
Furthermore, the output of JOIN USING suppresses redundant columns:
there is no need to print both of the matched columns, since they must
have equal values. While JOIN ON produces all columns from T1 followed
by all columns from T2, JOIN USING produces one output column for each
of the listed column pairs (in the listed order), followed by any
remaining columns from T1, followed by any remaining columns from T2.
Database tables
To demonstrate how the USING and ON clauses work, let's assume we have the following post and post_comment database tables, which form a one-to-many table relationship via the post_id Foreign Key column in the post_comment table referencing the post_id Primary Key column in the post table:
The parent post table has 3 rows:
| post_id | title |
|---------|-----------|
| 1 | Java |
| 2 | Hibernate |
| 3 | JPA |
and the post_comment child table has the 3 records:
| post_comment_id | review | post_id |
|-----------------|-----------|---------|
| 1 | Good | 1 |
| 2 | Excellent | 1 |
| 3 | Awesome | 2 |
The JOIN ON clause using a custom projection
Traditionally, when writing an INNER JOIN or LEFT JOIN query, we happen to use the ON clause to define the join condition.
For example, to get the comments along with their associated post title and identifier, we can use the following SQL projection query:
SELECT
post.post_id,
title,
review
FROM post
INNER JOIN post_comment ON post.post_id = post_comment.post_id
ORDER BY post.post_id, post_comment_id
And, we get back the following result set:
| post_id | title | review |
|---------|-----------|-----------|
| 1 | Java | Good |
| 1 | Java | Excellent |
| 2 | Hibernate | Awesome |
The JOIN USING clause using a custom projection
When the Foreign Key column and the column it references have the same name, we can use the USING clause, like in the following example:
SELECT
post_id,
title,
review
FROM post
INNER JOIN post_comment USING(post_id)
ORDER BY post_id, post_comment_id
And, the result set for this particular query is identical to the previous SQL query that used the ON clause:
| post_id | title | review |
|---------|-----------|-----------|
| 1 | Java | Good |
| 1 | Java | Excellent |
| 2 | Hibernate | Awesome |
The USING clause works for Oracle, PostgreSQL, MySQL, and MariaDB. SQL Server doesn't support the USING clause, so you need to use the ON clause instead.
The USING clause can be used with INNER, LEFT, RIGHT, and FULL JOIN statements.
SQL JOIN ON clause with SELECT *
Now, if we change the previous ON clause query to select all columns using SELECT *:
SELECT *
FROM post
INNER JOIN post_comment ON post.post_id = post_comment.post_id
ORDER BY post.post_id, post_comment_id
We are going to get the following result set:
| post_id | title | post_comment_id | review | post_id |
|---------|-----------|-----------------|-----------|---------|
| 1 | Java | 1 | Good | 1 |
| 1 | Java | 2 | Excellent | 1 |
| 2 | Hibernate | 3 | Awesome | 2 |
As you can see, the post_id is duplicated because both the post and post_comment tables contain a post_id column.
SQL JOIN USING clause with SELECT *
On the other hand, if we run a SELECT * query that features the USING clause for the JOIN condition:
SELECT *
FROM post
INNER JOIN post_comment USING(post_id)
ORDER BY post_id, post_comment_id
We will get the following result set:
| post_id | title | post_comment_id | review |
|---------|-----------|-----------------|-----------|
| 1 | Java | 1 | Good |
| 1 | Java | 2 | Excellent |
| 2 | Hibernate | 3 | Awesome |
You can see that this time, the post_id column is deduplicated, so there is a single post_id column being included in the result set.
Conclusion
If the database schema is designed so that Foreign Key column names match the columns they reference, and the JOIN conditions only check if the Foreign Key column value is equal to the value of its mirroring column in the other table, then you can employ the USING clause.
Otherwise, if the Foreign Key column name differs from the referencing column or you want to include a more complex join condition, then you should use the ON clause instead.
For those experimenting with this in phpMyAdmin, just a word:
phpMyAdmin appears to have a few problems with USING. For the record this is phpMyAdmin run on Linux Mint, version: "4.5.4.1deb2ubuntu2", Database server: "10.2.14-MariaDB-10.2.14+maria~xenial - mariadb.org binary distribution".
I have run SELECT commands using JOIN and USING in both phpMyAdmin and in Terminal (command line), and the ones in phpMyAdmin produce some baffling responses:
1) a LIMIT clause at the end appears to be ignored.
2) the supposed number of rows as reported at the top of the page with the results is sometimes wrong: for example 4 are returned, but at the top it says "Showing rows 0 - 24 (2503 total, Query took 0.0018 seconds.)"
Logging on to mysql normally and running the same queries does not produce these errors. Nor do these errors occur when running the same query in phpMyAdmin using JOIN ... ON .... Presumably a phpMyAdmin bug.
Short answer:
USING: when clause is ambiguous
ON: when clause has different comparison params
Briefly: database imported from foreign source, so I cannot prevent duplicates, I can only prune and clean the database.
Foreign db changes daily, so, I want to automate the pruning process.
It resides on:
MariaDB v10.4.6 managed predominantly by phpMyadmin GUI v4.9.0.1 (both pretty much up to date as of this writing).
This is a radio browsing database.
It has multiple columns, but for me there are only few important:
StationID (it is unique entry number, thus db does not consider new entries as duplicates, all of them are unique because of this primary key)
There are no row numbers.
Name, url, home-page, country, etc
I do want to remove multiple url duplicated entries base on:
duplicate url has country to it, but some country values are NULL (=empty)
so I do want remove all duplicates except one containing country name, if there is one entry with it, if there is none, just one url, regardless of name (names are multilingual, so some duplicated urls have also various names, which I do not care for.
StationID (unique number, but not consecutive, also this is primary db key)
Name (variable, least important)
url (variable, but I do want to remove the duplicates)
country (variable, frequently NULL/empty, I want to eliminate those with empty entries as much as possible, if possible)
One url has to stay by any means (not to be deleted)
I have tried multitude of queries, some work for SELECT, but do NOT for DELETE, some hang my machine when executed. Here are some queries I tried (remember I use MariaDB, not oracle, or ms-sql)
SELECT * from `radio`.`Station`
WHERE (`radio`.`Station`.`Url`, `radio`.`Station`.`Name`) IN (
SELECT `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
FROM `radio`.`Station`
GROUP BY `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
HAVING COUNT(*) > 1)
This one should show all entries (not only one grouped), but this query hangs my machine
This query gets me as close as possible:
SELECT *
FROM `radio`.`Station`
WHERE `radio`.`Station`.`StationID` NOT IN (
SELECT MAX(`radio`.`Station`.`StationID`)
FROM `radio`.`Station`
GROUP BY `radio`.`Station`.`Url`,`radio`.`Station`.`Name`,`radio`.`Station`.`Country`)
However this query lists more entries:
SELECT *, COUNT(`radio`.`Station`.`Url`) FROM `radio`.`Station` GROUP BY `radio`.`Station`.`Name`,`radio`.`Station`.`Url` HAVING (COUNT(`radio`.`Station`.`Url`) > 1);
But all of these queries group them and display only one row.
I also tried UNION, INNER JOIN, but failed.
WITH cte AS..., but phpMyadmin does NOT like this query, and mariadb cli also did not like it.
I also tried something of this kind, published at oracle blog, which did not work, and I really had no clue what was what in this function:
select *
from (
select f.*,
count(*) over (
partition by `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
) ct
from `radio`.`Station` f
)
where ct > 1
I did not know what f.* was, query did not like ct.
Given
drop table if exists radio;
create table radio
(stationid int,name varchar(3),country varchar(3),url varchar(3));
insert into radio values
(1,'aaa','uk','a/b'),
(2,'bbb','can','a/b'),
(3,'bbb',null,'a/b'),
(4,'bbb',null,'b/b'),
(5,'bbb',null,'b/b');
You could give the null countries a unique value (using coalesce), fortunately stationid is unique so:
select t.stationid,t.name,t.country,t.url
from radio t
join
(select url,max(coalesce(country,stationid)) cntry from radio t group by url) s
on s.url = t.url and s.cntry= coalesce(t.country,t.stationid);
Yields
+-----------+------+---------+------+
| stationid | name | country | url |
+-----------+------+---------+------+
| 1 | aaa | uk | a/b |
| 5 | bbb | NULL | b/b |
+-----------+------+---------+------+
2 rows in set (0.00 sec)
Translated to a delete
delete t from radio t
join
(select url,max(coalesce(country,stationid)) cntry from radio t group by url) s
on s.url = t.url and s.cntry <> coalesce(t.country,t.stationid);
MariaDB [sandbox]> select * from radio;
+-----------+------+---------+------+
| stationid | name | country | url |
+-----------+------+---------+------+
| 1 | aaa | uk | a/b |
| 5 | bbb | NULL | b/b |
+-----------+------+---------+------+
2 rows in set (0.00 sec)
Fix 2 problems at once:
Dup rows already in table
Dup rows can still be put in table
Do this fore each table:
CREATE TABLE new LIKE real;
ALTER TABLE new ADD UNIQUE(x,y); -- will prevent future dups
INSERT IGNORE INTO new -- IGNORE dups
SELECT * FROM real;
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
If I compare
explain select * from Foo where find_in_set(id,'2,3');
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | User | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
with this one
explain select * from Foo where id in (2,3);
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | User | range | PRIMARY | PRIMARY | 8 | NULL | 2 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
It is apparent that FIND_IN_SET does not exploit the primary key.
I want to put a query such as the above into a stored procedure, with the comma-separated string as an argument.
Is there any way to make the query behave like the second version, in which the index is used, but without knowing the content of the id set at the time the query is written?
In reference to your comment:
#MarcB the database is normalized, the CSV string comes from the UI.
"Get me data for the following people: 101,202,303"
This answer has a narrow focus on just those numbers separated by a comma. Because, as it turns out, you were not even talking about FIND_IN_SET afterall.
Yes, you can achieve what you want. You create a prepared statement that accepts a string as a parameter like in this Recent Answer of mine. In that answer, look at the second block that shows the CREATE PROCEDURE and its 2nd parameter which accepts a string like (1,2,3). I will get back to this point in a moment.
Not that you need to see it #spraff but others might. The mission is to get the type != ALL, and possible_keys and keys of Explain to not show null, as you showed in your second block. For a general reading on the topic, see the article Understanding EXPLAIN’s Output and the MySQL Manual Page entitled EXPLAIN Extra Information.
Now, back to the (1,2,3) reference above. We know from your comment, and your second Explain output in your question that it hits the following desired conditions:
type = range (and in particular not ALL) . See the docs above on this.
key is not null
These are precisely the conditions you have in your second Explain output, and the output that can be seen with the following query:
explain
select * from ratings where id in (2331425, 430364, 4557546, 2696638, 4510549, 362832, 2382514, 1424071, 4672814, 291859, 1540849, 2128670, 1320803, 218006, 1827619, 3784075, 4037520, 4135373, ... use your imagination ..., ..., 4369522, 3312835);
where I have 999 values in that in clause list. That is an sample from this answer of mine in Appendix D than generates such a random string of csv, surrounded by open and close parentheses.
And note the following Explain output for that 999 element in clause below:
Objective achieved. You achieve this with a stored proc similar to the one I mentioned before in this link using a PREPARED STATEMENT (and those things use concat() followed by an EXECUTE).
The index is used, a Tablescan (meaning bad) is not experienced. Further readings are The range Join Type, any reference you can find on MySQL's Cost-Based Optimizer (CBO), this answer from vladr though dated, with a eye on the ANALYZE TABLE part, in particular after significant data changes. Note that ANALYZE can take a significant amount of time to run on ultra-huge datasets. Sometimes many many hours.
Sql Injection Attacks:
Use of strings passed to Stored Procedures are an attack vector for SQL Injection attacks. Precautions must be in place to prevent them when using user-supplied data. If your routine is applied against your own id's generated by your system, then you are safe. Note, however, that 2nd level SQL Injection attacks occur when data was put in place by routines that did not sanitize that data in a prior insert or update. Attacks put in place prior via data and used later (a sort of time bomb).
So this answer is Finished for the most part.
The below is a view of the same table with a minor modification to it to show what a dreaded Tablescan would look like in the prior query (but against a non-indexed column called thing).
Take a look at our current table definition:
CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;
select min(id), max(id),count(*) as theCount from ratings;
+---------+---------+----------+
| min(id) | max(id) | theCount |
+---------+---------+----------+
| 1 | 5046213 | 4718592 |
+---------+---------+----------+
Note that the column thing was a nullable int column before.
update ratings set thing=id where id<1000000;
update ratings set thing=id where id>=1000000 and id<2000000;
update ratings set thing=id where id>=2000000 and id<3000000;
update ratings set thing=id where id>=3000000 and id<4000000;
update ratings set thing=id where id>=4000000 and id<5100000;
select count(*) from ratings where thing!=id;
-- 0 rows
ALTER TABLE ratings MODIFY COLUMN thing int not null;
-- current table definition (after above ALTER):
CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;
And then the Explain that is a Tablescan (against column thing):
You can use following technique to use primary index.
Prerequisities:
You know the maximum amount of items in comma separated string and it is not large
Description:
we convert comma separated string into temporary table
inner join to the temporary table
select #ids:='1,2,3,5,11,4', #maxCnt:=15;
SELECT *
FROM foo
INNER JOIN (
SELECT * FROM (SELECT #n:=#n+1 AS n FROM foo INNER JOIN (SELECT #n:=0) AS _a) AS _a WHERE _a.n <= #maxCnt
) AS k ON k.n <= LENGTH(#ids) - LENGTH(replace(#ids, ',','')) + 1
AND id = SUBSTRING_INDEX(SUBSTRING_INDEX(#ids, ',', k.n), ',', -1)
This is a trick to extract nth value in comma separated list:
SUBSTRING_INDEX(SUBSTRING_INDEX(#ids, ',', k.n), ',', -1)
Notes: #ids can be anything including other column from other or the same table.
I am trying rewrite this subquery into a join. I have read the other questions on SO but cant get this one working.
create table job (
emplid int,
effdt date,
title varchar(100),
primary key (emplid, effdt)
);
insert into job set emplid=1, effdt='2010-01-01', title='Programmer';
insert into job set emplid=1, effdt='2011-01-01', title='Programmer I';
insert into job set emplid=1, effdt='2012-01-01', title='Programmer II';
insert into job set emplid=2, effdt='2010-01-01', title='Analyst';
insert into job set emplid=2, effdt='2011-01-01', title='Analyst I';
insert into job set emplid=2, effdt='2012-01-01', title='Analyst II';
#Get each employees current job:
select *
from job a
where a.effdt=
(select max(b.effdt)
from job b
where b.emplid=a.emplid);
Results:
+--------+------------+---------------+
| emplid | effdt | title |
+--------+------------+---------------+
| 1 | 2012-01-01 | Programmer II |
| 2 | 2012-01-01 | Analyst II |
+--------+------------+---------------+
I would like to rewrite the query into a join, without a subquery. Is this possible?
Writing this as a join is perhaps a bit counterintuitive. The idea is to use a left outer join and include in the condition that b.effdt > a.effdt. This condition will match rows except when a.effdt takes on the maximum value. The query can then filter for these using a where:
select a.*
from job a left outer join
job b
on b.emplid = a.emplid and
b.effdt > a.effdt
where b.effdt is NULL;
Have you considered rewriting your schema?
If you are able to, it might be better to have a history or log table that has entries for when the effective date was changed, for which employee ID and what the previous title was. That way you would just query the actual table and get the results that you want.
This can be achieved by using triggers for whenever a row in the database is changed, then everything is handled at the database level.
I'm looking for something like
SELECT
`foo`.*,
(SELECT MAX(`foo`.`bar`) FROM `foo`)
FROM
(SELECT * FROM `fuz`) AS `foo`;
but it seem that foo does not get recognized in nested query as there is error like
[Err] 1146 - Table 'foo' doesn't exist
I try the query above because I think its faster than something like
SELECT
`fuz`.*,
(SELECT MAX(`bar`) FROM `fuz`) as max_bar_from_fuz
FROM `fuz`
Please give me some suggestions.
EDIT: I am looking for solutions with better performance than the second query. Please assume that my table fuz is a very, very big one, thus running an additional query getting max_bar cost me a lot.
What you want, for the first query (with some modification) to work, is called Common Table Expressions and MySQL has not that feature.
If your second query does not perform well, you can use this:
SELECT
fuz.*,
fuz_grp.max_bar
FROM
fuz
CROSS JOIN
( SELECT MAX(bar) AS max_bar
FROM fuz
) AS fuz_grp
Alias created into a SELECT clause only can be used to access scalar values, they are not synonyms to the tables. If you want to return the max value of a column for all returned rows you can do it by running a query before to calculate the max value into a variable and then use this variable as a scalar value into your query, like:
-- create and populate a table to demonstrate concept
CREATE TABLE fuz (bar INT, col0 VARCHAR(20), col1 VARCHAR(20) );
INSERT INTO fuz(bar, col0, col1) VALUES (1, 'A', 'Airplane');
INSERT INTO fuz(bar, col0, col1) VALUES (2, 'B', 'Boat');
INSERT INTO fuz(bar, col0, col1) VALUES (3, 'C', 'Car');
-- create the scalar variable with the value of MAX(bar)
SELECT #max_foo:=MAX(bar) FROM fuz;
-- use the scalar variable into the query
SELECT *, #max_foo AS `MAX_FOO`
FROM fuz;
-- result:
-- | BAR | COL0 | COL1 | MAX_FOO |
-- |-----|------|----------|---------|
-- | 1 | A | Airplane | 3 |
-- | 2 | B | Boat | 3 |
-- | 3 | C | Car | 3 |
Just simple use MAX function:
SELECT
`fuz`.*,
MAX(`fuz`.`bar`)
FROM
`fuz`
or if you use
SELECT
`foo`.*,
MAX(`foo`.`bar`)
FROM
(SELECT * FROM `fuz` JOIN `loolse' ON (`fuz`.`field` = `loolse`.`smile`)) AS `foo`;