There are 2 tables and their structure as below:
mysql> desc product;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| brand | varchar(20) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.02 sec)
mysql> desc sales;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| yearofsales | varchar(10) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
Here id is the foreign key.
And Queries are as follows:
1.
mysql> select brand,sum(price),yearofsales
from product p, sales s
where p.id=s.id
group by s.id,yearofsales;
+-------+------------+-------------+
| brand | sum(price) | yearofsales |
+-------+------------+-------------+
| Nike | 917504000 | 2012 |
| FF | 328990720 | 2010 |
| FF | 328990720 | 2011 |
| FF | 723517440 | 2012 |
+-------+------------+-------------+
4 rows in set (1.91 sec)
2.
mysql> select brand,tmp.yearofsales,tmp.sum
from product p
join (
select id,yearofsales,sum(price) as sum
from sales
group by yearofsales,id
) tmp on p.id=tmp.id ;
+-------+-------------+-----------+
| brand | yearofsales | sum |
+-------+-------------+-----------+
| Nike | 2012 | 917504000 |
| FF | 2011 | 328990720 |
| FF | 2012 | 723517440 |
| FF | 2010 | 328990720 |
+-------+-------------+-----------+
4 rows in set (1.59 sec)
Question is: Why the second query takes less time than the first one? I have executed it multiple times in different order as well.
You can check the execution plan for the two queries and the indexes on the two tables to see why one query takes more than the other. Also, you cannot run one simple test and trust the results, there are many factors that can impact the execution of queries, like the server being busy with something else when executing one query, so it runs slower. You'll have to run both queries a big number of times and then compare the averages.
However, it is highly recommended to use explicit joins instead of implicit joins:
SELECT brand, SUM(price), yearofsales
FROM product p
INNER JOIN sales s ON p.id = s.id
GROUP BY s.id, yearofsales;
Related
I have an application that stores stock quotes into my MySQL database.
I have a table called stock_history:
mysql> desc stock_history;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | MUL | NULL | |
| close | decimal(12,5) | NO | MUL | NULL | |
| dmal_3 | decimal(12,5) | YES | MUL | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)
These are all the values in this table:
mysql> select date, close, dmal_3 from stock_history order by date asc;
+------------+----------+----------+
| date | close | dmal_3 |
+------------+----------+----------+-
| 2000-01-03 | 2.00000 | NULL |
| 2000-01-04 | 4.00000 | NULL |
| 2000-01-05 | 6.00000 | NULL |
| 2000-01-06 | 8.00000 | NULL |
| 2000-01-07 | 10.00000 | NULL |
| 2000-01-10 | 12.00000 | NULL |
| 2000-01-11 | 14.00000 | NULL |
| 2000-01-12 | 16.00000 | NULL |
| 2000-01-13 | 18.00000 | NULL |
| 2000-01-14 | 20.00000 | NULL |
+------------+----------+----------+-
10 rows in set (0.01 sec)
I am guaranteed that there will be 0 or 1 record for each date.
Can I write a single query that will insert the three-day moving average (ie: the average closing prices of that day and the two previous trading days before it) into the dmal_3 field? How?
When the query is done, I want the table to look like this:
mysql> select date, close, dmal_3 from stock_history order by date asc;
+------------+----------+----------+
| date | close | dmal_3 |
+------------+----------+----------+
| 2000-01-03 | 2.00000 | NULL |
| 2000-01-04 | 4.00000 | NULL |
| 2000-01-05 | 6.00000 | 4.00000 |
| 2000-01-06 | 8.00000 | 6.00000 |
| 2000-01-07 | 10.00000 | 8.00000 |
| 2000-01-10 | 12.00000 | 10.00000 |
| 2000-01-11 | 14.00000 | 12.00000 |
| 2000-01-12 | 16.00000 | 14.00000 |
| 2000-01-13 | 18.00000 | 16.00000 |
| 2000-01-14 | 20.00000 | 18.00000 |
+------------+----------+----------+
10 rows in set (0.01 sec)
That is what I call a good challenge. My solution first creates a counter for the values and uses it as a table. From it I select everything and join with the same query as a subquery checking the position of the counter on both. Once the query works it just need an inner join with the actual table to do the update. Here it is my solution:
update stock_history tb1
inner join
(
select a.id,
case when a.step < 3 then null
else
(select avg(b.close)
from (
select hh.*,
#stp:=#stp+1 stp
from stock_history hh,
(select #sum:=0, #stp:=0) x
order by hh.dt
limit 17823232
) b
where b.stp >= a.step-2 and b.stp <= a.step
)
end dmal_3
from (select h1.*,
#step:=#step+1 step
from stock_history h1,
(select #sum:=0, #step:=0) x
order by h1.dt
limit 17823232
) a
) x on tb1.id = x.id
set tb1.dmal_3 = x.dmal_3;
I changed some columns names for easiness of my test. Here it is the working SQLFiddle: http://sqlfiddle.com/#!9/e7dc00/1
If you have any doubt, let me know so I can clarify!
Edit
The limit 17823232 clause was added there in the subqueries because I don't know which version of MySql you are in. Depending on it (>= 5.7, not sure exactly) the database optimizer will ignore the internal order by clauses making it not work the way it should. I just chose a random big number usually you can use the maximum allowed.
The only column with different colunm name between your table and mine is the date one which I named dt because date is a reserved word and you should use backticks ( ` ) to use such columns, therefore I will left it as dt in above query.
In the database 'college2' there are 3 TABLES:'student, course & enrolment', and one(1) VIEW:'enrolment_status', which is created using the following command:
CREATE VIEW enrolment_status AS
SELECT code, COUNT(id)
FROM enrolment
GROUP BY code;
Explain command for 'course,enrolment and enrolment_status' results in:
mysql> EXPLAIN course;
+---------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+-------+
| code | char(8) | NO | PRI | NULL | |
| name | varchar(90) | YES | MUL | NULL | |
| max_enrolment | char(2) | YES | | NULL | |
+---------------+-------------+------+-----+---------+-------+
3 rows in set (0.09 sec)
mysql> explain enrolment;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | char(6) | YES | MUL | NULL | |
| code | char(8) | YES | MUL | NULL | |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.02 sec)
mysql> explain enrolment_status;
+-----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+-------+
| code | char(8) | YES | | NULL | |
| COUNT(id) | bigint(21) | NO | | 0 | |
+-----------+------------+------+-----+---------+-------+
2 rows in set (0.18 sec)
'max_enrolment' column in 'course' TABLE is the maximum allowed # of student for each course, say 10 or 20.
'count(id)' column in 'enrolment_status' VIEW (not table) is actual # of students enrolled in each course.
'id' column in 'enrolment' TABLE is the student id enrolled in a course.
HERE'S MY QUESTION:
I want to have the '# of seats left' which is the difference between 'max_enrolment' column and 'count(id)' column.
'#of seats left' can be a stand alone table or view or a column added to any of the above tables. How can i do this:
I tried many commands including the following,
CREATE VIEW seats_left AS (
SELECT course.code, course.max_enrolment - enrolment_status.count
FROM course, enrolment_status
WHERE course.code = enrolment_status.code);
...which gives me the following error message:
ERROR 1054 (42S22): Unknown column 'enrolment_status.count' in 'field list'
mysql> SELECT*FROM enrolment_status;
+----------+-----------+
| code | COUNT(id) |
+----------+-----------+
| COMP9583 | 7 |
| COMP9585 | 9 |
| COMP9586 | 7 |
| COMP9653 | 7 |
| COMP9654 | 7 |
| COMP9655 | 8 |
| COMP9658 | 7 |
+----------+-----------+
7 rows in set (0.00 sec)
mysql> SELECT code, max_enrolment FROM course;
+----------+---------------+
| code | max_enrolment |
+----------+---------------+
| COMP9583 | 10 |
| COMP9585 | 15 |
| COMP9586 | 15 |
| COMP9653 | 12 |
| COMP9654 | 10 |
| COMP9655 | 12 |
| COMP9658 | 12 |
+----------+---------------+
7 rows in set (0.00 sec)
+----------+---------------------+
| code | max_enrolment - cnt |
+----------+---------------------+
| COMP9583 | 9 |
| COMP9585 | 14 |
| COMP9586 | 14 |
| COMP9653 | 11 |
| COMP9654 | 9 |
| COMP9655 | 11 |
| COMP9658 | 11 |
+----------+---------------------+
7 rows in set (0.09 sec)
Try to use an acronym for in the view.
CREATE VIEW enrolment_status AS
SELECT code, COUNT(id) count
FROM enrolment
GROUP BY code;
Then you should be able to do this:
CREATE VIEW seats_left AS (
SELECT course.code, course.max_enrolment - enrolment_status.count
FROM course, enrolment_status
WHERE course.code = enrolment_status.code);
If you cannot change the view, then you must use the exact same name in the query:
CREATE VIEW seats_left AS (
SELECT course.code, course.max_enrolment - enrolment_status.'count(id)'
FROM course, enrolment_status
WHERE course.code = enrolment_status.code);
Try this:
SELECT b.`code`,max_enrolment - cnt from
(select `code`, cnt from
(select count(1) as cnt,`code` from enrolment_status
GROUP BY `code`) as a) as a
LEFT JOIN
(SELECT code,max_enrolment from course) as b
on a.`code` = b.`code`
You can change left join to right join
I ran a somewhat nonsense query on MySQL, but because its output is the same each time, I'm wondering if someone can help me understand the underlying algorithm.
Here's the table Orders on which we'll execute the query (database taken from here, just in case someone's interested):
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| orderNumber | int(11) | NO | PRI | NULL | |
| orderDate | date | NO | | NULL | |
| requiredDate | date | NO | | NULL | |
| shippedDate | date | YES | | NULL | |
| status | varchar(15) | NO | | NULL | |
| comments | text | YES | | NULL | |
| customerNumber | int(11) | NO | MUL | NULL | |
+----------------+-------------+------+-----+---------+-------+
There are 326 records for now, with the largest orderNumber being 10425.
Now here's the query I ran (basically removed GROUP BY from a sensible query):
mysql> select count(1), orderNumber, status from orders;
+----------+-------------+---------+
| count(1) | orderNumber | status |
+----------+-------------+---------+
| 326 | 10100 | Shipped |
+----------+-------------+---------+
1 row in set (0.00 sec)
So I'm asking for the total number of rows, along with status and orderNumber, which can be just about anything under the given circumstances. But the query always returns orderNumber 10100, even if I log out and run it again.
Is there a predictable answer for this?
There's no predictable answer for which you should use in your design. In general, the DB will return the values of the first row that matches the query. If you want predictability, you should apply an aggregate to every column (e.g. using MIN or MAX to always get smallest/largest value)
I have two tables in mysql server. I use these tables for studing JOIN multiple tables but something appears to be incorrect:
mysql> select * from category;
+-------------+-----------+
| category_id | name |
+-------------+-----------+
| 1 | fruit |
| 2 | vegetable |
+-------------+-----------+
2 rows in set (0.00 sec)
mysql> desc category;
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| category_id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
And:
mysql> select * from goods;
+---------+--------+-------------+------+
| good_id | name | category_id | cost |
+---------+--------+-------------+------+
| 1 | banan | 1 | 1.00 |
| 2 | potato | 2 | 1.00 |
| 3 | peach | 1 | 1.00 |
+---------+--------+-------------+------+
3 rows in set (0.00 sec)
mysql> desc goods;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| good_id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| category_id | int(11) | NO | MUL | NULL | |
| cost | decimal(6,2) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
The second table has foreign key (category_id) and I can join them using INNER JOIN:
mysql> select c.name category, g.name, g.cost from category as c INNER JOIN goods g ON c.category_id = g.category_id;
+-----------+--------+------+
| category | name | cost |
+-----------+--------+------+
| fruit | banan | 1.00 |
| vegetable | potato | 1.00 |
| fruit | peach | 1.00 |
+-----------+--------+------+
3 rows in set (0.00 sec)
I tried to use NATURAL JOIN but it didnt work and it seems I dont know why(((
mysql> select c.name, g.name, g.cost from category as c NATURAL JOIN goods g;
Empty set (0.00 sec)
Could somebody explain why NATURAL JOIN does not work?
I was having the exact same thing happen to me, and my Googling led me to this question. I eventually figured it out, so I figured I'd post my answer here.
This was the culprit:
Instead of specifying a join condition through ON, USING or a WHERE clause, the NATURAL keyword tells the server to match up any column names between the two tables, and automatically use those columns to resolve the join.
Your fruit and category tables both have a column called "name". When SQL tries to join the two, it tries to join all like columns. So thus, category_id==category_id, but name!=name.
Rename your columns tablename_column instead.
I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this:
+---------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| CallType | varchar(32) | NO | | NULL | |
| StartTime | datetime | NO | MUL | NULL | |
| StopTime | datetime | NO | | NULL | |
| CallDuration | float(10,5) | NO | | NULL | |
| BillDuration | mediumint(8) unsigned | NO | | NULL | |
| CallMinimum | tinyint(3) unsigned | NO | | NULL | |
| CallIncrement | tinyint(3) unsigned | NO | | NULL | |
| BasePrice | float(12,9) | NO | | NULL | |
| CallPrice | float(12,9) | NO | | NULL | |
| TransactionId | varchar(20) | NO | | NULL | |
| CustomerIP | varchar(15) | NO | | NULL | |
| ANI | varchar(20) | NO | | NULL | |
| ANIState | varchar(10) | NO | | NULL | |
| DNIS | varchar(20) | NO | | NULL | |
| LRN | varchar(20) | NO | | NULL | |
| DNISState | varchar(10) | NO | | NULL | |
| DNISLATA | varchar(10) | NO | | NULL | |
| DNISOCN | varchar(10) | NO | | NULL | |
| OrigTier | varchar(10) | NO | | NULL | |
| TermRateDeck | varchar(20) | NO | | NULL | |
+---------------+-----------------------+------+-----+---------+----------------+
I have the following indexes:
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs | 0 | PRIMARY | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | id | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 1 | StartTime | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 2 | StopTime | A | 269622 | NULL | NULL | | BTREE | | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The query I am running is this:
SELECT MAX(cnt) AS max_channels FROM
(SELECT cl1.StartTime, COUNT(*) AS cnt
FROM cdrs cl1
INNER JOIN cdrs cl2
ON cl1.StartTime
BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id)
AS counts;
It seems like I might have to chunk this data for each day and store the results in a separate table like simultaneous_calls.
I'm sure you want to know not only the maximum simultaneous calls, but when that happened.
I would create a table containing the timestamp of every individual minute
CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute . . .
Then join that to the calls.
SELECT ts, COUNT(*) AS count FROM times
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime
GROUP BY ts ORDER BY count DESC LIMIT 1;
Here's the result in my test (MySQL 5.6.17 on a Linux VM running on a Macbook Pro):
+---------------------+----------+
| ts | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 | 1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)
This achieves several goals:
Reduces the number of rows examined by two orders of magnitude.
Reduces the execution time from 3 hours+ to about 1 minute.
Also returns the actual timestamp when the highest count was found.
Here's the EXPLAIN for my query:
explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | times | index | PRIMARY | PRIMARY | 5 | NULL | 1440 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cdrs | ALL | starttime | NULL | NULL | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
Notice the figures in the rows column, and compare to the EXPLAIN of your original query. You can estimate the total number of rows examined by multiplying these together (but that gets more complicated if your query is anything other than SIMPLE).
The inline view isn't strictly necessary. (You're right about a lot of time to run the EXPLAIN on the query with the inline view, the EXPLAIN will materialize the inline view (i.e. run the inline view query and populate the derived table), and then give an EXPLAIN on the outer query.
Note that this query will return an equivalent result:
SELECT COUNT(*) AS max_channels
FROM cdrs cl1
JOIN cdrs cl2
ON cl1.StartTime BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id
ORDER BY max_channels DESC
LIMIT 1
Though it still has to do all the work, and probably doesn't perform any better; the EXPLAIN should run a lot faster. (We expect to see "Using temporary; Using filesort" in the Extra column.)
The number of rows in the resultset is going to be the number of rows in the table (~250,000 rows), and those are going to need to be sorted, so that's going to be some time there. The bigger issue (my gut is telling me) is that join operation.
I'm wondering if the EXPLAIN (or performance) would be any different if you swapped the cl1 and cl2 in the predicate, i.e.
ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
I'm thinking that, just because I'd be tempted to try a correlated subquery. That's ~250,000 executions, and that's not likely going to be any faster...
SELECT ( SELECT COUNT(*)
FROM cdrs cl2
WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
) AS max_channels
, cl1.StartTime
FROM cdrs cl1
ORDER BY max_channels DESC
LIMIT 11
You could run an EXPLAIN on that, we're still going to see a "Using temporary; Using filesort", and it will also show the "dependent subquery"...
Obviously, adding a predicate on the cl1 table to cut down the number of rows to be returned (for example, checking only the past 15 days); that should speed things up, but it doesn't get you the answer you want.
WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY
(None of my musings here are sure-fire answers to your question, or solutions to the performance issue; they're just musings.)