Can someone please explain to me why this:
SELECT
A.id,
A.name,
B.id AS title_id
FROM title_information AS A
JOIN titles B ON B.title_id = A.id
WHERE
A.name LIKE '%testing%'
is considerably slower (6-7 times) than this:
SELECT
A.id,
A.name,
B.id AS title_id
FROM (SELECT id, name FROM title_information) AS A
JOIN titles B ON B.title_id = A.id
WHERE
A.name LIKE '%testing%'
I know it's probably hard to answer this question without knowing full details about the schema and MySQL configuration, but I'm looking for any generic reasons why the first example could be so significantly slower than the second?
Running EXPLAIN gives this:
|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
|| 1 || SIMPLE || B || index || || id || 12 || || 80407 || Using index ||
|| 1 || SIMPLE || A || eq_ref || PRIMARY,id_UNIQUE,Index 4 || PRIMARY || 4 || newsql.B.title_id || 1 || Using where ||
and
|| *id* || *select_type* || *table* || *type* || *possible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
|| 1 || PRIMARY || B || index || || id || 12 || || 80407 || Using index ||
|| 1 || PRIMARY || <derived2> || ALL || || || || || 71038 || Using where; Using join buffer ||
|| 2 || DERIVED || title_information || index || || Index 4 || 206 || || 71038 || Using index ||
UPDATE:
A.id and B.id are both PRIMARY KEYS, while A.name is an index. Both tables have around 50,000 rows (~15MB). MySQL configuration is pretty much a default one.
Not sure if that helps (or if it adds more to the confusion - as it does for me) but using more generic LIKE statement that is likely to have more matching fields (e.g. "LIKE '%x%'") makes the first query run considerably faster. On the other hand, using "LIKE '%there are no records matching this%'" will make the second query a lot faster (while the first one struggles).
Anyone can shed some light on what's going on here?
Thank you!
This is speculation (my powers of reading MySQL explain output are weaker than they should be, because I want to see data flow diagrams).
But here is what I think is happening. The first query is saying "Let's go through B and look up the appropriate value in A". It then looks up the appropriate value using the id index, then it needs to fetch the page and compare to name. These accesses are inefficient, because they are not sequential.
The second version appears to recognize the condition on name as being important. It is going through the name index on A and only fetching the matching rows as needed. This is faster, because the data is in the index and few pages are needed for the matching names. The match to B is then pretty simple, with only one row to match.
I am surprised at the performance difference. Usually, derived tables are bad performance-wise, but this is clearly an exception.
Related
Given the table, I am trying to select groups of records and sum the last column Class in each group. The rules of the grouping are slightly complicated and rows need to be compared to each other.
|| Seq || Time || Spec || Class
|| 1 || 8:05 || 0 || 5
|| 2 || 8:06 || 1 || 5
|| 3 || 8:07 || 2 ||10
|| 4 || 8:08 || 4 ||10
|| 5 || 8:09 || 3 || 5
|| 6 || 8:10 || 2 || 5
|| 7 || 8:11 || 6 || 5
|| 8 || 8:12 || 6 ||15
I need to group records based on the change in value (increase or decrease) in the Spec column. The required change in value is 2. So starting with row 1, the Spec is 0. It doesn’t increase by at least 2 until row 3. This is a valid group and I need to sum the Class field. The expected output is StartTime, StartSpec, EndTime, EndSpec, and TotalClass.
To determine the next group, I need to measure the change in value with the last row used in the previous group. As you can see, row 4 has immediately increased by 2 and so this one row is a valid group.
Expected Output:
||StartTime || StartSpec || EndTime || EndSpec || TotalClass
|| 8:05 || 0 || 8:07 || 2 || 20
|| 8:08 || 4 || 8:08 || 4 || 10
|| 8:09 || 3 || 8:10 || 2 || 10
|| 8:11 || 6 || 8:11 || 6 || 5
Can be done by using some intermediate variables, to detect the first and the last row in a group, as illustrated below.
Note that this will "auto-close" the last group, if it is not closed yet.
Also note, that for the use-cases like that, application-level solution
might be a more elegant option (as noted in the comments already).
Another option is to compute an explicit group discriminator (i.e. "gid"), at the data insertion time, and store it in the table itself, so that you can then query data in a standard way, w/o relying on any variables.
SELECT
MAX(startTime) as startTime,
MAX(startSpec) as startSpec,
MAX(endTime) as endTime,
MAX(endSpec) as endSpec,
SUM(class) as totalClass
FROM (
SELECT
/* Detect first and last rows in a group (when ordered by "seq") */
#first as isFirst,
#last:=(ABS(#prev-spec)>1 OR seq=(SELECT MAX(seq) FROM groups)) as isLast,
/* If this is a first row, set "startTime" and "startSpec" */
IF(#first,time,NULL) as startTime,
IF(#first,spec,NULL) as startSpec,
/* If this is a last row, set "endTime" and "endSpec" */
IF(#last,time,NULL) as endTime,
IF(#last,spec,NULL) as endSpec,
/* Start the next group */
IF(#last,#prev:=spec,NULL) as nextPrev,
IF(#last,(#gid:=#gid+1)-1,#gid) as gid,
/* Flip "first" */
#first:=#last as nextIsFirst,
/* Row "class" */
class
FROM
/* Declare some variables */
(SELECT #first:=TRUE,#last:=FALSE,#prev:=0,#gid:=0) init
CROSS JOIN Groups ORDER BY seq
) labeled GROUP BY gid;
I have two tables like this
Table1: manager
=======================================
|| Id || MgrName || department ||
=========================================
|| 1 || mgr1 ||human resource ||
|| 2 || mgr2 ||marketing ||
|| 3 || mgr3 ||customer management ||
=========================================
Table2: employee
====================================
|| empid || empname || empmanager||
====================================
|| 1 || abc || mgr1 ||
|| 2 || xyz || mgr1 ||
|| 3 || def || mgr3 ||
=====================================
The thing is when I delete mgr1 in table1:manager. I also want to update employee table where empmanager is mgr1 by null. I don't want to use any trigger.
Please tell me proper way to design database and also to avoid this problem.
You should follow normalization rules to avoid anomalies while performing CRUD operations,
First Normal Form
Second Normal Form
Third Normal Form
please refer these links to apply normalization to you tables link 1 link 2
I have 2 tables:
table 1:
|| *handtool_id* || *maintenance_interval_value* || *unit_unit_id* || *handtool_last_date_of_maintenance* || *handtool_next_date_of_maintenance* ||
|| 1 || 1 || 5 || 2014-11-07 || ||
|| 2 || 1 || 6 || 2014-11-07 || ||
|| 3 || 4 || 4 || 2014-11-07 || ||
table 2:
|| *unit_id* || *unit_name* || *unit_value* || *unit_parent_id* ||
|| 1 || Minute || 1 || 1 ||
|| 2 || Hour || 60 || 1 ||
|| 3 || Day || 1440 || 1 ||
|| 4 || Week || 10080 || 1 ||
|| 5 || Month || 32767 || 1 ||
|| 6 || Year || 525949 || 1 ||
What is the right syntax for calculating the handtool_next_date_of_maintenance from maintenance_interval_value and from unit_unit_id? Thank you
I have to say, it's wrong, and very confusing to change your question like this. You should rollback this question to the one Andrew Jones answered Nov 3, accept and upvote his answer, and then ask a new question.
That said, this would appear to get you something like what you're after (although how you arrived at figures of 32767 and 525949 is beyond me !?!)
SELECT *
, h.handtool_last_date_of_maintenance
+ INTERVAL h.maintenance_interval_value
* u.unit_value MINUTE x
FROM handtools h
JOIN units u
ON u.unit_id = h.unit_unit_id;
Whenever you insert to B, you want to insert into A. This is a good use of a MySQL trigger. I'm assuming an auto increment for web_content_id.
DELIMITER //
CREATE TRIGGER new_language_id
AFTER INSERT ON B
FOR EACH ROW
BEGIN
INSERT INTO A (web_content_const, i18n_language_codes_i18n_language_codes_id)
VALUES ('SERVICES_HEADING', #i18n_language_codes_id),
('SERVICES_MAIN_TEXT', #i18n_language_codes_id),
('SERVICES_1_HEADING', #i18n_language_codes_id),
('SERVICES_1_TEXT', #i18n_language_codes_id);
END;//
DELIMITER ;
Name|| day||ActivityDate ||TimeIn || TimeOut ||
Ade || 20 || 2013-08-20 || 10:06:09 || 18:21:03 ||
Ade || 21 ||2013-08-27 || 11:00:34 || 18:06:56 ||
Ade || 22 || 2013-08-28 || 09:56:29 || 17:59:56 ||
This is my query :
select
tot=sum(DATEDIFF(hh ,TimeIn ,TimeOut )) as TotalHourAndMinute
from report
And error :
1582 - Incorrect parameter count in the call to native function
'DATEDIFF'
This is my table in datable..
I don't know how to get total hour like TimeOut-TimeIn..
fyi, i have a lot of data in this table.. not only this 3..
i hope you guys clear...
Name|| day||ActivityDate ||TimeIn || TimeOut || TotalHourAndMinute
Ade || 20 || 2013-08-20 || 10:00:00 || 18:30:00 || 8.5
Ade || 21 ||2013-08-27 || 11:00:34 || 18:06:56 || 7.something
Ade || 22 || 2013-08-28 || 09:56:29 || 17:59:56 || 7.something
i want it will be like this..
UPDATE
Well, if you're using MySQL the correct format for the function is this:
DATEDIFF(expr1,expr2)
More information about DATEDIFF, here. Also, all available time functions for MySQL are here.
But, if you want to see the difference in hours between two dates, use TIME_DIFF.
TIMEDIFF(expr1,expr2)
And documentation about TIME_DIFF here.
But strictly to your case, you should write your query like this:
SELECT tot = sum(HOUR(TIMEDIFF(TimeIn, TimeOut))) AS TotalHourAndMinute
FROM report
UPDATE:
Now, after updating your question I understand what you want.
The query you need to use is this:
SELECT
NAME,
DAY,
ActivityDate,
SUM(TIMEDIFF(TimeOut, TimeIn)) as TotalHourAndMinute
FROM REPORT
WHERE (TimeOut IS NOT NULL) AND (TimeIn IS NOT NULL)
GROUP BY NAME, DAY, ActivityDate
use:
SELECT HOUR(TIMEDIFF(TimeOut,TimeIn)) AS hour from report;
For more information visit this site.
If you aggregate time values you will get a result in seconds, to convert it back to time you call SEC_TO_TIME. This gives you:
SELECT
SEC_TO_TIME(SUM(TIMEDIFF(TimeOut, TimeIn))) AS TotalHourAndMinute
FROM
report
I'm fairly new to queries which involve variable declaration in MySQL. I have seen various styles and I'm not fully clear of what these actually do. I've questions about what these actually do.
1)
set #row:=0;
SELECT name, #row:=#row + 1 AS rownum
FROM animal
2)
SELECT name, #row:=#row + 1 AS rownum
FROM (SELECT #row:= 0) c, animal
Both returns the same:
name rownum
|| cat || 1 ||
|| cat || 2 ||
|| dog || 3 ||
|| dog || 4 ||
|| dog || 5 ||
|| ant || 6 ||
What are the differences in the above two queries and which of the two to adopt as to their scope, efficiency, coding habit, use-cases?
3) Now if I do this:
set #row:=0;
SELECT name, #row:=#row + 1 AS rownum
FROM (SELECT #row:= 123) c, animal
I get
name rownum
|| cat || 124 ||
|| cat || 125 ||
|| dog || 126 ||
|| dog || 127 ||
|| dog || 128 ||
|| ant || 129 ||
So doesn't that mean that the inner variable initialization is overriding the outer initialization and leaving the latter redundant hence (and hence its always a better practice to initialize in a SELECT?
4) If I merely do:
SELECT name, #row:=#row + 1 AS rownum
FROM animal
I get
name rownum
|| cat || NULL ||
|| cat || NULL ||
|| dog || NULL ||
|| dog || NULL ||
|| dog || NULL ||
|| ant || NULL ||
I can understand that since row isn't initialized. But if I run any of the other queries (may be variable row is getting initialized?) I see that row variable is incremented every time I run the above query. That is it gives me the result on first run:
name rownum
|| cat || 1 ||
|| cat || 2 ||
|| dog || 3 ||
|| dog || 4 ||
|| dog || 5 ||
|| ant || 6 ||
and then when re-run it yields in
name rownum
|| cat || 7 ||
|| cat || 8 ||
|| dog || 9 ||
|| dog || 10 ||
|| dog || 11 ||
|| ant || 12 ||
So is row being stored somewhere? And what is its scope and lifespan?
5) If I have query like this:
SELECT (CASE WHEN #name <> name THEN #row:=1 ELSE #row:=#row + 1 END) AS rownum,
#name:=name AS name
FROM animal
This always yields the right result:
rownum name
|| 1 || cat ||
|| 2 || cat ||
|| 1 || dog ||
|| 2 || dog ||
|| 3 || dog ||
|| 1 || ant ||
So doesn't that mean its not always necessary to initialize variable at the top or in a SELECT depending on the query?
Make sure to read the manual section on user variables.
What are the differences in the above two queries and which of the two to adopt as to their scope, efficiency, coding habit, use-cases?
Query 1) uses multiple statements. It can therefore rely on the order in which these statements are executed, ensuring that the variable is set before it gets incremented.
Query 2) on the other hand does the initialization in a nested subquery. This turns the whole thing into a single query. You don't risk forgetting the initialization. But the code relies more heavily on the internal workings of the mysql server, particularly the fact that it will execute the subquery before it starts computing results for the outer query.
So doesn't that mean that the inner variable initialization is overriding the outer initialization and leaving the latter redundant hence (and hence its always a better practice to initialize in a SELECT?
This is not about inner and outer, but about sequential order: the subquery is executed after the SET, so it will simply overwrite the old value.
So is row being stored somewhere? And what is its scope and lifespan?
User variables are local to the server connection. So any other process will be unaffected by the setting. Even the same process might maintain multiple connections, with independent settings of user variables. Once a connection is closed, all variable settings are lost.
So doesn't that mean its not always necessary to initialize variable at the top or in a SELECT depending on the query?
Quoting from the manual:
If you refer to a variable that has not been initialized, it has a value of NULL and a type of string.
So you can use a variable before it is initialized, but you have to be careful that you can actually deal with the resulting NULL value in a reasonable way. Note however that your query 5) suffers from another problem explicitely stated in the manual:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server. In SELECT #a, #a:=#a+1, ..., you might think that MySQL will evaluate #a first and then do an assignment second. However, changing the statement (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may cause MySQL to select an execution plan with a different order of evaluation.
So in your case, the #name:=name part could get executed before the #name <> name check, causing all your rownum values to be the same. So even if it does work for now, there are no guarantees that it will work in the future.
Note that I've been very sceptic about using user variables in this fashion. I've already quoted the above warning from the manual in comments to several answers. I've also asked questions like the one about Guarantees when using user variables to number rows. Other users are more pragmatic, and therefore more willing to use code that appears to work without express guarantees that things will continue to work as intended.