We assume that there is no primary key defined for a table T. In that case, how does one count all the rows in T quickly/efficiently for these databases - Oracle 11g, MySql, Mssql ?
It seems that count(*) and count(column_name) can be slow and inaccurate respectively. The following seems to be the fastest and most reliable way to do it-
select count(rowid) from MySchema.TableInMySchema;
Can you tell me if the above statement also has any shortcomings ? If it is good, then do we have similar statements for mysql and mssql ?
Thanks in advance.
Source -
http://www.thewellroundedgeek.com/2007/09/most-people-use-oracle-count-function.html
count(column_name) is not inaccurate, it's simply something completely different than count(*).
The SQL standard defines count(column_name) as equivalent to count(*) where column_name IS NOT NULL. To the result is bound to be different if column_name is nullable.
In Oracle (and possibly other DBMS as well), count(*) will use an available index on a not null column to count the rows (e.g. PK index). So it will be just as fas
Additionally there is nothing similar to the rowid in SQL Server or MySQL (in PostgreSQL it would be ctid).
Do use count(*). It's the best option to get the row count. Let the DBMS do any optimization in the background if adequate indexes are available.
Edit
A quick demo on how Oracle automatically uses an index if available and how that reduces the amount of work done by the database:
The setup of the test table:
create table foo (id integer not null, c1 varchar(2000), c2 varchar(2000));
insert into foo (id, c1, c2)
select lvl, c1, c1 from
(
select level as lvl, dbms_random.string('A', 2000) as c1
from dual
connect by level < 10000
);
That generates 10000 rows with each row filling up some space in order to make sure the table has a realistic size.
Now in SQL*Plus I run the following:
SQL> set autotrace traceonly explain statistics;
SQL> select count(*) from foo;
Execution Plan
----------------------------------------------------------
Plan hash value: 1342139204
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2740 (1)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| FOO | 9999 | 2740 (1)| 00:00:33 |
-------------------------------------------------------------------
Statistics
----------------------------------------------------------
181 recursive calls
0 db block gets
10130 consistent gets
0 physical reads
0 redo size
430 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
As you can see a full table scan is done on the table which requires 10130 "IO Operations" (I know that that is not the right term, but for the sake of the demo it should be a good enough explanation for someone never seen this before)
Now I create an index on that column and run the count(*) again:
SQL> create index i1 on foo (id);
Index created.
SQL> select count(*) from foo;
Execution Plan
----------------------------------------------------------
Plan hash value: 129980005
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| I1 | 9999 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
27 consistent gets
21 physical reads
0 redo size
430 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
As you can see Oracle did use the index on the (not null!) column and the amount of IO went drastically down (from 10130 to 27 - not something I'd call "grossly ineffecient").
The "physical reads" stem from the fact that the index was just created and was not yet in the cache.
I would expect other DBMS to apply the same optimizations.
In Oracle, COUNT(*) is the most efficient. Realistically, COUNT(rowid), COUNT(1), or COUNT('fuzzy bunny') are likely to be equally efficient. But if there is a difference, COUNT(*) will be more efficient.
i EVER use SELECT COUNT(1) FROM anything;, instead of the asterisk...
some people are of the opinion, that mysql uses the asterisk to invoke the query-optimizer and ignores any optimizing when use of "1" as static scalar...
imho, this is straight-forward, because you don't use any variable and it's clear, that you only count all rows.
Related
I have a set of conditions in my where clause like
WHERE
d.attribute3 = 'abcd*'
AND x.STATUS != 'P'
AND x.STATUS != 'J'
AND x.STATUS != 'X'
AND x.STATUS != 'S'
AND x.STATUS != 'D'
AND CURRENT_TIMESTAMP - 1 < x.CREATION_TIMESTAMP
Which of these conditions will be executed first? I am using oracle.
Will I get these details in my execution plan?
(I do not have the authority to do that in the db here, else I would have tried)
Are you sure you "don't have the authority" to see an execution plan? What about using AUTOTRACE?
SQL> set autotrace on
SQL> select * from emp
2 join dept on dept.deptno = emp.deptno
3 where emp.ename like 'K%'
4 and dept.loc like 'l%'
5 /
no rows selected
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 4 (0)|
| 1 | NESTED LOOPS | | 1 | 62 | 4 (0)|
|* 2 | TABLE ACCESS FULL | EMP | 1 | 42 | 3 (0)|
|* 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)|
|* 4 | INDEX UNIQUE SCAN | SYS_C0042912 | 1 | | 0 (0)|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("EMP"."ENAME" LIKE 'K%' AND "EMP"."DEPTNO" IS NOT NULL)
3 - filter("DEPT"."LOC" LIKE 'l%')
4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
As you can see, that gives quite a lot of detail about how the query will be executed. It tells me that:
the condition "emp.ename like 'K%'" will be applied first, on the full scan of EMP
then the matching DEPT records will be selected via the index on dept.deptno (via the NESTED LOOPS method)
finally the filter "dept.loc like 'l%' will be applied.
This order of application has nothing to do with the way the predicates are ordered in the WHERE clause, as we can show with this re-ordered query:
SQL> select * from emp
2 join dept on dept.deptno = emp.deptno
3 where dept.loc like 'l%'
4 and emp.ename like 'K%';
no rows selected
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 62 | 4 (0)|
| 1 | NESTED LOOPS | | 1 | 62 | 4 (0)|
|* 2 | TABLE ACCESS FULL | EMP | 1 | 42 | 3 (0)|
|* 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)|
|* 4 | INDEX UNIQUE SCAN | SYS_C0042912 | 1 | | 0 (0)|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("EMP"."ENAME" LIKE 'K%' AND "EMP"."DEPTNO" IS NOT NULL)
3 - filter("DEPT"."LOC" LIKE 'l%')
4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
The database will decide what order to execute the conditions in.
Normally (but not always) it will use an index first where possible.
As has been said, looking at the execution plan will give you some information. However, unless you use the plan stability feature, you can't rely on the execution plan always remaining the same.
In the case of the query you posted, it doesn't look like the order of evaluation will change the logic in any way, so I guess what you are thinking about is efficiency. It's fairly likely that the Oracle optimizer will choose a plan that is efficient.
There are tricks you can do to encourage a particular ordering if you want to compare the performance with base query. Say for instance that you wanted the timestamp condition to be executed first. You could do this:
WITH subset AS
( SELECT /*+ materialize */
FROM my_table
WHERE CURRENT_TIMESTAMP - 1 < x.CREATION_TIMESTAMP
)
SELECT *
FROM subset
WHERE
d.attribute3 = 'abcd*'
AND x.STATUS != 'P'
AND x.STATUS != 'J'
AND x.STATUS != 'X'
AND x.STATUS != 'S'
AND x.STATUS != 'D'
The "materialize" hint should cause the optimizer to execute the inline query first, then scan that result set for the other conditions.
I'm not advising you do this as a general habit. In most cases just writing the simple query will lead to the best execution plans.
To add to the other comments on execution plans, under the cpu-based costing model introduced in 9i and used by default in 10g+ Oracle will also make an assessment of which predicate evaluation order will result in lower computational cost even if that does not affect the table access order and method. If executing one predicate before another results in fewer predicates calculations being executed then that optimisaton can be applied.
See this article for more details: http://www.oracle.com/technology/pub/articles/lewis_cbo.html
Furthermore, Oracle doesn't even have to execute predicates where comparison with a check constraint or partition definitions indicates that no rows would be returned anyway.
Complex stuff.
Finally, relational database theory says that you can never depend on the order of execution of the query clauses, so best not to try. As others have said, the cost-based optimizer tries to choose what it thinks is best, but even viewing explain plan won't guarantee the actual order that's used. Explain plan just tells you what the CBO recommends, but that's still not 100%.
Maybe if you explain why you're trying to do this, some could suggest a plan?
Tricky question. Just faced the same dilemma. I need to mention a function within a query. The function itself makes another query, so you understand how it affects performance in general. But in most cases we have, the function wouldn't be called so often if the rest of conditions executed first.
Well, thought it would be useful to post here another article for topic.
The following quote is copied from Donald Burleson's site (http://www.dba-oracle.com/t_where_clause.htm) .
The ordered_predicates hint is specified in the Oracle WHERE clause of
a query and is used to specify the order in which Boolean predicates
should be evaluated.
In the absence of ordered_predicates, Oracle uses
the following steps to evaluate the order of SQL predicates:
Subqueries are evaluated before the outer Boolean conditions in the WHERE clause.
All Boolean conditions without built-in functions or subqueries are evaluated in reverse from the order they are found in the WHERE
clause, with the last predicate being evaluated first.
Boolean predicates with built-in functions of each predicate are evaluated in increasing order of their estimated evaluation costs.
I have a table which contains task list of persons. followings are columns
+---------+-----------+-------------------+------------+---------------------+
| task_id | person_id | task_name | status | due_date_time |
+---------+-----------+-------------------+------------+---------------------+
| 1 | 111 | walk 20 min daily | INCOMPLETE | 2017-04-13 17:20:23 |
| 2 | 111 | brisk walk 30 min | COMPLETE | 2017-03-14 20:20:54 |
| 3 | 111 | take medication | COMPLETE | 2017-04-20 15:15:23 |
| 4 | 222 | sport | COMPLETE | 2017-03-18 14:45:10 |
+---------+-----------+-------------------+------------+---------------------+
I want to find out monthly compliance in percentage(completed task/total task * 100) of each person like
+---------------+-----------+------------+------------+
| compliance_id | person_id | compliance | month |
+---------------+-----------+------------+------------+
| 1 | 111 | 100 | 2017-03-01 |
| 2 | 111 | 50 | 2017-04-01 |
| 3 | 222 | 100 | 2017-03-01 |
+---------------+-----------+------------+------------+
Here person_id 111 has 1 task in month 2017-03-14 and which status is completed, as 1 out of 1 task is completed in march then compliance is 100%
Currently, I am using separate table which stores this compliance but I have to calculate compliance update that table every time the task status is changed
I have tried creating a view also but it's taking too much time to execute view almost 0.5 seconds for 1 million records.
CREATE VIEW `person_compliance_view` AS
SELECT
`t`.`person_id`,
CAST((`t`.`due_date_time` - INTERVAL (DAYOFMONTH(`t`.`due_date_time`) - 1) DAY)
AS DATE) AS `month`,
COUNT(`t`.`status`) AS `total_count`,
COUNT((CASE
WHEN (`t`.`status` = 'COMPLETE') THEN 1
END)) AS `completed_count`,
CAST(((COUNT((CASE
WHEN (`t`.`status` = 'COMPLETE') THEN 1
END)) / COUNT(`t`.`status`)) * 100)
AS DECIMAL (10 , 2 )) AS `compliance`
FROM
`task` `t`
WHERE
((`t`.`isDeleted` = 0)
AND (`t`.`due_date_time` < NOW())
GROUP BY `t`.`person_id` , EXTRACT(YEAR_MONTH FROM `t`.`due_date_time`)
Is there any optimized way to do it?
The first question to consider is whether the view can be optimized to give the required performance. This may mean making some changes to the underlying tables and data structure. For example, you might want indexes and you should check query plans to see where they would be most effective.
Other possible changes which would improve efficiency include adding an extra column "year_month" to the base table, which you could populate via a trigger. Another possibility would be to move all the deleted tasks to an 'archive' table to give the view less data to search through.
Whatever you do, a view will always perform worse than a table (assuming the table has relevant indexes). So depending on your needs you may find you need to use a table. That doesn't mean you should junk your view entirely. For example, if a daily refresh of your table is sufficient, you could use your view to help:
truncate table compliance;
insert into compliance select * from compliance_view;
Truncate is more efficient than delete, but you can't use a rollback, so you might prefer to use delete and top-and-tail with START TRANSACTION; ... COMMIT;. I've never created scheduled jobs in MySQL, but if you need help, this looks like a good starting point: here
If daily isn't often enough, you could schedule this to run more often than daily, but better options will be triggers and/or "partial refreshes" (my term, I've no idea if there is a technical term for the idea.
A perfectly written trigger would spot any relevant insert/update/delete and then insert/update/delete the related records in the compliance table. The logic is a little daunting, and I won't attempt it here. An easier option would be a "partial refresh" on called within a trigger. The trigger would spot user targetted by the change, delete only the records from compliance which are related to that user and then insert from your compliance_view the records relating to that user. You should be able to put that into a stored procedure which is called by the trigger.
Update expanding on the options (if a view just won't do):
Option 1: Daily full (or more frequent) refresh via a schedule
You'd want code like this executed (at least) daily.
truncate table compliance;
insert into compliance select * from compliance_view;
Option 2: Partial refresh via trigger
I don't work with triggers often, so can't recall syntax, but the logic should be as follows (not actual code, just pseudo-code)
AFTER INSERT -- you may need one for each of INSERT / UPDATE / DELETE
FOR EACH ROW -- or if there are multiple rows and you can trigger only on the last one to be changed, that would be better
DELETE FROM compliance
WHERE person_id = INSERTED.person_id
INSERT INTO compliance select * from compliance_view where person_id = INSERTED.person_id
END
Option 3: Smart update via trigger
This would be similar to option 2, but instead of deleting all the rows from compliance that relate to the relevant person_id and creating them from scratch, you'd work out which ones to update, and update them and whether any should be added / deleted. The logic is a little involved, and I'm not not going to attempt it here.
Personally, I'd be most tempted by Option 2, but you'd need to combine it with option 1, since the data goes stale due to the use of now().
Here's a similar way of writing the same thing...
Views are of very limited benefit in MySQL, and I think should generally be avoided.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(task_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,person_id INT NOT NULL
,task_name VARCHAR(30) NOT NULL
,status ENUM('INCOMPLETE','COMPLETE') NOT NULL
,due_date_time DATETIME NOT NULL
);
INSERT INTO my_table VALUES
(1,111,'walk 20 min daily','INCOMPLETE','2017-04-13 17:20:23'),
(2,111,'brisk walk 30 min','COMPLETE','2017-03-14 20:20:54'),
(3,111,'take medication','COMPLETE','2017-04-20 15:15:23'),
(4,222,'sport','COMPLETE','2017-03-18 14:45:10');
SELECT person_id
, DATE_FORMAT(due_date_time,'%Y-%m') yearmonth
, SUM(status = 'complete')/COUNT(*) x
FROM my_table
GROUP
BY person_id
, yearmonth;
person_id yearmonth x
111 2017-03 1.0
111 2017-04 0.5
222 2017-03 1.0
I have noticed a particular performance issue that I am unsure on how to deal with.
I am in the process of migrating a web application from one server to another with very similar specifications. The new server typically outperforms the old server to be clear.
The old server is running MySQL 5.6.35
The new server is running MySQL 5.7.17
Both the new and old server have virtually identical MySQL configurations.
Both the new and old server are running the exact same database perfectly duplicated.
The web application in question is Magento 1.9.3.2.
In Magento, the following function
Mage_Catalog_Model_Category::getChildrenCategories()
is intended to list all the immediate children categories given a certain category.
In my case, this function bubbles down eventually to this query:
SELECT `main_table`.`entity_id`
, main_table.`name`
, main_table.`path`
, `main_table`.`is_active`
, `main_table`.`is_anchor`
, `url_rewrite`.`request_path`
FROM `catalog_category_flat_store_1` AS `main_table`
LEFT JOIN `core_url_rewrite` AS `url_rewrite`
ON url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system=1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path LIKE '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC;
While the structure for this query is the same for any Magento installation, there will obviously be slight discrepancies on values between Magento Installation to Magento Installation and what category the function is looking at.
My catalog_category_flat_store_1 table has 214 rows.
My url_rewrite table has 1,734,316 rows.
This query, when executed on its own directly into MySQL performs very differently between MySQL versions.
I am using SQLyog to profile this query.
In MySQL 5.6, the above query performs in 0.04 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/JNKEpy/
In MySQL 5.7, the above query performs in 1.952 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/gWMgKZ/
As you can see, the same query on almost the exact same setup is virtually 2 seconds slower, and I am unsure as to why.
For some reason, MySQL 5.7 does not want to use the table index to help produce the result set.
Anyone out there with more experience/knowledge can explain what is going on here and how to go about fixing it?
I believe the issue has something to do with the way that MYSQL 5.7 optimizer works. For some reason, it appears to think that a full table scan is the way to go. I can drastically improve the query performance by setting max_seeks_for_key very low (like 100) or dropping the range_optimizer_max_mem_size really low to forcing it to throw a warning.
Doing either of these increases the query speed by almost 10x down to 0.2 sec, however, this is still magnitudes slower that MYSQL 5.6 which executes in 0.04 seconds, and I don't think either of these is a good idea as I'm not sure if there would be other implications.
It is also very difficult to modify the query as it is generated by the Magento framework and would require customisation of the Magento codebase which I'd like to avoid. I'm also not even sure if it is the only query that is effected.
I have included the minor versions for my MySQL installations. I am now attempting to update MySQL 5.7.17 to 5.7.18 (the latest build) to see if there is any update to the performance.
After upgrading to MySQL 5.7.18 I saw no improvement. In order to bring the system back to a stable high performing state, we decided to downgrade back to MySQL 5.6.30. After doing the downgrade we saw an instant improvement.
The above query executed in MySQL 5.6.30 on the NEW server executed in 0.036 seconds.
Wow! This is the first time I have seen something useful from Profiling. Dynamically creating an index is a new Optimization feature from Oracle. But it looks like that was not the best plan for this case.
First, I will recommend that you file a bug at http://bugs.mysql.com -- they don't like to have regressions, especially this egregious. If possible, provide EXPLAIN FORMAT=JSON SELECT... and "Optimizer trace". (I do not accept tweaking obscure tunables as an acceptable answer, but thanks for discovering them.)
Back to helping you...
If you don't need LEFT, don't use it. It returns NULLs when there are no matching rows in the 'right' table; will that happen in your case?
Please provide SHOW CREATE TABLE. Meanwhile, I will guess that you don't have INDEX(include_in_menu, is_active, path). The first two can be in either order; path needs to be last.
And INDEX(category_id, is_system, store_id, id_path) with id_path last.
Your query seems to have a pattern that works well for turning into a subquery:
(Note: this even preserves the semantics of LEFT.)
SELECT `main_table`.`entity_id` , main_table.`name` , main_table.`path` ,
`main_table`.`is_active` , `main_table`.`is_anchor` ,
( SELECT `request_path`
FROM url_rewrite
WHERE url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system = 1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
) as request_path
FROM `catalog_category_flat_store_1` AS `main_table`
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path like '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC
LIMIT 0, 1000
(The suggested indexes apply here, too.)
THIS is not a ANSWER only for comment for #Nigel Ren
Here you can see that LIKE also use index.
mysql> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,00 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | range | vals | vals | 515 | NULL | 3 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>
sample with LEFT()
mysql> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,01 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | index | NULL | vals | 515 | NULL | 5 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>
I am running a query to retrieve some game levels from a MySQL database. The query itself takes around 0.00025 seconds to execute on a base that contains 40 level strings. I thought it was satisfactory, until I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
I tried optimising by using explain and explain extended and adjusting the columns accordingly(adding indexes), but am always getting the same performance. What I noticed also is that MySQL didn't use indexes where they were available but instead did a full-table scan.
Results from EXPLAIN EXTENDED:
table id select_type type possible_keys key key_len ref rows Extra
users 1 SIMPLE ALL PRIMARY,id NULL NULL NULL 7 Using temporary; Using filesort
AllTime 1 SIMPLE ref PRIMARY,userid PRIMARY 4 Test.users.id 1
query:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname, AllTime.levelstr
FROM AllTime
INNER JOIN users
ON AllTime.userid=users.id
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
The tables:
users
| id(int) | nickname(varchar) |
| (Primary, Auto_increment) | |
|---------------------------|-------------------|
| 1 | username1 |
| 2 | username2 |
| 3 | username3 |
| ... | ... |
and AllTime
| id(int) | userid(int) | levelname(varchar) | levelstr(text) |
| (Primary, Auto_increment) | (index) | | |
|---------------------------|-------------|--------------------|----------------|
| 1 | 2 | levelname1 | levelstr1 |
| 2 | 2 | levelname2 | levelstr2 |
| 3 | 3 | levelname3 | levelstr3 |
| 4 | 1 | levelname4 | levelstr4 |
| 5 | 1 | levelname5 | levelstr5 |
| 6 | 1 | levelname6 | levelstr6 |
| 7 | 2 | levelname7 | levelstr7 |
Is there a way to optimize this query or would I be better off by calling two consecutive queries from php just to avoid the warning?
I am just learning MySQL, so please take that information into account when replying, thank you :)
I'm assuming you're using InnoDB.
For an INNER JOIN, MySQL typically starts with the table with the fewest rows, in this case users. However, since you just want the latest 20 AllTime records joined with the corresponding user records, you actually should start with AllTime since with the LIMIT, it will be the smaller data set.
Use STRAIGHT_JOIN to force the join order:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname,
AllTime.levelstr
FROM AllTime
STRAIGHT_JOIN users
ON users.id = AllTime.userid
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
It should be able to use the primary key on the AllTime table and follow it in descending order. It'll grab all the data on the same pages as it goes.
It should also use the primary key on the users table to grab the id and nickname. If there are more than just two columns, you might add a multi-column covering index on (id, nickname) to improve the speed.
If you can, convert the levelstr column to VARCHAR so that the data is stored on the same page as the rest of the data, otherwise, it has to go fetch the text columns separately. This assumes that your columns are under the 8000 byte row limit for InnoDB. There is no way to avoid the USING TEMPORARY unless you get rid of the text column.
Most likely, your host has identified this query by using the slow query log, which can identify all queries that don't use an index, or they may have red flagged it because of the Using temporary.
it doesn't look like the query has a problem.
Review the application code. Most likely the issue is in the code
Check MySQL query execution plan
possibly you are missing an index
Make sure you cache the data in Application and Database (fyi, sometimes you can load the whole database into Application memory)
Make sure you use a connection pool
Create a view (a very small chance for improvement)
Try to remove the "Order By" clause (again a very small chance it will improve the performance)
The query itself takes around 0.00025 seconds ... I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
Ask the website host for more details about why this query has been flagged for attention. A query that trivial is not going to cause strain on anything unless it is being called very frequently.
Find out how many times that query is being run. I will bet you a nickel that your site is getting hammered by a bot and being executed hundreds or thousands of times per minute. If so, then that's your real problem.
LIMIT ($value_from_php),20; -- if $value_form_php is huge, then the query is slow. This is because all the 'old' pages need to be scanned before getting to the 20 you need.
By "remembering where you left off" you can make every page equally fast. See this for further details: http://mysql.rjweb.org/doc.php/pagination
Edit:
actually aside from select distinct(which I haven't verified yet), the main performance bottle neck might be the network speed, when server and client are both on localhost,
select all 2 milion records took 36 seconds, however, on a (supposedly high speed) network with client sitting on another box, the query is not yet done after 10 minutes. This is supposedly 100mbps network but when I checked the client(java jdbc), it's receiving data at a rate of 3kb/second. The mysql server, however, is sending at a speed of 100kb/sec(including other client connections though).
Why is the java jdbc client receiving data at such a low rate?
select distinct(indexed_column) from mytable
is very slow on mytable with only 1 million rows, the indexed_column is a non-unique index.
is there a way to optimize it?
an explain gives this:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | SIMPLE | mytable | range | NULL | my_index | 50 | NULL | 1759002 | Using index for group-by |
does type=range means it's not using index? is this why it's slow?
I would build a unique index on the table, on the column you want "DISTINCT" of...
Hence your looking for DISTINCT on a given column. If you build a UNIQUE INDEX on a column(or columns) you are looking for distinct combinations, the indexed pages will only hold a pointer to the first record that qualifies for such combination.
Ex: if you have
Category Count
1 587
2 321
3 172
4 229
5 837
Your UNIQUE INDEX on category will only have 5 records... in this case, even though over 2,000 entries across 5 categories, the DISTINCT CATEGORY count is 5, the index has 5, you're done. Apply this concept to your table of 1 MILLION + records.