I recently learned that I can search in a MySQL table across multiple columns by using the following select statement with OR:
SELECT * data WHERE TEMP = "3000" OR X ="3000" OR Y="3000";
Which returns the results needed, but it does take approximately 1.7 s to return the results in the table that has only ~260k rows. I also have already added indexes for each of the columns that are searched.
Is there a way to optimize this query? Or is there another one which is faster but returns the same results?
Another option is to use UNION...
SELECT * FROM data WHERE TEMP = "3000"
UNION
SELECT * FROM data WHERE X ="3000"
UNION
SELECT * FROM data WHERE Y="3000";
...however the real key to improving the performance is firstly indexes and second the query analyser. Often the data determines which is faster as TEMP may be a hundred times less likely that Y to be "3000" - so that should be first in you original OR statement for example.
Related
By simple logic Id think yeah, is faster because the DBMS brings less info and needs less memory...however, I dont have a valid argument why could be faster.
If for example, I want to have a select from 2 related tables, with index and everything.
But I want to know why select tableA.field, tableA.field2, tableA.field3, tableBfield1, tableB,field2 from tableA, tableB
is actually faster than
select * from tableA,tableB
Both tables have about 3 million records and table A has about 14 fields and tableB got 18.
Any idea?
Thanks.
Reducing the number of fields selected means that less data has to be transmitted from the server to the client. It also reduces the amount of memory that the server and client have to use to hold the data selected. So these should improve performance once the server determines which rows should be in the result set.
It's not likely to have any significant impact on the speed of processing the query itself within the database server. That's dominated by the cost of joining the tables, filtering the rows based on the WHERE clause, and performing any calculations specified in the SELECT clause. These are all independent of the columns being selected. If you use EXPLAIN on the two queries, you won't see any difference.
you are joining two tables with 3 million rows each with no filter. that will make 9x10^12 rows. generating and transmitting to the client a resultset of a few fields, against all 32 fields will make a difference.
If you select all fields in the first query it's the same thing because you request the same amount of data. Check this http://sqlfiddle.com/#!9/27987/2
Maybe the difference of perfomance has another reason...like...other selects in running.
Essentially select * from tableA,tableB is the equivalent of the Cartesian product of the two tables, for a total of 3million x 3 million of rows.
Therefore:
select * from tableA,tableB
With the wildcards * you retrieve a table of 9million x 28 columns, while
select tableA.field, tableA.field2, tableA.field3, tableB.field1, tableB.field2 from tableA, tableB
with the explicit form you have a table of 9million x 5 columns...so less data!
Why following query takes so much time in mysql but not in oracle
select * from (select * from employee) as a limit 1000
I test this query in oracle and MySQL database with 50,00,000 records in this table.
I know that these query should be write like
select * from employee limit 1000
But for displaying data & total no of rows to our custom dynamic grid we have only one query we use simple select * from employee query and then add limit or other conditions.
we short this problem.
But my question is "Why such query in mysql takes too much time?"
Well, because to perform this query MySQL has to go over all 50K rows in the table, copy them to a temp table (in memory or on disk, depending on size) and then take the first 1000.
In MySQL, if you want to optimize queries with LIMIT, you need to follow some practices that prevent scanning the full data set (mainly indexes on the column you sort by).
See here: http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html
My problem is this:
select * from
(
select * from barcodesA
UNION ALL
select * from barcodesB
)
as barcodesTOTAL, boxes
where barcodesTotal.code=boxes.code;
Table barcodesA has 4000 entries
Table barcodesB has 4000 entries
Table boxes has like 180.000 entries
It takes 30 seconds to proccess the query.
Another problematic query:
select * from
viewBarcodesTotal, boxes
where barcodesTotal.code=boxes.code;
viewBarcodesTotal contains the UNION ALL from both barcodes tables. It also takes forever.
Meanwhile,
select * from barcodesA , boxes where barcodesA.code=boxes.code
UNION ALL
select * from barcodesB , boxes where barcodesB.code=boxes.code
This one takes <1second.
The question is obviously WHY?, is my code bugged? is mysql bugged?
I have to migrate from access to mysql, and i would have to rewrite all my code if the first option in bugged.
Add an index on boxes.code if you don't already have one. Joining 8000 records (4K+4K) to the 180,000 will benefit from an index on the 180K side of the equation.
Also, be explicit and specify the fields you need back in your SELECT statements. Using * in a production-use query is bad form as it encourages not having to think about what fields (and how big they might be), not to mention the fact that you have 2 different tables in your example, barcodesa and barcodesb with potentially different data types and column orders that you're UNIONing....
The REASON for the performance difference...
The first query says... First, do a complete union of EVERY record in A UNIONed with EVERY record in B, THEN Join it to boxes on the code. The union does not have an index to be optimized against.
By explicitly applying your SECOND query instance, each table individually IS optimized on the join (apparently there IS an index per performance of second, but I would ensure both tables have index on "code" column).
I need to query the MYSQL with some condition, and get five random different rows from the result.
Say, I have a table named 'user', and a field named 'cash'. I can compose a SQL like:
SELECT * FROM user where cash < 1000 order by RAND() LIMIT 5.
The result is good, totally random, unsorted and different from each other, exact what I want.
But I got from google that the efficiency is bad when the table get large because MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Then I go on searching and got a solution like:
SELECT * FROM `user` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `user`)- (SELECT MIN(id) FROM `user`))+(SELECT MIN(id) FROM `user`)) AS id) AS t2 WHERE t1.id >= t2.id AND cash < 1000 ORDER BY t1.id LIMIT 5;
This method uses JOIN and MAX(id). And the efficiency is better than the first one according to my testing. However, there is a problem. Since I also needs a condition "cash<1000" and if the the RAND() is so big that no row behind it has the cash<1000, then no result will return.
Anyone has good idea of how to compose the SQL that has have the same effect as the first one but has better efficiency?
Or, shall I just do simple query in MYSQL and let PHP randomly pick 5 different rows from the query result?
Your help is appreciated.
To make first query faster, just SELECT id - that will make the temporary table rather small (it will contain only IDs and not all fields of each row) and maybe it will fit in memory (temp table with text/blob are always created on-disk for example). Then when you get a result, run another query SELECT * FROM xy WHERE id IN (a,b,c,d,...). As you mentioned this approach is not very efficient, but as a quick fix this modification will make it several times faster.
One of the best approaches seems to be getting the total number of rows, choosing random numbers and for each result run a new query SELECT * FROM xy WHERE abc LIMIT $random,1. It should be quite efficient for random 3-5, but not good if you want 100 random rows each time :)
Also consider caching your results. Often you don't need different random rows to be displayed on each page load. Generate your random rows only once per minute. If you will generate the data for example via cron, you can live also with query which takes several seconds, as users will see the old data while new data are being generated.
Here are some of my bookmarks for this problem for reference:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
Avoid using IN(...) when selecting on indexed fields, It will kill the performance of SELECT query.
I found this here: https://wikis.oracle.com/pages/viewpage.action?pageId=27263381
Can you explain it? Why that will kill performance? And what should I use instead of IN. "OR" statement maybe?
To tell the truth, that statement contradicts to many hints that I have read in books and articles on MySQL.
Here is an example: http://www.mysqlperformanceblog.com/2010/01/09/getting-around-optimizer-limitations-with-an-in-list/
Moreover, expr IN(value, ...) itself has additional enhancements for dealing with large value lists, since it is supposed to be used as a useful alternative to certain range queries:
If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.
Still overusing INs may result in slow queries. Some cases are noted in the article.
Because MySQL can't optimize it.
Here is an example:
explain select * from keywordmaster where id in (1, 567899);
plan (sorry for external link. Doesn't show correctly here)
here is another query:
explain
select * from table where id = 1
union
select * from keywordmaster where id = 567899
plan
As you can see in the second query we get ref as const and type is const instead of range. MySQL can't optimize range scans.
Prior to MySQL 5.0 it seems that mySQL would only use a single index for a table. So, if you had a SELECT * FROM tbl WHERE (a = 6 OR b = 33) it could chooose to use either the a index or the b index, but not both. Note that it says fields, plural. I suspect the advice comes from that time and the work-around was to union the OR results, like so:
SELECT * FROM tbl WHERE (a = 6)
UNION
SELECT * FROM tbl WHERE (b = 33)
I believe IN is treated the same as a group of ORs, so using ORs won't help.
An alternative is to create a temporary table to hold the values of your IN-clause and then join with that temporary table in your SELECT.
For example:
CREATE TEMPORARY TABLE temp_table (v VARCHAR)
INSERT INTO temp_table VALUES ('foo')
INSERT INTO temp_table VALUES ('bar')
SELECT * FROM temp_table tmp, orig_table orig
WHERE temp_table.v = orig.value
DROP TEMPORARY TABLE temp_table