My problem is this:
select * from
(
select * from barcodesA
UNION ALL
select * from barcodesB
)
as barcodesTOTAL, boxes
where barcodesTotal.code=boxes.code;
Table barcodesA has 4000 entries
Table barcodesB has 4000 entries
Table boxes has like 180.000 entries
It takes 30 seconds to proccess the query.
Another problematic query:
select * from
viewBarcodesTotal, boxes
where barcodesTotal.code=boxes.code;
viewBarcodesTotal contains the UNION ALL from both barcodes tables. It also takes forever.
Meanwhile,
select * from barcodesA , boxes where barcodesA.code=boxes.code
UNION ALL
select * from barcodesB , boxes where barcodesB.code=boxes.code
This one takes <1second.
The question is obviously WHY?, is my code bugged? is mysql bugged?
I have to migrate from access to mysql, and i would have to rewrite all my code if the first option in bugged.
Add an index on boxes.code if you don't already have one. Joining 8000 records (4K+4K) to the 180,000 will benefit from an index on the 180K side of the equation.
Also, be explicit and specify the fields you need back in your SELECT statements. Using * in a production-use query is bad form as it encourages not having to think about what fields (and how big they might be), not to mention the fact that you have 2 different tables in your example, barcodesa and barcodesb with potentially different data types and column orders that you're UNIONing....
The REASON for the performance difference...
The first query says... First, do a complete union of EVERY record in A UNIONed with EVERY record in B, THEN Join it to boxes on the code. The union does not have an index to be optimized against.
By explicitly applying your SECOND query instance, each table individually IS optimized on the join (apparently there IS an index per performance of second, but I would ensure both tables have index on "code" column).
Related
It takes around 5 seconds to get result of query from a table consisting 1.5 million row. Query is "select * from table where code=x"
Is there a setting to increase speed ? Or should I jump to another database apart from MySQL ?
You could index the code column. Note that the trade off is that inserting new rows or updating the code column on existing rows will be slowed down a bit since the index also needs to be updated. In any event, you should benchmark the improvement to make sure it's worth it.
WHERE code=x -- needs INDEX(code)
SELECT * when many of the columns are bulky: Large columns are stored "off-record". Hence they take longer to fetch. So, explicitly list the columns you really need, hoping to leave out some of the bulky columns.
When a GROUP BY or LIMIT is involved, it is sometimes best to do
SELECT lots of columns
FROM ( SELECT id FROM t WHERE ... group-by or limit ) AS x
JOIN t AS y USING(id)
etc.
That is, start by finding just the ids as simply as possible, then JOIN back to the original table and other table(s). (This is not the case you presented, but I worry that you over-simplified it.)
I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;
I recently learned that I can search in a MySQL table across multiple columns by using the following select statement with OR:
SELECT * data WHERE TEMP = "3000" OR X ="3000" OR Y="3000";
Which returns the results needed, but it does take approximately 1.7 s to return the results in the table that has only ~260k rows. I also have already added indexes for each of the columns that are searched.
Is there a way to optimize this query? Or is there another one which is faster but returns the same results?
Another option is to use UNION...
SELECT * FROM data WHERE TEMP = "3000"
UNION
SELECT * FROM data WHERE X ="3000"
UNION
SELECT * FROM data WHERE Y="3000";
...however the real key to improving the performance is firstly indexes and second the query analyser. Often the data determines which is faster as TEMP may be a hundred times less likely that Y to be "3000" - so that should be first in you original OR statement for example.
By simple logic Id think yeah, is faster because the DBMS brings less info and needs less memory...however, I dont have a valid argument why could be faster.
If for example, I want to have a select from 2 related tables, with index and everything.
But I want to know why select tableA.field, tableA.field2, tableA.field3, tableBfield1, tableB,field2 from tableA, tableB
is actually faster than
select * from tableA,tableB
Both tables have about 3 million records and table A has about 14 fields and tableB got 18.
Any idea?
Thanks.
Reducing the number of fields selected means that less data has to be transmitted from the server to the client. It also reduces the amount of memory that the server and client have to use to hold the data selected. So these should improve performance once the server determines which rows should be in the result set.
It's not likely to have any significant impact on the speed of processing the query itself within the database server. That's dominated by the cost of joining the tables, filtering the rows based on the WHERE clause, and performing any calculations specified in the SELECT clause. These are all independent of the columns being selected. If you use EXPLAIN on the two queries, you won't see any difference.
you are joining two tables with 3 million rows each with no filter. that will make 9x10^12 rows. generating and transmitting to the client a resultset of a few fields, against all 32 fields will make a difference.
If you select all fields in the first query it's the same thing because you request the same amount of data. Check this http://sqlfiddle.com/#!9/27987/2
Maybe the difference of perfomance has another reason...like...other selects in running.
Essentially select * from tableA,tableB is the equivalent of the Cartesian product of the two tables, for a total of 3million x 3 million of rows.
Therefore:
select * from tableA,tableB
With the wildcards * you retrieve a table of 9million x 28 columns, while
select tableA.field, tableA.field2, tableA.field3, tableB.field1, tableB.field2 from tableA, tableB
with the explicit form you have a table of 9million x 5 columns...so less data!
SELECT DISTINCT
viewA.TRID,
viewA.hits,
viewA.department,
viewA.admin,
viewA.publisher,
viewA.employee,
viewA.logincount,
viewA.registrationdate,
viewA.firstlogin,
viewA.lastlogin,
viewA.`month`,
viewA.`year`,
viewA.businesscategory,
viewA.mail,
viewA.givenname,
viewA.sn,
viewA.departmentnumber,
viewA.sa_title,
viewA.title,
viewA.supemail,
viewA.regionname
FROM
viewA
LEFT JOIN viewB ON viewA.TRID = viewB.TRID
WHERE viewB.TRID IS NULL
I have two views with a about 10K and 5K records in them. They each come in very quickly - fraction of a second. When I try to get all of the records that are not in ViewB from ViewA, it works but it is very slow. All of the underlying TRID fields are same char set and all set to varchar (10) and indexed and tables are all Innodb. Right now the query is taking 16 seconds. Anything that I can do?
Normally, with JOIN, MySQL has to do a lookup for each joined record. Lookups are fast when using keys, but in your case, there aren't really any keys because the joined table is a view.
To try to get MySQL from running the query behind the second view once per record in the first view, we can use a subquery.
SELECT *
FROM viewA
WHERE TRID NOT IN (SELECT TRID FROM viewB);
This should allow MySQL to get all the TRID values for viewB in the subquery (in a temp table) then do a search over them for each record in viewA.
From MySQL docs:
MySQL executes uncorrelated subqueries only once. Use EXPLAIN to make
sure that a given subquery really is uncorrelated.
It is hard to optimize queries with views in MySQL. My first suggestion is to get rid of distinct unless you absolutely know that it is needed.
Then you might compare the performance with this query:
select viewA.*
from viewA
where not exists (select 1 from viewB where viewB.TRID = viewA.TRID);
It is hard to say whether one will be better than the other, but it is worth trying to see if this is better.