I have a talbe with ~110k rows and 20 columns and no indexes. I wrote a query to update 9 columns of this table JOIN with another table which has many indexes. And the query took forever to run. I really dont know why. Here is my query:
UPDATE tonghop a JOIN testdone b
ON a.stt = b.stt
SET a.source = b.source, a.pid=b.pid, a.tenbenhnhan = b.fullname,
a.orderdoctor=b.orderdoctor, a.specialty = b.specialty, a.rdate = b.rdate,
a.icd_code = b.icd_code, a.servicegroup = b.servicegroup;
Really appreciate if someone could help
The Query you are executing is without a WHERE Clause which means that it is going to be executed on all the 110K records, and your Join Column "stt" must be indexed on both the tables in order to achieve better performance.
You should add an index on the column "stt".
Without indexes on both of the columns, JOINS are going to be slow.
You are most likely forcing MySql to read every single one of the 110k records to check whether they match.
With an index, MySql knows where these records are, and can quickly find them.
Try adding an index on tonghop.stt.
You could also try and run an EXPLAIN on the query, to see if it indeed does a so-called "full table scan".
https://dev.mysql.com/doc/refman/5.6/en/using-explain.html
Related
It takes around 5 seconds to get result of query from a table consisting 1.5 million row. Query is "select * from table where code=x"
Is there a setting to increase speed ? Or should I jump to another database apart from MySQL ?
You could index the code column. Note that the trade off is that inserting new rows or updating the code column on existing rows will be slowed down a bit since the index also needs to be updated. In any event, you should benchmark the improvement to make sure it's worth it.
WHERE code=x -- needs INDEX(code)
SELECT * when many of the columns are bulky: Large columns are stored "off-record". Hence they take longer to fetch. So, explicitly list the columns you really need, hoping to leave out some of the bulky columns.
When a GROUP BY or LIMIT is involved, it is sometimes best to do
SELECT lots of columns
FROM ( SELECT id FROM t WHERE ... group-by or limit ) AS x
JOIN t AS y USING(id)
etc.
That is, start by finding just the ids as simply as possible, then JOIN back to the original table and other table(s). (This is not the case you presented, but I worry that you over-simplified it.)
I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.
I have a table of 15.1 million records. I'm running the following query on it to process the records for duplicate checking.
select id, name, state, external_id
from companies
where dup_checked=0
order by name
limit 500;
When I use explain extended on the query it tells me it's using the index_companies_on_name index which is just an index on the company name. I'm assuming this is due to the ordering. I tried creating other indexes based on the name and dup_checked fields hoping it would use this one as it may be faster, but it still uses the index_companies_on_name index.
Initially it was fast enough, but now we're down to 3.3 million records left to check and this query is taking up to 90 seconds to execute. I'm not quite sure what else to do to make this run faster. Is a different index the answer or something else I'm not thinking of? Thanks.
Generally the trick here is to create an index that filters first, reducing the number of rows ("Cardinality"), and has the ordering applied secondarily:
CREATE INDEX `index_companies_on_dup_checked_name`
ON `companies` (`dup_checked`,`name`)
That should give you the scope you need.
SELECT DISTINCT
viewA.TRID,
viewA.hits,
viewA.department,
viewA.admin,
viewA.publisher,
viewA.employee,
viewA.logincount,
viewA.registrationdate,
viewA.firstlogin,
viewA.lastlogin,
viewA.`month`,
viewA.`year`,
viewA.businesscategory,
viewA.mail,
viewA.givenname,
viewA.sn,
viewA.departmentnumber,
viewA.sa_title,
viewA.title,
viewA.supemail,
viewA.regionname
FROM
viewA
LEFT JOIN viewB ON viewA.TRID = viewB.TRID
WHERE viewB.TRID IS NULL
I have two views with a about 10K and 5K records in them. They each come in very quickly - fraction of a second. When I try to get all of the records that are not in ViewB from ViewA, it works but it is very slow. All of the underlying TRID fields are same char set and all set to varchar (10) and indexed and tables are all Innodb. Right now the query is taking 16 seconds. Anything that I can do?
Normally, with JOIN, MySQL has to do a lookup for each joined record. Lookups are fast when using keys, but in your case, there aren't really any keys because the joined table is a view.
To try to get MySQL from running the query behind the second view once per record in the first view, we can use a subquery.
SELECT *
FROM viewA
WHERE TRID NOT IN (SELECT TRID FROM viewB);
This should allow MySQL to get all the TRID values for viewB in the subquery (in a temp table) then do a search over them for each record in viewA.
From MySQL docs:
MySQL executes uncorrelated subqueries only once. Use EXPLAIN to make
sure that a given subquery really is uncorrelated.
It is hard to optimize queries with views in MySQL. My first suggestion is to get rid of distinct unless you absolutely know that it is needed.
Then you might compare the performance with this query:
select viewA.*
from viewA
where not exists (select 1 from viewB where viewB.TRID = viewA.TRID);
It is hard to say whether one will be better than the other, but it is worth trying to see if this is better.
I have about 1 million rows on users table and have columns A AA B BB C CC D DD E EE F FF by example to count int values 0 & 1
SELECT
CityCode,SUM(A),SUM(B),SUM(C),SUM(D),SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode
Result 8 rows in set (24.49 sec).
How to make my statement more faster?
Use explain to to know the excution plan of your query.
Create atleast one or more Index. If possible make CityCode primary key.
Try this one
SELECT CityCode,SUM(A),SUM(B),SUM(C),SUM(D), SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode,A,B,C,D,E,F,AA,BB,CC,DD,EE,FF
Create an index on the CityCode column.
I believe it is not because of SUM(), try to say select CityCode from users group by CityCode; it should take neary the same time...
Use better hardware
increase caching size - if you use InnoDB engine, then increase the innodb_buffer_pool_size value
refactor your query to limit the number of users (if business logic permits that, of course)
You have no WHERE clause, which means the query has to scan the whole table. This will make it slow on a large table.
You should consider how often you need to do this and what the impact of it being slow is. Some suggestions are:
Don't change anything - if it doesn't really matter
Have a table which contains the same data as "users", but without any other columns that you aren't interested in querying. It will still be slow, but not as slow, especially if there are bigger ones
(InnoDB) use CityCode as the first part of the primary key for table "users", that way it can do a PK scan and avoid any sorting (may still be too slow)
Create and maintain some kind of summary table, but you'll need to update it each time a user changes (or tolerate stale data)
But be sure that this optimisation is absolutely necessary.