Slow MySQL query on update statement - mysql

I am trying to move some data from a database to another. I am currently having over a million entries in my database and I was expecting this to take long but already passed 50min and no result :) .
Here is my query:
UPDATE xxx.product AS p
LEFT JOIN xx.tof_art_lookup AS l ON p.model_view = l.ARL_SEARCH_NUMBER
SET p.model = l.ARL_DISPLAY_NR
WHERE p.model_view = l.ARL_SEARCH_NUMBER;
Any help on how to improve this query will be welcome. Thanks in advance!

Indexes on p.model_view, l.ARL_SEARCH_NUMBER if you not gonna get rid of JOINs.
Actually, it might be optimized depending on actual data amounts and their values (NULLs presence) by use of:
1. Monitoring query execution plan and , if it's not good, putting query hints for compiler or exchange JOINs for subqueries so compiler uses another type of join inside it (merge/nested loops/hashs/whatever)
2. Making a stored procedure with more comlicated but faster logic
3. Doing updates by small portions

Identify what makes slow.
check JOIN is optimized
run SELECT only:
SELECT COUNT(*)
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l
ON p.model_view = l.ARL_SEARCH_NUMBER;
how long takes? and EXPLAIN SELECT ... check proper INDEX is used for JOIN.
If everything is fine for JOIN, then UPDATEing row is slow. this situation is hard to make things faster.
UPDATE = DELETE and INSERT
I didn't tried this. but sometimes, this strategy is faster.. UPDATE is DELETE old row and INSERT new row using new value.
// CREATE new table and INSERT
CREATE TABLE xxx.new_product
SELECT p.model_model, l. ARL_DISPLAY_NR, ...
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l
ON p.model_view = l.ARL_SEARCH_NUMBER;
// drop xxx.procuct
// rename xxx.new_product to xxx.product
divide table into small chunk, and run concurrently
I think your job is CPU bounded and your UPDATE query uses just one CPU can't have benefit many cores. xxx.product TABLE has no constraint for join, there for 1M rows are updated sequencially
My suggestion following.
give some conditions to xxx.product so that xxx.product divided 20 group. (I don't no which column would be better for you, as I have no information about xxx.product)
then run 20 queries at once concurrently.
for example:
// for 1st chunk
UPDATE xxx.product AS p
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
AND p.column BETWEEN val1 AND val2; <= this condition spliting xxx.product
// for 2nd chunk
UPDATE xxx.product AS p
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
AND p.column BETWEEN val2 AND val3;
...
...
// for 20th chunk
UPDATE xxx.product AS p
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
AND p.column BETWEEN val19 AND val20;
It is important to find BETWEEN value distribute table evenly. Histogram may help you. Getting data for histogram plot

Related

updating multiple tables mysql optimization

I am trying to update multiple tables that use the same column called "Team". I created a update statement but very slow and takes way to long. Can I get some tips to optimize and run faster?
update QB, RB, WR, passing, rushing, receiving
set qb.team='GB',
rb.team='GB',
wr.team='GB',
passing.team='GB',
rushing.team='GB',
receiving.team='GB'
where qb.team=('GNB') or
(rb.team='GNB') or
(wr.team='GNB') or
(passing.team='GNB') or
(rushing.team='GNB') or
(receiving.team='GNB');
You're doing a huge cross join on all six of your tables. This means that the criteria in your WHERE clause are scanning through a very large number of joined rows. Specifically you're scanning the product of the number of rows in all six tables.
Instead, you should write your query like this.
update QB
join RB ON QB.something = RB.something
join WR ON QB.something = WR.something ... etc
SET QB.team = 'GB', RB.team='GB' ... etc
WHERE something

Optimize query mysql

i have a problem with a query for a web site. This is the situation:
I have 3 table:
articoli = where there are all article
clasart = where there are all the matches between the code article and class code - 32314 rows
classificazioni = where there are all matches between class code and name of class - 2401 rows
and this is the query
SELECT a.clar_classi , b.CLA_DESCRI
FROM clasart a JOIN (
SELECT art.AI_CAPOCODI, art.ai_codirest
FROM (select * from clasart where clar_azienda = 'SRL') a
JOIN (
SELECT AI_CAPOCODI, AI_CODIREST,AI_DT_CREAZ,
AI_DESCRIZI, AI_CATEMERC, concat(AI_CAPOCODI, AI_CODIREST) as codice, aI_grupscon
FROM articoli
WHERE AI_AZIENDA = 'SRL' AND AI_CATEMERC LIKE '0101______' AND AI_FLAG_NOW = 0 AND AI_CAPOCODI <> 'zzz'
) art ON trim(a.CLAR_ARTICO) = art.AI_CODIREST
JOIN classificazioni b ON a.CLAR_CLASSI = b.CLA_CODICE
WHERE b.CLA_CODICE LIKE 'AA51__'
group by CLAR_ARTICO) art ON trim(CLAR_ARTICO) = concat(art.AI_CAPOCODI, art.ai_codirest)
JOIN classificazioni b ON a.CLAR_CLASSI = b.CLA_CODICE
WHERE CLAR_AZIENDA = 'SRL' AND CLAR_CLASSI like 'CO____'
The time of run is 16 second. The time increase to 16 second when join with classificazioni.
You can help me? Thanks
Introduce following indexing using the queries below and after that the query will start running within a second or two:
ALTER TABLE articoli ADD INDEX idx_artc_az_cat_flg_cap (AI_AZIENDA, AI_FLAG_NOW, AI_CAPOCODI, AI_CATEMERC);
The above query will introduce the multi-column indexes on articoli table. The indexing work similar way how hash tables or keys of the array work to directly identifying the row on which the target value(s) match. Using multi-column will result in comparison of less number of rows.
Do not use trim(a.CLAR_ARTICO): make sure that before insertion the values are trimmed but not at the time of joining. This can result in skipping the index files and the join comparison can be expensive this way.
Let's move to next steps:
Introduce index on clar_azienda using following query:
ALTER TABLE clasart ADD INDEX idx_cls_az (clar_azienda);
If art.AI_CODIREST is not a primary/foreign key you'll need to introduce index there using the query below:
ALTER TABLE classificazioni ADD INDEX idx_clsi_cd (CLA_CODICE);
We are almost done, you'll just need to index CLAR_AZIENDA as well the same way how I indexed the above columns. Let me also tell you what is what in index column last query so you can write your own.
ALTER TABLE <tableName> ADD INDEX <indexKey (<column to be indexed>);
Let me know if you still have issues, remember you can run these queries after selecting your database from PhpMyAdmin (SQL tabl) or on mysql console.

Optimize "JOIN" query

this is my query from my source code
SELECT `truyen`.*, MAX(chapter.chapter) AS last_chapter
FROM (`truyen`)
LEFT JOIN `chapter` ON `chapter`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE \'%%\'
GROUP BY `truyen`.`Id`
LIMIT 250
When I install it on iFastnet host, It cause over 500,000 rows to be examined due to the join, and the query is being blocked (this would used over 100% of a CPU, which ultimately would cause server instability).
I also tried to add this line before the query, it fixed the problem above but lead to another issue making some of functions can not run correctly
mysql_query("SET SQL_BIG_SELECTS=1");
How can I fix this problem without buying another hosting ?
Thanks.
You might be looking for an INNER JOIN. That would remove results that do not match. I find INNER JOINs to be faster than LEFT JOINs.
However, I'm not sure what results you are actually looking for. But because you are using the GROUP BY, it looks like the INNER JOIN might work for you.
One thing I would recommend is copy and paste the query that it generates into SQL with DESCRIBE before it.
So if the query ended up being:
SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
You would type:
DESCRIBE SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
This will tell you if you could possibly ad an index to your table to JOIN on faster.
I hope this at least points you in the right direction.
Michael Berkowski seems to agree with the indexing, which you will be able to see from the DESCRIBE.
Please look if you have indexes on chapter.chapter and chapter.truyen. If not, set them and try again. If this is not successful try these suggestions:
Do you have the possibility to flag permanently on insert/update your last chapter in a column of your chapter table? Then you could use it to reduce the joined rows and you could drop out the GROUP BY. Maybe in this way:
SELECT `truyen`.*, `chapter`.`chapter` as `last_chapter`
FROM `truyen`, `chapter`
WHERE `chapter`.`truyen` = `truyen`.`Id`
AND `chapter`.`flag_last_chapter` = 1
AND `truyen`.`title` LIKE '%queryString%'
LIMIT 250
Or create a new table for that instead:
INSERT INTO new_table (truyen, last_chapter)
SELECT truyen, MAX(chapter) FROM chapter GROUP BY truyen;
SELECT `truyen`.*, `new_table`.`last_chapter`
FROM (`truyen`)
LEFT JOIN `new_table` ON `new_table`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE '%queryString%'
GROUP BY `truyen`.`Id`
LIMIT 250
Otherwise you could just fetch the 250 rows of truyen, collect your truyen ids in an array and build another SQL Statement to select the 250 rows of the chapter table. I have seen in your original question that you can use PHP for that. So you could merge the results after that:
SELECT * FROM truyen
WHERE title LIKE '%queryString%'
LIMIT 250
SELECT truyen, MAX(chapter) AS last_chapter
FROM chapter
WHERE truyen in (comma_separated_ids_from_first_select)

Select taking too long. Need advice for a better performance

Ok, here we go. There's this messy SELECT crossing other tables and ordering to get the one desired row. Basically I do the "math" inside the ORDER BY.
1 base table.
7 JOINS poiting to local tables.
WHERE with 2 clauses and a NOT IN crossing another table.
You'll see in the code the ORDER BY is pretty damn big/ugly, it sums the result of 5 different calculations. I need that result to order by those calculations in order to get the worst row-case.
The problem is once I execute the Stored Procedure it takes up to 8 seconds to run. That's kind of non-acceptable. So, I'm starting to check Indexes.
So, I'm looking for advices on how to make this query run faster.
I'm indexing the WHERE clauses and the field LINEA, Should I index something else? Like the rows Im crossing for the JOINs? or should I approach the query differently?
Query:
SET #LINEA = (
SELECT TOP 1
BOA.LIN
FROM
BAND_BA BOA
LEFT JOIN
TEL PAR
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(PAR.Te,2,10)
LEFT JOIN
TELP CLP
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(CLP.Numtel,2,10)
LEFT JOIN
CA C
ON REPLACE(BOA.Lin,'-','') = C.An
LEFT JOIN
RE R
ON REPLACE(BOA.Lin,'-','') = R.Lin
LEFT JOIN
PRODUCTOS2 P2
ON BOA.PRODUCTO = P2.codigo
LEFT JOIN
EN
ON REPLACE(BOA.Lin,'-','') = EN.G
LEFT JOIN
TIP ID
ON TIPID = ID.ID
WHERE
BOA.EST = 'C' AND
ID.SE = 'boA' AND
BOA.LIN NOT IN (
SELECT
LIN
FROM
BAN
)
ORDER BY (EN.VALUE + ANT.VALUE + REIT.VAL + C.VALUE + TEL.VALUE
) DESC,
I'll be frank, this is some pretty terrible SQL. Without seeing all your table structures, advice here will be incomplete. That being said, please don't post all your table structures because you are already very close to "hire a consultant" territory with this.
All the REPLACE logic should be done away with. If you need to JOIN on these fields, then add comparable fields to the tables so you don't need to manipulate the data. Every single JOIN that uses a REPLACE or SUBSTRING is a table or index scan - those are non-SARGable and a definite anti-pattern.
The ORDER BY is probably the most convoluted ORDER BY I have ever seen. Some major issues there:
Subqueries should all be eliminated and materialized either in the outer query or as variables
String manipulation should be eliminated (see item 1 above)
The entire query is basically a code smell. If you need to write code like this to meet business requirements then you either have a terribly inappropriate design or some other much larger issue in the organization or data.
One thing that can kill performance is using a lot of LEFT JOINs. To improve performance of LEFT JOIN, you might want to make sure that the column(s) to which you join have an index - that can have a huge impact on performance.

how can I speed up my queries?

so I have a 560mb db with the largest table 500mb(over 10 million rows)
my query hase to join 5 tables and takes about 10 seconds to finish....
SELECT DISTINCT trips.tripid AS tripid,
stops.stopdescrption AS "perron",
Date_format(segments.segmentstart, "%H:%i") AS "time",
Date_format(trips.tripend, "%H:%i") AS "arrival",
Upper(routes.routepublicidentifier) AS "lijn",
plcend.placedescrption AS "destination"
FROM calendar
JOIN trips
ON calendar.vsid = trips.vsid
JOIN routes
ON routes.routeid = trips.routeid
JOIN places plcstart
ON plcstart.placeid = trips.placeidstart
JOIN places plcend
ON plcend.placeid = trips.placeidend
JOIN segments
ON segments.tripid = trips.tripid
JOIN stops
ON segments.stopid = stops.stopid
WHERE stops.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
AND calendar.vscdate = Str_to_date('25-10-2011', "%e-%c-%Y")
AND segments.segmentstart >= Str_to_date('15:56', "%H:%i")
AND routes.routeservicetype = 0
AND segments.segmentstart > "00:00:00"
ORDER BY segments.segmentstart
what are things I can do to speed this up? any tips are welcome, i'm pretty new to sql...
but I can't change the structure of the db because it's not mine...
Use EXPLAIN to find the bottlenecks: http://dev.mysql.com/doc/refman/5.0/en/explain.html
Then perhaps, add indexes.
If you don't need to select ALL rows, use LIMIT to limit returned result count.
Just looking at the query, I would say that you should make sure that you have indexes on trips.vsid, calendar.vscdate, segments.segmentstart and routes.routeservicetype. I assume that there is already indexes on all the primary keys in the tables.
Using explain as Briedis suggested would show you how well the indexes work.
You might want to add covering indexes for some tables, like for example an index on trips.vsid where tripid and routeid are included. That way the database can use only the index for the data that is needed from the table, and not read from the actual table.
Edit:
The execution plan tells you that it successfully uses indexes for everything except the segments table, where it does a table scan and filters by the where condition. You should try to make a covering index for segments.segmentstart by including tripid and stopid.
Try adding a clusters index to the routes table on both routeservicetype and routeid.
Depending on the frequency of the data within the routeservicetype field, you may get an improvement by shrinking the amount of data being compared in the join to the trips table.
Looking at the explain plan, you may also want to force the sequence of the table usage by using STRAIGHT_JOIN instead of JOIN (or INNER JOIN), as I've had real improvements with this technique.
Essentially, put the table with the smallest row-count of extracted data at the beginning of the query, and the largest row count table at the end (in this case possibly the segments table?), with the exception of simple lookups (eg. for descriptions).
You may also consider altering the WHERE clause to filter the segments table on stopid instead of the stops table, and creating a clustered index on the segments table on (stopid, tripid and segmentstart) - this index will be effectively able to satisfy two joins and two where clauses from a single index...
To build the index...
ALTER TABLE segments ADD INDEX idx_qry_helper ( stopid, tripid, segmentstart );
And the altered WHERE clause...
WHERE segments.stopid IN ( 43914, 23899, 23925, 23908,
23913, 19899, 23871, 43902,
23876, 25563, 18956, 19912,
23889, 23861, 23879, 23884,
23856, 19920, 19898, 23916,
23894, 20985, 23930, 20932,
20986, 22434, 20021, 19893,
19903, 19707, 19935 )
:
:
At the end of the day, a 10 second response for what appears to be a complex query on a fairly large dataset, isn't all that bad!