mysql delete join fails for no reason? - mysql

I am trying to do a join delete and it times out. Not sure if it is a cross join issue or a MySQL issue on my box. win 7 64, with MySQL 5.7.17 community.
had an issue with vipre av competing with trend wfb 9 last time this issue started and resolved itself (was testing AV products and I know you should not have 2 installed). I am planning on moving my small data mart to MySQL but can't figure out what is causing this issue and can't base everything on MySQL if I can't figure out what is going on.
So the SQL that keeps timing out is:
DELETE prod
FROM cmdata.vauto_feed_options as prod
INNER JOIN cmdata.vauto_feed_options_cache stage
using ( stock__ )
or
delete prod
FROM cmdata.vauto_feed_options as prod
INNER JOIN cmdata.vauto_feed_options_cache stage
ON ( prod.stock__ = stage.stock__
AND prod.vin = stage.vin)
or many other combinations of that with no luck.
the table does not have a natural primary key, and the only composite key would be all the fields. This is part of a data ELT and I just want to remove all traces of old data and load the new data for the records that arrived, while keeping old records that were not just sent as a text file (so no truncate table).
I have flipped innodb to myisam and back, deleted the table and recreated, excluded the path from av scans, disabled av, removed av (antivirus). I added va_option_id as an auto number pk just to have one.
new data comes in that has 50 to 100 records per vin and stock combo and the features text is different (floor mat, dvd player, etc). Actually normalized from a single field in a csv file to rows for my storage.
delete prod CREATE TABLE `vauto_feed_options` (
`va_option_id` int(11) NOT NULL AUTO_INCREMENT,
`Stock__` varchar(10) NOT NULL,
`VIN` varchar(17) NOT NULL,
`Features` varchar(200) NOT NULL,
PRIMARY KEY (`va_option_id`)
) ENGINE=InnoDB AUTO_INCREMENT=297269 DEFAULT CHARSET=utf8;
296303 records
CREATE TABLE `vauto_feed_options_cache` (
`Stock__` varchar(10) DEFAULT NULL,
`VIN` varchar(17) DEFAULT NULL,
`Features` varchar(200) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
58151 records
just did a similar delete join on a much larger dataset with no issue but those other tables have a nice primary key consistency (1 to 1)
delete prod
FROM tblname_cache
INNER JOIN tblname as prod
ON ( prod.company_number = tblname_cache.company_number
AND prod.record_key = tblname_cache.record_key )

You need to add a index on the column __stock in both tables.

Related

Update a table with longblobs is too slow

I have a table filled with data (about 20,000 records). I am trying to update it by the data from another table, but I have a timeout (30 seconds).
At first I tried a naive solution:
UPDATE TableWhithBlobs a
JOIN AnotherTable b on a.AnotherTableId = b.Id
SET a.SomeText= b.Description;
This script is working much longer then 30 seconds, so I tried to reduce join:
UPDATE TableWhithBlobs a
SET a.SomeText = (select b.Description from AnotherTable b where a.AnotherTableId = b.Id);
but this one is still very slow. Is there any cases how it could be fast?
Edit:
A bit explanation about what I'm doing. Previously, I had two tables, which in my script are called TableWhithBlobs and AnotherTable. In table TableWhithBlobs, a link to table AnotherTable was stored, but this link was not a real foreign key, it was just a guid from table AnotherTable. And there is a Unique key constraint for this reference in TableWhithBlobs for this guid. I decided to fix this, remove the old field from table TableWhithBlobs and add a normal foreign key to it (using the primary ID from AnotherTable). The script from the question just adds the correct data to this new field. After that, I delete old guid reference and add a new foreign key constraint. Everything works fine in the small amount of data in TableWhithBlobs, but on QA database with 20000 rows its extremely slow.
Update
SHOW CREATE TABLE TableWhithBlobs;
CREATE TABLE `TableWhithBlobs` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`AnotherTableId` char(36) CHARACTER SET ascii NOT NULL,
`ChunkNumber` bigint(20) NOT NULL,
`Content` longblob NOT NULL,
`SomeText` bigint(20) NOT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `AnotherTableId` (`AnotherTableId`,`ChunkNumber`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
SHOW CREATE TABLE AnotherTable ;
CREATE TABLE `AnotherTable` (
`Description` bigint(20) NOT NULL AUTO_INCREMENT,
`Id` char(36) CHARACTER SET ascii NOT NULL,
`Length` bigint(20) NOT NULL,
`ContentDigest` char(68) CHARACTER SET ascii NOT NULL,
`ContentAndMetadataDigest` char(68) CHARACTER SET ascii NOT NULL,
`Status` smallint(6) NOT NULL,
`ChunkStartNumber` bigint(20) NOT NULL DEFAULT '0',
`IsTestData` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`Description`),
UNIQUE KEY `Id` (`Id`),
UNIQUE KEY `ContentAndMetadataDigest` (`ContentAndMetadataDigest`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
PS. Column names may look weird because i want to hide the actual production scheme names.
innodb_buffer_pool_size is 134217728, RAM is 4Gb
Result of
explain UPDATE TableWhithBlobs a JOIN AnotherTable b on a.AnotherTableId =
b.Id SET a.SomeText= b.Description;
Version: mysql Ver 14.14 Distrib 5.7.21-20, for debian-linux-gnu (x86_64) using 6.3
Some thoughts, none of which jump out as "the answer":
Increase innodb_buffer_pool_size to 1500M, assuming this does not lead to swapping.
Step back and look at "why" the BIGINT needs to be copied over so often. And whether "all" rows need updating.
Put the LONGBLOB into another table in parallel with the current one. That will add a JOIN for the cases when you need to fetch the blob, but may keep it out of the way for the current query. (I would not expect the blob to be "in the way", but apparently it is.)
What is in the blob? In some situations, it is better to have the blob in a file. A prime example is an image for a web site -- it could be accessed via http's <img...>.
Increase the timeout -- but this just "sweeps the problem under the rug" and probably leads to 30+ second delays in other things that are waiting for it. I don't recognize 30 seconds as a timeout amount. Look through SHOW VARIABLES LIKE '%out'; Try increasing any that are 30.
Do the update piecemeal -- but would this have other implications? (Anyway, Luuk should carry this option forward.)
What about doing smaller updates?
UPDATE TableWhithBlobs a
JOIN AnotherTable b on a.AnotherTableId = b.Id
SET a.SomeText= b.Description
WHERE a.SomeText <> b.Description;
or even:
UPDATE TableWhithBlobs a
JOIN AnotherTable b on a.AnotherTableId = b.Id
SET a.SomeText= b.Description
WHERE a.SomeText <> b.Description
LIMIT 100;
Your timeout problem should be solved 😉, but i do not know how many times you have to run this to finally get the 0 rows affected...

Timeout with inner join on two MySQL tables

Here are two tables, with only 50K rows in each:
CREATE TABLE `ps_product_access` (
`id_order` int(10) UNSIGNED NOT NULL DEFAULT '0',
`id_product_access` int(10) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `ps_product_access`
ADD KEY `id_order` (`id_order`);
CREATE TABLE `ps_orders` (
`id_order` int(10) UNSIGNED NOT NULL,
`id_order_renew` int(10) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `ps_orders`
ADD PRIMARY KEY (`id_order`)
ADD KEY `ps_orders__id_order_renew__index` (`id_order_renew`);
The tables are overly simplified with only the relevant fields. There is no foreign key, but I can't add one right now (data is inconsistent in this database).
This query does not work (it means it's an infinite loading):
SELECT pa.`id_product_access`
FROM `ps_product_access` pa
INNER JOIN `ps_orders` o ON pa.id_order = o.id_order_renew;
I can't understand why? It seems pretty simple, just an inner join. I know I can optimize query with WHERE EXISTS but this is not the main question. This query should not run into an infinite loading, since there is almost no data (50k rows). Did I missed something?
side note: I run this query on a fresh install of MySQL 8 (installed via brew on a MacOS). I saw the same problem with the same data on another computer with a totally different config (ubuntu VM on windows, MySQL5)
The column id_order in ps_product_access defaults to 0, maybe you need to check how many rows you have with id_order = 0

'Impossible WHERE noticed...' with MySQL INNER JOIN on LONGTEXT cols. Options?

I am investigating a slow running SQL query on a production mySQL database, and looking for options on improving performance. I did not design or implement this, but I do need to fix it.
The intended purpose of the SQL is to check if the same datapacket has previously been inserted, and if so return the IDs of those previously inserted rows so the data inserted is not a duplicate. It attempts to do this with an INNER JOIN on itself via the LONGTEXT 'datapacket' column (which contain up to 60,000 characters of JSON data). There are currently close to 1 million records in this table, the SQL takes approx 30-60s to run each time, and this query runs hundreds-thousands of times each day.
CREATE TABLE `T_Upload` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(30) NOT NULL,
`datapacket` longtext,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_uploadtype` (`type`),
KEY `idx_timestamp` (`timestamp`)
) ENGINE=InnoDB CHARSET=ascii;
EXPLAIN
SELECT priorDuplicate.id
FROM T_Upload u INNER JOIN T_Upload priorDuplicate ON priorDuplicate.files = u.files
AND u.id > priorDuplicate.id
WHERE u.id = 3277515
AND u.type = 'mobile'
When I Run EXPLAIN on the SQL, I get..."Impossible WHERE noticed after reading const tables".
So, my Questions are:
Is this SQL always returning an empty recordset as per the 'EXPLAIN', and therefore a complete waste of system time & resources?
Is converting the LONGTEXT to a VARCHAR(65000) with an INDEX on first 20 characters (which contain a unique datapacket ID) a viable alternative?

Selecting rows where column value changed from previous row, user variables, innodb

I have a problem similar to
SQL: selecting rows where column value changed from previous row
The accepted answer by ypercube which i adapted to
CREATE TABLE `schange` (
`PersonID` int(11) NOT NULL,
`StateID` int(11) NOT NULL,
`TStamp` datetime NOT NULL,
KEY `tstamp` (`TStamp`),
KEY `personstate` (`PersonID`, `StateID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `states` (
`StateID` int(11) NOT NULL AUTO_INCREMENT,
`State` varchar(100) NOT NULL,
`Available` tinyint(1) NOT NULL,
`Otherstatuseshere` tinyint(1) NOT NULL,
PRIMARY KEY (`StateID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SELECT
COALESCE((#statusPre <> s.Available), 1) AS statusChanged,
c.PersonID,
c.TStamp,
s.*,
#statusPre := s.Available
FROM schange c
INNER JOIN states s USING (StateID),
(SELECT #statusPre:=NULL) AS d
WHERE PersonID = 1 AND TStamp > "2012-01-01" AND TStamp < "2013-01-01"
ORDER BY TStamp ;
The query itself worked just fine in testing, and with the right mix of temporary tables i was able to generate reports with daily sum availability from a huge pile of data in virtually no time at all.
The real problem comes in when i discovered that the tables where using the MyISAM engine, which we have completely abandoned, recreated the tables to use InnoDB, and noticed the query no longer works as expected.
After some bashing head into wall i have discovered that MyISAM seems to go over the columns each row in order (selecting statusChanged before updating #statusPre), while InnoDB seems to do all the variable assigning first, and only after that it populates result rows, regardless if the assigning happens in the select or where clauses, in functions (coalesce, greater etc), subqueries or otherwise.
Trying to accomplish this in a query without variables seems to always end the same way, a subquery requiring exponentially more time to process the more rows are in the set, resulting in a excrushiating minutes (or hours) long wait to get beginning and ending events for one status, while a finished report should include daily sums of multiple.
Can this type of query work on the InnoDB engine, and if so, how should one go about it?
or is the only feasible option to go for a database product that supports WITH statements?
Removing
KEY personstate (PersonID, StateID)
fixes the problem.
No idea why tho, but it was not really required anyway, the timestamp key is the more important one and speeds up the query nicely.

Optimizing MySQL table structure. Advice needed

I have these table structures and while it works, using EXPLAIN on certain SQL queries gives 'Using temporary; Using filesort' on one of the table. This might hamper performance once the table is populated with thousands of data. Below are the table structure and explanations of the system.
CREATE TABLE IF NOT EXISTS `jobapp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fullname` varchar(50) NOT NULL,
`icno` varchar(14) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `icno` (`icno`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `jobapplied` (
`appid` int(11) NOT NULL,
`jid` int(11) NOT NULL,
`jobstatus` tinyint(1) NOT NULL,
`timestamp` int(10) NOT NULL,
KEY `jid` (`jid`),
KEY `appid` (`appid`)
) ENGINE=MyISAM;
Query I tried which gives aforementioned statement:
EXPLAIN SELECT japp.id, japp.fullname, japp.icno, japp.status, japped.jid, japped.jobstatus
FROM jobapp AS japp
INNER JOIN jobapplied AS japped ON japp.id = japped.appid
WHERE japped.jid = '85'
AND japped.jobstatus = '2'
AND japp.status = '2'
ORDER BY japp.`timestamp` DESC
This system is for recruiting new staff. Once registration is opened, hundreds of applicant will register in a single time. They are allowed to select 5 different jobs. Later on at the end of registration session, the admin will go through each job one by one. I have used a single table (jobapplied) to store 2 items (applicant id, job id) to record who applied what. And this is the table which causes aforementioned statement. I realize this table is without PRIMARY key but I just can't figure out any other way later on for the admin to search specifically which job who have applied.
Any advice on how can I optimize the table?
Apart from the missing indexes and primary keys others have mentioned . . .
This might hamper performance once the
table is populated with thousands of
data.
You seem to be assuming that the query optimizer will use the same execution plan on a table with thousands of rows as it will on a table with just a few rows. Optimizers don't work like that.
The only reliable way to tell how a particular vendor's optimizer will execute a query on a table with thousands of rows--which is still a small table, and will probably easily fit in memory--is to
load a scratch version of the
database with thousands of rows
"explain" the query you're interested
in
FWIW, the last test I ran like this involved close to a billion rows--about 50 million in each of about 20 tables. The execution plan for that query--which included about 20 left outer joins--was a lot different than it was for the sample data (just a few thousand rows).
You are ordering by jobapp.timestamp, but there is no index for timestamp so the tablesort (and probably the temporary) will be necessary try adding and index for timestamp to jobapp something like KEY timid (timestamp,id)