I'm trying to delete (and update, but if I can delete than I'll be able to update) product data from MySQL website database using SSIS, when those products have been marked in our ERP (and in the sql server database used for reporting) as discontinued. I've tried the following:
First Attempt: Saving the rows-to-be-deleted to a recordset and using a for-each loop with an execute sql task to delete them as described here.
Result: Partially works, but is extremely slow and fails after about 500 deletes each time. Makes me wonder if the MySql database has some kind of hacker-protection feature.
Second Attempt: Converting the primary key for all rows-to-be-deleted into a comma-separated string variable using FOR XML PATH : as described here (or, rather, a series of them because of the 4000 char limit).
SQL Select Code (works fine)
WITH CTE (Product_sku,rownumber) AS
(
SELECT product_sku
, row_number() over(order by product_sku)
FROM product_updates
WHERE action = 'delete'
)
SELECT
Delete1= cast(
(SELECT TOP 1
STUFF(
(SELECT ',''' + product_sku+'''' FROM CTE
WHERE cte.RowNumber BETWEEN 1 and 700
FOR XML PATH (''))
, 1, 1, '') )
AS varchar(8000))
... and nine more of these select statements into additional variables to allow for larger delete operations.
And then using this result to delete records from MySql using an Execute SQL command with the following code:
DELETE FROM datarepo.product
WHERE product_sku in (?)
Result: The package executed but failed to delete anything. When viewing the MySql query log file I saw the following, which tells me why it failed to delete anything.
DELETE FROM datarepo.product
WHERE product_sku in ('\'')
Note that this same SSIS Execute SQL statement , when using hardcoded values (like the following), deletes just fine.
DELETE FROM datarepo.product
WHERE product_sku in ('1234','5678','abcd', etc...)
I haven't been able to find anything else online. As Reza Rad said in the first linked post, it's hard to find material about using SSIS to perform operations on MySql.
Related
I'm writing a script that locates all branches of a specific repo that haven't received any commits for more than 6 months and deletes them (after notifying committers).
This script will run from Jenkins every week, will store all these branches in some MySQL database and then in the next run (after 1 week), will pull the relevant branch names from the database and will delete them.
I want to make sure that if for some reason the script is run twice on the same day, relevant branches will not get added again to the database, so I check it using a SQL query:
def insert_data(branch_name):
try:
connection = mysql.connector.connect(user=db_user,
host=db_host,
database=db_name,
passwd=db_pass)
cursor = connection.cursor(buffered=True)
insert_query = """insert into {0}
(
branch_name
)
VALUES
(
\"{1}\"
) where not exists (select 1 from {0} where branch_name = \"{1}\" and deletion_date is NULL) ;""".format(
db_table,
branch_name
)
cursor.execute(insert_query, multi=True)
connection.commit()
except Exception as ex:
print(ex)
finally:
cursor.close()
connection.close()
When I run the script, for some reason, the branch_name variable is cut in the middle and then the query that checks if the branch name already exists in the database fails:
1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'where not exists (select 1 from branches_to_delete where branch_name = `AUT-1868' at line 8
So instead of checking for 'AUT-18681_designer_create_new_name_if_illegal_char_exist' it checks for 'AUT-1868' which doesn't exist in the database.
I've tried the following:
'{1}'
"{1}"
{1}
But to no avail.
What am I doing wrong?
Using WHERE statement in INSERT INTO query is illegal:
INSERT INTO `some_table`(`some_column`)
VALUES ('some_value') WHERE [some_condition]
So, the above example is not valid MySQL query. For prevent duplication of branch_name you should add unique index on your table like:
ALTER TABLE `table` ADD UNIQUE INDEX `unique_branch_name` (`branch_name`);
And after this you can use next query:
INSERT INTO `table` (`branch_name`) VALUES ('branch_name_1')
ON DUPLICATE KEY UPDATE `branch_name` = `branch_name`;
Pay attention: If your table have auto-increment id, it will be incremented on each insert attempt
Since MySQL 8.0 you can use JASON_TABLE function for generate pseudo table from your values filter it from already exists values and use it fro insert. Look here for example
I don't see anything wrong assuming the source of the branch_name is safe (you are not open to SQL Injection attacks), but as an experiment you might try:
insert_query = f"""insert into {db_table}(branch_name) VALUES(%s) where not exists
(select 1 from {db_table} where branch_name = %s and deletion_date is NULL)"""
cursor.execute(insert_query, (branch_name, branch_name))
I am using a prepared statement (which is also SQL Injection-attack safe) and thus passing the branch_name as a parameters to the execute method and have also removed the multi=True parameter.
Update
I feel like a bit of a dummy for missing what is clearly an illegal WHERE clause. Nevertheless, the rest of the answer suggesting the use of a prepared statement is advice worth following, so I will keep this posted.
I am running below query to get duplicate records from existing table. Below is the process i am doing
Uploading CSV
Load data in to temp table
writing query to check if duplicate records with same phone # already exist in table by joining temporary table and current table.
My query is working fine upto 100,000 records in current table but on live system data in current table is more that 10,000,000 so query output is timed out .
my query is
select
tempTbl.id + 1 as SrNo,
`tempTbl`.`phone` as `phone`,
( CASE WHEN count(panelists.id) > 1 THEN
CONCAT( CONCAT('Phone Already Exist with Panelist ID ', panelists.id
,' and Duplidate counts is ',count('panelists.id'))
ELSE
CONCAT('Phone Already Exist with Panelist ID ',' ',panelists.id)
END
) AS reason from `panelists` as `panelists`
inner join `temp` as `tempTbl` on `panelists`.`phone` = `tempTbl`.`phone`
where `panelists`.`panel_id` = ? group by `tempTbl`.`phone`
having tempTbl.phone != ''
I would be thankful if any help suggestion to optimize my query . Thanks in advance.
Have you checked and tried to change and increase default value in MySQL configuration file (option connect_timeout in mysqld section) -
[mysqld] connect_timeout=100
Please see and compare the Configuration parameter in both the databases servers.If query is getting executed successfully on one server we can check the configuration params of the both.
i have 10 tables with same structure except table name.
i have a sp (stored procedure) defined as following:
select * from table1 where (#param1 IS NULL OR col1=#param1)
UNION ALL
select * from table2 where (#param1 IS NULL OR col1=#param1)
UNION ALL
...
...
UNION ALL
select * from table10 where (#param1 IS NULL OR col1=#param1)
I am calling the sp with the following line:
call mySP('test') //it executes in 6,836s
Then I opened a new standard query window. I just copied the query above. Then replaced #param1 with 'test'.
This executed in 0,321s and is about 20 times faster than the stored procedure.
I changed the parameter value repeatedly for preventing the result to be cached. But this did not change the result. The SP is about 20 times slower than the equivalent standard query.
Please can you help me to figure out why this is happening ?
Did anybody encounter similar issues?
I am using mySQL 5.0.51 on windows server 2008 R2 64 bit.
edit: I am using Navicat for test.
Any idea will be helpful for me.
EDIT1:
I just have done some test according to Barmar's answer.
At finally i have changed the sp like below with one just one row:
SELECT * FROM table1 WHERE col1=#param1 AND col2=#param2
Then firstly i executed the standart query
SELECT * FROM table1 WHERE col1='test' AND col2='test' //Executed in 0.020s
After i called the my sp:
CALL MySp('test','test') //Executed in 0.466s
So i have changed where clause entirely but nothing changed. And i called the sp from mysql command window instead of navicat. It gave same result. I am still stuck on it.
my sp ddl:
CREATE DEFINER = `myDbName`#`%`
PROCEDURE `MySP` (param1 VARCHAR(100), param2 VARCHAR(100))
BEGIN
SELECT * FROM table1 WHERE col1=param1 AND col2=param2
END
And col1 and col2 is combined indexed.
You could say that why dont you use standart query then? My software design is not proper for this. I must use stored procedure. So this problem is highly important to me.
EDIT2:
I have gotten query profile informations. Big difference is because of "sending data row" in SP Profile Information. Sending data part takes %99 of query execution time. I am doing test on local database server. I am not connecting from remote computer.
SP Profile Informations
Query Profile Informations
I have tried force index statement like below in my sp. But same result.
SELECT * FROM table1 FORCE INDEX (col1_col2_combined_index) WHERE col1=#param1 AND col2=#param2
I have changed sp like below.
EXPLAIN SELECT * FROM table1 FORCE INDEX (col1_col2_combined_index) WHERE col1=param1 AND col2=param2
This gave this result:
id:1
select_type=SIMPLE
table:table1
type=ref
possible_keys:NULL
key:NULL
key_len:NULL
ref:NULL
rows:292004
Extra:Using where
Then i have executed the query below.
EXPLAIN SELECT * FROM table1 WHERE col1='test' AND col2='test'
Result is:
id:1
select_type=SIMPLE
table:table1
type=ref
possible_keys:col1_co2_combined_index
key:col1_co2_combined_index
key_len:76
ref:const,const
rows:292004
Extra:Using where
I am using FORCE INDEX statement in SP. But it insists on not using index. Any idea? I think i am close to end :)
Just a guess:
When you run the query by hand, the expression WHERE ('test' IS NULL or COL1 = 'test') can be optimized when the query is being parsed. The parser can see that the string 'test' is not null, so it converts the test to WHERE COL1 = 'test'. And if there's an index on COL1 this will be used.
However, when you create a stored procedure, parsing occurs when the procedure is created. At that time, it doesn't know what #param will be, and has to implement the query as a sequential scan of the table.
Try changing your procedure to:
IF #param IS NULL
THEN BEGIN
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
...
END;
ELSE BEGIN
SELECT * FROM table1 WHERE col1 = #param
UNION ALL
SELECT * FROM table2 WHERE col1 = #param
...
END;
END IF;
I don't have much experience with MySQL stored procedures, so I'm not sure that's all the right syntax.
Possible character set issue? If your table character set is different from your database character set, this may be causing a problem.
See this bug report: http://bugs.mysql.com/bug.php?id=26224
[12 Nov 2007 21:32] Mark Kubacki Still no luck with 5.1.22_rc - keys
are ingored, query takes within a procedure 36 seconds and outside
0.12s.
[12 Nov 2007 22:30] Mark Kubacki After having changed charsets to UTF-8 (especially for the two used), which is used for the
connection anyways, keys are taken into account within the stored
procedure!
The question I cannot answer is: Why does the optimizer treat charset
conversions an other way within and outside stored procedures?
(Indeed, I might be wrong asking this.)
Interesting question, because I am fond of using stored procedures. Reason is maintenance and the encapsulation principle.
This is information I found:
http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html
It states that the query cache is not used for queries that
1. are a subquery that belong to an outer query, and
2. are executed within the body of a stored procedure, trigger or event.
This implies that it works as designed.
I had seen this behavior, but it wasn't related to the character set.
I had a table that held self-referencing hierarchical data (a parent with children, and some children had children of their own, etc.). Since the parent_id had to reference the primary id's (and the column specified a constraint to that effect), I couldn't set the parent id to NULL or 0 (zero) to disassociate a child from a parent, so I simply referenced it to itself.
When I went to run a stored procedure to perform the recursive query to find all children (at all levels) of a particular parent, the query took between 30 & 40 times as long to run. I found that altering the query used by the stored procedure to make sure it excluded the top-level parent record (by specifying WHERE parent_id != id) restored the performance of the query.
The stored procedure I'm using is based on the one shown in:
https://stackoverflow.com/questions/27013093/recursive-query-emulation-in-mysql.
I'm new to SSIS and need help on this one. I found an article which describes how to detect rows which exist and which have changed. The part that I'm missing is how to update rows that changed. I found some articles which say that it's also good solution to delete records which have changed and insert new recordset. The thing is I don't know how to do that step of deleting (red box).
Any suggestions?
If you have to delete the rows within Data Flow Task, then you need to use the OLE DB Command transformation and write a DELETE statement like DELETE FROM dbo.Table WHERE ColumnName = ?. Then in the column mappings of the OLE DB Command transformation, you will map the parameter represented by the question mark with the data that comes from the previous transformation. In your case, the data that comes from Union All 2.
However, I wouldn't recommend that option because OLE DB Command executes for every row and it might slow down your package if there are too many rows.
I would recommend something like this:
Redirect the output from the Union All 2 to a temporary staging table (say dbo.Staging) using OLE DB Destination.
Let's us assume that your final destination table is dbo.Destination. Now, your Staging table has all the records that should be deleted from the table Destination.
On the Control Flow tab, place an Execute SQL Task after the Data Flow Task. In the Execute SQL Task, write an SQL statement or use a stored procedure that would call an SQL statement to join the records between Staging and Destination to delete all the matching rows from Destination table.
Also, place another Execute SQL Task before the Data Flow Task. In this Execute SQL Task, delete/truncate rows from the Staging table.
Something like this might work to delete the rows:.
DELETE D
FROM dbo.Destination D
INNER JOIN dbo.Staging S
ON D.DestinationId = S.StagingId
Hope that helps.
In addition to user756519 answer. If you have millions of records to delete the last step (4) for ExecuteSQL Delete statement can be done in batches with something like this:
WHILE (1=1)
BEGIN
DELETE D
from dbo.Destination D
inner join
(
-- select ids that should be removed from table
SELECT TOP(10000) DestinationId
FROM
(
SELECT
D1.DestinationId,
S.StagingId
from
dbo.Destination as D1
LEFT JOIN
dbo.Staging as S
ON
D1.DestinationId = S.StagingId
) AS G
WHERE
StagingId IS NULL
) as R
on D.DestinationId = R.DestinationId;
IF ##ROWCOUNT < 1 BREAK
-- info message
DECLARE #timestamp VARCHAR(50)
SELECT #timestamp = CAST(getdate() AS VARCHAR)
RAISERROR ('Chunk deleted %s', 10, 1,#timestamp) WITH NOWAIT
END
I have a MySQL table of tasks to perform, each row having parameters for a single task.
There are many worker apps (possibly on different machines), performing tasks in a loop.
The apps access the database using MySQL's native C APIs.
In order to own a task, an app does something like that:
Generate a globally-unique id (for simplicity, let's say it is a number)
UPDATE tasks
SET guid = %d
WHERE guid = 0 LIMIT 1
SELECT params
FROM tasks
WHERE guid = %d
If the last query returns a row, we own it and have the parameters to run
Is there a way to achieve the same effect (i.e. 'own' a row and get its parameters) in a single call to the server?
try like this
UPDATE `lastid` SET `idnum` = (SELECT `id` FROM `history` ORDER BY `id` DESC LIMIT 1);
above code worked for me
You may create a procedure that does it:
CREATE PROCEDURE prc_get_task (in_guid BINARY(16), OUT out_params VARCHAR(200))
BEGIN
DECLARE task_id INT;
SELECT id, out_params
INTO task_id, out_params
FROM tasks
WHERE guid = 0
LIMIT 1
FOR UPDATE;
UPDATE task
SET guid = in_guid
WHERE id = task_id;
END;
BEGIN TRANSACTION;
CALL prc_get_task(#guid, #params);
COMMIT;
If you are looking for a single query then it can't happen. The UPDATE function specifically returns just the number of items that were updated. Similarly, the SELECT function doesn't alter a table, only return values.
Using a procedure will indeed turn it into a single function and it can be handy if locking is a concern for you. If your biggest concern is network traffic (ie: passing too many queries) then use the procedure. If you concern is server overload (ie: the DB is working too hard) then the extra overhead of a procedure could make things worse.
I have the exact same issue. We ended up using PostreSQL instead, and UPDATE ... RETURNING:
The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.
Example: UPDATE 'my_table' SET 'status' = 1 WHERE 'status' = 0 LIMIT 1 RETURNING *;
Or, in your case: UPDATE 'tasks' SET 'guid' = %d WHERE 'guid' = 0 LIMIT 1 RETURNING 'params';
Sorry, I know this doesn't answer the question with MySQL, and it might not be easy to just switch to PostgreSQL, but it's the best way we've found to do it. Even 6 years later, MySQL still doesn't support UPDATE ... RETURNING. It might be added at some point in the future, but for now MariaDB only has it for DELETE statements.
Edit: There is a task (low priority) to add UPDATE ... RETURNING support to MariaDB.
I don't know about the single call part, but what you're describing is a lock. Locks are an essential element of relational databases.
I don't know the specifics of locking a row, reading it, and then updating it in MySQL, but with a bit of reading of the mysql lock documentation you could do all kinds of lock-based manipulations.
The postgres documenation of locks has a great example describing exactly what you want to do: lock the table, read the table, modify the table.
UPDATE tasks
SET guid = %d, params = #params := params
WHERE guid = 0 LIMIT 1;
It will return 1 or 0, depending on whether the values were effectively changed.
SELECT #params AS params;
This one just selects the variable from the connection.
From: here