Merge databases with duplicate object id's - mysql

I have two databases that I would like to merge, the problem is it has around 20 tables that are relevant and have unique object id's that are linked into each other. in example Table names:
name
object_id
FirstName
500
and then it has tables like Items:
item_name
object_id
item_id
itemNr1
500
400
the third table would be Items_specialty:
specialty
item_id
specialty_id
power1
400
600
as you see they all are tied together name's object id is attached to item and item id is attached to specialty_id.
however in two databases object_id, item_id and specialty_id are duplicating, and when I'm talking about nearly 100,000 rows it get's complicated and concern of loosing object id's is high as if that would happen different names would have different items etc. so what would be the best way to merge it while maintaining all object id's to specific name and follow the trail through the tables updating them all together?
Ideal solution would be check whether object_id+1 is not used and if not apply it and then do the same in all further tables, doing the same for item_id and specialty_id, where at the end same name would hold same item and specialty with it.
Really appreciate any tips or possible solutions to explore, was searching the internet far and wide but without having to pay thousands for a tools can't seem to find a solution that would fit my issue, as usually people only got to merge couple tables instead of many like mine.
Thank you in advance.

(The queries used in this method are for sql server. You need similar queries in mysql to run correctly.)
Database merge requires a thorough understanding of the data and the design of the database.
For example, the following solution can be used to merge these two databases:
1- With the following command, you can remove all the restrictions of the second database in sql server:
use db2;
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'';
SELECT #sql = #sql + N'
ALTER TABLE ' + QUOTENAME(s.name) + N'.'
+ QUOTENAME(t.name) + N' DROP CONSTRAINT '
+ QUOTENAME(c.name) + ';'
FROM sys.objects AS c
INNER JOIN sys.tables AS t
ON c.parent_object_id = t.[object_id]
INNER JOIN sys.schemas AS s
ON t.[schema_id] = s.[schema_id]
WHERE c.[type] IN ('D','C','F','PK','UQ')
ORDER BY c.[type];
--PRINT #sql;
EXEC sys.sp_executesql #sql;
2- In the first step, we start merge the tables that do not have an external key with the first database, and at the same time enter the data into the tables of the first database, with the newly generated values ​​(I assume that your tables have identity), We update the related tables in the second database. For example, look at the following command:
declare #I int = 0
declare #count_t1 int = (select count(*) from table1)
DECLARE #LAST_object_id INT = 0;
DECLARE #OLD_object_id INT = 0;
WHILE #I < #count_t1
BEGIN
SET #LAST_object_id = 0;
SET #OLD_object_id = 0;
INSERT INTO db1.dbo.table1
(
name
)
SELECT
name
FROM db2.dbo.table1
ORDER BY db2.dbo.table1.object_id
OFFSET #I ROWS FETCH FIRST 1 ROWS ONLY
SET #LAST_object_id = (SELECT TOP(1) object_id FROM db1.dbo.table1 ORDER BY db1.dbo.table1.object_id DESC)
SET #OLD_object_id = (SELECT object_id FROM db2.dbo.table1 ORDER BY db2.dbo.table1.object_id OFFSET #I ROWS FETCH FIRST 1 ROWS ONLY)
UPDATE db2.dbo.table2
SET object_id = #LAST_object_id
WHERE object_id = #OLD_object_id
SET #I = #I + 1
END
3- In this step, we will merge the tables with the first database that have foreign keys, but we know that their foreign keys have been updated in step 2.
4- Repeat step 3 for the depth of the database to reach tables whose primary key is not an foreign key for other tables. Then we merge those tables with the first database.
Remember: If the values ​​of the first database tables are less than the values ​​of the second database tables, the probability of error in that method increases. So you need to control how identity grows in sql server with the following command:
DECLARE #MAX_0_object_id INT;
DECLARE #MAX_1_object_id INT;
SET #MAX_0_object_id = IDENT_CURRENT('db1.dbo.table1')
SET #MAX_1_object_id = IDENT_CURRENT('db2.dbo.table1')
IF #MAX_1_object_id > #MAX_0_object_id
BEGIN
DBCC CHECKIDENT ('db1.dbo.table1', RESEED, #MAX_1_object_id)
END

Related

Matlab Bulk Update MySQL Table

I want to update a MySQL table from matlab in bulk. The current logic that I use iterates over the array and inserts it one-by-one which takes way too long.
Here is my current implementation-
function update_table(customer_id_list, cluster_id_list, write_conn)
num_customers = size(customer_id_list, 1);
for idx=1:num_customers+1
customer_id = customer_id_list(idx);
cluster_id = cluster_id_list(idx);
sql = strcat(sql, 'UPDATE table SET cluster_id = ', num2str(cluster_id), ' WHERE customer_id = ', num2str(customer_id));
exec(write_conn, sql);
end
end
Tried to look for documentation to do bulk update/insert, but haven't found anything yet.
Do an "upjoin" using a temporary table.
Build your update specification as a Matlab table array with all the cluster_id and customer_id pairs that specify the new values.
Create a SQL temporary table that contains columns for the key columns you'll be matching on and the columns to update.
CREATE TEMPORARY TABLE my_temp_table SELECT customer_id, cluster_id FROM table WHERE 1 = 0
Batch-insert your update specification data from Matlab into the temporary table using Matlab Database Toolbox's datainsert or sqlwrite.
Update the target table en masse by joining it to the temp table: UPDATE table SET targ.cluster_id = upd.cluster_id FROM table targ INNER JOIN my_temp_table upd ON targ.customer_id = upd.customer_id.
Drop the temp table.
Boom. If you're going to do this a lot, wrap it up in a generic upjoin() function.
See the Matlab documentation for datainsert and sqlwrite. Do not use fastinsert; despite its name, it is much slower than datainsert and sqlwrite.

Can I set multiple columns to NULL in MySQL in bulk?

I have a very large database and for testing, I want to set a certain amount of data to NULL.
As an example, I have 57 columns across 3 tables, all of which need to be nullified. I can't delete the rows, I just need to know that if the row exists and there's no data in those fields, that everything still works.
To clarify, all the data in those fields has been moved to anther table, and the old data was not wiped in the migration. To test my reports I need to know that the reports are pulling from the new location, not the old, since as new data is added, it will only go to the new location. Our plan is to generate each report from the old database, migrate, and then generate them again and compare. But to ensure that they are pulling from the right place, we want to wipe the old data so it doesn't provide a false positive.
Is there a way for me to do this in bulk or should I resign myself to writing one comma separated SET statement after another?
You can create the statements using the data from the internal information_schema.COLUMNS table.
Assuming you have this table:
CREATE TABLE my_table (
keep1 INT,
keep2 INT,
set_null1 INT,
set_null2 INT,
set_null3 INT
);
and you want to set all columns to NULL except of keep1 and keep2. Execute the following script:
set #db_name = 'test';
set #table_name = 'my_table';
set #exclude_columns = 'keep1,keep2';
select concat(
'UPDATE `', #table_name, '` SET\n',
group_concat('`', COLUMN_NAME, '` = NULL' separator ',\n'),
';'
)
from information_schema.COLUMNS c
where c.TABLE_SCHEMA = #db_name
and c.TABLE_NAME = #table_name
and find_in_set(c.COLUMN_NAME, #exclude_columns) = 0;
This will generate the following statement:
UPDATE `my_table` SET
`set_null1` = NULL,
`set_null2` = NULL,
`set_null3` = NULL;
Copy the result and paste it into your UPDATE script. Do it for all 12 tables adjusting the variables #db_name, #table_name and #exclude_columns.
See demo on db-fiddle.
This is a very unusual task for an SQL database, so it's not surprising that it's a bit awkward.
As you know, to set multiple columns to NULL in an UPDATE statement, you'd have to set each column individually.
UPDATE mytable
SET col1 = NULL, col2 = NULL, ... col57 = NULL
WHERE id = ?;
That could be quite a bit of typing. Or it could be a task to write code to loop over the column names in your table, and concatenate the terms for UPDATE statement. Up to you.
An alternative that might be easier is to delete the row and then re-insert it with no values specified except the primary key.
DELETE FROM mytable WHERE id = ?;
INSERT INTO mytable SET id = ?;
By omitting the other columns, they'll be NULL or else take a DEFAULT value defined in your table. If you want those columns with defaults to be NULL too, you'll have to specify that.
INSERT INTO mytable SET id = ?, col23 = NULL;

Is it possible to name column by index

I am wondering if it is possible to use SQL to create a table that name columns by index(number). Say, I would like to create a table with 10 million or so columns, I definitely don't want to name every column...
I know that I can write a script to generate a long string as SQL command. However, I would like to know if there is a more elegant way to so
Like something I make up here:
CREATE TABLE table_name
(
number_columns 10000000,
data_type INT
)
I guess saying 10 million columns caused a lot of confusion. Sorry about that. I looked up the manual of several major commercial DBMS and seems it is not possible. Thank you for pointing this out.
But another question, which is most important, does SQL support numerical naming of columns, say all the columns have the same type and there is 50 columns. And when referring it, just like
SELECT COL.INDEX(3), COL.INDEX(2) FROM MYTABLE
Does the language support that?
Couldn't resist looking into this, and found that the MySQL Docs say "no" to this, that
There is a hard limit of 4096 columns per table, but the effective
maximum may be less for a given table
You can easily do that in Postgres with dynamic SQL. Consider the demo:
DO LANGUAGE plpgsql
$$
BEGIN
EXECUTE '
CREATE TEMP TABLE t ('
|| (
SELECT string_agg('col' || g || ' int', ', ')
FROM generate_series(1, 10) g -- or 1600?
)
|| ')';
END;
$$;
But why would you even want to give life to such a monstrosity?
As #A.H. commented, there is a hard limit on the number of columns in PostgreSQL:
There is a limit on how many columns a table can contain. Depending on
the column types, it is between 250 and 1600. However, defining a
table with anywhere near this many columns is highly unusual and often
a questionable design.
Emphasis mine.
More about table limitations in the Postgres Wiki.
Access columns by index number
As to your additional question: with a schema like the above you can simply write:
SELECT col3, col2 FROM t;
I don't know of a built-in way to reference columns by index. You can use dynamic SQL again. Or, for a table that consists of integer columns exclusively, this will work, too:
SELECT c[3] AS col3, c[2] AS col2
FROM (
SELECT translate(t::text, '()', '{}')::int[] AS c -- transform row to ARRAY
FROM t
) x
Generally when working with databases your schema should be more or less "defined" so dynamic column adding isn't a built in functionality.
You can, however, run a loop and continually ALTER TABLE to add columns like so:
BEGIN
SET #col_index = 0;
start_loop: LOOP
SET #col_index = #col_index + 1;
IF #col_index <= num_columns THEN
SET #alter_query = (SELECT CONCAT('ALTER TABLE table_name ADD COLUMN added_column_',#col_index,' VARCHAR(50)'));
PREPARE stmt FROM #alter_query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
ITERATE start_loop;
END IF;
LEAVE start_loop;
END LOOP start_loop;
END;
But again, like most of the advice you have been given, if you think you need that many columns, you probably need to take a look at your database design, I have personally never heard of a case that would need that.
Note: As mentioned by #GDP you can have only 4096 cols and definitely the idea is not appreciated and as again #GDP said that database design ideas need to be explored to consider if something else could be a better way to handle this requirement.
However, I was just wondering apart from the absurd requirement if ever I need to do this how can I do it? I thought why not create a custom / user defined MySQL function e.g. create_table() tht will receive the parameters you intend to send and which will in turn generate the required CREATE TABLE command.
This is an option for finding columns using ordinal values. It might not be the most elegant or efficient but it works. I am using it to create a new table for faster mappings between data that I need to parse through all the columns / rows.
DECLARE #sqlCommand varchar(1000)
DECLARE #columnNames TABLE (colName varchar(64), colIndex int)
DECLARE #TableName varchar(64) = 'YOURTABLE' --Table Name
DECLARE #rowNumber int = 2 -- y axis
DECLARE #colNumber int = 24 -- x axis
DECLARE #myColumnToOrderBy varchar(64) = 'ID' --use primary key
--Store column names in a temp table
INSERT INTO #columnNames (colName, colIndex)
SELECT COL.name AS ColumnName, ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM sys.tables AS TAB
INNER JOIN sys.columns AS COL ON COL.object_id = TAB.object_id
WHERE TAB.name = #TableName
ORDER BY COL.column_id;
DECLARE #colName varchar(64)
SELECT #colName = colName FROM #columnNames WHERE colIndex = #colNumber
--Create Dynamic Query to retrieve the x,y coordinates from table
SET #sqlCommand = 'SELECT ' + #colName + ' FROM (SELECT ' + #colName + ', ROW_NUMBER() OVER (ORDER BY ' + #myColumnToOrderBy+ ') AS RowNum FROM ' + #tableName + ') t2 WHERE RowNum = ' + CAST(#rowNumber AS varchar(5))
EXEC(#sqlCommand)

How to compare multiple parameters of a row column value?

how to write query for following request?
my table:
id designation
1 developer,tester,projectlead
1 developer
1 techlead
if id=1,designation="'developer'"
Then need to first,second records.Because 2 rows are having venkat.
if id=1,designation="'developer','techlead'" then need to get 3 records as result.
i wrote one service for inserting records to that table .so that i am maintaining one table to store all designation with same column with comas.
By using service if user pass id=1 designation="'developer','techlead'" then need to pull the above 3 records.so that i am maintaining only one table to save all designations
SP:
ALTER PROCEDURE [dbo].[usp_GetDevices]
#id INT,
#designation NVARCHAR (MAX)
AS
BEGIN
declare #idsplat varchar(MAX)
set #idsplat = #UserIds
create table #u1 (id1 varchar(MAX))
set #idsplat = 'insert #u1 select ' + replace(#idsplat, ',', ' union select ')
exec(#idsplat)
Select
id FROM dbo.DevicesList WHERE id=#id AND designation IN (select id1 from #u1)
END
You need to use the boolean operators AND and OR in conjunction with LIKE:
IF empid = 1 AND (empname LIKE '%venkat%' OR empname LIKE '%vasu%')
The above example will return all rows with empid equals 1 and empname containing venkat or vasu.
Apparently you need to create that query based on the input from user, this is just an example of how the finally query should look like.
Edit: Trying to do this within SqlServer can be quite hard so you should really change your approach on how you call the stored procedure. If you can't do this then you could try and split your designation parameter on , (the answers to this question show several ways of how to do this) and insert the values into a temporary table. Then you can JOIN on this temporary table with LIKE as described in this article.

trouble copying data from one table to another with auto increment field

I have the following SQL statement which was working perfectly until I moved it to another server. The middle query (encapsulated in ** ) does not seem to work. I am getting an error saying that 'AUTO' is an incorrect integer type. If I remove it altogether, it says that I have an incorrect number of fields. I am trying to copy data from one table to another and allow the destination table to auto increment its ID number.
SET sql_safe_updates=0;
START TRANSACTION;
DELETE FROM shares
WHERE asset_id = '$asset_ID';
/*************************************************************/
INSERT INTO shares
SELECT 'AUTO', asset_ID, member_ID, percent_owner, is_approved
FROM pending_share_changes
WHERE asset_ID = '$asset_ID';
/*************************************************************/
DELETE FROM pending_share_changes
WHERE asset_ID = '$asset_ID';
DELETE FROM shares
WHERE asset_ID = '$asset_ID' AND percent_owner = '0';
COMMIT;";
Based on this page of the mysql docs, you have to do:
INSERT INTO shares
(column_name1, column_name2, column_name3, column_name4) -- changed!
SELECT asset_ID, member_ID, percent_owner, is_approved
FROM pending_share_changes
WHERE asset_ID = '$asset_ID';
The difference is that the column names of the "receiving" table are explicitly listed after the name of the receiving table.
The docs say
AUTO_INCREMENT columns work as usual.