Is there a way to extend/merge ones mysql table structure into another, while keeping the table data intact?
For example i have developed something on the local copy of the database and to transfer all database changes into production i have to copy all new columns etc into production afterwards.
Would be nice to see all the differences between databases and have some dump generated based on these differences.
Thanks.
local_table
loc_id, loc_desc, loc_price
production_table
pro_id, pro_desc, pro_price
I want to insert data from production_table to local_table and if the loc_id is same as pro_id I want to ignore it. Thereby, I'm only inserting the new rows, without replacing/changing the data in local_table:
insert ignore into local_table(loc_id, loc_desc, loc_price)
select pro_id, pro_desc, pro_list_price\n
from production_table
join local_table
where loc_id != pro_id;
Related
We are offering a SAAS based product and for database, we are using MYSQL Compatible AWS Aurora 5.7.
To overcome the issue of large number of rows in one table, We have created the multiple groups of tables(g1_, g2_, g3_, etc). Like our application has around 350 tables then there are 350 tables with g1_ prefix, 350 tables with g2_ prefix, and so on.
Each group is having our multiple client's data like the g1_customer table has 5 our client's customers.
Now, the number of rows growing in each table and we want to move the one specific client's all data from one group to another group.
Solution 1 in our mind:
We can keep the client id each table(master and child) and get all data from each source table by client id and insert it into the respective table to the target group.
Issue: The child table's row mapping, The target group tables can have the existing rows, and the source group master table's row will get a new autoincrement id here, so respective child table's row mapping would not be possible.
Solution 2 in our mind:
Write a script that will get a single row and insert it into the target table, then get related rows from the chile table insert into the target child table, map with new autoincrement id, and so on.
Issue: This process will be very slow with large dataset (2.1 million rows)
Please share your best idea or any tool to achieve it.
Let's back up and look at whether the proposed solution is the best.
In general, splitting one table (on one set of tables) into a set of identical tables is counter-productive. It involves changing the client code to first pick which table, then proceed to use the desired table. Often performance suffers rather than benefits.
Regardless of the approach, we cannot really help you without having
SHOW CREATE TABLE
and the various queries that would be impacted by the change.
Why you don't use mySQL partition table to your propose, of course if I could understand what you are proposing. I having been using PARTITION tables to a lot of proposes where, and we have some tables with almost of 100 millions of reports.
Here some examples.
Creating a customers partitioned table by customer group name:
CREATE TABLE customers (
id INT NOT NULL,
name VARCHAR(30),
customer_group CHAR(10),
settings JSON, # very useful when you are working with no structured data
created_at DATETIME
)
PARTITION BY LIST(store_id) (
PARTITION g1 VALUES IN ('customer1'),
PARTITION g2 VALUES IN ('customer2'),
PARTITION g3 VALUES IN ('customer3')
);
Now you could insert some data:
INSERT INTO `customers` VALUES(1, 'Customer 1', 'customer1', '{}', NOW());
INSERT INTO `customers` VALUES(2, 'Customer 2', 'customer2', '{}', NOW());
INSERT INTO `customers` VALUES(3, 'Customer 3', 'customer3', '{}', NOW());
Of course that if you are not have a large table with lots of results, I think that these example may be not help you so much. But imagine that you want to add data from other customer and you don't want that these data create a mess with our other customers. So if you try to insert a "customer4" inside table customers you are been blocked by mySql, to do that you need to include another partition as follow:
ALTER TABLE `customers` ADD PARTITION (PARTITION `customer4` VALUES IN ('customer4'));
So, if you need now to delete some data from a large table and passing by parameter only the customer group, these will take some time but, as you are using partition tables, you could just do that.
If you want to delete all customer4 data:
ALTER TABLE `customers` TRUNCATE PARTITION `customer4`;
OR if you want to DROP some PARTITION like customer1 and 3, without affect the whole customers table:
ALTER TABLE `customers` DROP PARTITION `customer1`,`customer3`;
If you want to make your system a quite more strict using the customer references into your queries, you could use PARTITION name inside you query:
SELECT * FROM `customers` PARTITION(`customer2`);
The result of this query will be a simple line with customer2 data.
These are some simple examples that that you could do with PARTITIONS tables, I don't know with you had read about it, if don't I think that it could be an option to you, otherwise I think that I don't understand you problem very well and I'm sorry for that. Hope that helps!
These more references
i created a huge table in mysql,
say
table1
i will be querying on this table for my results, continuously
once a week i will be flushing the values in table1 and insert new values.
(this process takes 3 hours of time.)
So my issue is my querying will be stopped for 3 hours(while the new table1 is being generated) , and my querying should be continuous.
i was thinking of creating a like table
create like table temp_table1 from table1
and till the time new table1 is getting populated, i will be using temp_table1 for querying.
But for that i also wanted to set an automated trigger for changing between tables.
Is there any better way to achieve this?
also creating like table for a huge table would take a lot of time?
You can do it other way actually...
create table temp_table1 same as table1
Do your process in temp_table1 instead of table1
Once process completes insert the data to main table using insert into .. select from construct
That way your main table is free for querying and won't be blocked. The final insert could be fast enough depending on the select performance.
I can "copy" a table using:
CREATE TABLE copy LIKE original_table
and
CREATE TABLE copy as select * from original_table
In the latter case only the data are copied but not e.g primary keys etc.
So I was wondering when would I prefer using a select as?
These do different things. CREATE TABLE LIKE creates an empty table with the same structure as the original table.
CREATE TABLE AS SELECT inserts the data into the new table. The resulting table is not empty. In addition, CREATE TABLE AS SELECT is often used with more complicated queries, to generate temporary tables. There is no "original" table in this case. The results of the query are just captured as a table.
EDIT:
The "standard" way to do backup is to use . . . . backup at the database level. This backs up all objects in the database. Backing up multiple tables is important, for instance, to maintain relational integrity among the objects.
If you just want a real copy of a table, first do a create table like and then insert into. However, this can pose a challenge with auto_increment fields. You will probably want to drop the auto_increment property on the column so you can populate such columns.
The second form is often used when the new table is not an exact copy of the old table, but contains only selected columns or columns that result from a join.
"Create Table as Select..." are most likely used when you have complex select
e.g:
create table t2 as select * from t1 where x1=7 and y1 <>2 from t1;
Now, apparently you should use Create Like if you don't need such complex selects. You can change the PI in this syntax also.
I am trying to insert a large dump of custom CMS's news section into WordPress. Unfortunalety, columns doesn't match. Some of them - yeah, sure - like title, date or content. But WordPress required a lot of columns which this dump doesn't have. Is there a way to either omit this count on insert or filling it with dummy (preferably blank) data? Search and replace (even with regular expressions) won't do here, since it is really huge file and simple 'find' takes a lot of time.
You stated that you were given a "table." If it included a schema, create the table and insert the data. Otherwise, create the table based on the data columns and insert your data. This is considered your staging table. You can now write a SELECT statement to select the data from your staging table that will be inserted into your destination table. You will finally prepend an INSERT statement to insert your selected data. It should look something like this:
INSERT INTO destinationTable (fruits, animals, numbers, plants)
SELECT fruits, animals, numbers, '' FROM stagingTable
If you did not have plants in your staging table, you would simply SELECT '' or SELECT NULL for that column. You can then simply drop your staging table.
Assuming the answer to my clarification is yes, you can insert multiple rows by delimiting them with a column.
INSERT INTO table (col1, col2)
VALUES (row1val1, row1val2),
(row2val1,row2val2)
I got a table with a normal setup of auto inc. ids. Some of the rows have been deleted so the ID list could look something like this:
(1, 2, 3, 5, 8, ...)
Then, from another source (Edit: Another source = NOT in a database) I have this array:
(1, 3, 4, 5, 7, 8)
I'm looking for a query I can use on the database to get the list of ID:s NOT in the table from the array I have. Which would be:
(4, 7)
Does such exist? My solution right now is either creating a temporary table so the command "WHERE table.id IS NULL" works, or probably worse, using the PHP function array_diff to see what's missing after having retrieved all the ids from table.
Since the list of ids are closing in on millions or rows I'm eager to find the best solution.
Thank you!
/Thomas
Edit 2:
My main application is a rather easy table which is populated by a lot of rows. This application is administrated using a browser and I'm using PHP as the intepreter for the code.
Everything in this table is to be exported to another system (which is 3rd party product) and there's yet no way of doing this besides manually using the import function in that program. There's also possible to insert new rows in the other system, although the agreed routing is to never ever do this.
The problem is then that my system cannot be 100 % sure that the user did everything correct from when he/she pressed the "export" key. Or, that no rows has ever been created in the other system.
From the other system I can get a CSV-file out where all the rows that system has. So, by comparing the CSV file and my table I can see if:
* There are any rows missing in the other system that should have been imported
* If someone has created rows in the other system
The problem isn't "solving it". It's making the best solution to is since there are so much data in the rows.
Thanks again!
/Thomas
We can use MYSQL not in option.
SELECT id
FROM table_one
WHERE id NOT IN ( SELECT id FROM table_two )
Edited
If you are getting the source from a csv file then you can simply have to put these values directly like:
I am assuming that the CSV are like 1,2,3,...,n
SELECT id
FROM table_one
WHERE id NOT IN ( 1,2,3,...,n );
EDIT 2
Or If you want to select the other way around then you can use mysqlimport to import data in temporary table in MySQL Database and retrieve the result and delete the table.
Like:
Create table
CREATE TABLE my_temp_table(
ids INT,
);
load .csv file
LOAD DATA LOCAL INFILE 'yourIDs.csv' INTO TABLE my_temp_table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(ids);
Selecting records
SELECT ids FROM my_temp_table
WHERE ids NOT IN ( SELECT id FROM table_one )
dropping table
DROP TABLE IF EXISTS my_temp_table
What about using a left join ; something like this :
select second_table.id
from second_table
left join first_table on first_table.id = second_table.id
where first_table.is is null
You could also go with a sub-query ; depending on the situation, it might, or might not, be faster, though :
select second_table.id
from second_table
where second_table.id not in (
select first_table.id
from first_table
)
Or with a not exists :
select second_table.id
from second_table
where not exists (
select 1
from first_table
where first_table.id = second_table.id
)
The function you are looking for is NOT IN (an alias for <> ALL)
The MYSQL documentation:
http://dev.mysql.com/doc/refman/5.0/en/all-subqueries.html
An Example of its use:
http://www.roseindia.net/sql/mysql-example/not-in.shtml
Enjoy!
The problem is that T1 could have a million rows or ten million rows, and that number could change, so you don't know how many rows your comparison table, T2, the one that has no gaps, should have, for doing a WHERE NOT EXISTS or a LEFT JOIN testing for NULL.
But the question is, why do you care if there are missing values? I submit that, when an application is properly architected, it should not matter if there are gaps in an autoincrementing key sequence. Even an application where gaps do matter, such as a check-register, should not be using an autoincrenting primary key as a synonym for the check number.
Care to elaborate on your application requirement?
OK, I've read your edits/elaboration. Syncrhonizing two databases where the second is not supposed to insert any new rows, but might do so, sounds like a problem waiting to happen.
Neither approach suggested above (WHERE NOT EXISTS or LEFT JOIN) is air-tight and neither is a way to guarantee logical integrity between the two systems. They will not let you know which system created a row in situations where both tables contain a row with the same id. You're focusing on gaps now, but another problem is duplicate ids.
For example, if both tables have a row with id 13887, you cannot assume that database1 created the row. It could have been inserted into database2, and then database1 could insert a new row using that same id. You would have to compare all column values to ascertain that the rows are the same or not.
I'd suggest therefore that you also explore GUID as a replacement for autoincrementing integers. You cannot prevent database2 from inserting rows, but at least with GUIDs you won't run into a problem where the second database has inserted a row and assigned it a primary key value that your first database might also use, resulting in two different rows with the same id. CreationDateTime and LastUpdateDateTime columns would also be useful.
However, a proper solution, if it is available to you, is to maintain just one database and give users remote access to it, for example, via a web interface. That would eliminate the mess and complication of replication/synchronization issues.
If a remote-access web-interface is not feasible, perhaps you could make one of the databases read-only? Or does database2 have to make updates to the rows? Perhaps you could deny insert privilege? What database engine are you using?
I have the same problem: I have a list of values from the user, and I want to find the subset that does not exist in anther table. I did it in oracle by building a pseudo-table in the select statement Here's a way to do it in Oracle. Try it in MySQL without the "from dual":
-- find ids from user (1,2,3) that *don't* exist in my person table
-- build a pseudo table and join it with my person table
select pseudo.id from (
select '1' as id from dual
union select '2' as id from dual
union select '3' as id from dual
) pseudo
left join person
on person.person_id = pseudo.id
where person.person_id is null