I have a main database and am moving data from that database to a second data warehouse on a periodic schedule.
Instead of migrating an entire table each time, I want to only migrate the rows that has changed since the process last run. This is easy enough to do with a WHERE clause. However, suppose some rows have been deleted in the main database. I don't have a good way to detect which rows no longer exist, so that I can delete them on the data warehouse too. Is there a good way to do this? (As opposed to reloading the entire table each time, since the table is huge)
It could be done in following steps for let’s say in this example I am using customer table:
CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
LAST_UPDATED DATETIME,
PRIMARY KEY (ID)
);
Create CDC:
CREATE TABLE CUSTOMERS_CDC(
ID INT NOT NULL,
LAST_UPDATED DATETIME,
PRIMARY KEY (ID)
);
Trigger on source table like below on delete event:
CREATE TRIGGER TRG_CUSTOMERS_DEL
ON CUSTOMERS
FOR DELETE
AS
INSERT INTO CUSTOMERS_CDC (ID, LAST_UPDATED)
SELECT ID, getdate()
FROM DELETED
In your ETL process where you are querying source for changes add deleted records information through UNION or create separate process like below:
SELECT ID, NAME, AGE, ADDRESS, LAST_UPDATED, ‘I/U’ STATUS
FROM CUSTOMERS
WHERE LAST_UPDATED > #lastpulldate
UNION
SELECT ID, null, null, null, LAST_UPDATED, ‘D’ STATUS
FROM CUSTOMERS_CDC
WHERE LAST_UPDATED > #lastpulldate
If you just fire an update query, then it wont update the rows.
The way I see: lets say you have your way where you do a where clause. Youd have that as part of an update query, unless you are doing a csv export. If you do a mysql dump of the rows you wish to update and create a new tempTable in the main database,
Then
UPDATE mainTable WHERE id = (SELECT id from tempTable WHERE id >0 and id <1000)
If there is no corresponding match, then no update gets run, and no error occurs, by using the id limits as parameters.
Related
I recently started to work with SQL in the Visual Studio environment, I have created the following two tables and populated them with values, these are the command for the creation of the tables users and photos:
CREATE TABLE users(
id INTEGER AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE photos(
id INTEGER AUTO_INCREMENT PRIMARY KEY,
image_url VARCHAR(255) NOT NULL,
user_id INTEGER NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY(user_id) REFERENCES users(id)
);
Now these are the statements that I ran to populate the tables
INSERT INTO users(username) Values
('Colton'),('Ruben');
INSERT INTO photos(image_url,user_id) VALUES
('/alskjd76',1),
('/lkajsd98',2);
Now if I run the statement
SELECT *
FROM photos
JOIN users;
I get the tables:
Now if I run the command:
SELECT *
FROM users
JOIN photos;
I get the table
Here are the tables of users and the tables for photos.
Now my question is why is it that "id" column in the second table is changed to 4,4,5,5 when the actual "id" column of the users table only contains the values 1,2? The first instance seems to respect this why doesn't the second?
EDIT: It seems to be displaying the following now when running the commands
SELECT *
FROM photos
JOIN users;
and when I run :
SELECT *
FROM users
JOIN photos;
Edit: this seems to be correct now, is this right, it seems to have been solved with the deletion and recreation of the tables entirely. I think that V.S studio might have mistakenly taken the table to have more present photos with id's 1-3.
Let say, I have a pre-defined table called cities, with almost all the cities in my country.
When a user register himself (user table), the column cities_id in the table user stores the city id from the table cities (Foreign Key, reference table cities), something like
CREATE TABLE `cities` (
`id` int,
`city_name` varchar(100)
)
CREATE TABLE `user` (
`id` int,
`name` varchar(60)
`****`
`cities_id` FK
)
The user table stores the city id.
But what if I missed a few cities ... How does the user then save his city name in the user table which does not accept any city name but only IDs.
Can I have one more column city_name right after the cities_id in the table user something like
CREATE TABLE `user` (
`id` int,
`name` varchar(60)
`****`
`cities_id` FK
`citiy name` varchar(100)
)
to record the data entered by the user at the time of registration? Can this be done?
You can add a type to city table tag, the user can't find their corresponding to the city allows him to type the name of his city, and then you in the city, and will create a corresponding record in the table type marked as a special status (convenient operating personnel check and correction), at the same time to save the record id to the user record
CREATE TABLE `cities` (
`id` int,
`city_name` varchar(100),
`type` int,
)
CREATE TABLE `user` (
`id` int,
`name` varchar(60)
`****`
`cities_id` FK
)
As #Joakim mentioned in the comment, from a DB perspective, as cities_id is a foreign key referencing to the cities table, inserting a record to the user table will fail if the city in question is not already there in the table.
From a programming perspective, if you want a city which is not there in the table should be first inserted automatically whenever a user is registering, it is possible. Assuming you are using Java and Hibernate and User entity contains City entity, then calling saveOrUpdate() method on the user entity will cause the city record to be inserted if not already there, and a user record will then be inserted into the User table.
That's how I would quickly solve this
Create an additional table to store the missing cities, that will be introduced by users
CREATE TABLE `cities_users` (
`id` int,
`city_name` varchar(100),
`added_by` varchar(100),
`added_TS` DATETIME DEFAULT CURRENT_TIMESTAMP
);
Create a VIEW that UNION the 2 cities tables :
CREATE VIEW all_cities AS
SELECT id, city_name FROM `cities`
UNION ALL
SELECT id, city_name FROM `cities_users`;
Whenever a user register, you query the VIEW to check if the user's city exists. That way you'll kknow if a city exists in your original table OR the cities introduced by users.
If not, you INSERT the new city in the cities_users table (along with the user that created it for logging purposes).
You should generate a unique ID properly, ie one that can't ever exists in the cities table. You can do this in various ways, here's a quick example : Take the last ID in the cities_users table and add 1 million to it. Your cities_users IDs will be like: 1000001, 1000002, 1000003
And finally, you insert the generated cities_users ID in the users table.
Having a separate table for user inputs should help you to keep the database clean :
Your original cities table remains totally unchanged
You will know easily at all times the new cities added by whom and when, and you can create a small interface to review and manage that.
Your users are working for you to complete your database.
If a user suggest a new city you should create a new record into cities table and store city_id into users table. This is the best way to store the table records.
I feel like it should be pointed out, despite answers to the contrary, that your original suggestion of adding a city_name column to the table will work fairly well
If you allow both cities_id and city_name to be nullable then you can validate that one and only one of them is set in the application logic
The benefit of this approach is that it would keep your city table 'pure' and allow you to count duplicates of and analyse the user supplied cities easily
It would however add a very sparse nullable city_name column in your table
I guess it depends on how you want to get the city from the user, (drop-down + text box for others, text-box with suggestions, just a text box) and what you plan to do with the cities you have gathered
You could even change the label to 'city (or nearest city)' with a hard-coded drop-down, or searchable drop-down, and not allow user supplied cities
If you have a buffer table where the raw data is put in, i.e. the relationship between city_name, user_name
CREATE TABLE `buffer_city_user` (
`buffer_id` int,
`city_name` varchar(100),
`user_name` varchar(100),
);
you can first process the buffer table for new city_names - if found, insert into table cities.
Then insert the user info - any new city-names should already be in the cities table and no foreign key issues will occur.
Note: Apologies if this is a duplicate but I can't find a solution.
I have two databases (one dev and one live) which have exactly the same schema.
To make things easier to explain, assume I have a 'customer' table and a 'quote' table. Both tables have auto increment ids and the quote table has a 'customerid' column that serves as a foreign key to the customer table.
My problem is that I have some rows in my dev database that I want to copy to the live database. When I copy the customer rows I can easily get a new id, but how can i get the new id to be assigned to the 'child' quote table rows?
I know I can manually script out INSERTS to overcome the problem but is there an easier way to do this?
EDIT:
This is a simplified example, I have about 15 tables all of which form a hierarchy using auto-increments and foreign keys. There is considerably more data in the live database so the new ids will be bigger (e.g. dev.customer.id = 4, live.customer.id = 54)
Easiest way without changing any IDs.
Ensure that you are currently in the table where the record you want to copy is in (source db).
Run the following command:
INSERT INTO to_database.to_table
SELECT * FROM from_table WHERE some_id = 123;
No need to specify columns if there is no need to remap anything.
Hope that helps!
I eventually managed to do this (as per my comment) but in order to do so I had to write some code. In the end I created some dummy tables that kept track of the old id against new id so. When copying over records with FK constraints I just looked up the new id based on the old. A bit long winded but it worked.
This post is getting on a bit now so I've marked this as the answer. If anyone out there has better ideas/solutions that work I'll happily 'unmark' it as the accepted answer.
EDIT: As requested here is some pseudo-code that I hope explains how I did it.
I have the two related tables as follows:
CREATE TABLE tblCustomers (
Id int NOT NULL AUTO_INCREMENT,
Name varchar(50) DEFAULT NULL,
Address varchar(255) DEFAULT NULL,
PRIMARY KEY (Id)
)
ENGINE = MYISAM
ROW_FORMAT = fixed;
CREATE TABLE tblQuotes (
Id int NOT NULL AUTO_INCREMENT,
CustomerId int(11) DEFAULT NULL,
QuoteReference varchar(50) DEFAULT NULL,
PRIMARY KEY (Id)
)
ENGINE = MYISAM
ROW_FORMAT = fixed;
I create an extra table that I will use to track old ids against new ids
CREATE TABLE tblLookupId (
Id int NOT NULL AUTO_INCREMENT,
TableName varchar(50) DEFAULT NULL,
OldId int DEFAULT NULL,
NewId int DEFAULT NULL,
PRIMARY KEY (Id)
)
ENGINE = MYISAM
ROW_FORMAT = fixed;
The idea is that I copy the tblCustomer rows one at a time and track the ids as I go, like this:
// copy each customer row from dev to live and track each old and new id
//
foreach (customer in tblCustomers)
{
// track the old id
var oldid = customer.id; // e.g. 1
// insert the new record into the target database
INSERT newdb.tblCustomers (...) VALUES (...);
// get the new id
var newid = SELECT LAST_INSERT_ID() // e.g. 245
// insert the old id and the new id in the id lookup table
INSERT idlookup (TableName, OldId, NewId) VALUES ('tblCustomers', oldid, newid); // this maps 1->245 for tblCustomers
}
When I come to copy the table (tblQuote) with the foreign key I have to first lookup the new id based on the old.
// copy each quote row from dev to live and lookup the foreign key (customer) from the lookup table
//
foreach(quote in tblQuotes)
{
// get the old foreign key value
var oldcustomerid = quote.CustomerId; // e.g 1
// lookup the new value
var newcustomerid = SELECT newid FROM tblIdLookup WHERE TableName='tblCustomers' AND oldid=oldcustomerid; // returns 245
// insert the quote record
INSERT tblQuotes (CustomerId, ...) VALUES (newcustomerid, ...);
}
I've tried to keep this short and to the point (and language agnostic) so the technique can be seen. In my real scenario I had around 15 'cascading' tables so I had to track the new ids of every table not just tblCustomer
Use INSERT ... SELECT:
insert into your_table (c1, c2, ...)
select c1, c2, ...
from your_table
where c1, c2, ... are all the columns except id.
I have a mysql table that stores a mapping from an ID to a set of values:
CREATE TABLE `mapping` (
`ID` bigint(20) unsigned NOT NULL,
`Value` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This table is a list of values and the ID of a row selects the set, this value belongs to.
So the column ID is unique per set, but not unique per row.
I insert data into the table using the following statement:
INSERT INTO `mapping`
SELECT 5, `value` FROM `set1`;
In this example I calculated and set the ID manually to 5.
It would be great if mysql could set this ID automatically. I know the autokey feature, but using it will not work, because all rows inserted with the same insert statement should have the same ID.
So each insert statement should generate a new ID and then use it for all inserted rows.
Is there a way to accomplish this?
I am not convinced to it (I'm not sure whether locking table is good idea, I think it's not), but this might help:
lock tables `mapping` as m write, m as m1 read;
insert into m
select (select max(id) + 1 from m1), `value` from `set1`;
ulock tables;
One option is to have an additional table with an autogenerated key on single rows. Insert (with or without an necessary or appropriate other data) into that table, thus generating the new ID, and then use the generated key to insert into the mapping table.
This moves you to a world where the non-unique id is a foreign key reference to a truly unique key. Much more in keeping with typical relational database thinking.
How to have only 3 rows in the table and only update them?
I have the settings table and at first run there is nothing so I want to insert 3 records like so:
id | label | Value | desc
--------------------------
1 start 10 0
2 middle 24 0
3 end 76 0
After this from PHP script I need to update this settings from one query.
I have researched REPLACE INTO but I end up with duplicate rows in DB.
Here is my current query:
$query_insert=" REPLACE INTO setari (`eticheta`, `valoare`, `disabled`)
VALUES ('mentenanta', '".$mentenanta."', '0'),
('nr_incercari_login', '".$nr_incercari_login."', '0'),
('timp_restrictie_login', '".$timp_restrictie_login."', '0')
";
Any ideas?
Here is the create table statement. Just so you can see in case I'm missing something.
CREATE TABLE `setari` (
`id` int(10) unsigned NOT NULL auto_increment,
`eticheta` varchar(200) NOT NULL,
`valoare` varchar(250) NOT NULL,
`disabled` tinyint(1) unsigned NOT NULL default '0',
`data` datetime default NULL,
`cod` varchar(50) default NULL,
PRIMARY KEY (`eticheta`,`id`,`valoare`),
UNIQUE KEY `id` (`eticheta`,`id`,`valoare`)
) ENGINE=MyISAM
As explained in the manual, need to create a UNIQUE index on (label,value) or (label,value,desc) for REPLACE INTO determine uniqueness.
What you want is to use 'ON DUPLICATE KEY UPDATE' syntax. Read through it for the full details but, essentially you need to have a unique or primary key for one of your fields, then start a normal insert query and add that code (along with what you want to actually update) to the end. The db engine will then try to add the information and when it comes across a duplicate key already inserted, it already knows to just update all the fields you tell it to with the new information.
I simply skip the headache and use a temporary table. Quick and clean.
SQL Server allows you to select into a non-existing temp table by creating it for you. However mysql requires you to first create the temp db and then insert into it.
1.
Create empty temp table.
CREATE TEMPORARY TABLE IF NOT EXISTS insertsetari
SELECT eticheta, valoare, disabled
FROM setari
WHERE 1=0
2.
Insert data into temp table.
INSERT INTO insertsetari
VALUES
('mentenanta', '".$mentenanta."', '0'),
('nr_incercari_login', '".$nr_incercari_login."', '0'),
('timp_restrictie_login', '".$timp_restrictie_login."', '0')
3.
Remove rows in temp table that are already found in target table.
DELETE a FROM insertsetari AS a INNER JOIN setari AS b
WHERE a.eticheta = b.eticheta
AND a.valoare = b.valoare
AND a.disabled = b.disabled
4.
Insert temp table residual rows into target table.
INSERT INTO setari
SELECT * FROM insertsetari
5.
Cleanup temp table.
DELETE insertsetari
Comments:
You should avoid replacing when the
new data and the old data is the
same. Replacing should only be for
situations where there is high
probability for detecting key values
that are the same but the non-key
values are different.
Placing data into a temp table allows
data to be massaged, transformed and modified
easily before inserting into target
table.
Deleting rows from temp table is
faster.
If anything goes wrong, temp table
gives you an additional debugging
stage to find out what went wrong.
Should consider doing it all in a single transaction.