I'm working on a database right now and I have a pretty specific problem that I'm trying to figure out:
I have a large master table in my database with all the info we're gathering. We're updating records in this master table based on Excel files returned to us by various team members across the company - all of the records have unique ID numbers so we know what fields in the master table to update. We are tracking who responds by updating the file name into the master table as well. I want to update this with the file name; however, if two sources give me the same data, I want to append the second file to the first file rather than replace it with an update.
The problem is, I need the query to "know" when to update and when to append. Is there some IF statement I can use - maybe Update when Null, Append when Not Null?
You can refer to an Excel sheet or range in a query:
INSERT INTO Table1 ( ADate )
SELECT SomeDate FROM [Excel 8.0;HDR=YES;DATABASE=Z:\Docs\Test.xls].[Sheet1$a1:a4]
WHERE SomeDate Is Not Null
This means that you can run queries based on the presence or absence of data in the Excel file.
Related
I'm trying to delete records in my target table based on whether records exist in source table. I tried using a 'Delete' step, but I noticed that this step is based on a conditional clause.
My condition is quite simple "if the record/row does NOT exist in table A [source], delete the record/row from table B [destination]".
I also read about using a 'Merge Rows (diff)' step, but it seems to check/compare the entire set of tables for differences.
The table has several million records with many hundreds of columns in a MySQL server, I need to do this in the most efficient way.
I'm doing a search of table A with the Table input object and sql command:
'' ' SELECT I went , user , password , attribute , op FROM viewuserradiusunisulma
Any help would be appreciated.
print - image screen pentaho transformation
Transformation
Delete Pentaho
if your source and target table are in the same database, you can use a SQL query to delete all records in tableB that don't have a corresponding entry in tableA:
delete tableB where not exists (select id from tableA where id = tableB.id)
if source and destination tables are not in the same database, you would have to go through all rows in tableB and check whether the record exists in tableA. If your source tableA has a limited number of rows, loading the key values in memory and then performing a stream lookup instead of a database lookup would be much faster. I'd probably try that even with higher number of rows because of the significant performance impact.
note: I hope I haven't messed up the sql syntax, I'm thinking almost exclusively in abap at the moment and that messes with my memory a bit. So please test this on some backup before firing away.
I found the solution. In this case, I check the records, then report, update and enter the new data
Trasnsformation
Hi,
I am trying to copy unique records from a database table to another table of the same name but different database. The source database contains some records that are already present in the destination database, so those I dont need, only the other ones. Database destination is called "test" and the source database is "forums". The table name is store for both cases. I am using this query:
INSERT INTO test.store (cs_key, cs_value, cs_array, cs_updated,cs_rebuild)
SELECT DISTINCT cs_key, cs_value, cs_array, cs_updated,cs_rebuild
FROM forums.store
But I am getting many errors as I try to run this query. Why?
Thank you.
Is this the right syntax:
INSERT INTO stock (Image)
SELECT Image,
FROM productimages
WHERE stock.Name_of_item = productimages.number;
SQL Server Management Studio's "Import Data" task (right-click on the DB name, then tasks) will do most of this for you. Run it from the database you want to copy the data into.
If the tables don't exist it will create them for you, but you'll probably have to recreate any indexes and such. If the tables do exist, it will append the new data by default but you can adjust that (edit mappings) so it will delete all existing data.
I use this all the time and it works fairly well.
INSERT INTO bar..tblFoobar( *fieldlist* )
SELECT *fieldlist* FROM foo..tblFoobar
This just moves the data. If you want to move the table definition (and other attributes such as permissions and indexes), you'll have to do something else.
The query logic that you are trying appears to be not correct (the query itself is buggy).
Assuming you have the correct query for the above logic and what you are trying is to insert new rows into the table stock by selecting a column from productimages table with a matching record as stock.Name_of_item = productimages.number
The above logic will add redundant data in to the table.
You perhaps looking to update instead of insert, something as -
update stock s
join productimages p on p.number = s.Name_of_item
set s.Image = p.Image
I'm having data flow from source tables to destination table. To simplify the question, I'll say there are two merge joined source tables and one destination table. Also, there are primary keys helping me identify each record
The package is running everyday, and if one record is deleted from source table, how could I know which one is deleted so that I could delete that in destination table?
(FYI~~ I've dong checking to see if a record exists in destination table and if so update else insert, but don't know how to find deleted data)
Another possible approach:
Assuming you receive all records from source, not just imports and updates:
Amend package to stamp records that have been inserted or updated using a unique id or run datetime
Following the package run, process the destination table where records weren't inserted or updated in the last package run. By a process of elimination, any records that weren't provided in the source file should be deleted.
Again, assuming that all records are sent, not just imports and updates. But then again, if you don't receive all records, it's going to be physically impossible to detect if a record has been deleted.
The problem with comparing source to destination is that you have to compare every source row to the destination in every load, and as the number of rows increases that takes up more and more time.
As a result, the best way to handle this is probably on the source side. Two common approaches are a 'soft delete' where you set a flag column to mark the row as deleted; or a trigger that records the PK of the deleted row in a log table (or moves the entire row to an archive log table). Your ETL process then looks at the flags or the log/archive table to determine which rows were deleted since the last load.
Another possibility is that the source platform offers some built-in feature you can use to track deleted rows, e.g. CDC in SQL Server. But if you have no control at all over the source database (if it even is a database) then there may be no alternative to comparing the full data set.
One possible approach:
Prior to running package, delete the destination table records (using a stored procedure)
Just import all records in to destination table
Pros:
Your destination table will always mirror the incoming data, no need to check for deletions
Cons:
You won't have any historical information (if that is required)
I had the same problem, as in how to mark my old/archive records as being "deleted" because they no longer exist in the original data source.
Basically, I built two tables, where one is the main table containing all the records that came in from the original data source, and a temporary table I kept to store the original data source every time I ran my scripts.
MAIN TABLE
ID, NAME, SURNAME, DATE_MODIFIED, ORDERS_COUNT, etc
plus a STATUS column (1 for Active, 0 for Deleted)
TEMP TABLE same as the original, but without STATUS column
ID, NAME, SURNAME, DATE_MODIFIED, ORDERS_COUNT, etc
The key was to update the MAIN TABLE with STATUS = 0 if the ID of the MAIN table was no longer in the Temp table. ie: The source records have been deleted.
I did it like this:
UPDATE m
SET m.Status = 0
FROM tblMAIN AS m
LEFT JOIN tblTEMP AS t
ON t.ID = m.ID
WHERE t.ID IS NULL
I have two SQL Server databases, and I need to write a script to migrate data from database A to database B. Both databases have the same schema.
I must loop through the tables and for each table I must follow those rules:
If the item I'm migrating does not exist in the target table (for example, the comparison is made on a column Name) then I insert it directly.
If the item I'm migrating exists in the target table then I need to only update certain columns (for example, only update Age and Address but do not touch other columns)
Can anyone help me with that script? Any example would suffice. Thanks a lot
EDIT:
I just need an example for one table. No need to loop, I can handle each table separately (because each table has its own comparison column and update columns)
The MERGE statement looks like it can help you here. An Example:
MERGE StudentTotalMarks AS stm
USING (SELECT StudentID,StudentName FROM StudentDetails) AS sd
ON stm.StudentID = sd.StudentID
WHEN MATCHED AND stm.StudentMarks > 250 THEN DELETE
WHEN MATCHED THEN UPDATE SET stm.StudentMarks = stm.StudentMarks + 25
WHEN NOT MATCHED THEN
INSERT(StudentID,StudentMarks)
VALUES(sd.StudentID,25);
The merge statement is available as of SQL Server 2008 so you are in luck
Instead of creating a script why don't you copy the source table under a different name into the target server (update needs to take place).
Then just do a simple insert where name does not exist.
Here is the SQL for step 1 only.
INSERT INTO [TableA]
SELECT Name,
XX,
XXXX
FROM TableB
WHERE NOT NAME IN(SELECT NAME
FROM TableA)