so I have a script:
select *
into my_archive_05302013
from archive_A
where my_Date = '05/18/2013'
and:
insert into archive_B (ID,my_date,field_n )
select ID, my_Date,field_n from my_archive_05302013
where the n in field_n is about 100 or so. so in other words there are more than 100 columns in the table that I am loading.
which run pretty fast the query inserts about 200000 records. my_date is a non-clustered index in table archive_A
Now when I create a dataflow using SSIS 2008 it takes HOURS to complete
I have the following in my OLE DB source:
SELECT * FROM Archive_A
WHERE My_Date = (SELECT MAX(My_Date) from Archive_A)
and for OLE DB Destination:
Data access mode of: "Table or view - fast load"
Name of the table: archive_B
Table lock and Check constraints is checked
Any one know what the problem could be?
Thanks in advance
The problem is that because you are using a data source and a data destination what you are doing is pulling all of the data out of the database to then put it all back in again whereas your INSERT statement keeps it all contained within the database. Use an Excute SQL Task with your INSERT statement instead.
Related
I have a data flow task in which I am reading data from sql server table A (id, order_no, amount).
I want to join this result to table B (order_no, amount) located on another sql server, ON tableA.order_no = table B.order_no and perform a addition of the both amounts and store it back into table B.
I have connection manager setup for both sql server databases.
Which transform can I use to perform this operation?
This dtsx design may help:
Create a temp table on server B temp db via Execute SQL Task.
Create a temp table and load table A data into via a Data Flow Task.
Create another Execute SQL Task to join table B to the newly
created temp table on order_no and make a tableB.amount +
temptable.amount as [Added Amounts] column.
You did not specify if the new column will be needed in table B or you will need to update table B's amounts. Pending this the code will be adjusted.
This process overall will eliminate db overhead.
I need to create a SSIS package in which I am reading a flat file (provided monthly with many defined columns) and writing the data to a already defined SQL Server table (with lot of data already in SQL table). In the SQL table design view, I have datatypes including float ,datetime , bigint, varchar (which are already defined and CANNOT be changed)
I need to prevent the insert of any data rows from flat file that already exist in the SQL Server table. How can I achieve this ?
I tried to achieve this using lookup transformation but in Edit mappings I get an error while creating relationships "Cannot map the lookup column because the column is set to a floating point data type" . I am able to create the relationships for all other data types but then there are some data rows in source file which differ from data in sql table in floating point values only and the expectation is that these rows will be inserted.
Is there any other simple way to achieve this ?
Thanks.
Please try to convert the columns which has problem in mapping using data conversion.
Thanks
neither SSIS nor SQL bulk load (the SQL feature that is behind the SSIS load task) permit this out of the box.
you can use the method described by #sasi, and in your lookup, define the sql query yourself with sql cast (the convert keyword). But even if you could solve your cast issue this way, you will surely face a performance problem if you load a large amount of data.
There are two way to deal with it:
The first (the easiest but quiet slow compared to the other option, maybe even more slow than your solution in some conditions), use an insert statement command using sql command for each row like the following:
INSERT target_table (val1, val2, id)
SELECT $myVal1, $myVal2, $myCandidateKey
WHERE NOT EXISTS (SELECT 1 FROM target_table as t WHERE t.id = $myCandidateKey);
The second implies the creation of a staging table on the target database. This table has the same structure than your target table. It is created once for good. You must also create an index on what is supposed to be the key that will define if the record might already be loaded. Your process will empty it prior any execution for an obvious purpose. Instead of loading the target table with SSIS, you load this staging table. Once this staging table is loaded, you will run the following command just once:
INSERT target_table (val1, val2, id)
SELECT stg.val1, stg.val2, stg.id
FROM staging_target_table as stg
WHERE NOT EXISTS (SELECT 1 FROM target_table as t WHERE t.id = stg.id);
This is extremely fast, compared to the first solution.
in this case, I supposed that what permits you to recognize you row is a key (the "id" column), but if you actually want to compare the full row, you will have to add the comparison like this for the first solution:
INSERT target_table (val1, val2)
SELECT $myVal1, $myVal2
WHERE NOT EXISTS (SELECT 1 FROM target_table as t WHERE t.val1 = $myVal1 and t.val2 = $myVal2);
or like this for the second solution:
INSERT target_table (val1, val2, id)
SELECT stg.val1, stg.val2, stg.id
FROM staging_target_table as stg
WHERE NOT EXISTS (SELECT 1 FROM target_table as t WHERE t.val1 = stg.val1 and t.val2 = stg.val2);
I used to run this command to insert some rows in a counter table:
insert into `monthly_aggregated_table`
select year(r.created_at), month(r.created_at), count(r.id) from
raw_items r
group by 1,2;
This query is very heavy and takes some time to run (millions of rows), and the raw_items table is MyISAM, so it was causing table locking and writes to it had to wait for the insert to finish.
Now I created a slave server to do the SELECT.
What I would like to do is to execute the SELECT in the slave, but get the results and insert into the master database. Is it possible? How? What is the most efficient way to do this? (The insert used to have 1.3 million rows)
I am running MariaDB 10.0.17
You will have to split the action in 2 parts with a programming language like java or php in between.
First the select, then load the resultset into your application, and then insert the data.
Another optimization which you could do to speed the select up is add one new column in your table "ym_created_at" containing a concatenation of year(created_at) and month(created_at). Place an index on that column and then run the updated statement:
insert into `monthly_aggregated_table`
select ym_created_at, count(r.id) from
raw_items r
group by 1;
Easier and might be a lot quicker since not functions are acting on the columns you are using the group by on.
I am trying to generate some data as follows (everything is done in MySQL Workbench 6.0.8.11354 build 833):
What I have:
users table (~9.000.000 entries):
SUBSCRIPTION,NUMBER,STATUS,CODE,TEST1,TEST2,TEST3,MANUFACTURER,TYPE,PROFILE
text(3 options),number,text(3
options),number,yes/no,yes/no,yes,no,number(6 options),text(50
options),text(30 options)
What I need:
stats1 table (data that I want and I need to create):
PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER,TEST1YES,TEST1NO,TEST2YES,TEST2NO,TEST3YES,TEST3NO
profile1,type1,subscription1,status1,man1,count,count,count,count,count,count
profile1,type2,subscription2,status2,man2,count,count,count,count,count,count
each PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER pair is unique.
What I did so far:
Created the stats1 table
Execute the following query in order to populate the table (I ended up with ~500 distinct entries):
insert into stats1 (PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER) select DISTINCT users.PROFILE,users.TYPE,users.SUBSCRIPTION,users.STATUS,users.MANUFACTURER from users;
Execute the following script for counting the values foe OCT13YES, for each of the ~500 entries:
update stats1 SET TEST1YES = (select count(*) from users where (users.TEST1YES='yes' and users.PROFILE=stats1.PROFILE and users.TYPE=stats1.TYPE and users.SUBSCRIPTION=stats1.SUBSCRIPTION and users.STATUS=stats1.STATUS and users.MANUFACTURER=stats1.MANUFACTURER));
I receive the following error in Workbench:
Error Code: 2013. Lost connection to MySQL server during query 600.573 sec
This is a known bug in Workbench and, even so, the server continues to execute the query.
However, the query runs for more than 70 minutes in the background (as I saw in client connections / management) and I need to run 5 more queries like this, for the rest of the columns.
Is there a better / faster / more efficient way for performing the count for those 6 columns in stats1 table?
I have to migrate records from a table in Oracle to SQL Server 2008 R2. I already designed the solution that allows me to move the data and save a copy of the migrated IDs into a stage table.
Thanks to a Lookup component and the stage table I can ensure to avoid duplicates, since the migration is done in several different moments and the objects migrated do not follow a specific sequencial order.
Below my SSIS schema:
I use an expression and two variables to gather data from Oracle in slots:
SELECT *
FROM ORDERS
WHERE OrderID > [#User::Start] AND OrderID <= [#User::End]
AND STATUS <> 'Open'
When all the orders that are not in statsu "Open" have been migrated, we will migrate the remaining delta. To do this I need to lookup to the aready migrated data in Stage. SO the query for the data source will be:
SELECT *
FROM ORDERS
WHERE OrderID NOT IN ([#User::AlreadyMigratedIDs])
My need would be able to store in variable "AlreadyMigratedIDs" all the IDs present in table Stage.
How would it be possible to use the information in the stage table (on SQL Server) as a condition for the query used in the ADO.net component to gather the source data from Oracle? Can I use any other SSIS, like lookup, before the ADO.net object?
Use Execute SQL Task before your Data Flow to store the values of User::Start and User::End from the sql server staging table
Name Data Type
StartID int
EndID int
Use ResultSet as Single Row
For sample I have taken the query from Adventure Works 2008 R2
Select max([BusinessEntityID]) as StartID
,max([DepartmentID]) as EndID
FROM [AdventureWorks2008R2].[HumanResources].[EmployeeDepartmentHistory]
Change the above query to match your needs to get the ID values from your staging table
In the resultset match the results to the variables created
Now use the 2 variables in your oracle query
SELECT *
FROM ORDERS
WHERE OrderID > [#User::StartID] AND OrderID <= [#User::EndID]
AND STATUS <> 'Open'