SSIS Package - From One Table to Infinite Tables Depending on Data - ssis

I have a simple requirement. I have a table with Product Names and their Count. I want to create a SSIS package to extract data from one table to infinite tables based on Product Name.
In table if i have 10 products then SSIS package should create 10 tables dynamically with one product in each table.
Table Name : Products
ProductName , QuantitySold
ABC 10
xyz 15
Testing 25
Table Name : ABC
ProductName , QuantitySold
ABC 10
Table Name : XYZ
ProductName , QuantitySold
xyz 15
Table Name : Testing
ProductName , QuantitySold
ABC 10

Conceptually, you're looking at something like
The concept is that you will identify all the product names in the table and perform 2 tasks on each row: Create the target table, if needed. Run a query against your source for that one row and load it into the table.
Variables
I have 6 variables declared
Query_TableCreateBase is a big string that formatted looks like
IF NOT EXISTS
(
SELECT
*
FROM
sys.tables AS T
WHERE
T.name = '<Table/>'
)
BEGIN
CREATE TABLE dbo.<Table/>
(
ProductName varchar(30) NOT NULL
, QuantitySold int NOT NULL
);
END
I have expressions on Query_Source, Query_TableCreate and TargetTable
Query_Source expression
"SELECT ProductName, QuantitySold FROM (
VALUES
('ABC', 10)
, ('xyz', 15)
, ('Testing', 25)
) Products(ProductName, QuantitySold) WHERE ProductName = '" + #[User::ProductName] + "'"
Query_TableCreate expression
replace(#[User::Query_TableCreateBase], "<Table/>", #[User::ProductName])
TargetTable expression
"[dbo].[" +#[User::ProductName] + "]"
SQL Get Rows
I simulate your Products table with a query. I load those results into a variable named RS_Product.
SELECT
ProductName
FROM
(
VALUES
('ABC', 10)
, ('xyz', 15)
, ('Testing', 25)
) Products(ProductName, QuantitySold);
FELC Shred Results
I use a Foreach Loop Container, set to process an ADO Result set and parse out the 0th column into our ProductName variable
SQL Create Table if needed
This is a query that gets evaluated out to something like
IF NOT EXISTS
(
SELECT
*
FROM
sys.tables AS T
WHERE
T.name = 'Foo'
)
BEGIN
CREATE TABLE dbo.Foo
(
ProductName varchar(30) NOT NULL
, QuantitySold int NOT NULL
);
END
DFT Load Table
I have this set as DelayValidation = true as the table may not exist right up until it gets the signal to start.
Again, simulating your Products table, my query looks like
SELECT ProductName, QuantitySold FROM (
VALUES
('ABC', 10)
, ('xyz', 15)
, ('Testing', 25)
) Products(ProductName, QuantitySold) WHERE ProductName = 'Foo'
Wrapup
Strictly speaking, the data flow is not required. It could all be done through your Execute SQL Task if we pulled back all the columns in our source query.
Biml implemenation
Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. Here, we're going to use it to describe the ETL. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that addresses a host of shortcomings with it. Specifically, we're going to use the ability to transform a Biml file describing ETL into an SSIS package. This has the added benefit of providing you a mechanism for being able to generate exactly the solution I'm describing versus clicking through many tedious dialogue boxes.
The following code assumes you have a default instance on your local machine and that within tempdb, you have a table called Foo.
use tempdb;
GO
CREATE TABLE dbo.Foo
(
ProductName varchar(30) NOT NULL
, QuantitySold int NOT NULL
);
Save the following script into a .biml file which when you add to your SSIS project will show up under the Miscellaneous virtual folder. Right click, choose Generate SSIS Package and it should create a package called so_27320726
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="tempdb" ConnectionString="Data Source=localhost;Initial Catalog=tempdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;" />
</Connections>
<Packages>
<Package Name="so_27320726" ConstraintMode="Parallel" >
<Variables>
<Variable Name="ProductName" DataType="String">Foo</Variable>
<Variable Name="Query_Source" DataType="String" EvaluateAsExpression="true">"SELECT ProductName, QuantitySold FROM (
VALUES
('ABC', 10)
, ('xyz', 15)
, ('Testing', 25)
) Products(ProductName, QuantitySold) WHERE ProductName = '" + #[User::ProductName] + "'"</Variable>
<Variable Name="Query_TableCreate" DataType="String" EvaluateAsExpression="true"><![CDATA[replace(#[User::Query_TableCreateBase], "<Table/>", #[User::ProductName])]]></Variable>
<Variable Name="Query_TableCreateBase" DataType="String" ><![CDATA[IF NOT EXISTS
(
SELECT
*
FROM
sys.tables AS T
WHERE
T.name = '<Table/>'
)
BEGIN
CREATE TABLE dbo.<Table/>
(
ProductName varchar(30) NOT NULL
, QuantitySold int NOT NULL
);
END]]></Variable>
<Variable Name="RS_Product" DataType="Object" />
<Variable Name="TargetTable" DataType="String" EvaluateAsExpression="true">"[dbo].[" +#[User::ProductName] + "]"</Variable>
</Variables>
<Tasks>
<ExecuteSQL Name="SQL Get Rows" ConnectionName="tempdb" ResultSet="Full">
<Variables>
<Variable Name="Variable" DataType="Int32" IncludeInDebugDump="Include">0</Variable>
</Variables>
<Results>
<Result Name="0" VariableName="User.RS_Product" />
</Results>
<DirectInput>SELECT
*
FROM
(
VALUES
('ABC', 10)
, ('xyz', 15)
, ('Testing', 25)
) Products(ProductName, QuantitySold);</DirectInput>
</ExecuteSQL>
<ForEachAdoLoop Name="FELC Shred Results" ConstraintMode="Linear" SourceVariableName="User.RS_Product">
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="SQL Get Rows.Output" SsisName="Constraint" />
</Inputs>
</PrecedenceConstraints>
<Tasks>
<ExecuteSQL Name="SQL Create Table if needed" ConnectionName="tempdb">
<VariableInput VariableName="User.Query_TableCreate" />
</ExecuteSQL>
<Dataflow Name="DFT Load Table" DelayValidation="true">
<Transformations>
<OleDbSource Name="OLE_SRC Get Data" DefaultCodePage="1252" ConnectionName="tempdb">
<VariableInput VariableName="User.Query_Source" />
</OleDbSource>
<OleDbDestination Name="OLE_DST Save data" ConnectionName="tempdb" >
<TableFromVariableOutput VariableName="User.TargetTable" />
<Columns>
<Column SourceColumn="ProductName" TargetColumn="ProductName" />
<Column SourceColumn="QuantitySold" TargetColumn="QuantitySold" />
</Columns>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
<VariableMappings>
<VariableMapping Name="0" VariableName="User.ProductName" />
</VariableMappings>
</ForEachAdoLoop>
</Tasks>
<Connections>
<Connection ConnectionName="tempdb" />
</Connections>
</Package>
</Packages>
</Biml>

Related

Query cf queryObject and insert into table

I'm passing queryObject into a CFC. I can writeDump(myQryObject) and I see the queryObjects contents and all is good up to this point. I can write a select statement and dump a row(s) depending on my query - again, all good here. I need to now insert the data into a table but I'm not getting the syntax right.
The CFC is written in CFScript.
local.blkLoadQry = new Query(); // new query object
local.blkLoadQry.setDBType("query");
local.blkLoadQry.setAttributes(sourceQuery=arguments.blkdata);
local.blkLoadQry.addParam(name="batchid",value=arguments.batchID,cfsqltype="cf_sql_varchar",maxlength="36");
local.blkLoadQry.setSQL("
INSERT INTO bulkloadtemptable (
uuid
, gradyear
, firstName
, lastName
, email
)
SELECT
:batchid
, `Graduation Year`
, `Jersey`
, `First Name`
, `Last Name`
, `Email`
FROM
bulkloadtemptable_copy
WHERE uuid = :batchid
");
`Lexical error at line 10, column 17. Encountered: "`" (96), after : ""`
This is the error I'm getting but the line numbers of the errors don't line up with my expectations so that's what brings me here. :batchid would be line 10.
What am I missing?
You are attempting something impossible. Your query of queries select statement runs in ColdFusion only. There is no database connection in play.
If you want to insert data from a ColdFusion query into a database, you have to loop through the rows somehow. You can have an insert query inside a loop or a loop inside an insert query. Here is sample syntax for both.
Query inside loop.
<cfoutput query="cfQueryObject">
<cfquery datasource = "aRealDatabase">
insert into table
(field1
, field2
, etc)
values
(<cfqueryparam value = "#cfQueryObject.field1#">
, <cfqueryparam value = "#cfQueryObject.field1#">
, etc
)
</cfquery>
</cfoutput>
Loop inside query
<cfquery datasource = "aRealDatabase">
insert into table
(field1
, field2
, etc)
select null
, null
, etc
from someSmallTable
where 1 = 2
<cfoutput query="cfQueryObject">
union
select <cfqueryparam value = "#cfQueryObject.field1#">
, <cfqueryparam value = "#cfQueryObject.field1#">
, etc
from someSmallTable
</cfoutput>
</cfquery>
You can experiment to see what works better in your situation.

Execution Error, return code 1 while executing query in hive for twitter sentiment analysis

I am doing twitter sentiment analysis using hadoop, flume and hive.
I have created the table using
hive -f tweets.sql
tweets.sql
--create the tweets_raw table containing the records as received from Twitter
SET hive.support.sql11.reserved.keywords=false;
CREATE EXTERNAL TABLE Mytweets_raw (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/user/flume/tweets';
-- create sentiment dictionary
CREATE EXTERNAL TABLE dictionary (
type string,
length int,
word string,
pos string,
stemmed string,
polarity string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/data/dictionary';
-- loading data to the table dictionary
load data inpath 'data/dictionary/dictionary.tsv' INTO TABLE dictionary;
CREATE EXTERNAL TABLE time_zone_map (
time_zone string,
country string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/data/time_zone_map';
-- loading data to the table time_zone_map
load data inpath 'data/time_zone_map/time_zone_map.tsv' INTO TABLE time_zone_map;
-- Clean up tweets
CREATE VIEW tweets_simple AS
SELECT
id,
cast ( from_unixtime( unix_timestamp(concat( '2014 ', substring(created_at,5,15)), 'yyyy MMM dd hh:mm:ss')) as timestamp) ts,
text,
user.time_zone
FROM Mytweets_raw
;
CREATE VIEW tweets_clean AS
SELECT
id,
ts,
text,
m.country
FROM tweets_simple t LEFT OUTER JOIN time_zone_map m ON t.time_zone = m.time_zone;
-- Compute sentiment
create view l1 as select id, words from Mytweets_raw lateral view explode(sentences(lower(text))) dummy as words;
create view l2 as select id, word from l1 lateral view explode( words ) dummy as word ;
create view l3 as select
id,
l2.word,
case d.polarity
when 'negative' then -1
when 'positive' then 1
else 0 end as polarity
from l2 left outer join dictionary d on l2.word = d.word;
create table tweets_sentiment as select
id,
case
when sum( polarity ) > 0 then 'positive'
when sum( polarity ) < 0 then 'negative'
else 'neutral' end as sentiment
from l3 group by id;
-- put everything back together and re-name sentiments...
CREATE TABLE tweetsbi
AS
SELECT
t.*,
s.sentiment
FROM tweets_clean t LEFT OUTER JOIN tweets_sentiment s on t.id = s.id;
-- data with tweet counts.....
CREATE TABLE tweetsbiaggr
AS
SELECT
country,sentiment, count(sentiment) as tweet_count
FROM tweetsbi
group by country,sentiment;
-- store data for analysis......
CREATE VIEW A as select country,tweet_count as positive_response from tweetsbiaggr where sentiment='positive';
CREATE VIEW B as select country,tweet_count as negative_response from tweetsbiaggr where sentiment='negative';
CREATE VIEW C as select country,tweet_count as neutral_response from tweetsbiaggr where sentiment='neutral';
CREATE TABLE tweetcompare as select A.*,B.negative_response as negative_response,C.neutral_response as neutral_response from A join B on A.country= B.country join C on B.country=C.country;
-- permission to show data in Excel sheet for analysis ....
grant SELECT ON TABLE tweetcompare to user hue;
grant SELECT ON TABLE tweetcompare to user root;
-- for Tableau or Excel
-- UDAF sentiscore = sum(sentiment)*50 / count(sentiment)
-- context n-gram made readable
While executing query
SELECT t.retweeted_screen_name, sum(retweets) AS total_retweets, count(*) AS tweet_count FROM (SELECT retweeted_status.user.screen_name as retweeted_screen_name, retweeted_status.text, max(retweet_count) as retweets FROM mytweets GROUP BY retweeted_status.user.screen_name, retweeted_status.text) t GROUP BY t.retweeted_screen_name ORDER BY total_retweets DESC LIMIT 10;
this error shows:
Query ID = root_20161114140028_852cb526-011f-4a25-95c8-8c6587a88759
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/tmp/e70ec3c9-14c7-41e9-ad11-2d4528057e47_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:179)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/tmp/e70ec3c9-14c7-41e9-ad11-2d4528057e47_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. File does not exist: hdfs://localhost:9000/tmp/e70ec3c9-14c7-41e9-ad11-2d4528057e47_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/lib/warehouse</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/usr/lib/warehouse/metastore_db;create=true </value>
</property>
<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>256000000</value>
</property>
<property>
<name>hive.exec.reducers.max</name>
<value>1009</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
However, I have added the jar file to hive, the same error shows :
ADD JAR file:///usr/lib/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
Please help me fix this.
Try,
hadoop fs -put /usr/lib/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar hdfs://localhost:9000/usr/lib/json-serde-1.3.8-SNAP‌​‌​SHOT-jar-with-depe‌​nd‌​encies.jar
ADD JAR hdfs://localhost:9000/usr/lib/json-serde-1.3.8-SNAP‌​‌​SHOT-jar-with-depe‌​nd‌​encies.jar;

Quantity in stock in stock locations in Exact Online

Using the following query, I found that for items that have a stock location, there are multiple rows returned from the REST API StockLocations of Exact Online:
select spn.item_code_attr || '-' || spn.warehouse_code_attr || '-' || stn.code key
, itm.itemgroupcode
, itm.itemgroupdescription
, spn.item_code_attr
, spn.item_description
, spn.currentquantity
, spn.planning_in
, spn.planning_out
, spn.currentquantity + spn.planning_in - spn.planning_out plannedquantity
, -1 bestelniveau /* out of scope */
, itm.costpricestandard costprijs
, itm.costpricestandard * spn.currentquantity stockvalue
, spn.warehouse_code_attr
, stn.code locatie
, itm.unitcode UOM
, itm.id
, whe.id
, sln.stock
, sln.itemid
, sln.warehouse
, stn.id
from exactonlinexml..StockPositions spn
join exactonlinerest..items itm
on itm.code = spn.item_code_attr
and itm.code = 'LE-10242'
and itm.isstockitem = 1
join exactonlinerest..warehouses whe
on whe.code = spn.warehouse_code_attr
left
outer
join exactonlinerest..stocklocations sln
on sln.itemid = itm.id
and sln.stock != 0
and sln.warehouse = whe.id
left
outer
join storagelocations stn
on stn.id = sln.storagelocation
and stn.warehouse = sln.warehouse
--
-- Filter out no stock nor planned.
--
where ( spn.currentquantity !=0
or
spn.planning_in != 0
or
spn.planning_out != 0
)
and spn.item_code_attr = 'LE-10242'
order
by key
For example, for this item, there are 10 StockLocations. When I sum the field Stock, it returns the stock quantity found in StockPositions. However, it seems that every transaction creates an additional StockLocation entry.
I would expect StockLocation to contain per location in stock the total amount to be found there.
EDIT
The StockLocations API is described in https://start.exactonline.nl/api/v1/{division}/logistics/$metadata as:
<EntityType Name="StockLocation">
<Key>
<PropertyRef Name="ItemID"/>
</Key>
<Property Name="ItemID" Type="Edm.Guid" Nullable="false"/>
<Property Name="Warehouse" Type="Edm.Guid" Nullable="true"/>
<Property Name="WarehouseCode" Type="Edm.String" Nullable="true"/>
<Property Name="WarehouseDescription" Type="Edm.String" Nullable="true"/>
<Property Name="Stock" Type="Edm.Double" Nullable="true"/>
<Property Name="StorageLocation" Type="Edm.Guid" Nullable="true"/>
<Property Name="StorageLocationCode" Type="Edm.String" Nullable="true"/>
<Property Name="StorageLocationDescription" Type="Edm.String" Nullable="true"/>
</EntityType>
Somehow it is not documented at https://start.exactonline.nl/docs/HlpRestAPIResources.aspx
What am I doing wrong?
Discussed question on Hackathon with engineer. This is as the StockLocation API works; the naming does not optimally reflect the contents, but this is intended behaviour.
With a select field, sum(stock) from stocklocations group by field, you can get the right information.
To improve join performance, it is recommended to use an inline view for this such as select ... from table1 join table2 ... join ( select field, sum(stock) from stocklocations group by field).

SSIS 2008 R2 OnProgress logging for Execute SQL tasks truncates SQL statements

My Execute SQL tasks are configured to log to the automatically created dbo.sysssislog table. I'd really like to log the full text of any SQL statements in the 2K messages column but SSIS is only recording the first 40-ish characters. Is there a way to override that truncation? Am I trying to solve the wrong problem in that there's a better way to accomplish my goal?
I created a new (simple) package to demonstrate my issue ...
Expression in Execute SQL task
Logging OnProgress to the default SSIS generated table via SSIS generated proc
The truncated data in the table (note ellipsis)
What you are seeing is an artifact of how the default OnProgress logger is built (and something I never noticed). Why it is structured as it is, would be a question for Microsoft.
If you explicitly invoke sp_ssis_addlogentry then it will correctly log text up to nvarchar(2048). However, if you log the OnProgress event, the text that is passed to the stored proc has been truncated by the inner workings of the event logger.
Two notes of caution on your supplied screenshot.
You're building a string within a component's Expression builder. I strongly caution against that for the simple fact that if you have an incorrect expression in there, you have no capacity to debug it. Set a breakpoint on the object, great. You still can't inspect the value of the what you've built as it is internal to the object. Instead, I find I have better success doing all the string building within a Variable and then my Task's expression is simply #[MyVariable]
Prior to the 2012 release, there is a 4k character limit on expressions. You can initialize and assign strings greater than that length but if you attempt to use an expression to do so, it doesn't work. I can't recall now whether it truncates or flat out fails.
On with the tl;wr;
Round 1, Script approach
I created two SSIS Variables: Query and QueryLength.
The Value of Query is the following query I ran in SSMS
SELECT REPLICATE('SELECT 100 AS col UNION ALL ', 70) + ' SELECT 0'
Query is an Integer value using the following expression: LEN(#[User::Query]) It should have a value of 1968
I added a Script Task to my package and used the following code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
namespace ST_b3643f349be14c7a9d004a35aaa0422e
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
// User::Query,User::QueryLength
string query = this.Dts.Variables["User::Query"].Value.ToString();
string queryLength = this.Dts.Variables["User::QueryLength"].Value.ToString();
bool fireAgain = false;
string message = string.Format("Query => {0}", query);
this.Dts.Events.FireInformation(0, "Query", message, string.Empty, 0, ref fireAgain);
this.Dts.Events.FireProgress(message, 50, 0, 1, "query", ref fireAgain);
message = string.Format("QueryLength => {0}", queryLength);
this.Dts.Events.FireInformation(0, "QueryLength", message, string.Empty, 0, ref fireAgain);
this.Dts.Events.FireProgress(message, 50, 0, 1, "queryLength", ref fireAgain);
Dts.TaskResult = (int)ScriptResults.Success;
}
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
}
}
Finally, I turned on Logging to SQL Server and logged OnInformation & OnProgress events
I executed the package and then ran the following query
USE msdb
SELECT
S.source
, S.event
, LEN(S.message) AS lenMsg
, S.message
FROM
dbo.sysssislog AS S
WHERE
S.event = 'OnInformation'
OR S.event = 'OnProgress';
As you can see, it records more than 40ish characters.
Round 2, Execute SQL Task approach
Based on the comments below, I reworked the package to use an Execute SQL Task that invokes sp_ssis_addlogentry I'm still getting the full message logged to the message column of dbo.sysssislog
The following Biml represents my package. Install BIDS Helper, add an empty Biml file and substitute this. Correct the Data Source value in line 3 and click generate, and woosh a working package.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=msdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_28101525">
<Variables>
<Variable DataType="String" Name="Query">SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 100 AS col UNION ALL SELECT 0</Variable>
<Variable DataType="Int32" Name="QueryLength" EvaluateAsExpression="true">LEN(#[User::Query])</Variable>
<Variable DataType="String" Name="EventType" Namespace="Log">OnProgress</Variable>
<Variable DataType="String" Name="Source" Namespace="Log">OnProgress</Variable>
</Variables>
<Tasks>
<ExecuteSQL ConnectionName="CM_OLE" Name="Log it">
<DirectInput>EXECUTE dbo.sp_ssis_addlogentry
#event = ?
, #computer = ?
, #operator = ?
, #source = ?
, #sourceid = ?
, #executionid = ?
, #starttime = ?
, #endtime = ?
, #datacode = 0
, #databytes = NULL
, #message = ?;
</DirectInput>
<Parameters>
<Parameter DataType="String" VariableName="Log.EventType" Name="0" />
<Parameter DataType="String" VariableName="System.MachineName" Name="1" />
<Parameter DataType="String" VariableName="System.UserName" Name="2" />
<Parameter DataType="String" VariableName="Log.Source" Name="3" />
<Parameter DataType="String" VariableName="System.ExecutionInstanceGUID" Name="4" />
<Parameter DataType="String" VariableName="System.ExecutionInstanceGUID" Name="5" />
<Parameter DataType="String" VariableName="System.ContainerStartTime" Name="6" />
<Parameter DataType="String" VariableName="System.ContainerStartTime" Name="7" />
<Parameter DataType="String" VariableName="User.Query" Name="8" />
</Parameters>
</ExecuteSQL>
<ExecuteSQL ConnectionName="CM_OLE" Name="Run Query">
<VariableInput VariableName="User.Query" />
</ExecuteSQL>
</Tasks>
<LogProviders>
<SqlServerLogProvider ConnectionName="CM_OLE" Name="SQL Log Provider" />
</LogProviders>
<LogEvents>
<LogEvent EventName="OnProgress"></LogEvent>
</LogEvents>
</Tasks>
</Package>
</Packages>
</Biml>
That results in this package being generated.
The results of running a profiler trace while the package runs
exec sp_executesql
N'exec sp_ssis_addlogentry #P1, #P2, #P3, #P4, #P5, #P6, #P7, #P8, #P9, #P10, #P11'
, N'#P1 nvarchar(4000),#P2 nvarchar(4000),#P3 nvarchar(4000),#P4 nvarchar(4000),#P5 uniqueidentifier,#P6 uniqueidentifier,#P7 datetime2(7),#P8 datetime2(7),#P9 int,#P10 varbinary(8000),#P11 nvarchar(4000)'
, N'OnProgress'
, N'Rohan'
, N'home\billinkc'
, N'Log it'
, '795A8317-110B-423E-BFD3-2E90AB021D53'
, 'DE260D62-A2BC-4C19-A48D-CA6526F1B3EC'
, '2015-01-23 10:28:38'
, '2015-01-23 10:28:38'
, 100
, 0x
, N'Executing query "EXECUTE dbo.sp_ssis_addlogentry
#event = ?
, ...".'
This stands in contrast to the manually generated call to sp_ssis_logentry

How to Add the Result Set from a T-SQL Statement to a Data Flow?

I have a simple SSIS package, and I'd like to complicate it a little.
Right now, it executes a stored procedure in an OLE DB Source, and adds the rows returned from the stored procedure to the data flow. Then, for each row returned, it executes an OLE DB Command transform, executing a second stored procedure (in a second database), passing the columns from the source as parameters.
The second stored procedure performs a synchronization function, and I would like to log the grand total number of adds, deletes and updates. The "sync" stored procedure uses the OUTPUT clause of a MERGE statement to get this data and return it as a resultset.
I don't see a way to get this resultset out of the OLE DB Command transform. It does not allow me to add output columns.
Short of adding a Script Transform, is there a way for me to log the grand total of the add, delete and update columns?
This is not as straight forward as it ought to be. That or I need to go back to SSIS class.
The OLE DB Command component can't add new rows to the dataflow, as it's a synchronous component.
It also cannot add new columns to the data flow. That's the first thing that was non-intuitive. So you'll see in my source, I have added an ActionName column of type nvarchar(10)/string length of 10. You could add the column in a Derived Column Transformation prior to the OLE DB Command component if you so wish.
Since I can't add rows to the data flow, that means I'm only able to use an OUTPUT parameter for my proc instead of using the recordset it could generate. Perhaps your stored procedure only allows for one row to be altered at a time and this is ok but has a general code smell to me.
Table definition and set up
CREATE TABLE dbo.so_27932430
(
SourceId int NOT NULL
, SourceValue varchar(20) NOT NULL
);
GO
INSERT INTO
dbo.so_27932430
(SourceId, SourceValue)
VALUES
(1, 'No change')
, (3,'Changed');
Stored Proc
CREATE PROCEDURE
dbo.merge_27932430
(
#SourceId int
, #SourceValue varchar(20)
, #ActionName nvarchar(10) OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE
#BloodyHack table
(
ActionName nvarchar(10) NOT NULL
, SourceId int NOT NULL
);
MERGE
dbo.so_27932430 AS T
USING
(
SELECT
D.SourceId
, D.SourceValue
FROM
(
SELECT #SourceId, #SourceValue
) D(SourceId, SourceValue)
) AS S
ON
(
T.SourceId = S.SourceId
)
WHEN
MATCHED
AND T.SourceValue <> S.SourceValue
THEN
UPDATE
SET
T.SourceValue = S.SourceValue
WHEN
NOT MATCHED THEN
INSERT
(
SourceId
, SourceValue
)
VALUES
(
SourceId
, SourceValue
)
OUTPUT
$action, S.SourceId
INTO
#BloodyHack;
/* Pick one, any one*/
SELECT
#ActionName = BH.ActionName
FROM
#BloodyHack AS BH
END
Source Query
SELECT
D.SourceId
, D.SourceValue
, CAST(NULL AS nvarchar(10)) AS ActionName
FROM
(
VALUES
(1, 'No change')
, (2, 'I am new')
, (3,'I Changed')
) D(SourceId, SourceValue);
OLE DB Command setup
EXECUTE dbo.merge_27932430 #SourceId = ?, #SourceValue = ?, #ActionName = ? OUTPUT;
Results
References
OUTPUT clause
Biml
Assuming you have the free BidsHelper the following Biml was used to generate this package.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_27932430">
<Variables>
<Variable DataType="String" Name="QuerySource">
<![CDATA[SELECT
D.SourceId
, D.SourceValue
, CAST(NULL AS nvarchar(10)) AS ActionName
FROM
(
VALUES
(1, 'No change')
, (2, 'I am new')
, (3,'I Changed')
) D(SourceId, SourceValue);
]]></Variable>
<Variable DataType="String" Name="QueryCommand">EXECUTE dbo.merge_27932430 #SourceId = ?, #SourceValue = ?, #ActionName = ? OUTPUT;</Variable>
</Variables>
<Tasks>
<Dataflow Name="DFT OLEDB Test">
<Transformations>
<OleDbSource ConnectionName="CM_OLE" Name="OLESRC GenData">
<VariableInput VariableName="User.QuerySource" />
</OleDbSource>
<OleDbCommand ConnectionName="CM_OLE" Name="OLECMD Test">
<DirectInput>EXECUTE dbo.merge_27932430 #SourceId = ?, #SourceValue = ?, #ActionName = ? OUTPUT;</DirectInput>
<Parameters>
<Parameter SourceColumn="SourceId" DataType="Int32" TargetColumn="#SourceId"></Parameter>
<Parameter SourceColumn="SourceValue" DataType="AnsiString" Length="20" TargetColumn="#SourceValue"></Parameter>
<Parameter SourceColumn="ActionName" DataType="String" Length="10" TargetColumn="#ActionName"></Parameter>
</Parameters>
</OleDbCommand>
<DerivedColumns Name="DER PlaceHolder" />
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>