Disclaimer: new to SSIS and Active Directory
I have a need to extract all users within a particular Active Directory (AD) domain and import them into Excel. I have followed this: https://www.itnota.com/query-ldap-in-visual-studio-ssis/ in order to create my SSIS package. My SQL is:
LDAP://DC=JOHN,DC=JANE,DC=DOE;(&(objectCategory=person)(objectClass=user)(name=a*));Name,sAMAccountName
As you know there is a 1,000 row limit when pulling from the AD. In my SQL I currently have (name=a*) to test the process and it works. I need to know how to setup a loop with variables to pull all records and import into Excel (or whatever you experts recommend). Also, how do I know what the other field names are that are available to pull?
Thanks in advance.
How do I see what's in Active Directory
Tool recommendations are off topic for the site but a tool that you can download, no install required, is AD Explorer It's a MS tool that allows you to view your domain. Highly recommend people that need to see what's in AD use something like this as it shows you your basic structure.
What's my domain controller?
Start -> Command Prompt
Type set | find /i "userdnsdomain" and look for USERDNSDOMAIN and put that value in the connect dialog and I save it because I don't want to enter this every time.
Search/Find and then look yourself up. Here I'm going to find my account by using my sAMAccountName
The search results show only one user but there could have been multiples since I did a contains relationship.
Double clicking the value in the bottom results section causes the under pane window to update with the details of the search result.
This is nice because while the right side shows all the properties associated to my account, it's also updated the left pane to navigate to the CN. In my case it's CN=Users but again, it could be something else in your specific environment.
You might discover an interesting categorization for your particular domain. At a very large client, I discovered that my target users were all under a CN
(Canonical Name, I think) so I could use that in my AD query.
There are things you'll see here that you sure would like to bring into a data flow but you won't be able to. Like the memberOf that's a complex type and there's no equivalent in the data flow data types for it. I think Integer8 is also something that didn't work.
Loop the loop
The "trick" here is that we'll need to take advantage of the
The name of the AD provider has changed since I last looked at this. In VS 2017, I see the OLE DB Provider name as "OLE DB Provider for Microsoft Directory Service"
Put in your query and you should get results back. Let that happen so the metadata is set.
An ADO.NET source does not support parameterization as the OLE DB does. However, you can apply an Expression on the Data Flow which surfaces the component and that's what we'll do.
Click out of the Data Flow and back into the Control Flow and right click on the Data Flow and select Properties. In that properties window, find Expressions and click the ellipses ... Up pops the Property Expressions Editor
Find the ADO.NET source under Property and in the Expressions section, click the Ellipses.
Here, we'll use your same source query just to prove we're doing the right things
"LDAP://DC=JOHN,DC=JANE,DC=DOE;(&(objectCategory=person)(objectClass=user)(name=" + "a" + "*));Name,sAMAccountName"
We're doing string building here so the problem we're left to solve is how we can substitute something for the "a" in the above query.
The laziest route would be to
Create an SSIS variable of type String called CurrentLetter and initialize it to a
Update the expression we just created to be "LDAP://DC=JOHN,DC=JANE,DC=DOE;(&(objectCategory=person)(objectClass=user)(name=" + #[USer::CurrentLetter] + "*));Name,sAMAccountName"
Add a Foreach Loop Container (FELC) to your Control Flow.
Configure the FELC with an enumerator of "Foreach Item Enumerator"
Click the Columns...
Click Add (this results in Column 0 with data type String) so click OK
Fill the collection with each letter of the alphabet
In the Variable Mappings tab, assign Variable User::CurrentLetter to Index 0
Click OK
Old blog posts on the matter because I like clicks
https://billfellows.blogspot.com/2011/04/active-directory-ssis-data-source.html
http://billfellows.blogspot.com/2013/11/biml-active-directory-ssis-data-source.html
Related
TL;DR: A data flow task "works" when the package runs, but I am unable to view the query when clicking "preview" in the Data flow Source Assistant.
My working theory is that somehow the Source Assistant is unable to get the value of the parameter when previewing. I tried to resolve this problem by use of (iterations) "Parameter" and "Variable" without success. The example shown below use "Parameter".
In an SSIS Data Flow task, I have a Source Assistant with "SQL Command" as its data access mode.
SELECT h.Campus,
h.StudentName,
h.StudentId,
h.EntryDate
FROM dbo.RwsBaseEnrollmentHistory h
WHERE h.Campus = ?;
The question mark indicates a parameter, which I have defined in "Parameters".
In the OLE DB Source Editor, I made sure to map the variable to the parameter.
When I click "Preview" I get an error:
The data in question cannot be shared, but trust me when I say that when I run the package, the query works as intended. It's the inability to preview the query that I'd like to be able to fix.
What do I need to do so that I can preview the query without having to hard-code the parameter value?
I stumbled into a possible answer. It appears to work so far and I haven't been able to break it. If there is a better way of doing this I would love feedback.
This is my resolution:
Create a variable to hold the desired query (like shown in the example above). In this example, I name it EnrollmentHistory.
In the "Expression", click the [...] button and paste the query. Be sure that double-quotes surround the query.
"SELECT
h.Campus,
h.StudentName,
h.StudentId,
h.EntryDate
FROM dbo.RwsBaseEnrollmentHistory h
WHERE h.Campus = '" + # [User::Campus] + "'"
Press the "Evaluate Expression" button and press OK.
Go back to your Source Assistant and select "SQL Command from Variable" as the Data Access Mode and the Variable you just defined (in this case EnrollmentHistory) as the Variable name. You will see that the parameter is correctly resolved in the Variable Value.
There is an existing Access program that I need to learn more about. There is a button in the program that, when pressed, outputs an Excel file. It queries some tables within it for the data.
How would I view the raw SQL code that this button utilizes to generate this Excel file? From the button's properties, I can see that its On Click event is "[Embedded Macro]". Also, it's object type is "Query".
Thank you in advance.
In the On Click row (under the Event tab in Properties), when I click on the ellipse next to "[Embedded Macro]", I am presented with an expandable section containing these rows:
Object Type - Query
Object Name - Inv File Query
Output Format - Excel Workbook (*.xlsx)
Output File - (blank)
Auto Start - No
Template File - (blank)
Encoding - (blank)
Output Quality - Print
I don't see any logic (SQL, VBA, etc.) listed anywhere, however, as to how Access will construct this Excel file that it exports. How would I view this logic? I figure this logic has to be stored somewhere because the button does actually return an Excel file populated with data.
You have the name of the query used here.
The SQL thus used is in a query called Inv File Query.
So you need to display the query objects in the nav pane (assuming 2007 onwards).
So choose this:
And then all of the queries used in the application will display. If the query Inv File Query does not show, it may be hidden. (to be fair, I would choose All Access objects - but for this we choose query). And hit F11 if the nav pane does not show.
If for some reason you STILL do not see the query, then you want to display hidden objects.
To display all hidden objects then right click on the top part of the nav pane (on the query) like this:
Then:
Turn all of the options, and now you should be able to see/view the SQL used for the query called Inv File Query
Open the form in Design view
Open the Properties sheet by right-clicking on the button and choosing "Properties".
In the On Click event, where you see "[Embedded Macro]", click on the ellipse on the far right
As a pointer, once you've figured out what's going on, I'd do my best to convert it all to VBA code. This will eliminate confusion in the future, and macros are really the poor man's way out of writing code anyway. I never, ever use macros in my Access apps.
Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.
I am working with SSIS 2008. I have a select query name sqlquery1 that returns some rows:
aq
dr
tb
This query is not implemented on the SSIS at the moment.
I am calling a stored procedure from an OLE DB Source within a Data Flow Task. I would like to pass the data obtained from the query to the stored procedure parameter.
Example:
I would like to call the stored procedure by passing the first value aq
storedProdecure1 'aq'
then pass the second value dr
storedProdecure1 'dr'
I guess it would be something like a cycle. I need this because the data generated by the OLE DB Source through the stored procedure needs to be sent to another destination and this must be done for each record of the sqlquery1.
I would like to know how to call the query sqlquery1 and pass its output to call another stored procedure.
How do I need to do this in SSIS?
Conceptually, what your solution will look like is an execute your source query to generate your result set. Store that into a variable and then you'll need to do iterate through those results and for each row, you'll want to call your stored procedure with that row's value and send the results into a new Excel file.
I'd envision your package looking something like this
An Execute SQL Task, named "SQL Load Recordset", attached to a Foreach Loop Container, named "FELC Shred Recordset". Nested inside there I have a File System Task, named "FST Copy Template" which is a precedence for a Data Flow Task, named "DFT Generate Output".
Set up
As you're a beginner, I'm going to try and explain in detail. To save yourself some hassle, grab a copy of BIDSHelper. It's a free, open source tool that improves the design experience in BIDS/SSDT.
Variables
Click on the background of your Control Flow. With nothing selected, right-click and select Variables. In the new window that pops up, click the button that creates a New Variable 4 times. The reason for clicking on nothing is that until SQL Server 2012, the default behaviour of variable creation is to create them at the scope of the current object. This has resulted in many lost hairs for new and experienced developers alike. Variable names are case sensitive so be aware of that as well.
Rename Variable to RecordSet. Change the Data type from Int32 to Object
Rename Variable1 to ParameterValue. Change the data type from Int32 to String
Rename Variable2 to TemplateFile. Change the data type from Int32 to String. Set the value to the path of your output Excel File. I used C:\ssisdata\ShredRecordset.xlsx
Rename Variable 4 to OutputFileName. Change the data type from Int32 to String. Here we're going to do something slightly advanced. Click on the variable and hit F4 to bring up the Properties window. Change the value of EvaluateAsExpression to True. In Expression, set it to "C:\\ssisdata\\ShredRecordset." + #[User::ParameterValue] + ".xlsx" (or whatever your file and path are). What this does, is configures a variable to change as the value of ParameterValue changes. This helps ensure we get a unique file name. You're welcome to change naming convention as needed. Note that you need to escape the \ any time you are in an expression.
Connection Managers
I have made the assumption you are using an OLE DB connection manager. Mine is named FOO. If you are using ADO.NET the concepts will be similar but there will be nuances pertaining to parameters and such.
You will also need a second Connection Manager to handle Excel. If SSIS is temperamental about data types, Excel is flat out psychotic-stab-you-in-the-back-with-a-fork-while-you're-sleeping about data types. We're going to wait and let the data flow actually create this Connection Manager to ensure our types are good.
Source Query to Result Set
The SQL Load Recordset is an instance of the Execute SQL Task. Here I have a simple query to mimic your source.
SELECT 'aq' AS parameterValue
UNION ALL SELECT 'dr'
UNION ALL SELECT 'tb'
What's important to note on the General tab is that I have switched my ResultSet from None to Full result set. Doing this makes the Result Set tab go from being greyed out to usable.
You can observe that I have assigned the Variable Name to the variable we created above (User::RecordSet) and I the Result Name is 0. That is important as the default value, NewResultName doesn't work.
FELC Shred Recordset
Grab a Foreach Loop Container and we will use that to "shred" the results that were generated in the preceding step.
Configure the enumerator as a Foreach ADO Enumerator Use User::RecordSet as your ADO object source variable. Select rows in the first table as your Enumeration mode
On the Variable Mappings tab, you will need to select your variable User::ParameterValue and assign it the Index of 0. This will result in the zerotth element in your recordset object being assigned to the variable ParameterValue. It is important that you have data type agreement as SSIS won't do implicit conversions here.
FST Copy Template
This a File System Task. We are going to copy our template Excel File so that we have a well named output file (has the parameter name in it). Configure it as
IsDestinationPathVariable: True
DestinationVarible: User::OutputFileName
OverwriteDestination: True
Operation: Copy File
IsSourcePathVariable: True
SourceVariable: User::TemplateFile
DFT Generate Output
This is a Data Flow Task. I'm assuming you're just dumping results straight to a file so we'll just need an OLE DB Source and an Excel Destination
OLEDB dbo_storedProcedure1
This is where your data is pulled from your source system with the parameter we shredded in the Control Flow. I am going to write my query in here and use the ? to indicate it has a parameter.
Change your Data access mode to "SQL Command" and in the SQL command text that is available, put your query
EXECUTE dbo.storedProcedure1 ?
I click the Parameters... button and fill it out as shown
Parameters: #parameterValue
Variables: User::ParameterValue
Param direction: Input
Connect an Excel Destination to the OLE DB Source. Double click and in the Excel Connection Manager section, click New... Determine if you're needing 2003 or 2007 format (.xls vs .xlsx) and whether you want your file to have header rows. For you File Path, put in the same value you used for your #User::TemplatePath variable and click OK.
We now need to populate the name of the Excel Sheet. Click that New... button and it may bark that there is not sufficient information about mapping data types. Don't worry, that's semi-standard. It will then pop up a table definition something like
CREATE TABLE `Excel Destination` (
`name` NVARCHAR(35),
`number` INT,
`type` NVARCHAR(3),
`low` INT,
`high` INT,
`status` INT
)
The "table" name is going to be the worksheet name, or precisely, the named data set in the worksheet. I made mine Sheet1 and clicked OK. Now that the sheet exists, select it in the drop down. I went with the Sheet1$ as the target sheet name. Not sure if it makes a difference.
Click the Mappings tab and things should auto-map just fine so click OK.
Finally
At this point, if we ran the package it would overwrite the template file every time. The secret is we need to tell that Excel Connection Manager we just made that it needs to not have a hard coded name.
Click once on the Excel Connection Manager in the Connection Managers tab. In the Properties window, find the Expressions section and click the ellipses ... Here we will configure the Property ExcelFilePath and the Expression we will use is
#[User::OutputFileName]
If your icons and such look different, that's to be expected. This was documented using SSIS 2012. Your work flow will be the same in 2005 and 2008/2008R2 just the skin is different.
If you run this package and it doesn't even start and there is an error about the ACE 12 or Jet 4.0 something not available, then you are on a 64bit machine and need to tell BIDS/SSDT that you want to run in 32 bit mode.
Ensure the Run64BitRuntime value is False. This project setting can be found by right clicking on the project, expand the Configuration Properties and it will be an option under Debugging.
Further reading
A different example of shredding a recordset object can be found on How to automate the execution of a stored procedure with an SSIS package?
I have a task that I am working on that has me stumped. Hoping you can help me. I am using a data flow task which is basically inserting a row into a sqlite table. I was doing this using a "SQL Task" but unfortunately the only way to successfully insert a guid into the sqlite table is to convert it as a byte stream using the data flow task. I do not want to use a source database because my data is not flowing from one table to another. I really just want to take my populated variables and convert them to a byte stream which i can then insert successfully into a sqlite database. The issue is, i cannot use a dataflow task without a source database.
My work-around so far has been to declare a source database/table and only one column (but never use it in the data flow). This works fine and I am unable to insert the row into sqlite using my pre-set variables, but i am left with a somewhat annoying message in my Output log every time i do this:
Warning: 0x80047076 at , SSIS.Pipeline: The output column "" (117) on output "OLE DB Source Output" (11) and component "OLE DB Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
Anyone know of a good way to get this warning not to show up?
In your dataflow choose a Script Component.
When prompted to choose Source, Destination, or Transformation, choose Source.
Add your pre populated variables to the CustomProperties.ReadOnlyVariables section of the script tab.
Go to the Inputs and Outputs section.
Add a column to the default output for each of your variables.
In your script (if using C#) put something similar to the following in the CreateNewOutputRows() section
Output0Buffer.AddRow();
Output0Buffer.ContainerName = Variables.ContainerName;
Output0Buffer.TaskName = Variables.TaskName;
Output0Buffer.TaskStartDate = Variables.ContainerStartTime;
Save your script.
Connect your script component to your destination object.
If this is causing your package execution to get failed, you got an option of ignoring these warnings/errors..
Just double click the Source block in Dataflow and navigate to the last tab("Error OUtput") in left side pane and you need to select the option to ignore the errors. (I dont know eactly what phrase in that option will do it )