Since excel source has constatnt problems with truncating either numbers or texts - can't get it to work properly with mixed data in one column, i've figured out Powerquery source would be the answer.
I managed to import one file.
Now i try to iterate over all files in the folder.
Problem is in Description of Connection manager - can I somehow use wildcards for all files ? otherwise it crashes with error for incorrect credentials.
As of connection manager - no problem as i can use expressions to use variables
As far as I know Power Query still on preview, very limited compared with all the functions in the for example Power BI desktop version.
In your case do the query using Power BI Desktop, select new source > from Folder, do the transformations and the copy and paste the code in SSIS PQY. In that way you don't have to resort using wildcards in the SSIS flow to iterate over files in the same folder.
Related
I'm trying to open the following XLS file in SSIS:
https://drive.google.com/file/d/1E_fNSlRTMuoYnH7VERFB8hXbcxssKSGr/view?usp=sharing
I can open it in Excel, without any error or warning from Excel.
But When I try to open it in SSIS or even In PowerBi, I get the following message: "External table is not in the expected format". If I open it in Excel and then Save again in the same XLS format, I can open it in SSIS.
I've installed the following OLE DB Drivers:
AccessDatabaseEngine_X64 (x64)
AccessDatabaseEngine (x86)
And I've tried with the following providers:
Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=Excel 12.0;
Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=Excel 8.0;
Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=Excel 8.0;
Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=Excel 5.0;
Any idea about why the file is not opening in SSIS?
I don't want to be opening every file, every day, because there are many files every day that I need to load.
I'm using Visual Studio 2019 with projet compatibility for SSIS 2017.
Thanks!
The first issue that the excel reader is going to have is the image sitting there throws the tooling off. As soon as I deleted the image and saved it, the tooling started to work.
The next problem you're going to run into is that you need to skip the first N rows before your data begins. Since there's no functionality in the JET driver to do that, you're going to need to do some magic to work with the data set.
Google the terms Excel, IMEX and registry keys and you'll get into the voodoo of Excel type inference (based on the first 8 rows) and it's ugly.
At this point in my career, I either push back and ask for a cleaner extract of data from the provider. Otherwise, I increase the estimate and write a custom Script Component Source that uses the JET/ACE drivers to extract the data and then shape and type the data into my data flow.
I have around 500 SSIS package. I wanted to get the list of SSIS package where linked server is used. The reason we have to get this is that we are now removing the linked server. I don't want to open every SSIS package and check all the task to see if link server is available.
Is there any way we can do this?
If you don't want to use powershell, I use something called FnR.EXE which you can google and download. Again you just search through the XML which is just a text file. If you know that names of your linked servers, that's good. If you don't know the names of your linked server you'll have to search for something of the form %.%.%.%. It would be much more reliable to know all the linked server names (it should be quicker to work that out than go through all of your packages)
You also need to consider if your package uses a view which in turn references a linked server. Then the linked server name won't actually appear in the package.
It's not really an answer, however it is too long for comment. You could simply search given text in SSIS packages. Those are nothing more than xml files.
You could use f.ex. PowerShell:
Get-ChildItem -recurse | Select-String -pattern "YOUR_LINKED_SERVER" | group path | select name
This will at least give you list of packages with linked server. Then depending on where is you linked server you might want to:
If it's SQL strings, just replace its name with empty string (PowerShell or something else)
If in other components, you might want to look into Microsoft.SqlServer.Dts.Runtime name space and write either PowerShell script or .NET app and alter files from code.
Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m
I'm posting it here because I couldn't' find any such scenario on the web so far. I have a webpage which contains a set of reports both in XLS and PDF formats. I should be downloading the excel files from the page and load into my database. I wish I could use the URL for XLS file directly but the problem is the naming convention may keep changing every time (Sales_Quarter1.xlsx can be Sales_Q1.xlsx the next year). The only thing that would be constant in the following example is "Sales for Calendar Year". I should be looking up for the file that corresponds to this text and download it before loading it into database table.
I would like to know from experts if this would be possible?
<li>
<sub>Sales for Calendar Year 2015--All Countries </sub>
<a href="/Data/Downloads/Documents/Sales/Sales_Quarter1.xlsx">
<sub>[XLS]</sub></a><sub> , <sub>[PDF]</sub><sub></sub></sub>
</li>
PS: I am using SQL Server 2014.
Thanks!
Have a look at Integration Services. Create a package for both pulling the web page using a script task, along with a variable name that will represent your downloaded, local filenames for the html file and excel files (you will also have to parse the link out of the html file). Then utilize an Excel Source next in your package.
The variable name for the excel file used in the script task will need to be set to ReadWrite as well.
You can also schedule the resulting package execution via SQL Agent job, if you plan to run this on a reoccurring basis, placing logic into the script or the execution paths,
I'm working with legacy tsql code that outputs to txt files.
For security reasons, I'm replacing these outputs with SSIS packages.
I've gotten most of them to work, but one particular one gives me the following error:
TITLE: Microsoft Visual Studio
Cannot create connector.
The destination component does not have any available inputs for use in creating a path.
The data flow itself is very simple. OLE DB Source runs an SQL command, then outputs to a flatfile source that points to an existing txtfile that was created by the TSQL.
Anyone know what the error means in regards to the available inputs?
Your SSIS toolbox is divided into 3 general groupings (pre 2012/2014)
Sources
Transformations
Destinations
A source has 1 to N output paths. Nothing can feed into a source. Things can only consume what a Source emits.
A Transformation does not generate* rows, it accepts rows from an upstream provider (either a Source or another Transformation). A Transformation has 1 to N output paths.
A Destination is the terminus for data. I'm not aware of any destinations that accept more than one input. It has one optional output path, Error.
Your problem, therefore, is that you are trying to route data into a Source. Change that to a Flat File Destination.
Replace the Flat File Source with a Flat File Destination task. Click on the RowCount task and drag the green arrow to the new destination.