Load big one line flat json file in ssis - json

I am trying to load a big file which basically is a json format flat file from my local drive to SQL Server by using SSIS. It's a one line file and I don't need to specify columns and rows as I am going to parse it as soon as it's in SQL Server by OPENJSON.
but when I tried to create Flat File Source in Visual Studio SSIS, I was not able to do that as even I used 'fixed width' format according to the solution here: import large flat file with very long string as SSIS package, as the max width seems to be 32000, while the json file could be much bigger.
here are my settings:
There are other options of loading the data by t-sql like OPENROWSET but we have SQL Server instance installed on another server rather than the same one we are doing our dev work. So there are some security limits between them.
So just wondering if this is the limitation of Flat File Source in SSIS or I didn't do it right?

You're likely looking for the Import Column transformation. https://learn.microsoft.com/en-us/sql/integration-services/data-flow/transformations/import-column-transformation?view=sql-server-ver15
Define a Data Flow as OLE Source -> Import Column -> OLE Destination.
OLE Source
Really, any source but this is the easiest to reproduce
SELECT 'C:\curl\output\source_data.txt' AS SourceFilePath;
That will add a column named SourceFilePath with a single row.
Import Column
Reference the article on Import Column Transformation but the summary is
Check the column that will provide the path
Add a column to the Import Column Collection to hold the file content. Change the data type to DT_TEXT/DT_NTEXT depending on your unicode-ness and note the LineageID value
Click back to Import Column Input and find the column name. Scroll down to the Custom Properties and use the LineageID above for FileDataColumnID where it says 0. Otherwise, you have an error of
The "Import Column.Outputs[Import Column Output].Columns[FileContent]" is not referenced by any input column. Each output column must be referenced by exactly one input column.
OLE DB Destination
Any data sink will do but the important thing will be to map our column from the previous step to a n/varchar(max) in the database.

Related

Using a variable as the input for a Conditional Split control

Might be going about this completely the wrong way - happy to be shown the error of my ways.
In a nutshell, I've got 50-odd files of mixed types (csv and excel) that I want to import (each file to its own table) to an SQL database.
In the control flow I've got an sql task that returns:
The source data filename
The source data filetype (csv / xlsx)
What I want to name the table to import to.
This object gets passed to a Foreach loop that loops through this object and puts these 3 fields into variables.
I want to then say "if the filetype variable is csv, go and do a flat file import. If it's .xlsx, go and do an excel import"
So inside my for each container I've got a dataflow task.
I want the first thing the dataflow task does to check the filetype variable, and then do the appropriate import.
I think it's got to be in the dataflow, because there isn't an "If" style control I can see in the control flow?
But I'm at a loss as to how I pass a variable into the conditional split.
Any thoughts welcome.
OR! - just had a thought. Is the best way to do this to get a list of all the csv file types, process them in a dataflow, then get a list of all the .xlsx ones and process them - so I'd have:
Get csv filenames & tablenames
for each to loop through these
dataflow to import data from csv
get xlsx filenames and tablenames
for each through these
dataflow to import data from xlsx.
Just doesn't seem as elegant?
Cheers

Source File Connection (Flat File) - Not reading column metadata

When I create the SSIS package it requires a file to be referenced to pick up the files metadata. For example the column headers will be ColumnA, ColumnB.
I have always assumed that these column names need to be present in the file for it to be loaded. Recently business, for whatever reason, changed one of the column names in the file to something else so the file contains ColumnA, NotColumnB. When the SSIS package runs it ignores this and loads the file. I assumed that it would fail. Is my assumption correct and there is something weird going on or is my assumption incorrect, if so please let me know why.
I have changed the column names in a few other packages that load data from a file and they also dont care what the column names are
Click on the flat file source, and press F4 to show the properties tab. There are a property called ValidateExternalMetadata change it to True.
For more information check the following answer:
Detect new column in source not mapped to destination and fail in SSIS
Update 1
It looks like that flat file connection manager has no validation engine and the metadata defined is used at configuration time to configure the mappings between the data file and the database.
Why Does't SSIS Flat File Data Check If Columns Names or Order Have Changed? What is best way to check?
Flat file destination columns data types validation

How to Write the File and File Path to table

I have a SSIS package - which within a FOR LOOP CONTAINER I look in a particular location, for a particular file format and import it into a database.
This is working fine - when I have two files the contents of both files are being imported.
So I have a Variable Mapping under my ForLoop which records the fully qualified name. What I want to do is when I import the file is I am also recording the file path of where it has come from.
I'm unsure in my dataflow task where I would put that ? Under the data flow I have my source file and a destination.
I tried to have a sql task after the data flow that updated the field in the database with the variable (via Parameter Mapping), but that set the field to the same value for everything (the last file path found) which is not what I'm after.
Any advice would be welcome
In your dataflow task, in between your source and destination add a Derived Column transformation. This will add columns to your dataset with a name and value that you specify. If you reference variables in which you are storing the file name for your loop container, the name of the file being accessed will be appended to an additional column in your dataset. Obviously you need to make sure that this column is present in your destination table.

SSIS Data conversion Package Error

This is what happens every time I try to run the package:
It appears this error is coming from a data flow task where you are trying to apply a text or Excel file source and import to a database destination. The initial errors, which are likely causing the later ones, are due to an inconsistency in the data types. Some of the source fields are defined as a Unicode where a non-Unicode is expected. The message shows this is taking place with columns VILLE, HABITATION, and PROFESSION.
This can be corrected by inserting between the source and destination data flow tasks a Data Conversion. Here you can convert the data types creating new fields that can be applied in the destination task mapping.
Hope this helps.

How to import a fixed width flat file into database using SSIS?

Does any one have a tutorial on how to import a fixed width flat file into a database using an SSIS package?
I have a flat file containing columns with varying lengths.
Column name Width
----------- -----
First name 25
Last name 25
Id 9
Date 8
How do I convert a flat file into columns?
Here is a sample package created using SSIS 2008 R2 that explains how to import a flat file into a database table.
Create a fixed-width flat file named Fixed_Width_File.txt with data as shown in the screenshot. The screenshot uses Notepad++ to display the file contents. It has the capability to show the special characters like carriage return and line feed. CR LF denotes the row delimiters Carriage return and Line feed.
In the SQL server database, create a table named dbo.FlatFile using the create script provided under SQL Scripts section.
Create a new SSIS package and add a new OLE DB Connection manager that would connect to the SQL Server database. Let's assume that the OLE DB Connection manager is named as SQLServer.
On the package's control flow tab, place a Data Flow Task.
Double-click on the data flow task and you will be taken to the data flow tab. On the data flow tab, place a Flat File Source. Double-click on the flat file source and the Flat File Source Editor will appear. Click the New button to open the Flat File Connection Manager Editor.
On the General section of the Flat File Source Editor, enter a value in Connection manager name (say Source) and browse to the flat file location and select the file. This example uses the sample file in the path C:\temp\Fixed_Width_File.txt If you have header rows in your file, you can enter a value 1 in the Header rows to skip textbox to skip the header row.
Click on the Columns section. Change the font according to your choice I chose Courier New so I could see more data with less scrolling. Enter the value 69 in the Row width text box. This value is the sum of width of all your columns + 2 for the row delimiter. Once you have set the correct row width, you should see the fixed width file data correctly on the Source data columns section. Now, you have to click at the appropriate locations to determine the column limits. Note the sections 4, 5, 6 and in the below screenshot.
Click on the Advanced section. You will notice 5 columns created for you automatically based on the column limits that we set on the Columns section in the previous step. The fifth column is for row delimiter.
Rename the column names as FirstName, LastName, Id, Date and RowDelimiter
By default, the columns will be set with DataType string [DT_STR]. If we are fairly certain, that a certain column will be of different data type, we can configure it in the Advanced section. We will change Id column to be of data type four-byte signed integer [DT_I4] and Date column to be of data type date [DT_DATE]
Click on the Preview section. The data will be shown as per the column configuration.
Click OK on the Flat file connection manager editor and the flat file connection will be assigned to the Flat File Source in the data flow task.
On the Flat File Source Editor, click on the Columns section. You will notice the columns that were configured in the flat file connection manager. Uncheck the RowDelimiter because we won't need that.
On the data flow task, place an OLE DB Destination. Connect the output from the Flat file source to the OLE DB Destination.
On the OLE DB Destination Editor, select the OLE DB Connection manager named SQLServer and set the Name of the table or the view drop down to [dbo].[FlatFile]
On the OLE DB Destination Editor, click on the Mappings section. Since the column names in the flat file connection manager are same as the columns in the database, the mapping will take place automatically. If the names are different, you have to manually map the columns. Click OK.
Now the package is ready. Execute the package to load the fixed-width flat file data into the database.
If you query the table dbo.FlatFile in the database, you will notice the flat file data imported into the database.
This sample should give you an idea about how to import fixed-width flat file into database. It doesn't explain how to handle error logging but this should get you started and help you discover other SSIS related features when you play with packages.
Hope that helps.
SQL Scripts:
CREATE TABLE [dbo].[FlatFile](
[Id] [int] NOT NULL,
[FirstName] [varchar](25) NOT NULL,
[LastName] [varchar](25) NOT NULL,
[Date] [datetime] NOT NULL
)
In the derived column transformation you can use SUBSTRING() function for each of the column.
Example:
Columns DerivedColumn
FirstName SUBSTRING(Data, startFrom, Length);
Here the FirstName has width 25 so if we consider that from the 0th position then in the derived column you should specify it by giving SUBSTRING(Data, 0, 25);
Similarly for other columns.
Very well explained, Siva! Your tutorial and excellent illustrations point out what Microsoft should have made clear
that the width for a fixed length row has to include the Carriage Return and Line Feed (CR & LF) characters (which I figured out because the preview showed the rows were not lining up correctly)
the all important step of defining an extra column to contain those CR & LF characters, even though they won't be imported. I figured this out, too. I would have benefited by finding your answer before I began.
Without those two things, an attempt to run the import will give this error message:
The data conversion for column "Column x" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
I have added in this error text in hopes someone will find this page while searching for the cause of their error. Your turorial is worth finding, even if after the fact!