I do have a source team data sent as below:
Name
ID
StateCode
Abc
1
CA,GE
Xyz
2
IL,MA
As you can see the StateCode column has again comma separated values. Can anyone suggest how can the above records read in Ab Initio?
Name and ID are fixed length.
I've handled this in the past making a character that is not part of the data as the delimiter. The one I have used frequently is '\x01'. Another way is to use quotes around the data field. You can use read csv component to specify that option. Let me know if you have more questions. If you're looking to read a column with multiple values separated by commas, you could read them into a vector.
I would like to transpose a table like the one below:
into this:
I wanted to mention that the files are CSV files.
Thanks,
There is a solution to this, but it's inelegant and inefficient and may not work incase of a huge dataset (may run out of memory).
You can denormalise the whole input, by defining all the schema columns in the tDenormalize component, pass it across to a tMap to concatenate all the columns using a special character in between. The special character is just an identifier for the next component we are going to use.
Connect the tMaps output to a tNormalize and use the special character as the Item Seperator while the column to normalise should be the only column (which you concatenated to in the previous tMap) available.
This should do what you're looking for. If you wish to process the data after this instead of just transposing, you can use tExtractDelimitedFields component and use the "," as your field Seperator since its a csv.
Today I did something stupid: I had a list of card numbers in an excel file that I had to import to a DB table somehow. So i exported the numbers to CSV file, but without any quotes (don't ask me why). The file looked like:
123456
234567
345678
...
Then I created a table with a single VARCHAR(22) column and did a
LOAD DATA LOCAL INFILE 'numbers.csv' INTO TABLE cards
This worked fine, apart from many warnings, which I ignored (the other stupid thing I did).
After that I tried to query with this SQL:
SELECT * FROM cards WHERE number='123456'
which gave me an empty result. Whereas this works:
SELECT * FROM cards WHERE number=123456
Notice the missing quotes! So it seems, that I managed to populate my VARCHAR table with INTEGER data. I have no idea how that is possible at all.
I already tried to fix this with an UPDATE like this
UPDATE cards SET number = CAST(number AS CHAR(22))
But that didn't work.
So is there a way to fix this and how could this even happen?
This is the result of some implicit conversion in order to do a numerical comparison:
SELECT * FROM cards WHERE number='123'
This will only match against text fields that are literally "123" and will miss on " 123" and "123\r" if you have those. For some reason, "123 " and "123" are considered "equivalent" presumably do to trailing space removal on both sides.
When doing your import, don't forget LINES TERMINATED BY '\r\n'. If you're ever confused about what's in a field, including hidden characters, try:
SELECT HEX(number) FROM cards
This will show the hex-dumped output of each string. Things like 20 indicate space, just as %20 in a URL is a space.
You can also fix this by:
UPDATE cards SET number=REPLACE(number, '\r', '')
I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.
Working with a MS Access database, using one particular table, and scattered throughout the table at varying positions in date columns (which themselves can be in varying orders as a result of the data import) is the text "Not known". I want to replace occurrences of that text string across the whole data table.
The only way I can think of doing it is export to a csv format, and do a REReplace then import the data again, but I would like to know if there is a 'slicker' way?
The columns contain data which is a data import from a csv file so all the columns are text, they can contain a mix of "date string", text, numbers (as string) and null.
You can use replace, it follows basic TSQL implementation :
http://msdn.microsoft.com/en-us/library/ms186862.aspx
Here is an example I did updating the customers table of the Northwind sample database:
update customers set Customers.[Job Title] = replace( Customers.[Job Title], 'Purchasing', 'Manufacturing');
So to distill it into a generic example :
update TABLENAME set FIELD =
replace( FIELD, 'STRING_TO_REPLACE', 'STRING_TO_REPLACE_WITH' )
That updates the entire table in one statement. Be careful ;)
You can do this using Access, running edit-replace command. If you need to do this in code - you can open recordset, loop through records and for each field run:
rst.fields(i)=replace(rst.fields(i),"Not known","Something")
this is how it works in VBA, beleive you can do something similar in coldfusion
Why not just open the CSV file in Notepad++ (or similar) and do a Find/Replace?