Is there a way to handle comma separated data in a particular column in Abinitio - ab-initio

I do have a source team data sent as below:
Name
ID
StateCode
Abc
1
CA,GE
Xyz
2
IL,MA
As you can see the StateCode column has again comma separated values. Can anyone suggest how can the above records read in Ab Initio?
Name and ID are fixed length.

I've handled this in the past making a character that is not part of the data as the delimiter. The one I have used frequently is '\x01'. Another way is to use quotes around the data field. You can use read csv component to specify that option. Let me know if you have more questions. If you're looking to read a column with multiple values separated by commas, you could read them into a vector.

Related

Preserving decimal values in SSIS

I have a column from my .csv file coming in with values as 1754625.24 etc,. where as we have to save it as integer in our database. So am trying to split number with '.' and divide second part with 1000 (24/1000) as i want 3 digit number.
so i get 0.024. But i am having issues storing/preserving that value as a decimal.
I tried (DT_DECIMAL,3) conversion but i get result as '0'.
My idea is to then append '024' part to original first part. So my final result should look like 1754625024
Please help
I am not convinced why would you store 1754625.24 as 1754625024 when storing it as int.
But still for your case , we can use a derived column task and
use Replace command on the source column of csv. E.g.
Replace('1754625.24','.',0)

SQL RegEx to handle comma separated IDs

I have a string that denotes which users are allowed to access something. For instance, if user 1, user 2, and user 3 could access it, the accessibility column would contain 1,2,3. If only user 1 could access it, it would only be 1 and so forth.
I know I can't do a simple CONTAINS clause because searching for 1 could return true for 14,2,3. How would I get a regex to accommodate when there is a comma on both sides, on one side, or neither of the ID number?
Here is a sample of what I'm trying to do
DataID: 1
Accessibility: "1,2,3,4,5"
Data: "secret stuff"
DataID: 2
Accessibility: "5,6,7,8,9"
Data: "more secret stuff"
I need to tell the regex to search for a number and to make sure its at the beginning of the string and the end of the string if it has no commas around it, is at the beginning of the string if it only has a comma after it, is at the end of a string if it only has a comma before it, or if it commas on both sides that's fine because it's in the middle of the string.
I know what I need to do, but don't know how to achieve it. Thanks.
First, you have a really bad data structure for several reasons:
The proper way to store lists in SQL is using tables, not strings.
The proper way to store integers in SQL is as integers, not strings.
Ids should be defined with a proper foreign key relationship, which you cannot do when the id is stored in a string.
Sometimes, we are stuck with other people's bad design decisions. That is, we are unable to create a proper junction table, with one column for the DataId and each user who has access to it.
In that situation, you can use the find_in_set() functionality in MySQL. This does not require a regular expression. You can just write:
where find_in_set($user, accessibility) > 0
Since A-Z, 0-9, and underscore are considered word boundaries, you could generalize like this:
-- word-bound DataID, e.g. 1 becomes \b1\b
SELECT '\b' || DataID || '\b' AS DataID_Bound FROM USER
WHERE REGEX_LIKE(DataID_Bound, Accessibility)
That way it doesn't matter if there is a comma leading, trailing, or if it's a sole occupant of the search subject. But it deffinitely cannot match 14 or 21, etc. \b1\b will only match solo 1, \b14\b will only match whole word 14, etc.

What should be the Table structure for following situation

I have a situation where i need to store "Error Type" which has following options
I want to know what is the best way I should use to create my table "Error".
So either I take a "VARCHAR" data type and store values like "1,3,4" (Comma Separated) if "Take Off, Details and Legend" is selected and parse it when getting in view
OR
Take separate column for each field in table with data type "TINYINT" like "IsTakeOff" , "IsSpecifications" ,"Details" etc.
Please advice
Thanks
If user with id 1 select 1,3 and 4 then you can use following
Don't use multiple columns unless you are very confident the number won't grow.
Otherwise use a many-to-many association table - one with columns for the id of the item and the id of the error
One way to go might be to do it as an int and then in your code do something like this PHP, I don't know language you are using but most languages have a switch so it shouldn't be to hard to translate to another language.
$row // The row from the database.
switch ($row['Error_Type'])
{
case ('0'):
{
// Do something.
break;
}
...
}
You can use comma seperated column here that is varchar because you know that no of records will not grow here. And this is limited to only 5 values. And while searching the field you can use MySQL FIND_IN_SET which is very effective for this kind of situations. FIND_IN_SET will take two parameters. Your search keyword and the comma seperated string.

Why does SSIS TOKEN function fail to count adjacent column delimiters?

I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.

Convert datatypes in Access Insert

Ok here is my problem. I have a csv file that is created out of my control that has a data for different groupings on the same file. The first seven lines are table headers for each group which are different for each group. So first I import this file into Access into a single table. I have since created queries to access the individual groups for data analysis. The problem is that I need to use an expression on one of the fields but since it has to be text in order to import from the spreadsheet because each column contains numbers and characters because of the headers in the top and because sometimes the data is not in the correct column and needs to be massaged. So what I want to do is insert each group into their own table but I want to convert some of the columns to numbers so that my expression will work. I will post the expression that I am having problems with. Thanks.
Sum(IIf([2000 Query].[Field19]=1,IIf([5000 Query].[Field21]>0,-[5000 Query].[Field21],[5000 Query].[Field21]),[5000 Query].[Field21])) AS [ADJ Invoice Total]
CDec:
IIf(CDec([2000 Query].[Field19])=1 ...
It works like so:
?cdec(" 20,121.34 ")
20121.34
So commas and leading and trailing spaces should be okay.
CDec is available in VBA but not in MS Access queries. In queries, Val will work:
IIf(Val([2000 Query].[Field19])=1 ...
Or CDbl, which will accept comma thousand separators and leading and trailing spaces.