transpose a table using talend - csv

I would like to transpose a table like the one below:
into this:
I wanted to mention that the files are CSV files.
Thanks,

There is a solution to this, but it's inelegant and inefficient and may not work incase of a huge dataset (may run out of memory).
You can denormalise the whole input, by defining all the schema columns in the tDenormalize component, pass it across to a tMap to concatenate all the columns using a special character in between. The special character is just an identifier for the next component we are going to use.
Connect the tMaps output to a tNormalize and use the special character as the Item Seperator while the column to normalise should be the only column (which you concatenated to in the previous tMap) available.
This should do what you're looking for. If you wish to process the data after this instead of just transposing, you can use tExtractDelimitedFields component and use the "," as your field Seperator since its a csv.

Related

BCP CHAR value to Snowflake

I am trying to create a BCP file with | delimiter and then load it to a snowflake table.
Issue:
in SQL server there are columns defined as CHAR(4) and have values "sss"
so when i do BCP the its being padded to length of 4 "sss " and being loaded to snowflake
due to which our reports are failing because they do something like where column="SSS" but due to trailing space in snowflake the correct columns are not showing up.
we do not want to change our reports. So, is there a way that BCP can handle the padding or trimming of these columns?
note that there 24 tables and each have around 130+ columns so i cant go and put Trim functions on each char column
If your BCP file is maintaining the trailing space, then Snowflake will retain it, too, as long as the field is being FIELD_OPTIONALLY_ENCLOSED_BY a " or '. You may also want to make sure your TRIM_SPACE option is correctly set on your format definition for your COPY INTO command.
If your BCP file isn't maintaining the space and you can't figure out how to get that to work, you could force the space back in during the COPY INTO command with some string functions in your SELECT, or you could create a view for your report that does the same set of string functions to force the space for your report to work from.
So, is there a way that BCP can handle the padding or trimming of these columns?
Yes, but not by some switch or option. The correct way to handle this is to set your datatypes up front. As someone mentioned in comments to your question, your query that is creating BCP output should use VARCHAR(4) instead of CHAR(4). BCP is giving you what you asked of it. They way to avoid whitespace is to use varchar.
Seems like a fairly quick "find and replace" against scripted out query objects would work fine but you know your situation best.
Additionally, "trim" wont work - FYI. Even if the value of the field was only "SSS" (as in your example); if the result/column is defined as CHAR(4) you will get 4 bytes of data and a blank in the 4th place since you only had 3 bytes of data. Trim will work during the query... the padded " " you are getting is placed there by the copy out. The way to correct this is to set your data types as you need up front.
Unless someone knows of a better way in snowflake (im not familiar with it) the only other option is to manipulate the file inbetween SQL and Snowflake. replace " |" with "|"... but... blech.
This is a known "issue" with BCP. The "solution" is to use the queryout option, which means you must include a query with every export. But the data are the way they are.
Eg: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/88c258fe-d1a6-4f3a-9dac-40388d04e9c7/remove-space-in-columns-on-bcp-out?forum=transactsql
But this is really a Snowflake problem, because Snowflake has its own default CHAR semantics.
You get a warning in the documentation String & Binary Data Types but that doesn't tell the whole truth.
The following executed on Oracle (and apparently MSSQL? MySQL?) will select the aaa line:
CREATE TABLE C AS SELECT CAST('aaa ' AS CHAR(4)) t FROM DUAL;
SELECT * FROM C WHERE t = 'aaa';
but won't on Snowflake, unless you create the column with COLLATION:
CREATE OR REPLACE TABLE C (t CHAR(4) COLLATE 'en_US-rtrim');
INSERT INTO C VALUES('aaa ');
SELECT * FROM C WHERE t = 'aaa';
Unfortunately, you can't ALTER the collation after creation, which would have been convenient after a COPY INTO <table>.
PS: Mike Walton's answer is better, TRIM_SPACE is much cleaner than COLLATE.

How to remove unwanted columns and fields in Notepad++

I have a feed with the following columns:
product_name,description,aw_product_id,store_price,merchant_image_url,merchant_deep_link,merchant_category,merchant_product_id
Each line afterwards has all the information in this order. I only require the product_name for each line, not everything that comes afterwards.
So my question is, how do I remove everything and only keep the product_name?
You could use a regex to replace the comma and everything after it with nothing:
Search: ,.*
Replace: (nothing)
As you want the first column, you can just use regex to extract the data, however things would be a lot more trickier if you wanted a column from the middle.
If that's the case, importing into a spreadsheet program such as Excel as a CSV file will extract all the data into columns which then allows you to highlight that column (or columns) and extract the data as necessary.
You could use the Column mode (ALT + Mouseselect) to select only the part (column) you want.
This could be tricky if the product name length is very unequal.
An other way would be Find+Replace with a clever RegEx. Thats what I would do in your case.
As the product name is the first column, deleting everthing behind the comma should do the trick. So use this regex and replace with an empty string:
Find: ,[\w]*
Replace:
To remove the 6th column from a CSV file:
Find:(.*?)(,.*?)(,.*?)(,.*?)(,.*?)(?:,.*?)(,.*)
Replace:${1}${2}${3}${4}${5}${6}
Search Mode: Regular Expression

How can correctly save a csv with values list from neooffice / openoffice?

I need something like this:
Attivato;Nome;Categorie;Prezzo tasse escluse;Descrizione;Immagini
1;"Bracciale rock";11,12,13;130;"This is a long description.";http://s20.postimg.org/r08w8i4i5/perle.jpg,http://s20.postimg.org/tmjtbp6bx/bracciale.jpg
But if I open it with neooffice calc (or anyway in some spreadsheet program) it then export like this, at the best:
Attivato;Nome;Categorie;Prezzo tasse escluse;Descrizione;Immagini
1;"Bracciale rock";"11,12,13";130;"This is a long description.";"http://s20.postimg.org/r08w8i4i5/perle.jpg,http://s20.postimg.org/tmjtbp6bx/bracciale.jpg"
It won't retain things like 11,12,13 without converting them to strings
How can I fix this?
I tried really everything but no way... Tried any kind of import/export options, different programs, etc... I cannot do it.
I finally found a couple of ways.
(1) In neooffice/openoffice:
left empty the text separator field
check the 'detect special numbers' option
(2) As another alternative, I used google docs. That's good also to use for clients who usually use exel. Google docs/drive saves csv files in utf-8 encoding by default.
Also, for the problem of "string conversion" of multiple values like 5,7,9,6, I found that if you use semicolons (;) instead of commas, that works (I mean, it doesn't add "" when you save as csv AND doesn't read them as dates or other wrong data types). And in prestashop you can set the field and test separators accordingly.
Hope it helps other people.

Why does SSIS TOKEN function fail to count adjacent column delimiters?

I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.

Convert datatypes in Access Insert

Ok here is my problem. I have a csv file that is created out of my control that has a data for different groupings on the same file. The first seven lines are table headers for each group which are different for each group. So first I import this file into Access into a single table. I have since created queries to access the individual groups for data analysis. The problem is that I need to use an expression on one of the fields but since it has to be text in order to import from the spreadsheet because each column contains numbers and characters because of the headers in the top and because sometimes the data is not in the correct column and needs to be massaged. So what I want to do is insert each group into their own table but I want to convert some of the columns to numbers so that my expression will work. I will post the expression that I am having problems with. Thanks.
Sum(IIf([2000 Query].[Field19]=1,IIf([5000 Query].[Field21]>0,-[5000 Query].[Field21],[5000 Query].[Field21]),[5000 Query].[Field21])) AS [ADJ Invoice Total]
CDec:
IIf(CDec([2000 Query].[Field19])=1 ...
It works like so:
?cdec(" 20,121.34 ")
20121.34
So commas and leading and trailing spaces should be okay.
CDec is available in VBA but not in MS Access queries. In queries, Val will work:
IIf(Val([2000 Query].[Field19])=1 ...
Or CDbl, which will accept comma thousand separators and leading and trailing spaces.