Mule - how to pad rows in a CSV file with extra delimiters - csv

I have a CSV file coming into my Mule application that looks as follows:
1,Item,Item,Item
2,Field,Field,Field,Field
2,Field,Field,Field,Field
3,Text
Is there a way I can transform this file in Mule to something like the below:
1,Item,Item,Item,,,,,,
2,Field,Field,Field,Field,,,,,
2,Field,Field,Field,Field,,,,,
3,Text,,,,,,,,
Essentially, what I need to do here is append a string (containing x occurrences of a delimiter) to the end of each row. The number of delimiters I need to append to each row can be determined by the first character of that row e.g. if row[0]='1' then (append ",,,,,,") else if row[0]='2' then (append ",,,,,"etc.
The reason I have this rather annoying problem is because the system providing the input to my Mule application produces a file where the number of columns in each row may vary. I'm trying to pad this file so that the number of columns in each row is equal, so that I can pass it on to a Java transformer like the one explained here that is using FlatPack (which expects x columns, it won't accept a file that has a varying number of columns in each row).
Does anyone have any ideas on how I could approach this? Thanks in advance.
UPDATE
Based on #Learner's recommendation and #EdC's answer - I've achieved this in Mule using the below:
<flow name="testFlow1" doc:name="testFlow1">
<file:inbound-endpoint .../>
<file:file-to-string-transformer doc:name="File to String"/>
<component doc:name="Java" class="package.etc.etc.MalformData"/>
</flow>

Try this based on #Learner 's answer.
import org.mule.api.MuleEventContext;
import org.mule.api.MuleMessage;
public class MalformData implements org.mule.api.lifecycle.Callable {
private String[] strTempArray;
public String findMalformRow(String strInput) {
String strOutput = "";
String strFix = "";
int intArrayLength = 0;
char charFirst = ' ';
String strControl = "";
strTempArray = strInput.split("\\n");
intArrayLength = strTempArray.length;
for (int i = 0; i < intArrayLength; i++) {
charFirst = strTempArray[i].charAt(0);
strFix = strTempArray[i];
String missingDelimiter = "";
if (charFirst == '1') {
missingDelimiter = ",,,,,,";
strFix += missingDelimiter;
} else if (charFirst == '2') {
missingDelimiter = ",,,,,";
strFix += missingDelimiter;
} else if (charFirst == '3') {
missingDelimiter = ",,,,,,,,";
strFix += missingDelimiter;
} else {
strFix = "Good";
}
if (strControl != "Good") {
strTempArray[i] = strFix;
} else {
charFirst = ' ';
strFix = "";
}
}
for(int i=0; i < intArrayLength; i++){
strOutput += strTempArray[i] + "\n";
}
return strOutput;
}
#Override
public Object onCall(MuleEventContext eventContext) throws Exception {
MuleMessage message = eventContext.getMessage();
String newPayload = this.findMalformRow(message.getPayloadAsString());
message.setPayload(newPayload);
return message;
}
}

You could write a custom java component that reads your CSV file line by line, write to another file and add , based on your logic at the end of each line

Related

How to open csv file which contains special characters in one of the fields in csv?

hi I am working on a xamarin.forms app, While trying to open one of the csv file the following exception is displayed "input string is not in a correct format " the csv file contains a field called item name which consists the following names ET Door,E459-2,H 91 Ft and Key,Door so these both items contain comma so I am not able to open the csv file which consists of these two elements as they contain special characters like comma and underscore .Here is my code to read and open csv file ,please check the code and let me know what changes do i need to make so the file with items consisting of special characters also open ?
public async void OnProcess(object o, EventArgs args)
{
if (!string.IsNullOrWhiteSpace(csv_file.Text))
{
// _database.AddFiles();
if (App.Current.MainPage is NavigationPage)
{
try
{
List<ItemsCSV> items = new List<ItemsCSV>();
string[] lines = File.ReadAllLines(string.Format(#"{0}", this.file.FilePath));
if (lines != null)
{
for (int x = 1; x < lines.Length; x++)
{
string data = lines[x];
string[] item = data.Split(',');
// ItemsCSV itemsCSV = new ItemsCSV();
_itemsCSV = new ItemsCSV();
{
_itemsCSV.Cycle_Count = string.IsNullOrEmpty(item.ElementAtOrDefault(0)) ? 0 : Convert.ToInt32(item[0]);
_itemsCSV.Line_Number = string.IsNullOrEmpty(item.ElementAtOrDefault(1)) ? 0 : Convert.ToInt32(item[1]);
_itemsCSV.Item_Number = item.ElementAtOrDefault(2);
_itemsCSV.Name = item.ElementAtOrDefault(3);
_itemsCSV.Warehouse = item.ElementAtOrDefault(4);
_itemsCSV.Aisle = item.ElementAtOrDefault(5);
_itemsCSV.Bin = item.ElementAtOrDefault(6);
_itemsCSV.Level = item.ElementAtOrDefault(7);
_itemsCSV.Order_Qty = string.IsNullOrEmpty(item.ElementAtOrDefault(8)) ? 0 : Convert.ToInt32(item[8]);
_itemsCSV.Order_UOM = item.ElementAtOrDefault(9);
_itemsCSV.Consumption_Qty = string.IsNullOrEmpty(item.ElementAtOrDefault(10)) ? 0 : Convert.ToInt32(item[10]);
_itemsCSV.Consumption_UOM = item.ElementAtOrDefault(11);
_itemsCSV.Status = "";
};
items.Add(_itemsCSV);
_database.AddItems(_itemsCSV);
}
var result = await DisplayAlert("", "CSV has been processed, please do cycle count", "OK", "Cancel");
if(result == true)
{
var cyclecountPage = new CycleCountPage(items, 0, "MainPage",this.file.FilePath);
await (App.Current.MainPage as NavigationPage).PushAsync(cyclecountPage);
}
else
{
}
}
else
{
await DisplayAlert("Alert", "File is empty", "OK");
}
}
catch (Exception e)
{
await DisplayAlert("Exception", e.Message, "OK");
}
}
}
else
{
await DisplayAlert("Alert", "File name is mandatory", "OK");
}
}

Can we search or filter " data-tag='to-do' " in onenote API ? If yes then how we can do this?

How can we use OneNote tags (like data-tag='to-do') with search or filter in OneNote API. I tried using provide operators but found no success.
I tried in this way --
$url = "https://www.onenote.com/api/v1.0/me/notes";
//$url .= "/pages?search=hello";
$url .= "/pages?filter=data-tag eq 'to-do'";
I want to search data-tag and then extract the data from OneNote pages which contains the data-tag='to-do'.
Any help is appreciated and thanks in advance.
You'll have to run through all your pages.
For each pages, you can retrieve its content with a GET call to https://www.onenote.com/api/v1.0/me/notes/pages/%s/content?includeIds=true
From there you get a string that you can parse.
I'll advise you to use jsoup.
With jsoup you can then write (assuming contentcontains your page's content):
Document doc = Jsoup.parse(content);
Elements todos=doc.select("[data-tag^=\"to-do\"]");
for(Element todo:todos) {
System.out.println(todo.ownText());
}
Sadly OneNote API doesn't support it yet, so I've written my custom parser which extracts notes with data-tags from page content. Here it is:
public class OneNoteParser
{
static public List<Note> ExtractTaggedNotes(string pageContent, string tag = "*")
{
List<Note> allNotes = new List<Note>();
string[] dataTagString = { "data-tag=\""};
string[] dirtyNotes = pageContent.Split(dataTagString, StringSplitOptions.RemoveEmptyEntries);
//First one in this array can be dropped as it doesn't contain todo
for (int i = 1; i < dirtyNotes.Length; i )
{
string curStr = dirtyNotes[i];
Note curNote = new Note();
// Firstly we need to extract all the tags from it (sample html: data-tag="to-do:completed,important" ....)
string allTags = curStr.Substring(0,curStr.IndexOf("\""));
curNote.Tags = new List<string>(allTags.Split(','));
// Now we have to jump to the next ">" symbol and start finding the text after it
curStr = curStr.Substring(curStr.IndexOf(">"));
int depth = 1;
bool addAllowed = false;
for (int j = 0; j < curStr.Length - 1; j )
{
// Finding next tag opener "<" symbol
if (curStr[j] == '<')
{
addAllowed = false;
// Checking if it is not "</" closer
if (curStr[j 1] == '/')
{
// Means this is a tag closer. Decreasing depth
depth--;
}
else
{
// Means this is an tag opener. Increasing depth
depth ;
}
}
else if (curStr[j] == '>')
{
addAllowed = true;
if (j > 0 && curStr[j - 1] == '/')
{
// Means this is a tag closer. Decreasing depth
depth--;
}
}
else
{
if (depth < 1)
{
// Found end of the tag. Saving index and exiting for loop
break;
}
if (addAllowed)
curNote.Text = curStr[j]; // Appending letter to string
}
}
// Filtering by tag and adding to final list
if (tag == "*" || curNote.Tags.Any(str => str.Contains(tag)))//curNote.Tags.Contains(tag, StringComparer.CurrentCultureIgnoreCase))
allNotes.Add(curNote);
}
return allNotes;
}
}
And here is the class Note
public class Note
{
public string Text;
public List<string> Tags;
public Note()
{
Tags = new List<string>();
}
}
To extract todo-s simply call this function:
OneNoteParser.ExtractTaggedNotes(pageContent, "to-do");
Also you can extract other tags like this:
OneNoteParser.ExtractTaggedNotes(pageContent, "important");
OneNoteParser.ExtractTaggedNotes(pageContent, "highlight");
//...

Handling Inconsistent Delimiters in Flat File Source on ForeachLoop Container

I'm trying to handle inconsistent delimiters in 'n Flat File Source contained in a Data Flow Task running in a Foreach Loop container in SSIS.
I have several files in a folder with varying names but with one consistent identifier e.g.
File23998723.txt
File39872397.txt
File29387234.txt etc., etc.
These files, as a standard should be tab delimited, but every so often a user missed cleaning up a file and it will be delimited with a , or a ; etc., which causes the package import to fail.
Is there an easy approach for me to follow to dynamically change the delimiter or to test for the delimiter beforehand?
I managed to handle it with a script task, thanks!
Basically added a script task to the Foreach Loop Container that executes before my DataFlow task.
I send the file name through as a variable:
I added the following namespaces to the script:
using System.IO;
using RuntimeWrapper = Microsoft.SqlServer.Dts.Runtime.Wrapper;
And my script looks like this:
public void Main()
{
if (!string.IsNullOrEmpty(Dts.Variables["sFileName"].Value.ToString()))
{
StreamReader file = new StreamReader(Dts.Variables["sFileName"].Value.ToString());
if (file != null)
{
string HeadRowDelimiter = "";
string ColDelimiter = "";
string data = "";
while (file.Peek() >= -1)
{
char[] c = new char[500];
file.Read(c, 0, c.Length);
data = string.Join("", c);
if (!string.IsNullOrEmpty(data))
{
//set row delimiters
if (data.Contains("\r\n"))
{
HeadRowDelimiter = "\r\n";
}
else if (data.Contains("\r"))
{
HeadRowDelimiter = "\r";
}
else if (data.Contains("\n"))
{
HeadRowDelimiter = "\n";
}
else if (data.Contains("\0"))
{
HeadRowDelimiter = "\0";
}
//set column delimiters
if (data.Contains("\t"))
{
ColDelimiter = "\t";
}
else if (data.Contains(";"))
{
ColDelimiter = ";";
}
else if (data.Contains(","))
{
ColDelimiter = ",";
}
else if (data.Contains(":"))
{
ColDelimiter = ":";
}
else if (data.Contains("|"))
{
ColDelimiter = "|";
}
else if (data.Contains("\0"))
{
ColDelimiter = "\0";
}
}
break;
}
file.Close();
RuntimeWrapper.IDTSConnectionManagerFlatFile100 flatFileConnection = Dts.Connections["FlatFileConnection"].InnerObject as RuntimeWrapper.IDTSConnectionManagerFlatFile100;
if (flatFileConnection != null)
{
flatFileConnection.HeaderRowDelimiter = HeadRowDelimiter;
flatFileConnection.RowDelimiter = HeadRowDelimiter;
flatFileConnection.HeaderRowsToSkip = 0;
flatFileConnection.Columns[0].ColumnDelimiter = ColDelimiter;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
}
}

Parse a string value using mysql

I have a value in a column in this manner
"id=Clarizen,ou=GROUP,dc=opensso,dc=java,dc=net|id=devendrat,ou=USER,dc=opensso,dc=java,dc=net"
I want to extract group name and user name from this string and will store it into separate columns of another table.
Desired result:
Clarizen as Groupname
devendrat as Username
Please help
You are looking for CharIndex and Substring option.
The following works for T-SQL. I am not sure about the Syntax in My SQL
SELECT REPLACE(SUBSTRING(ColumnName,1,CHARINDEX(',',ColumnName) - 1),'ID=','')
AS Groupname,
REPLACE(SUBSTRING(SUBSTRING(ColumnName,CHARINDEX('|',ColumnName),
LEN(ColumnName)),1,
CHARINDEX(',',ColumnName) - 1),'|ID=','') AS Username
(sorry this is C# I overlooked that you are using mysql, so the answer is useless to you but I'll leave it here unless someone is to remove it)
Using string split can get the job done, here is something that I whipped together, it won't be optimal but it definately works!
string parse_me = "id=Clarizen,ou=GROUP,dc=opensso,dc=java,dc=net|id=devendrat,ou=USER,dc=opensso,dc=java,dc=net";
string[] lines = parse_me.Split(',');
List<string> variables = new List<string>();
List<string> values = new List<string>();
foreach (string line in lines)
{
string[] pair = line.Split('=');
//Console.WriteLine(line);
variables.Add(pair[0]);
values.Add(pair[1]);
}
string group = "";
string user = "";
if (variables.Count == values.Count)
{
for (int i = 0; i < variables.Count; ++i )
{
Console.Write(variables[i]);
Console.Write(" : ");
Console.WriteLine(values[i]);
if (variables[i] == "ou")
{
if (group == "")
{
group = values[i];
}
else if (user == "")
{
user = values[i];
}
}
}
}
Console.WriteLine("Group is: " + group);
Console.WriteLine("User is: " + user);
Console.ReadLine();

Groovy Split CSV

I have a csv file (details.csv) like
ID,NAME,ADDRESS
1,"{foo,bar}","{123,mainst,ny}"
2,"{abc,def}","{124,mainst,Va}"
3,"{pqr,xyz}","{125,mainst,IL}"
when I use (Note: I have other closure above this which reads all csv files from directory)
if(file.getName().equalsIgnoreCase("deatails.csv")) {
input = new FileInputStream(file)
reader = new BufferedReader(new InputStreamReader(input))
reader.eachLine{line-> def cols = line.split(",")
println cols.size() }
Instead of getting size 3 I am getting 6 with values
1
"{foo
bar}"
"{123
mainst
ny}"
spilt(",") is splitting data by comma(,) but I want my results as
1
"{foo,bar}"
"{123,mainst,ny}"
How can I fix this closure. Please help! Thanks
Writing a csv parser is a tricky business.
I would let someone else do the hard work, and use something like GroovyCsv
Here is how to parse it with GroovyCsv
// I'm using Grab instead of just adding the jar and its
// dependencies to the classpath
#Grab( 'com.xlson.groovycsv:groovycsv:1.0' )
import com.xlson.groovycsv.CsvParser
def csv = '''ID,NAME,ADDRESS
1,"{foo,bar}","{123,mainst,ny}"
2,"{abc,def}","{124,mainst,Va}"
3,"{pqr,xyz}","{125,mainst,IL}"'''
def csva = CsvParser.parseCsv( csv )
csva.each {
println it
}
Which prints:
ID: 1, NAME: {foo,bar}, ADDRESS: {123,mainst,ny}
ID: 2, NAME: {abc,def}, ADDRESS: {124,mainst,Va}
ID: 3, NAME: {pqr,xyz}, ADDRESS: {125,mainst,IL}
So, to get the NAME field of the second row, you could do:
def csvb = CsvParser.parseCsv( csv )
println csvb[ 1 ].NAME
Which prints
{abc,def}
Of course, if the CSV is a File, you can do:
def csvc = new File( 'path/to/csv' ).withReader {
CsvParser.parseCsv( it )
}
Then use it as above
There are two ways of doing.
One is using collect
def processCsvData(Map csvDataMap, File file)
{
InputStream inputFile = new FileInputStream(file);
String[] lines = inputFile.text.split('\n')
List<String[]> rows = lines.collect {it.split(',')}
// Add processing logic
}
Here problem is it is removing commas in between braces ({}) i.e "{foo,bar}" becomes "{foo bar}"
Another way of using java, and this works just fine
public class CSVParser {
/*
* This Pattern will match on either quoted text or text between commas, including
* whitespace, and accounting for beginning and end of line.
*/
private final Pattern csvPattern = Pattern.compile("\"([^\"]*)\"|(?<=,|^)([^,]*)(?:,|$)");
private ArrayList<String> allMatches = null;
private Matcher matcher = null;
private int size;
public CSVParser() {
allMatches = new ArrayList<String>();
matcher = null;
}
public String[] parse(String csvLine) {
matcher = csvPattern.matcher(csvLine);
allMatches.clear();
String match;
while (matcher.find()) {
match = matcher.group(1);
if (match!=null) {
allMatches.add(match);
}
else {
allMatches.add(matcher.group(2));
}
}
size = allMatches.size();
if (size > 0) {
return allMatches.toArray(new String[size]);
}
else {
return new String[0];
}
}
}
Hope this helps!