Adding hyphen in string - ssis

I got some formatting issue in ssis. I have some sets of telephone numbers from text files and a hyphen needs to be added with this format.
ex. 1234567890
formatted: 123-456-7890
Im thinking using substring in expression from derived column task. Hope u can help. Thanks!

public static String setHyphen(String str) {
StringBuilder stringBuilder = new StringBuilder();
char ssnArr[] = str.toCharArray();
for(int i=0;i<ssnArr.length;i++){
if(i == 2 || i == 5){
stringBuilder.append(ssnArr[i] + "-");
}else{
stringBuilder.append(ssnArr[i]);
}
}
return stringBuilder.toString();
}

Related

Best way to clean csv that uses comma delimiter & double quote (") text qualifier using excel. Data errors with extra" makes data go to wrong fields

I have been having issues where my data is in the wrong fields. I have a few large csv files that I have to manually update before loading into QLIK. The csv's have a comma (,) delimiter & double quote (") text qualifier. Some data has extra characters that throw it off and results in numeric numbers in text fields and vice versa. Can someone please advise the best/fastest way to combat this? To remove the unwanted " and save me from manually deleting quotes and pasting to correct fields for hundreds of records. I have created dummy data below.
Please note I am bit limited with the tools I have available to clean the csv. Or could you please advise the best tools/applications needed for this? Just unsure where to start
IN NOTEPAD:
ID,T_No,T_Type,T_Date,T_Name,T_TNo,
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1
IN EXCEL:
[enter image description here][1]
Any assistance is greatly appreciated.
Thank you
[1]: https://i.stack.imgur.com/vyYAT.png
If the issue is just with the T_Name column, you could set the mode to CsvMode.NoEscape, use the ClassMap to get the fields you know you can get without issue and then use some logic to figure out where the T_Name column ends and the T_TNo column starts. There is a lot that could break in this code, depending on what the rest of the data looks like, but it should at least give you some ideas.
void Main()
{
var text = new StringBuilder();
text.AppendLine("ID,T_No,T_Type,T_Date,T_Name,T_TNo,");
text.AppendLine("2,256,House,30/05/2021,Airport,75.1,");
text.AppendLine("3,268,Hotel,31/05/2021,Hotel Antel\"\"\",76.1");
text.AppendLine("4,269,House,31/05/2021,Bank of USA,\"LA Branch\"\"\"\"\",77.1");
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Mode = CsvMode.NoEscape
};
using (var reader = new StringReader(text.ToString()))
using (var csv = new CsvReader(reader, config))
{
var options = new TypeConverterOptions { Formats = new[] { "dd/MM/yyyy" } };
csv.Context.TypeConverterOptionsCache.AddOptions<DateTime>(options);
csv.Context.RegisterClassMap<MyClassMap>();
var records = new List<MyClass>();
csv.Read();
csv.ReadHeader();
while (csv.Read())
{
var record = csv.GetRecord<MyClass>();
var name = string.Empty;
int i = 4;
var finished = false;
while (!finished)
{
var field = csv.GetField(i);
if (i == 4)
{
record.Name = field.Replace("\"", "");
i++;
continue;
}
var isNumber = float.TryParse(field, out var number);
if (!isNumber)
{
record.Name += ", " + field.Replace("\"", "");
i++;
continue;
}
record.TNumber = number;
finished = true;
}
records.Add(record);
}
records.Dump();
}
}
public class MyClassMap : ClassMap<MyClass>
{
public MyClassMap()
{
Map(x => x.Id).Name("ID");
Map(x => x.Number).Name("T_No");
Map(x => x.Type).Name("T_Type");
Map(x => x.Date).Name("T_Date");
}
}
public class MyClass
{
public int Id { get; set; }
public int Number { get; set; }
public string Type { get; set; }
public DateTime Date { get; set; }
public string Name { get; set; }
public float TNumber { get; set; }
}
If you have access to C# (there is a free version) you could process the file and fix the bad records. I would do that by figuring out first if there is an issue, and if there is then. Figure out where the name field starts and ends and fix the issues with the quotes.
This would be a good starting point:
private void UpdateCsv()
{
var lines = System.IO.File.ReadAllLines("your file");
var updatedLines = new List<string>();
foreach (var line in lines)
{
//fixes issue with your first example
var newLine = line.TrimEnd(',');
var fixedString = "";
if (newLine.Split(",").Length == 6) //indicates there are no issues
{
fixedString = newLine;
}
else
{
//get the start of the name field
var startName = IndexOfOccurence(newLine, ",", 4) + 1;
//get the end of the name field
var endName = newLine.LastIndexOf(',') + 1;
//populate a new string to hold the fixed data
fixedString = newLine.Substring(0, startName);
//parse the name field based on start and end
var name = newLine.Substring(startName, endName - startName - 1);
//get rid of starting and ending quotes
name = name.TrimStart('"').TrimEnd('"');
//what to do with quote in middle of string? escape or remove your choice uncomment your choice
//name = name.Replace('"', ' '); //to remove
//name = name.Replace("\"", "\"\""); //to escape
//if name contains comma or quote then add quote, else not needed
if (name.Contains(',') || name.Contains('"'))
{
fixedString += "\"" + name + "\"" + newLine.Substring(endName - 1);
}
else
{
fixedString += name + newLine.Substring(endName - 1);
}
}
updatedLines.Add(fixedString);
}
//write out the updated data
System.IO.File.WriteAllLines("your file", updatedLines);
}
private int IndexOfOccurence(string s, string match, int occurence)
{
int i = 1;
int index = 0;
while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
{
if (i == occurence)
return index;
i++;
}
return -1;
}

JSONObject and Streams/Lambda

I'm trying to get more familiar with Java lambda, can do some streams and such but still a lot to learn.
Got this simple code using JSONObject and JSONArray (org.json.simple with this exact library and no other because Gson is too easy :P) is there a way to simplify the code with java lambda/streams? (I tried with no luck)
JSONArray jsonArray = (JSONArray) jsonObject.get("someData");
Iterator<JSONObject> iterator = jsonArray.iterator();
double total = 0;
while(iterator.hasNext()) {
JSONObject iteratedJson = iterator.next();
// iteratedJson.get("ip") = "101.99.99.101" example values
String ip = (String) iteratedJson.get("ip");
// Need only first octet
ip = ip.substring(0, ip.indexOf("."));
if (Integer.valueOf(ip) >= 1 && Integer.valueOf(ip) <= 100) {
// Another object inside the array object
JSONObject locationObject = (JSONObject) iteratedJson.get("location");
// Id is int but JSONObject don't let me parse int...
long locationId = (Long) locationObject.get("id");
if (locationId == 8) {
// iteratedJson.get("amount") = "$1,999.10" example values
Number number = NumberFormat.getCurrencyInstance(Locale.US).parse((String)iteratedJson.get("amount"));
// Don't need a lot of precission
total = total + number.doubleValue();
}
}
}
You can do like this:
first of all to extract data from JsonObject I've created a class. this class takes a JosonObject as an argument and extract its values as bellow.
class ExtractData {
Integer ip;
long id;
double amount;
public ExtractData(JSONObject jsonObject) {
this.ip = Integer.valueOf(jsonObject.get("ip").toString().split("\\.")[0]);
this.id = Long.parseLong(((JSONObject) jsonObject.get("location")).get("id").toString());
try {
this.amount = NumberFormat.getCurrencyInstance(Locale.US)
.parse((String) jsonObject.get("amount")).doubleValue();
} catch (ParseException e) {
this.amount = 0d;
}
}
// getter&setter
}
then you can use stream API to calculate the sum of the amount property.
jsonArray.stream()
.map(obj -> new ExtractData((JSONObject) obj))
.filter(predicate)
.mapToDouble(value -> ((ExtractData) value).getAmount())
.sum();
for simplifying I've extracted filter operation.
Predicate<ExtractData> predicate = extractData ->
extractData.getIp()>=1 && extractData.getIp()<=100 && extractData.getId() == 8;

Monospaced html formatting jlabel

I am creating an output window for a program dealing with matrices. It is supposed to print out the preformed command along with a formatted version of the matrix. But I am having problems with the alignment. I know the String.format works because i have a toString() method that works correctly.
Notice how the second and third rows are not correctly spaced. This is because the 100.00 has completely filled the formatted string where as the 0.00's need extra spaces to fill the string(see toHtml()). I believe this has something to do with the way that the HTML is being displayed but im not sure. My guess is that the spaces behind the zeros are not being displayed properly or are being combined.
Here are the methods involved.
public String toHtml(int dec)
{
String[] colors = {"#C0C0C0","#FFFFFF"};
String f = "%-"+(getLongestValue()+dec+1)+"."+dec+"f";
String res = "";
for(int r = 0;r<rows;r++)
{
for(int c = 0;c<columns;c++)
{
res += "<span style=\"background-color:"+colors[(r+c)%2]+";\">"+String.format(f, contents[r][c])+"</span>";
}
res += "<p>";
}
return res;
}
which creates the HTML text to be displayed. The method getLongestValue() returns the largest length of any number before its decimal place in the array 'contents'.
and
newOutput("New Matrix ["+name+"]<br>"+m.toHtml());
public void newOutput(String s)
{
JLabel l = new JLabel("<html>"+s+"<br></html>");
l.setFont(new Font("Monospaced",1,18));
jPanel1.add(l);
}
which adds the label to the output window
Also, here is the toString() method for reference
public String toString()
{
String f = "%-"+(getLongestValue()+3)+".2f ";
String res = "";
for(int r = 0;r<rows;r++)
{
for(int c = 0;c<columns;c++)
{
res += String.format(f, contents[r][c]);
}
res += "\n";
}
return res;
}
output of the Matrix through toString()
toString Output
A more extreme version
In this case the program should have found that the largest values were -15 or -20 and set the size of the format length to 6( 3 for the length, 2 for the decimal places and 1 for the decimal) but instead it doesnt appear that any of the values, besides the two I mentioned, are following the format.
Here is the output of toString() for the previous example
toString() output
This fixes it, the spaces arent being represented correctly as monospaced
public String toHtml(int dec)
{
String[] colors = {"#C0C0C0","#FFFFFF"};
String f = "%-"+(getLongestValue()+dec+2)+"."+dec+"f";
String res = "";
for(int r = 0;r<rows;r++)
{
for(int c = 0;c<columns;c++)
{
res += "<span style=\"background-color:"+colors[(r+c)%2]+";\">"+
String.format(f, contents[r][c]).replaceAll("\\s", ((char)160)+"")+"</span>";
}
res += "<p>";
}
return res+"";
}

Lazy loading with JPA criterea

I implemented a generic solution for using lazy loading primefaces datatables using JPA Criterea.
However I am still having some doubts with thie implemented solution whenever we deal with several Joins (say for example an entity User that has relation with other entities like Account, Address, Department.....in addition to raw type properties like: String username, Date birthdate...etc).
I tested this solution but I am having some delays while loading huge number of data (however the solution is supposed to load only a limited number of rows specified by PageSize coming from datatable), so:
How to improve the performance of this solution?
How to be sure the number of loaded data is the one specified in the Pagesize?
Can you check the count() method and tell if it counts the number of result rows without loading all the data?
And most importantly how to use this solution in order to be generic with filters coming from Search forms (I mean how to use this sae generic method and give search critereas from a search form with multi search fields)?
Please I need your answer on the above mentioned questions especially the last one.
Here is the code:
public <T extends Object> List<T> search(Class<T> type, int first, int pageSize, String sortField, SortOrder sortOrder, Map<String, String> filters){
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<T> q = cb.createQuery(type);
Root<T> root=q.from(type);
q.select(root);
//Sorting
if (sortField != null && !sortField.isEmpty()) {
String[] sortingField = sortField.split("\\.", 2);
Path path = sortingField.length == 1 ? root.get(sortingField[0]): root.join(sortingField[0]).get(sortingField[1]);
if (sortOrder.equals(SortOrder.ASCENDING)) {
q.orderBy(cb.asc(path));
} else if (sortOrder.equals(SortOrder.DESCENDING)) {
q.orderBy(cb.desc(path));
}
}
// Filtering
Predicate filterCondition = cb.conjunction();
String wildCard = "%";
for (Map.Entry<String, String> filter : filters.entrySet()) {
String[] filterField = filter.getKey().split("\\.", 2);
Path path = filterField.length == 1 ? root.get(filterField[0]): root.join(filterField[0]).get(filterField[1]);
filterCondition = cb.and(filterCondition, filter.getValue().matches("[0-9]+")
? cb.equal(path, Long.valueOf(filter.getValue()))
: cb.like(path, wildCard + filter.getValue() + wildCard));
}q.where(filterCondition);
//Pagination
TypedQuery<T> s = entityManager.createQuery(q);
if (pageSize >= 0){
s.setMaxResults(pageSize);
}
if (first >= 0){
s.setFirstResult(first);
}
log.info("\n\n\n");
log.info("XXXXXXXXXxX");
log.info("=> CommonRepository - Total number of rows returned: ");
log.info("XXXXXXXXXXX");
log.info("\n\n\n");
return s.getResultList();
}
public <T extends Object> int count(Class<T> type, Map<String, String> filters){
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Long> cq = cb.createQuery(Long.class);
Root<T> root=cq.from(type);
// Filtering
Predicate filterCondition = cb.conjunction();
String wildCard = "%";
for (Map.Entry<String, String> filter : filters.entrySet()) {
String[] filterField = filter.getKey().split("\\.", 2);
Path path = filterField.length == 1 ? root.get(filterField[0]): root.join(filterField[0]).get(filterField[1]);
filterCondition = cb.and(filterCondition, filter.getValue().matches("[0-9]+")
? cb.equal(path, Long.valueOf(filter.getValue()))
: cb.like(path, wildCard + filter.getValue() + wildCard));
}cq.where(filterCondition);
cq.select(cb.count(root));
return entityManager.createQuery(cq).getSingleResult().intValue();
}

xml to csv using hadoop

guys i am trying in convert my xml file to csv using hadoop so i am using the following code in mapper class
protected void map(LongWritable key, Text value,
#SuppressWarnings("rawtypes") Mapper.Context context)
throws
IOException, InterruptedException {
String document = value.toString();
System.out.println("‘" + document + "‘");
try {
XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(new
ByteArrayInputStream(document.getBytes()));
String propertyName = "";
String propertyValue = "";
String currentElement = "";
while (reader.hasNext()) {
int code = reader.next();
switch (code) {
case XMLStreamConstants.START_ELEMENT: //START_ELEMENT:
currentElement = reader.getLocalName();
break;
case XMLStreamConstants.CHARACTERS: //CHARACTERS:
if (currentElement.equalsIgnoreCase("author")) {
propertyName += reader.getText();
} else if (currentElement.equalsIgnoreCase("price"))
{
String name=reader.getText();
name.trim();
propertyName += name;
propertyName.trim();
}
}
console.write(null,new Text(propertyName));
}
}
but the output i am getting is in this form
Gambardella, Matthew
XML Developer's Guide
44.95
2000-10-01
Ralls, Kim
Midnight Rain
5.95
2000-12-16
can u help me with this
The output of the program depends on how you are collecting/writing from mapper.
In this case you should be using TextOutputFormat & KeyOut will be NullWritable and ValueOut will be Text. The Value out should be a concatenation of the values which you extracted from CSV.
From your code it looks like you are writing output after reading each value from the XML.