How can I continuously read a CSV file in Flink and remove the header - csv

I am working with Flink streaming API and I want to continuously read CSV files from a folder, ignore the header and convert each row in the CSV file into a Java class (POJO). After all this processing, I should obtain a stream of Java objects(POJOs).
So far, I do the following to partially achieve the behavior(code below):
read the CSV files as regular text files, continuously
get a stream of strings from the CSV files
convert the stream of strings to a stream of Java objects
String path = "/home/cosmin/Projects/flink_projects/flink-java-project/data/";
TextInputFormat format = new TextInputFormat(
new org.apache.flink.core.fs.Path(path));
DataStream<String> inputStream = streamEnv.readFile(format, path, FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
DataStream<MyEvent> parsedStream = inputStream
.map((line) -> {
String[] cells = line.split(",");
MyEvent event = new MyEvent(cells[1], cells[2], cells[3]);
return event;
});
However, with this I don't manage to remove the header line in each CSV file.
I have read that I can build a custom connector for reading CSV files by using createInput() or addSource () methods on the StreamExecutionEnvironment class.
Can you help with some guidance on how to achieve this, as I haven't found any examples beyond the Javadoc?

You could chain a filter function before your map function to filter out header lines
inputStream.filter(new FilterFunction<String>() {
public boolean filter(String line) {
if (line.contains("some header identifier")) return false;
else return true;
}
}).map(...) <Your map function as before>

Related

Why does one form file iteration work but the other throws % exception? (working with JSON parse in Google-apps-script)

I was trying to use the method found here (see most up-voted answer):
Google Apps Script Fastest way to find a row?
I currently use this while it does work I wanted to try the above linked method yet when I replace the below code
function AutoPopulate (evalue)
{
//uses google drive file irretator reads in JSON file and parses it to a Javascript object that we can work with
var iter = DriveApp.getFilesByName("units.json");
// iterate through all the files named units.json
while (iter.hasNext()) {
// define a File object variable and set the Media Tyep
var file = iter.next();
var jsonFile = file.getBlob().getDataAsString();
// log the contents of the file
//Logger.log(jsonFile);
}
var UnitDatabase = JSON.parse(jsonFile);
//Logger.log(UnitDatabase);
//Logger.log(UnitDatabase[1027]);
return UnitDatabase[evalue];
}
WITH THIS CODE:
function AutoPopulate (evalue)
{
//this method did not work for me but should have according to stackflow answer linked above I am trying to understand why or how I can find out why it may have thrown an error
var jsonFile = DriveApp.getFilesByName("units.json").next(),
UnitDatabase = UnitDatabase.getBlob().getDataAsString();
return UnitDatabase[evalue];
}
I get an error in the excecution indicating that there is a % at postion 0 in the JSON, between the methods I dont alter the JSON file in anyway so I dont understand why does the top method work but the bottom one does not?
For further information the idea behind the code is that I have a list of Unit numbers and model numbers that are in a spreadsheet. I then convert this to a JSON file, this however is only done when a new unit is added to the fleet. As I learned one can parse a whole JSON file into a javascript object which makes working with the data set much faster. This javascript object is used so that when a user enters a UNIT# the MODEL# is auto populated based on the JSON file.
I cannot share the JSON file as it contains client information.
Your code does not work for two reasons:
You have a typo in the line UnitDatabase = UnitDatabase.getBlob()... - it should be UnitDatabase = jsonFile.getBlob()...
If you want to retrieve a nested object from a json file - you need to parse the JSOn - otherwise it is considered a string and you can not access the nested structure
Modified working code:
function AutoPopulate2 (evalue)
{
var jsonFile = DriveApp.getFilesByName("units.json").next();
var UnitDatabase = JSON.parse(jsonFile.getBlob().getDataAsString());
return UnitDatabase[evalue];
}
Mind that this code will only work if you have a "units.json" file on your drive and if evalue is a valid 1st-level nested object of this json.

Dart How to Access File Map Data

So I have successfully stored a map of keys and values onto a json file. I am now trying to read the json file in in order to convert it into objects but am having trouble being able to get those keys and values back.
Json File:
{"frc1":"1","frc4":"4","frc5":"5","frc6":"6","frc7":"7","frc8":"8","frc9":"9","frc10":"10","frc11":"11","frc13":"13","frc14":"14","frc15":"15","frc16":"16","frc17":"17","frc18":"18","frc19":"19","frc20":"20","frc21":"21","frc22":"22","frc23":"23","frc24":"24","frc25":"25"}
I understand that I need to access the file like below but don't know where to go after.
Future<List<Team>> readTeams(String query) async {
try {
final file = await _localFile;
....
}

Parsing csv data format in apache Camel

I followed an example from a book Camel in action. how to marshal and unmarshal csv data format. However, I want to unmarshal a csv file with (comma seperated delimiter) and split body. Then, I will use content based .choice to distribute messages according to required tasks.
In fact, The first and simple example didn't work for me. I used camel 2.15.6 (camel-core, camel-context, camel-csv, commons-csv) and java 7.
public void configure()
{
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:test?noop=true")
.unmarshal().csv()
.split(body())
.to("file:out");
}
Please find below the stack trace.
Can you try by removing noop=true? Actually, if noop is true, the file is not moved or deleted in any way. This option is good for readonly data, or for ETL type requirements.
Pass csv as a parameter like this:
public void configure()throws Exception
{
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:test?noop=true")
.unmarshal(csv)
.split(body())
.to("file:out");
}
Or it will help you to set contain based routing:I filter according to header of CSV:
//Route 1 for filter CSV based on header
from("file:/home/r2/Desktop/csvFile?noop=true")
.choice().when(body().contains("partyName"))
.to("direct:partyNameCSV")
.when(body().contains("\"stuffName\""))
.to("direct:stuffNameCSV")
.otherwise().endChoice();
//Route 2 partyNameCSV
from("direct:partyNameCSV")
.unmarshal(csv)
.process(new PartyNameCSVProcessor())
.end();
//Route 3 stuffNameCSV
from("direct:stuffNameCSV")
.unmarshal(csv)
.process(new StuffCSVProcessor())
.end();

Append data to existing file in Windows Store 8 using JSON

I have created an application in which I am inserting data to the file. It is working fine. Following is my code:
private async void btnSearch_Click(object sender, RoutedEventArgs e)
{
UserDetails details = new UserDetails
{
Name= TxtName.Text,
Course= TxtCouse.Text,
City=TxtCity.Text
};
string jsonContents = JsonConvert.SerializeObject(details);
StorageFolder localFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync("Storage", CreationCollisionOption.ReplaceExisting); ;
StorageFile textFile = await localFolder.CreateFileAsync("UserDetails.txt", CreationCollisionOption.ReplaceExisting);
using (IRandomAccessStream textStream = await textFile.OpenAsync(FileAccessMode.ReadWrite))
{
// write the JSON string!
using (DataWriter textWriter = new DataWriter(textStream))
{
textWriter.WriteString(jsonContents);
await textWriter.StoreAsync();
}
}
this.Frame.Navigate(typeof(BlankPage1));
}
Now I want that, when a user enter new data the data will append to the same existing file.
Appending data to a JSON text file would mean doing some parsing of the file to find the correct location to insert the text. That is, because JSON is structured with {} delimiters, it's not a simple matter of just appending text to the end of the file.
Given that your data doesn't look that large, the easiest thing to do is to deserialize the JSON from the existing file into memory, add your additional properties to that data structure, and then serialize back to JSON. In that case you probably just want to maintain the structure in memory during the app session, and just overwrite the file with new data whenever you need to. But of course you could also reopen the file, read/parse the JSON into memory, and then rewrite the contents.

how to insert excel data in a database with java

i want to insert data from an excel file into a local database in a UNIX server with java without any manipulation of data.
1- someone told me that i've to convert the excel file extension into .csv to conform with unix. i created a CSV file for each sheet (i've 12) with a macro. the problem is it changed the date format from DD-MM-YYYY to MM-DD-YYYY. how to avoid this?
2- i used LOAD DATA command to insert data from the CSV files to my database. there's a date colonne that is optionnaly specified in the excel file. so in CSV it become ,, so the load data doesn't work (an argument is needed). how can i fix this?
thanks for your help
It should be quite easy to read out the values from Excel with Apache POI. Then you save yourself the extra step of converting to another format and possible problems when your data contains comma and you convert to CSV.
Save the EXCEL file as CSV (comma separated values) format. It will make it easy to read and parse with fairly simple use of StringTokenizer.
Use MySQL (or SQLite depending on your needs) and JDBC to load data into the database.
Here is a CSVEnumeration class I developed:
package com.aepryus.util;
import java.util.*;
public class CSVEnumeration implements Enumeration {
private List<String> tokens = new Vector<String>();
private int index=0;
public CSVEnumeration (String line) {
for (int i=0;i<line.length();i++) {
StringBuffer sb = new StringBuffer();
if (line.charAt(i) != '"') {
while (i < line.length() && line.charAt(i) != ',') {
sb.append(line.charAt(i));
i++;
}
tokens.add(sb.toString());
} else {
i++;
while(line.charAt(i) != '"') {
sb.append(line.charAt(i));
i++;
}
i++;
tokens.add(sb.toString());
}
}
}
// Enumeration =================================================================
public boolean hasMoreElements () {
return index < tokens.size();
}
public Object nextElement () {
return tokens.get(index++);
}
}
If you break the lines of the CSV file up using split and then feed them one by one into the CSVEnumeration class, you can then step through the fields. Or here is some code I have lying around that uses StringTokenizer to parse the lines. csv is a string that contains the entire contents of the file.
StringTokenizer lines = new StringTokenizer(csv,"\n\r");
lines.nextToken();
while (lines.hasMoreElements()) {
String line = lines.nextToken();
Enumeration e = new CSVEnumeration(line);
for (int i=0;e.hasMoreElements();i++) {
String token = (String)e.nextElement();
switch (i) {
case 0:/* do stuff */;break;
}
}
}
I suggest MySQL for its performance and obviously open source.
Here comes two situations:
If you want just to store the excel cell values into the database. You can convert the excel to CSV format, so that you can simply LOAD DATA command in MySQL command.
If you have to do some manipulation before the values to get into the tables, I suggest Apache POI. I've used, that works so fine, whatever you're format of Excel you just have to use the correct implementation.
We are using SQLite in our java application. It's serveless, really simple to use and very efficient.