Weka - issue with line X ... coverting csv to ARFF

Weka - issue with line X ... coverting csv to ARFF - csv

I am currently trying to covert a csv file of information to an ARFF file in Weka...
The issue pops up that there is a problem with line 3384... but there is nothing that i can see that is wrong with the line?
Image of excel file here
Please can someone help?
Thanks.

This problem often pops up when there are illegal characters in the file to be converted. You can double check for such characters. You can also use the code below to do the conversion from csv to arff in java.
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import java.io.File;
public class CsvtoArff {
public static void main(String[] args) throws Exception {
String args0="/Users/Kehinde/Documents/trainingtest.csv";
String args1="/Users/Kehinde/Documents/theoutput.arff";
// This is used to load CSV
CSVLoader myloader = new CSVLoader();
myloader.setSource(new File(args0));
Instances mydata = myloader.getDataSet();
System.out.println(mydata);
// This is used to save ARFF
ArffSaver mysaver = new ArffSaver();
mysaver.setInstances(mydata);
mysaver.setFile(new File(args1));
mysaver.setDestination(new File(args1));
mysaver.writeBatch();
}
}

Related

USACO Code Submission Problem - Output File Missing

I'm practicing some USACO past released problems but whenever I submit my code for grading I receive the error:
Your output file (FILENAME.out):
[File missing!]
I tested every problem using this simple code, but still receive the same error:
import java.util.*;
import java.io.*;
public class Test
{
public static void main (String [] args) throws IOException
{
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(FILENAME)));
out.println("Hello world.");
out.close();
System.exit(0);
}
}
Why would this code not create an output file?

The USACO grading system has the output file already made in the same directory as your java solution, so all you need to do is just write to it.
In your line
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(FILENAME)));
you should change this to
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(FILENAME.out)));
since this is the name of the file. This does not create an actual file, but just writes to the existing one on the USACO grading system.

Flink Data Stream CSV Writer not writing data to CSV file

I am new to apache flink and trying to learn data streams. I am reading student data which has 3 columns(Name,Subject and Marks) from a csv file. I have applied filter on marks and only selecting those records where marks >40.
I am trying to write this data to csv file but program runs successfully and csv file remains empty. No data gets written to csv file.
I tried with different syntax for writing csv file but none of them worked for me. I am running this locally through eclipse. Write to text file works fine.
DataStream<String> text = env.readFile(format, params.get("input"),
FileProcessingMode.PROCESS_CONTINUOUSLY,100);
DataStream<String> filtered = text.filter(new FilterFunction<String>(){
public boolean filter(String value) {
String[] tokens = value.split(",");
return Integer.parseInt(tokens[2]) >= 40;
}
});
filtered.writeAsText("testFilter",WriteMode.OVERWRITE);
DataStream<Tuple2<String, Integer>> tokenized = filtered
.map(new MapFunction<String, Tuple2<String, Integer>>(){
public Tuple2<String, Integer> map(String value) throws Exception {
return new Tuple2("Test", Integer.valueOf(1));
}
});
tokenized.print();
tokenized.writeAsCsv("file:///home/Test/Desktop/output.csv",
WriteMode.OVERWRITE, "/n", ",");
try {
env.execute();
} catch (Exception e1) {
e1.printStackTrace();
}
}
}
Below is my input CSV format:
Name1,Subj1,30
Name1,Subj2,40
Name1,Subj3,40
Name1,Subj4,40
Tokenized.print() prints all correct records.

I did a little experimenting, and found that this job works just fine:
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class WriteCSV {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.fromElements(new Tuple2<>("abc", 1), new Tuple2<>("def", 2))
.writeAsCsv("file:///tmp/test.csv", FileSystem.WriteMode.OVERWRITE, "\n", ",");
env.execute();
}
}
If I don't set the parallelism to 1, then the results are different. In that case, test.csv is a directory containing four files, each written by one of the four parallel subtasks.
I'm not sure what's wrong in your case, but maybe you can work backwards from this example (assuming it works for you).

You should remove tokenized.print(); before tokenized.writeAsCsv();.
It will consume the data the print();.

JSON to CSV conversion on HDFS

I am trying to convert a JSON file into CSV.
I have a JAVA code which is able to do it perfectly on UNIX file system and on local file system.
I have written below main class to perform this conversion on HDFS.
public class ClassMain {
public static void main(String[] args) throws IOException {
String uri = args[1];
String uri1 = args[2];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
FSDataOutputStream out = fs.create(new Path(uri1));
try{
in = fs.open(new Path(uri));
JsonToCSV toCSV = new JsonToCSV(uri);
toCSV.json2Sheet().write2csv(uri1);
IOUtils.copyBytes(in, out, 4096, false);
}
finally{
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
}
}
json2sheet and write2csv are methods which perform the conversion and write operation.
I am running this jar using below command:
hadoop jar json-csv-hdfs.jar com.nishant.ClassMain /nishant/large.json /nishant/output
The problem is, it does not write anything at /nishant/output. It creates a 0 sized /nishant/output file.
Maybe the usage of copyBytes is not a good idea here.
How to achieve this on HDFS if it is working OK on unix FS and local FS.
Here I am trying to convert JSON file to CSV and not trying to map JSON objects to their values

FileSystem needs only one configuration key to successfully connect to HDFS.
conf.set(key, "hdfs://host:port"); // where key="fs.default.name"|"fs.defaultFS"

How to read a CSV file from Hdfs?

I have my Data in a CSV file. I want to read the CSV file which is in HDFS.
Can anyone help me with the code??
I'm new to hadoop. Thanks in Advance.

The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this :
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
System.out.println(inputStream.readChar());
}
FSDataInputStream has several read methods. Choose the one which suits your needs.
If it is MR, it's even easier :
public static class YourMapper extends
Mapper<LongWritable, Text, Your_Wish, Your_Wish> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//Framework does the reading for you...
String line = value.toString(); //line contains one line of your csv file.
//do your processing here
....................
....................
context.write(Your_Wish, Your_Wish);
}
}
}

If you want to use mapreduce you can use TextInputFormat to read line by line and parse each line in mapper's map function.
Other option is to develop (or find developed) CSV input format for reading data from file.
There is one old tutorial here http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html but logic is same in new versions
If you are using single process for reading data from file it is same as reading file from any other file system. There is nice example here https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
HTH

JavaFX 2. Loading external CSV into a TableView

Im pretty new to Java and Im searching the Internet for a simple way to load an external csv into JavaFX TableView.
I was able to parse the CSV into an array but I dont know how I have to handle it now. Then I was playing with the DataFX library. But again wasnt able to pass the parsed csv into my table.
I think I dont really understand ObservableLists here which I believe is kind of necessary? Do you know a good tutorial or could you explain what the next steps would be after parsing the file?
thx
Edit: That's what I did
import javafx.application.Application;
import javafx.scene.SceneBuilder;
import javafx.scene.control.TableColumn;
import javafx.scene.control.TableView;
import javafx.stage.Stage;
import org.javafxdata.datasources.reader.FileSource;
import org.javafxdata.datasources.provider.CSVDataSource;
public class CSVTableSample extends Application {
#SuppressWarnings("unchecked")
#Override
public void start(Stage stage) throws Exception {
stage.setTitle("Test App");
// Just loading the file...
FileSource fs = new FileSource("test.csv");
// Now creating my datasource
CSVDataSource dataSource = new CSVDataSource(
fs, "order-id", "order-item-id");
#SuppressWarnings("rawtypes")
TableView table1 = new TableView();
TableColumn<?, ?> orderCol = dataSource.getNamedColumn("order-id");
TableColumn<?, ?> itemCol = dataSource.getNamedColumn("order-item-id");
table1.getColumns().addAll(orderCol, itemCol);
table1.setItems(dataSource);
stage.setScene(SceneBuilder.create().root(table1).build());
stage.show();
}
public static void main(String[] args) {
Application.launch(args);
}
}
eclipse says for table1.setItems(dataSource);
The method setItems(ObservableList) in the type TableView is not applicable for the arguments (CSVDataSource)

There is a sample solution here for a tab delimited file.
A csv file could handled similarly.
The sample works by declaring the type of the the TableView as TableView<ObservableList<StringProperty>> such that each row in the TableView is an ObservableList of string properties where each property represents a field in the csv file. The TableView's items list is a list of such lists. cellValueFactorys set for each column extract the correct cell value for that column from the ObservableList<StringProperty> backing that cell's row.

The method setItems(ObservableList) in the type TableView is not
applicable for the arguments (CSVDataSource)
change your line
table1.setItems(dataSource);
to
table1.setItems(dataSource.getData());
Example Code Using DataFX :
DataSourceReader dsr1 = new FileSource("your csv file path");
String[] columnsArray // create array of column names you want to display
CSVDataSource ds1 = new CSVDataSource(dsr1,columnsArray);
TableView tableView = new TableView();
tableView.setItems(ds1.getData());
tableView.getColumns().addAll(ds1.getColumns());
if you want to do it in standard javafx way : Look Here

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Weka - issue with line X ... coverting csv to ARFF - csv

I am currently trying to covert a csv file of information to an ARFF file in Weka... The issue pops up that there is a problem with line 3384... but there is nothing that i can see that is wrong with the line? Image of excel file here Please can someone help? Thanks.

Related

USACO Code Submission Problem - Output File Missing

Flink Data Stream CSV Writer not writing data to CSV file

JSON to CSV conversion on HDFS

How to read a CSV file from Hdfs?

JavaFX 2. Loading external CSV into a TableView

Categories

Resources