Can OpenCSV ignore trailing commas on records?

Can OpenCSV ignore trailing commas on records? - csv

A CSV with trailing commas like this:
name, phone
joe, 123-456-7890,
bob, 333-555-6666,
processed like this:
CSVReaderHeaderAware r = new CSVReaderHeaderAware(reader);
Map<String, String> values = r.readMap();
will throw this exception:
java.io.IOException: Error on record number 2: The number of data elements is not the same as the number of header elements
For now I'm stripping commas from input files using sed:
find . -type f -exec sed -i 's/,*\r*$//' {} \;
Is there some easy way to tell OpenCSV to ignore trailing commas?

OpenCSV maintainers commented here. As of OpenCSV v5.1 there is no simple way to accomplish this and pre-processing the file using sed, etc is best for now.

According to link provided in #Andrew's answer it's a malformed CSV input.
But as own maintainer suggests ( here ):
If you know you will always have single-line records, you could
derive a class from CSVReader, override getNextLine() to call
super.getNextLine(), then cut off the trailing comma, and of course,
pass your new reader into opencsv to use in parsing.
In other words, create your own CustomCSVReader and remove the last comma.
Here's an example:
import com.opencsv.CSVReader;
public class CustomCSVReader extends CSVReader {
public CustomCSVReader(Reader reader) {
super(reader);
}
#Override
protected String getNextLine() throws IOException {
String line = super.getNextLine();
if (line == null) {
return null;
}
boolean endsWithComma = line.endsWith(",");
if (endsWithComma) {
return line.substring(0, line.length() - 1);
}
return line;
}
}
The Model Converter using CustomCSVReader
public class CustomCSVParser{
public List<User> convert(String data) {
return new CsvToBeanBuilder<Transaction>(new CustomCSVReader(new StringReader(data)))
.withType(User.class)
.build()
.parse();
}
The Model class
import com.opencsv.bean.CsvBindByName;
public class User {
#CsvBindByName(column = "name")
private String userName;
#CsvBindByName(column = "phone")
private String phoneNumber;
// Constructor, Getters and Setters ommited
}
Test Class
class CustomCSVParserTest {
private CustomCSVParser instance;
#BeforeEach
void setUp() {
instance = new CustomCSVParser();
}
#Test
void csvInput_withCommaInLastLine_mustBeParsed() {
String data = "name, phone
joe, 123-456-7890,
bob, 333-555-6666,";
List<User> result = instance.convert(data);
List<User> expectedResult = Arrays.asList(
new User("joe", "123-456-7890"),
new User("bob", "333-555-6666"));
Assertions.assertArrayEquals(expectedResult.toArray(), result.toArray());
}
}
That's it.

Related

SpringBatch Read from unstructured csv

I would like to read from an unstructured CSV file. It means it will have different columns types every time. Please help.

Yes, Finally Myself found the solution and i would like to share with you. You can write a LineMapper and you can map unstructured header (dynamic- columns) with each line by the following code. Please Note i have read header while job scheduling and pass it as JobParameter.
#Bean
#StepScope
public FlatFileItemReader<Customer> csvReader(#Value("#{jobParameters[filepath]}") String filepath,
#Value("#{jobParameters[header]}") String header,
#Value("#{jobParameters[campaignId]}") String campaignId,
#Value("#{jobParameters[_id]}") String _id) {
FlatFileItemReader<Customer> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setResource(new FileSystemResource(filepath));
flatFileItemReader.setName("customer-csv-file-reader");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(lineMapper(header,campaignId,_id));
return flatFileItemReader;
}
#Bean
#StepScope
public LineMapper<Customer> lineMapper(#Value("#{jobParameters[header]}") String header,
#Value("#{jobParameters[campaignId]}") String campaignId,
#Value("#{jobParameters[_id]}") String _id) {
return new LineMapper<Customer>() {
public String[] headers = header.split(",");
#Override
public Customer mapLine(String line, int linenumber) throws Exception {
Customer item = new Customer();
String[] p = line.split(",");
Map<String, String> properties = IntStream.range(0, headers.length).boxed()
.collect(Collectors.toMap(i -> headers[i], i -> p[i]));
item.setCampaignId(new ObjectId(campaignId));
item.setInviteId(new ObjectId(_id));
item.setProperties(properties);
return item;
}
};
}

How to read csv data one by one and pass it in multiple testNG tests

I need to insert a data multiple times in an web application. I am using selenium with testNG along with data driven framework.
I am using CSV file for reading the the input values.
Please find the sample code below.
public class TestData
{
private static String firstName;
public static String lastName;
#BeforeClass
public void beforeClass() throws IOException
{
reader = new CSVReader(new FileReader(fileName));
while((record = reader.readNext()) != null)
{
firstName = record[0];
lastName = record[1];
}
}
#Test
public void test1()
{
driver.findElement(By.id(id)).sendKeys(firstName);
driver.findElement(By.id(id)).click();
and so on....
}
#Test
public void test2()
{
driver.findElement(By.id(id)).sendKeys(lastName);
driver.findElement(By.id(id)).click();
and so on....
}
}
Here, I need to insert 3 records, but when I use the above code, only the 3rd record gets inserted.
Kindly help me to fix this issue.
Sample Input File

What you need here is a Factory powered by a DataProvider. The Factory would produce test class instances (A test class here is basically a regular class that contains one or more #Test methods housed in it). The data provider would basically feed the factory method with the data required to instantiate the test class.
Now your #Test methods would basically work with the data members in the instances to run its logic.
Here's a simple sample that shows this in action.
import org.assertj.core.api.Assertions;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Factory;
import org.testng.annotations.Test;
public class TestClassSample {
private String firstName;
private String lastName;
#Factory(dataProvider = "dp")
public TestClassSample(String firstName, String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
#DataProvider(name = "dp")
public static Object[][] getData() {
//feel free to replace this with the logic that reads up a csv file (using CSVReader)
// and then translates it to a 2D array.
return new Object[][]{
{"Mohan", "Kumar"},
{"Kane", "Williams"},
{"Mark", "Henry"}
};
}
#Test
public void test1() {
Assertions.assertThat(this.firstName).isNotEmpty();
}
#Test
public void test2() {
Assertions.assertThat(this.lastName).isNotEmpty();
}
}

As per the data given by you , the while loop ends at the third record of CSV file. In each iteration your variables "firstName" and "lastName" are overwritten.
When the loop breaks , the variables store the lastly written values. So , use a better data structure for storing all values. I recommend map.
You can further club all the test cases in a single method , use invocationcount attribute in #Test annotation to repeat the execution for each entry from map. Add one more method with #BeforeTest for increment to next keyset in map.

Handle JSON which sends array of items but sometimes empty string in case of 0 elements

I have a JSON which sends array of element in normal cases but sends empty string "" tag without array [] brackets in case of 0 elements.
How to handle this with Gson? I want to ignore the error and not cause JSONParsingException.
eg.
"types": [
"Environment",
"Management",
"Computers"
],
sometimes it returns:
"types" : ""
Getting the following exception: Expected BEGIN ARRAY but was string

Since you don't have control over the input JSON string, you can test the content and decide what to do with it.
Here is an example of a working Java class:
import com.google.gson.Gson;
import java.util.ArrayList;
public class Test {
class Types {
Object types;
}
public void test(String input) {
Gson gson = new Gson();
Types types = gson.fromJson(input,Types.class);
if(types.types instanceof ArrayList) {
System.out.println("types is an ArrayList");
} else if (types.types instanceof String) {
System.out.println("types is an empty String");
}
}
public static void main(String[] args) {
String input = "{\"types\": [\n" +
" \"Environment\",\n" +
" \"Management\",\n" +
" \"Computers\"\n" +
" ]}";
String input2 = "{\"types\" : \"\"}";
Test testing = new Test();
testing.test(input2); //change input2 to input
}
}

If a bad JSON schema is not under your control, you can implement a specific type adapter that would try to determine whether the given JSON document is fine for you and, if possible, make some transformations. I would recomment to use #JsonAdapter in order to specify improperly designed types (at least I hope the entire API is not improperly designed).
For example,
final class Wrapper {
#JsonAdapter(LenientListTypeAdapterFactory.class)
final List<String> types = null;
}
where LenientListTypeAdapterFactory can be implemented as follows:
final class LenientListTypeAdapterFactory
implements TypeAdapterFactory {
// Gson can instantiate it itself, let it just do it
private LenientListTypeAdapterFactory() {
}
#Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
// Obtaining the original list type adapter
#SuppressWarnings("unchecked")
final TypeAdapter<List<?>> realListTypeAdapter = (TypeAdapter<List<?>>) gson.getAdapter(typeToken);
// And wrap it up in the lenient JSON type adapter
#SuppressWarnings("unchecked")
final TypeAdapter<T> castTypeAdapter = (TypeAdapter<T>) new LenientListTypeAdapter(realListTypeAdapter);
return castTypeAdapter;
}
private static final class LenientListTypeAdapter
extends TypeAdapter<List<?>> {
private final TypeAdapter<List<?>> realListTypeAdapter;
private LenientListTypeAdapter(final TypeAdapter<List<?>> realListTypeAdapter) {
this.realListTypeAdapter = realListTypeAdapter;
}
#Override
public void write(final JsonWriter out, final List<?> value)
throws IOException {
realListTypeAdapter.write(out, value);
}
#Override
public List<?> read(final JsonReader in)
throws IOException {
// Check the next (effectively current) JSON token
switch ( in.peek() ) {
// If it's either `[...` or `null` -- we're supposing it's a "normal" list
case BEGIN_ARRAY:
case NULL:
return realListTypeAdapter.read(in);
// Is it a string?
case STRING:
// Skip the value entirely
in.skipValue();
// And return a new array list.
// Note that you might return emptyList() but Gson uses mutable lists so we do either
return new ArrayList<>();
// Not anything known else?
case END_ARRAY:
case BEGIN_OBJECT:
case END_OBJECT:
case NAME:
case NUMBER:
case BOOLEAN:
case END_DOCUMENT:
// Something definitely unexpected
throw new MalformedJsonException("Cannot parse " + in);
default:
// This would never happen unless Gson adds a new type token
throw new AssertionError();
}
}
}
}
Here is it how it can be tested:
for ( final String name : ImmutableList.of("3-elements.json", "0-elements.json") ) {
try ( final Reader reader = getPackageResourceReader(Q43562427.class, name) ) {
final Wrapper wrapper = gson.fromJson(reader, Wrapper.class);
System.out.println(wrapper.types);
}
}
Output:
[Environment, Management, Computers]
[]
If the entire API uses "" for empty arrays, then you can drop the #JsonAdapter annotation and register the LenientListTypeAdapterFactory via GsonBuilder, but add the following lines to the create method in order not to break other type adapters:
if ( !List.class.isAssignableFrom(typeToken.getRawType()) ) {
// This tells Gson to try to pick up the next best-match type adapter
return null;
}
...
There are a lot of weirdly designed JSON response choices, but this one hits the top #1 issue where nulls or empties are represented with "". Good luck!

Thanks for all your answers.
The recommed way as mentioned in above answers would be to use TypeAdapters and ExclusionStrategy for GSON.
Here is a good example Custom GSON desrialization

Apache Camel CSV with Header

I have written a simple test app that reads records from a DB and puts the result in a csv file. So far it works fine but the column names i.e. headers are not put in the csv file. According to the doc it should be put there. I have also tried it without/with streaming and split but the situation is the same.
In the camel unit-tests in line 182 the headers are put there explicitly: https://github.com/apache/camel/blob/master/components/camel-csv/src/test/java/org/apache/camel/dataformat/csv/CsvDataFormatTest.java
How could this very simple problem be solved without the need to iterate over the headers? I also experimented with different settings but all the same. The e.g delimiters have been considered I set but the headers not. Thanks for the responses also in advance.
I used Camel 2.16.1 like this:
final CsvDataFormat csvDataFormat = new CsvDataFormat();
csvDataFormat.setHeaderDisabled(false);
[...]
from("direct:TEST").routeId("TEST")
.setBody(constant("SELECT * FROM MYTABLE"))
.to("jdbc:myDataSource?readSize=100") // max 100 records
// .split(simple("${body}")) // split the list
// .streaming() // not to keep all messages in memory
.marshal(csvDataFormat)
.to("file:extract?fileName=TEST.csv");
[...]
EDIT 1
I have also tried to add the headers from the exchange.in. They are there available with the name "CamelJdbcColumnNames" in a HashSet. I added it to the csvDataFormat like this:
final CsvDataFormat csvDataFormat = new CsvDataFormat();
csvDataFormat.setHeaderDisabled(false);
[...]
from("direct:TEST").routeId("TEST")
.setBody(constant("SELECT * FROM MYTABLE"))
.to("jdbc:myDataSource?readSize=100") // max 100 records
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
headerNames = (HashSet)exchange.getIn().getHeader("CamelJdbcColumnNames");
System.out.println("#### Process headernames = " + new ArrayList<String>(headerNames).toString());
csvDataFormat.setHeader(new ArrayList<String>(headerNames));
}
})
.marshal(csvDataFormat)//.tracing()
.to("file:extract?fileName=TEST.csv");
The println() prints the column names but the cvs file generated does not.
EDIT2
I added the header names to the body as proposed in comment 1 like this:
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
Set<String> headerNames = (HashSet)exchange.getIn().getHeader("CamelJdbcColumnNames");
Map<String, String> nameMap = new LinkedHashMap<String, String>();
for (String name: headerNames){
nameMap.put(name, name);
}
List<Map> listWithHeaders = new ArrayList<Map>();
listWithHeaders.add(nameMap);
List<Map> records = exchange.getIn().getBody(List.class);
listWithHeaders.addAll(records);
exchange.getIn().setBody(listWithHeaders, List.class);
System.out.println("#### Process headernames = " + new ArrayList<String>(headerNames).toString());
csvDataFormat.setHeader(new ArrayList<String>(headerNames));
}
})
The proposal solved the problem and thank you for that but it means that CsvDataFormat is not really usable. The exchange body after the JDBC query contains an ArrayList from HashMaps containing one record of the table. The key of the HashMap is the name of the column and the value is the value. So setting the config value for the header output in CsvDataFormat should be more than enough to get the headers generated. Do you know a simpler solution or did I miss something in the configuration?

You take the data from a database with JDBC so you need to add the headers yourself first to the message body so its the first row. The resultset from the jdbc is just the data, not including headers.

I have done it by overriding the BindyCsvDataFormat and BindyCsvFactory
public class BindySplittedCsvDataFormat extends BindyCsvDataFormat {
private boolean marshallingfirslLot = false;
public BindySplittedCsvDataFormat() {
super();
}
public BindySplittedCsvDataFormat(Class<?> type) {
super(type);
}
#Override
public void marshal(Exchange exchange, Object body, OutputStream outputStream) throws Exception {
marshallingfirslLot = new Integer(0).equals(exchange.getProperty("CamelSplitIndex"));
super.marshal(exchange, body, outputStream);
}
#Override
protected BindyAbstractFactory createModelFactory(FormatFactory formatFactory) throws Exception {
BindySplittedCsvFactory bindyCsvFactory = new BindySplittedCsvFactory(getClassType(), this);
bindyCsvFactory.setFormatFactory(formatFactory);
return bindyCsvFactory;
}
protected boolean isMarshallingFirslLot() {
return marshallingfirslLot;
}
}
public class BindySplittedCsvFactory extends BindyCsvFactory {
private BindySplittedCsvDataFormat bindySplittedCsvDataFormat;
public BindySplittedCsvFactory(Class<?> type, BindySplittedCsvDataFormat bindySplittedCsvDataFormat) throws Exception {
super(type);
this.bindySplittedCsvDataFormat = bindySplittedCsvDataFormat;
}
#Override
public boolean getGenerateHeaderColumnNames() {
return super.getGenerateHeaderColumnNames() && bindySplittedCsvDataFormat.isMarshallingFirslLot();
}
}

My solution with spring xml (but I'd like to have an option in for extracting also the header on top:
Using spring xml
<multicast stopOnException="true">
<pipeline>
<log message="saving table ${headers.tablename} header to ${headers.CamelFileName}..."/>
<setBody>
<groovy>request.headers.get('CamelJdbcColumnNames').join(";") + "\n"</groovy>
</setBody>
<to uri="file:output"/>
</pipeline>
<pipeline>
<log message="saving table ${headers.tablename} rows to ${headers.CamelFileName}..."/>
<marshal>
<csv delimiter=";" headerDisabled="false" useMaps="true"/>
</marshal>
<to uri="file:output?fileExist=Append"/>
</pipeline>
</multicast>
http://www.redaelli.org/matteo-blog/2019/05/24/exporting-database-tables-to-csv-files-with-apache-camel/

Remove namespace prefix while JAXB marshalling

I have JAXB objects created from a schema. While marshalling, the xml elements are getting annotated with ns2. I have tried all the options that exist over the net for this problem, but none of them works. I cannot modify my schema or change package-info.java. Please help

After much research and tinkering I have finally managed to achieve a solution to this problem. Please accept my apologies for not posting links to the original references - there are many and I wasn't taking notes - but this one was certainly useful.
My solution uses a filtering XMLStreamWriter which applies an empty namespace context.
public class NoNamesWriter extends DelegatingXMLStreamWriter {
private static final NamespaceContext emptyNamespaceContext = new NamespaceContext() {
#Override
public String getNamespaceURI(String prefix) {
return "";
}
#Override
public String getPrefix(String namespaceURI) {
return "";
}
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
};
public static XMLStreamWriter filter(Writer writer) throws XMLStreamException {
return new NoNamesWriter(XMLOutputFactory.newInstance().createXMLStreamWriter(writer));
}
public NoNamesWriter(XMLStreamWriter writer) {
super(writer);
}
#Override
public NamespaceContext getNamespaceContext() {
return emptyNamespaceContext;
}
}
You can find a DelegatingXMLStreamWriter here.
You can then filter the marshalling xml with:
// Filter the output to remove namespaces.
m.marshal(it, NoNamesWriter.filter(writer));
I am sure there are more efficient mechanisms but I know this one works.

For me, only changing the package-info.java class worked like a charm, exactly as zatziky stated :
package-info.java
#javax.xml.bind.annotation.XmlSchema
(namespace = "http://example.com",
xmlns = {#XmlNs(prefix = "", namespaceURI = "http://example.com")},
elementFormDefault = javax.xml.bind.annotation.XmlNsForm.QUALIFIED)
package my.package;
import javax.xml.bind.annotation.XmlNs;

You can let the namespaces be written only once. You will need a proxy class of the XMLStreamWriter and a package-info.java. Then you will do in your code:
StringWriter stringWriter = new StringWriter();
XMLStreamWriter writer = new Wrapper((XMLStreamWriter) XMLOutputFactory
.newInstance().createXMLStreamWriter(stringWriter));
JAXBContext jaxbContext = JAXBContext.newInstance(Collection.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
jaxbMarshaller.marshal(books, writer);
System.out.println(stringWriter.toString());
Proxy class (the important method is "writeNamespace"):
class WrapperXMLStreamWriter implements XMLStreamWriter {
private final XMLStreamWriter writer;
public WrapperXMLStreamWriter(XMLStreamWriter writer) {
this.writer = writer;
}
//keeps track of what namespaces were used so that not to
//write them more than once
private List<String> namespaces = new ArrayList<String>();
public void init(){
namespaces.clear();
}
public void writeStartElement(String localName) throws XMLStreamException {
init();
writer.writeStartElement(localName);
}
public void writeStartElement(String namespaceURI, String localName) throws XMLStreamException {
init();
writer.writeStartElement(namespaceURI, localName);
}
public void writeStartElement(String prefix, String localName, String namespaceURI) throws XMLStreamException {
init();
writer.writeStartElement(prefix, localName, namespaceURI);
}
public void writeNamespace(String prefix, String namespaceURI) throws XMLStreamException {
if(namespaces.contains(namespaceURI)){
return;
}
namespaces.add(namespaceURI);
writer.writeNamespace(prefix, namespaceURI);
}
// .. other delegation method, always the same pattern: writer.method() ...
}
package-info.java:
#XmlSchema(elementFormDefault=XmlNsForm.QUALIFIED, attributeFormDefault=XmlNsForm.UNQUALIFIED ,
xmlns = {
#XmlNs(namespaceURI = "http://www.w3.org/2001/XMLSchema-instance", prefix = "xsi")})
package your.package;
import javax.xml.bind.annotation.XmlNs;
import javax.xml.bind.annotation.XmlNsForm;
import javax.xml.bind.annotation.XmlSchema;

You can use the NamespacePrefixMapper extension to control the namespace prefixes for your use case. The same extension is supported by both the JAXB reference implementation and EclipseLink JAXB (MOXy).
http://wiki.eclipse.org/EclipseLink/Release/2.4.0/JAXB_RI_Extensions/Namespace_Prefix_Mapper

Every solution requires complex overwriting or annotations which seems not to work with recent version. I use a simpler approach, just by replacing the annoying namespaces. I wish Google & Co would use JSON and get rid of XML.
kml.marshal(file);
String kmlContent = FileUtils.readFileToString(file, "UTF-8");
kmlContent = kmlContent.replaceAll("ns2:","").replace("<kml xmlns:ns2=\"http://www.opengis.net/kml/2.2\" xmlns:ns3=\"http://www.w3.org/2005/Atom\" xmlns:ns4=\"urn:oasis:names:tc:ciq:xsdschema:xAL:2.0\" xmlns:ns5=\"http://www.google.com/kml/ext/2.2\">", "<kml>");
FileUtils.write(file, kmlContent, "UTF-8");

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Can OpenCSV ignore trailing commas on records? - csv

OpenCSV maintainers commented here. As of OpenCSV v5.1 there is no simple way to accomplish this and pre-processing the file using sed, etc is best for now.

Related

SpringBatch Read from unstructured csv

How to read csv data one by one and pass it in multiple testNG tests

Handle JSON which sends array of items but sometimes empty string in case of 0 elements

Apache Camel CSV with Header

Remove namespace prefix while JAXB marshalling

Categories

Resources