SpringBatch Read from unstructured csv - csv

I would like to read from an unstructured CSV file. It means it will have different columns types every time. Please help.

Yes, Finally Myself found the solution and i would like to share with you. You can write a LineMapper and you can map unstructured header (dynamic- columns) with each line by the following code. Please Note i have read header while job scheduling and pass it as JobParameter.
#Bean
#StepScope
public FlatFileItemReader<Customer> csvReader(#Value("#{jobParameters[filepath]}") String filepath,
#Value("#{jobParameters[header]}") String header,
#Value("#{jobParameters[campaignId]}") String campaignId,
#Value("#{jobParameters[_id]}") String _id) {
FlatFileItemReader<Customer> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setResource(new FileSystemResource(filepath));
flatFileItemReader.setName("customer-csv-file-reader");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(lineMapper(header,campaignId,_id));
return flatFileItemReader;
}
#Bean
#StepScope
public LineMapper<Customer> lineMapper(#Value("#{jobParameters[header]}") String header,
#Value("#{jobParameters[campaignId]}") String campaignId,
#Value("#{jobParameters[_id]}") String _id) {
return new LineMapper<Customer>() {
public String[] headers = header.split(",");
#Override
public Customer mapLine(String line, int linenumber) throws Exception {
Customer item = new Customer();
String[] p = line.split(",");
Map<String, String> properties = IntStream.range(0, headers.length).boxed()
.collect(Collectors.toMap(i -> headers[i], i -> p[i]));
item.setCampaignId(new ObjectId(campaignId));
item.setInviteId(new ObjectId(_id));
item.setProperties(properties);
return item;
}
};
}

Related

Can OpenCSV ignore trailing commas on records?

A CSV with trailing commas like this:
name, phone
joe, 123-456-7890,
bob, 333-555-6666,
processed like this:
CSVReaderHeaderAware r = new CSVReaderHeaderAware(reader);
Map<String, String> values = r.readMap();
will throw this exception:
java.io.IOException: Error on record number 2: The number of data elements is not the same as the number of header elements
For now I'm stripping commas from input files using sed:
find . -type f -exec sed -i 's/,*\r*$//' {} \;
Is there some easy way to tell OpenCSV to ignore trailing commas?
OpenCSV maintainers commented here. As of OpenCSV v5.1 there is no simple way to accomplish this and pre-processing the file using sed, etc is best for now.
According to link provided in #Andrew's answer it's a malformed CSV input.
But as own maintainer suggests ( here ):
If you know you will always have single-line records, you could
derive a class from CSVReader, override getNextLine() to call
super.getNextLine(), then cut off the trailing comma, and of course,
pass your new reader into opencsv to use in parsing.
In other words, create your own CustomCSVReader and remove the last comma.
Here's an example:
import com.opencsv.CSVReader;
public class CustomCSVReader extends CSVReader {
public CustomCSVReader(Reader reader) {
super(reader);
}
#Override
protected String getNextLine() throws IOException {
String line = super.getNextLine();
if (line == null) {
return null;
}
boolean endsWithComma = line.endsWith(",");
if (endsWithComma) {
return line.substring(0, line.length() - 1);
}
return line;
}
}
The Model Converter using CustomCSVReader
public class CustomCSVParser{
public List<User> convert(String data) {
return new CsvToBeanBuilder<Transaction>(new CustomCSVReader(new StringReader(data)))
.withType(User.class)
.build()
.parse();
}
The Model class
import com.opencsv.bean.CsvBindByName;
public class User {
#CsvBindByName(column = "name")
private String userName;
#CsvBindByName(column = "phone")
private String phoneNumber;
// Constructor, Getters and Setters ommited
}
Test Class
class CustomCSVParserTest {
private CustomCSVParser instance;
#BeforeEach
void setUp() {
instance = new CustomCSVParser();
}
#Test
void csvInput_withCommaInLastLine_mustBeParsed() {
String data = "name, phone
joe, 123-456-7890,
bob, 333-555-6666,";
List<User> result = instance.convert(data);
List<User> expectedResult = Arrays.asList(
new User("joe", "123-456-7890"),
new User("bob", "333-555-6666"));
Assertions.assertArrayEquals(expectedResult.toArray(), result.toArray());
}
}
That's it.

Iterate items in ResponseBody and put them in a HashMap Spring Boot

In a REST controller in Spring boot, I am trying to iterate the values in a RequestBody response and put some of them in a HashMap in a POST endpoint.
The JSON I am sending is of this structure:
{"name":"yogurt","vitaminA":6,"vitaminb12":5}
The endpoint looks like this so far:
#RequestMapping("/create")
public NutrientList createNUtrientList(#RequestBody NutrientList nutrientList) {
Map<String, Double> nutrientMap = new HashMap<String,Double>();
//get nutrient values, need help with this part
for()
//add values to map
NutrientList nl = new NutrientList(nutrientList.getName(), nutrientMap);
//will save to repository
return nl;
}
The NutrientList class looks like this:
public class NutrientList {
#Id
private ObjectId id;
#JsonProperty("name")
private String name;
#JsonProperty("nutrientMap")
Map <String,Double> nutrientMap = new HashMap<String,Double>();
public NutrientList() {}
public NutrientList(String name, Map<String, Double> nutrientMap) {
this.id = new ObjectId();
this.name = name;
this.nutrientMap = nutrientMap;
}
//setters and getters
}
The data is stored by separate nutrient in the database, it is not a map. I see the NutrientList class does not share the same structure, but is there any way I can get around this to be able to use a map without changing how it is stored in the database?
I need to use a map because there are many nutrients and I don't want to have separate variables for them. Thank you so much. Let me know if something is not clear.
EDIT:
I could alternately turn the csv where I got the data in the database from into JSON format with the map, but I have not found a tool online that gives me this flexibility.
If you have a list of valid keys, you could use the following:
private static final List<String> validKeys = Arrays.asList("vitaminA", "vitaminB" /* ... */);
#RequestMapping("/create")
public NutrientList createNutrientList(#RequestBody Map<String, Object> requestBody) {
Map<String, Double> nutrientMap = new HashMap<>();
for (String nutrient : requestBody.keySet()) {
if (validKeys.contains(nutrient) && requestBody.get(nutrient) instanceof Number) {
Number number = (Number) requestBody.get(nutrient);
nutrientMap.put(nutrient, number.doubleValue());
}
}
String name = (String) requestBody.get("name"); // maybe check if name exists and is really a string
return new NutrientList(name, nutrientMap);
}
If you want to use Java 8 Stream API you can try:
private static final List<String> validKeys = Arrays.asList("vitaminA", "vitaminB" /* ... */);
#RequestMapping("/create")
public NutrientList createNutrientList(#RequestBody Map<String, Object> requestBody) {
Map<String, Double> nutrientMap = requestBody.entrySet().stream()
.filter(e -> validKeys.contains(e.getKey()))
.filter(e -> e.getValue() instanceof Number)
.collect(Collectors.toMap(Map.Entry::getKey, e -> ((Number) e.getValue()).doubleValue()));
String name = Optional.ofNullable(requestBody.get("name"))
.filter(n -> n instanceof String)
.map(n -> (String) n)
.orElseThrow(IllegalArgumentException::new);
return new NutrientList(name, nutrientMap);
}
Hope that helps.

Display Message to User instead of empty JSON on HTML when records are empty in the database

I have an application where I have an html page which takes user input through a textbox.This is a REST Spring Framework and is divided as Controller, Entity, Service, Repository, View and the main application class.
I take an input value and search in the Mongodb database, If the value is present, I return the entity object from Service to Controller. The controller returns the same Entity View object.- PersonView in this case. I get a JSON Data.
The above scenario works well as long as there are records in the database. In case if the record is not present, it returns an empty JSON. My Controller returns Person View Object and I do not wish to change the signature and make the return type as String since in that case it returns the address on my HTML page.
Considering this, how should I handle the case when there are no records in the database and I wish to display a message on this same HTML page saying there are no records available.
I tried throwing an exception but in this case too, how Do I display message on my HTML considering that my Controller returns JSON object and I do not wish to change its signature?
Controller Class is as below:
public PersonView searchPerson(#PathVariable String pname) {
List<Person> pList= PersonService.searchPerson(pname);
PersonView personView = new PersonView();
personView.setPersonView(pList);
return personView;
EDIT:
Here is the function from personView Class that I call in Controller:
public List<Person> setPersonView() {
this.personView = personView;
}
Here is the service Impl class:
public List<Person> searchPerson(String name) throws Exception {
List<Person> personlist= new ArrayList<Person>();
personlist = personRepository.findByName(name);
if (personlist.isEmpty())
throw new Exception("Records not found in the the database");
return personlist;
}
Create a custom Exception class:
public class EntityNotFoundException extends RuntimeException {
public EntityNotFoundException(String message) {
super(message);
}
}
Now, in you controller code:
public List<Person> searchPerson(String name) {
List<Person> personlist= new ArrayList<Person>();
personlist = personRepository.findByName(name);
if (personlist.isEmpty()) {
throw new EntityNotFoundException("Records not found in the the database");
}
return personlist;
}
After that you can try something like this in you controller class:
private static final MappingJacksonJsonView JSON_VIEW = new MappingJacksonJsonView();
#ExceptionHandler(EntityNotFoundException.class)
public ModelAndView handleNotFoundException( Exception ex )
{
return new ModelAndView(JSON_VIEW, "error", new ErrorMessage("No Record in Db") );
}
Your ErrorMessage class can be a simple POJO:
public class ErrorMessage {
private String message;
ErrorMessage(String message) {
this.message = message;
}
public String getMessage() {
return message;
}
}
Although already answered, I will add some points here.
Please note that at some point of time you will have a requirement to send the
headers, Response body (with different Objects). So consider using ResponseEntity Object which will be a wrapper to your List. Here is the sample code.
public ResponseEntity<List<Person>> searchPerson(String name) {
List<Person> personlist= new ArrayList<Person>();
personlist = personRepository.findByName(name);
if (personlist.isEmpty()) {
return new ResponseEntity(new EntityNotFoundException("Records not found in the the database"), HttpStatus.BAD_REQUEST);
}
return new ResponseEntity(personlist , HttpStatus.OK);
}
Response Entity Object provides flexibility to greater extent. Read the documentation here.
https://docs.spring.io/spring/docs/current/javadocapi/org/springframework/http/ResponseEntity.html

How to read csv data one by one and pass it in multiple testNG tests

I need to insert a data multiple times in an web application. I am using selenium with testNG along with data driven framework.
I am using CSV file for reading the the input values.
Please find the sample code below.
public class TestData
{
private static String firstName;
public static String lastName;
#BeforeClass
public void beforeClass() throws IOException
{
reader = new CSVReader(new FileReader(fileName));
while((record = reader.readNext()) != null)
{
firstName = record[0];
lastName = record[1];
}
}
#Test
public void test1()
{
driver.findElement(By.id(id)).sendKeys(firstName);
driver.findElement(By.id(id)).click();
and so on....
}
#Test
public void test2()
{
driver.findElement(By.id(id)).sendKeys(lastName);
driver.findElement(By.id(id)).click();
and so on....
}
}
Here, I need to insert 3 records, but when I use the above code, only the 3rd record gets inserted.
Kindly help me to fix this issue.
Sample Input File
What you need here is a Factory powered by a DataProvider. The Factory would produce test class instances (A test class here is basically a regular class that contains one or more #Test methods housed in it). The data provider would basically feed the factory method with the data required to instantiate the test class.
Now your #Test methods would basically work with the data members in the instances to run its logic.
Here's a simple sample that shows this in action.
import org.assertj.core.api.Assertions;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Factory;
import org.testng.annotations.Test;
public class TestClassSample {
private String firstName;
private String lastName;
#Factory(dataProvider = "dp")
public TestClassSample(String firstName, String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
#DataProvider(name = "dp")
public static Object[][] getData() {
//feel free to replace this with the logic that reads up a csv file (using CSVReader)
// and then translates it to a 2D array.
return new Object[][]{
{"Mohan", "Kumar"},
{"Kane", "Williams"},
{"Mark", "Henry"}
};
}
#Test
public void test1() {
Assertions.assertThat(this.firstName).isNotEmpty();
}
#Test
public void test2() {
Assertions.assertThat(this.lastName).isNotEmpty();
}
}
As per the data given by you , the while loop ends at the third record of CSV file. In each iteration your variables "firstName" and "lastName" are overwritten.
When the loop breaks , the variables store the lastly written values. So , use a better data structure for storing all values. I recommend map.
You can further club all the test cases in a single method , use invocationcount attribute in #Test annotation to repeat the execution for each entry from map. Add one more method with #BeforeTest for increment to next keyset in map.

Apache Camel CSV with Header

I have written a simple test app that reads records from a DB and puts the result in a csv file. So far it works fine but the column names i.e. headers are not put in the csv file. According to the doc it should be put there. I have also tried it without/with streaming and split but the situation is the same.
In the camel unit-tests in line 182 the headers are put there explicitly: https://github.com/apache/camel/blob/master/components/camel-csv/src/test/java/org/apache/camel/dataformat/csv/CsvDataFormatTest.java
How could this very simple problem be solved without the need to iterate over the headers? I also experimented with different settings but all the same. The e.g delimiters have been considered I set but the headers not. Thanks for the responses also in advance.
I used Camel 2.16.1 like this:
final CsvDataFormat csvDataFormat = new CsvDataFormat();
csvDataFormat.setHeaderDisabled(false);
[...]
from("direct:TEST").routeId("TEST")
.setBody(constant("SELECT * FROM MYTABLE"))
.to("jdbc:myDataSource?readSize=100") // max 100 records
// .split(simple("${body}")) // split the list
// .streaming() // not to keep all messages in memory
.marshal(csvDataFormat)
.to("file:extract?fileName=TEST.csv");
[...]
EDIT 1
I have also tried to add the headers from the exchange.in. They are there available with the name "CamelJdbcColumnNames" in a HashSet. I added it to the csvDataFormat like this:
final CsvDataFormat csvDataFormat = new CsvDataFormat();
csvDataFormat.setHeaderDisabled(false);
[...]
from("direct:TEST").routeId("TEST")
.setBody(constant("SELECT * FROM MYTABLE"))
.to("jdbc:myDataSource?readSize=100") // max 100 records
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
headerNames = (HashSet)exchange.getIn().getHeader("CamelJdbcColumnNames");
System.out.println("#### Process headernames = " + new ArrayList<String>(headerNames).toString());
csvDataFormat.setHeader(new ArrayList<String>(headerNames));
}
})
.marshal(csvDataFormat)//.tracing()
.to("file:extract?fileName=TEST.csv");
The println() prints the column names but the cvs file generated does not.
EDIT2
I added the header names to the body as proposed in comment 1 like this:
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
Set<String> headerNames = (HashSet)exchange.getIn().getHeader("CamelJdbcColumnNames");
Map<String, String> nameMap = new LinkedHashMap<String, String>();
for (String name: headerNames){
nameMap.put(name, name);
}
List<Map> listWithHeaders = new ArrayList<Map>();
listWithHeaders.add(nameMap);
List<Map> records = exchange.getIn().getBody(List.class);
listWithHeaders.addAll(records);
exchange.getIn().setBody(listWithHeaders, List.class);
System.out.println("#### Process headernames = " + new ArrayList<String>(headerNames).toString());
csvDataFormat.setHeader(new ArrayList<String>(headerNames));
}
})
The proposal solved the problem and thank you for that but it means that CsvDataFormat is not really usable. The exchange body after the JDBC query contains an ArrayList from HashMaps containing one record of the table. The key of the HashMap is the name of the column and the value is the value. So setting the config value for the header output in CsvDataFormat should be more than enough to get the headers generated. Do you know a simpler solution or did I miss something in the configuration?
You take the data from a database with JDBC so you need to add the headers yourself first to the message body so its the first row. The resultset from the jdbc is just the data, not including headers.
I have done it by overriding the BindyCsvDataFormat and BindyCsvFactory
public class BindySplittedCsvDataFormat extends BindyCsvDataFormat {
private boolean marshallingfirslLot = false;
public BindySplittedCsvDataFormat() {
super();
}
public BindySplittedCsvDataFormat(Class<?> type) {
super(type);
}
#Override
public void marshal(Exchange exchange, Object body, OutputStream outputStream) throws Exception {
marshallingfirslLot = new Integer(0).equals(exchange.getProperty("CamelSplitIndex"));
super.marshal(exchange, body, outputStream);
}
#Override
protected BindyAbstractFactory createModelFactory(FormatFactory formatFactory) throws Exception {
BindySplittedCsvFactory bindyCsvFactory = new BindySplittedCsvFactory(getClassType(), this);
bindyCsvFactory.setFormatFactory(formatFactory);
return bindyCsvFactory;
}
protected boolean isMarshallingFirslLot() {
return marshallingfirslLot;
}
}
public class BindySplittedCsvFactory extends BindyCsvFactory {
private BindySplittedCsvDataFormat bindySplittedCsvDataFormat;
public BindySplittedCsvFactory(Class<?> type, BindySplittedCsvDataFormat bindySplittedCsvDataFormat) throws Exception {
super(type);
this.bindySplittedCsvDataFormat = bindySplittedCsvDataFormat;
}
#Override
public boolean getGenerateHeaderColumnNames() {
return super.getGenerateHeaderColumnNames() && bindySplittedCsvDataFormat.isMarshallingFirslLot();
}
}
My solution with spring xml (but I'd like to have an option in for extracting also the header on top:
Using spring xml
<multicast stopOnException="true">
<pipeline>
<log message="saving table ${headers.tablename} header to ${headers.CamelFileName}..."/>
<setBody>
<groovy>request.headers.get('CamelJdbcColumnNames').join(";") + "\n"</groovy>
</setBody>
<to uri="file:output"/>
</pipeline>
<pipeline>
<log message="saving table ${headers.tablename} rows to ${headers.CamelFileName}..."/>
<marshal>
<csv delimiter=";" headerDisabled="false" useMaps="true"/>
</marshal>
<to uri="file:output?fileExist=Append"/>
</pipeline>
</multicast>
http://www.redaelli.org/matteo-blog/2019/05/24/exporting-database-tables-to-csv-files-with-apache-camel/