How to read huge CSV file in Mule - csv

I'am using Mule Studio 3.4.0 Community Edition.
I have a big problem about how to parse a large CSV file incoming with File Endpoint. The scenario is that I have 3 CSV files and I would putting the files'content into a database.
But when I try to load a huge file (about 144MB) I get the "OutOfMemory" Exception. I thought as solution to divide/split my the large CSV into smaller size CSVs (I don't know if this solution is the best) o try to find a way to process CSV without throwing an exception.
<file:connector name="File" autoDelete="true" streaming="true" validateConnections="true" doc:name="File"/>
<flow name="CsvToFile" doc:name="CsvToFile">
<file:inbound-endpoint path="src/main/resources/inbox" moveToDirectory="src/main/resources/processed" responseTimeout="10000" doc:name="CSV" connector-ref="File">
<file:filename-wildcard-filter pattern="*.csv" caseSensitive="true"/>
</file:inbound-endpoint>
<component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property"/>
<choice doc:name="Choice">
<when expression="INVOCATION:nome_file=azienda" evaluator="header">
<jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/companies-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Azienda"/>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertAziende" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Azienda">
<jdbc-ee:query key="InsertAziende" value="INSERT INTO aw006_azienda VALUES (#[map-payload:AW006_ID], #[map-payload:AW006_ID_CLIENTE], #[map-payload:AW006_RAGIONE_SOCIALE])"/>
</jdbc-ee:outbound-endpoint>
</when>
<when expression="INVOCATION:nome_file=servizi" evaluator="header">
<jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/services-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Servizi"/>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertServizi" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Servizi">
<jdbc-ee:query key="InsertServizi" value="INSERT INTO ctrl_aemd_unb_servizi VALUES (#[map-payload:CTRL_ID_TIPO_OPERAZIONE], #[map-payload:CTRL_DESCRIZIONE], #[map-payload:CTRL_COD_SERVIZIO])"/>
</jdbc-ee:outbound-endpoint>
</when>
<when expression="INVOCATION:nome_file=richiesta" evaluator="header">
<jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/requests-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Richiesta"/>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertRichieste" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Richiesta">
<jdbc-ee:query key="InsertRichieste" value="INSERT INTO ctrl_aemd_unb_richiesta VALUES (#[map-payload:CTRL_ID_CONTROLLER], #[map-payload:CTRL_NUM_RICH_VENDITORE], #[map-payload:CTRL_VENDITORE], #[map-payload:CTRL_CANALE_VENDITORE], #[map-payload:CTRL_CODICE_SERVIZIO], #[map-payload:CTRL_STATO_AVANZ_SERVIZIO], #[map-payload:CTRL_DATA_INSERIMENTO])"/>
</jdbc-ee:outbound-endpoint>
</when>
</choice>
</flow>
Please, I do not know how to fix this problem.
Thanks in advance for any kind of help

As SteveS said, the csv-to-maps-transformer might try to load the entire file to memory before process it. What you can try to do is split the csv file in smaller parts and send those parts to VM to be processed individually.
First, create a component to achieve this first step:
public class CSVReader implements Callable{
#Override
public Object onCall(MuleEventContext eventContext) throws Exception {
InputStream fileStream = (InputStream) eventContext.getMessage().getPayload();
DataInputStream ds = new DataInputStream(fileStream);
BufferedReader br = new BufferedReader(new InputStreamReader(ds));
MuleClient muleClient = eventContext.getMuleContext().getClient();
String line;
while ((line = br.readLine()) != null) {
muleClient.dispatch("vm://in", line, null);
}
fileStream.close();
return null;
}
}
Then, split your main flow in two
<file:connector name="File"
workDirectory="yourWorkDirPath" autoDelete="false" streaming="true"/>
<flow name="CsvToFile" doc:name="Split and dispatch">
<file:inbound-endpoint path="inboxPath"
moveToDirectory="processedPath" pollingFrequency="60000"
doc:name="CSV" connector-ref="File">
<file:filename-wildcard-filter pattern="*.csv"
caseSensitive="true" />
</file:inbound-endpoint>
<component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property" />
<component class="com.dgonza.CSVReader" doc:name="Split the file and dispatch every line to VM" />
</flow>
<flow name="storeInDatabase" doc:name="receive lines and store in database">
<vm:inbound-endpoint exchange-pattern="one-way"
path="in" doc:name="VM" />
<Choice>
.
.
Your JDBC Stuff
.
.
<Choice />
</flow>
Maintain your current file-connector configuration to enable streaming. With this solution the csv data can be processed without the need to load the entire file to memory first.
HTH

I believe that the csv-to-maps-transformer is going to force the whole file into memory. Since you are dealing with one large file, personally, I would tend to just write a Java class to handle it. The File endpoint will pass a filestream to your custom transformer. You can then make a JDBC connection and pick off the information a row at a time without having to load the whole file. I have used OpenCSV to parse the CSV for me. So your java class would contain something like the following:
protected Object doTransform(Object src, String enc) throws TransformerException {
try {
//Make a JDBC connection here
//Now read and parse the CSV
FileReader csvFileData = (FileReader) src;
BufferedReader br = new BufferedReader(csvFileData);
CSVReader reader = new CSVReader(br);
//Read the CSV file and add the row to the appropriate List(s)
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
//Push your data into the database through your JDBC connection
}
//Close connection.
}catch (Exception e){
}

Related

Caused by: org.apache.camel.InvalidPayloadException:

I have put together a file uploader with HTTP4 to push data to HTTP server environments. there are 2 systems, 1 for CSV data files and 1 for JSON delivery. sending a CSV to the CSV system works fine. however, converting the data to JSON and sending to the JSON system fails with the below exception. the error is intuitive, however, I don't really know what to do about it.
Does JSON not use multipart form data and should specifically be an IO Stream? being my first HTTP service I'm at a lose of protocol acceptance. apparently the JSON system does not expect Multipart data, I'm not sure what a Json http send should be?? thank you for your help!
Caused by: org.apache.camel.InvalidPayloadException: No body available of type: java.io.InputStream but has value: org.apache.http.entity.mime.MultipartFormEntity
Caused by: No type converter available to convert from type: org.apache.http.entity.mime.MultipartFormEntity to the required type: java.io.InputStream with value org.apache.http.entity.mime.MultipartFormEntity
here is the uploader class that works for CSV data files but not for Json?
LOG.info("Uploading File for CustKey: " + custKey + " and Tenant: " + tenant);
StringBuilder authHeader = new StringBuilder("Bearer ");
authHeader.append(token);
LOG.info("Authorization: " + authHeader.toString());
exchange.setProperty("CamelCharsetName", "UTF-8"); //"CamelCharsetName" Exchange.CHARSET_NAME
exchange.getIn().setHeader("CamelHttpCharacterEncoding", "UTF-8"); //"CamelHttpCharacterEncoding" Exchange.HTTP_CHARACTER_ENCODING
exchange.getIn().setHeader("CamelAcceptContentType", "application/json"); //"CamelAcceptContentType" Exchange.ACCEPT_CONTENT_TYPE
exchange.getIn().setHeader("CamelHttpUri", uploadUrl); //"CamelHttpUri" Exchange.HTTP_URI
exchange.getIn().setHeader("CamelHttpMethod", "POST"); //"CamelHttpMethod" Exchange.HTTP_METHOD
exchange.getIn().setHeader("x-ge-csvformat", "ODB");
exchange.getIn().setHeader("Tenant", tenant);
// exchange.getIn().setHeader("Content-Type", "multipart/form-data"); //"Content-Type" ; boundary=
exchange.getIn().setHeader("Authorization", authHeader.toString());
// Process the file in the exchange body
File file = exchange.getIn().getBody(File.class);
String fileName = (String) exchange.getIn().getHeader(Exchange.FILE_NAME);
LOG.info("fileName: " + fileName);
MultipartEntityBuilder entity = MultipartEntityBuilder.create();
entity.addBinaryBody("file", file);
entity.addTextBody("name", fileName);
exchange.getIn().setBody(entity.build()); //*** use for CSV uploads
HERE IS the route, the only difference in the route is the JsonMapper process
<route
id="core.predix.upload.route"
autoStartup="false" >
<from uri="{{uploadEntranceEndpoint}}" />
<process ref="customerEntitesProcessor" /> <!-- sets up the message with the customer environment entities to upload data -->
<process ref="customerTokenProcessor" /> <!-- sets up the message with the cusotmer's token -->
<process ref="jsonMapper" />
<to uri="{{jsonEndpoint}}" />
<process ref="uploadProcessor" /> <!-- conditions the message with the HTTP header info per customer env -->
<setHeader headerName="CamelHttpUri">
<simple>${header.UPLOADURL}?throwExceptionOnFailure=false</simple>
</setHeader>
<setHeader headerName="CamelHttpMethod">
<constant>POST</constant>
</setHeader>
<to uri="http4://apm-timeseries-query-svc-prod.app-api.aws-usw02-pr.predix.io:443/v2/time_series/upload?throwExceptionOnFailure=false" />
<log message="After POSTING JSON: ${body}" loggingLevel="INFO"/>
<to uri="{{afteruploadLocation}}" />
<!-- <log message="JSON Route: ${body}" loggingLevel="INFO"/> -->
<!-- <to uri="{{jsonEndpoint}}" /> -->
</route>
aparently sending json is as an IO Stream. Multipart form data Content-type is for file transfers only, and system is expecting Multipart form data as Content-Type.
to get this to work I am simply sending data as IO Stream and changed the Content-Type to application/json.
Code changed to, to send Json.
StringBuilder authHeader = new StringBuilder("Bearer ");
authHeader.append(token);
LOG.info("Authorization: " + authHeader.toString());
exchange.setProperty("CamelCharsetName", "UTF-8"); //"CamelCharsetName" Exchange.CHARSET_NAME
exchange.getIn().setHeader("CamelHttpCharacterEncoding", "UTF-8"); //"CamelHttpCharacterEncoding" Exchange.HTTP_CHARACTER_ENCODING
exchange.getIn().setHeader("CamelAcceptContentType", "application/json"); //"CamelAcceptContentType" Exchange.ACCEPT_CONTENT_TYPE
exchange.getIn().setHeader("CamelHttpUri", uploadUrl); //"CamelHttpUri" Exchange.HTTP_URI
exchange.getIn().setHeader("CamelHttpMethod", "POST"); //"CamelHttpMethod" Exchange.HTTP_METHOD
exchange.getIn().setHeader("x-ge-csvformat", "ODB");
exchange.getIn().setHeader("Tenant", tenant);
exchange.getIn().setHeader("Content-Type", "application/json");
exchange.getIn().setHeader("Authorization", authHeader.toString());
//*** use for CSV uploads - uncomment all commented lines for CSV file upload
// Process the file in the exchange body
// File file = exchange.getIn().getBody(File.class);
// String fileName = (String) exchange.getIn().getHeader(Exchange.FILE_NAME);
// LOG.info("fileName: " + fileName);
// MultipartEntityBuilder entity = MultipartEntityBuilder.create();
// entity.addBinaryBody("file", file);
// entity.addTextBody("name", fileName);
// exchange.getIn().setBody(entity.build());

API Managment unable to cast response body as string

In Azure API Management, when the response going back to the client is a 500, I wish to check the body of the response to see if it matches "Some text". I need to do this so that I may change the body of the response to contain some more helpful text in this particular scenario.
The following <outbound> section of my policy is accepted by the API Management console, but when I test and get a 500, API Management generates an error -
Expression evaluation failed. Unable to cast object of type 'Microsoft.WindowsAzure.ApiManagement.Proxy.Gateway.MessageBody' to type 'System.String'.
I'm guessing this is my fault, but does anybody know how I can amend the ploicy so that it does not generate an error? To clarify, the error is being generated by this line - ((string)(object)context.Response.Body == "Some text").
<outbound>
<choose>
<when condition="#((context.Response.StatusCode == 500) && ((string)(object)context.Response.Body == "Some text"))">
<set-status code="500" reason="Internal Server Error" />
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-body>
{
"statusCode": "500",
"Message": "Some different, more helpful text."
}
</set-body>
</when>
</choose>
</outbound>
Update
I've discovered that context.Response.Body is of type IMessageBody. There seems to be woefully little documentation around this type, and the only reference I can find comes under <set-body> in the Transformation Policies API management documentation.
The troube is, the example that MS havd documented produces an exception when I try and save my policy -
<set-body>
#{
JObject inBody = context.Request.Body.As<JObject>();
if (inBody.attribute == <tag>) {
inBody[0] = 'm';
}
return inBody.ToString();
}
</set-body>
Property or indexer 'string.this[int]' cannot be assigned to -- it is read only
Try context.Request.Body.As<string>(). Method As currently supports following types as generic argument value:
byte[]
string
JToken
JObject
JArray
XNode
XElement
XDocument
Mind that if you try to call .As<JObject> over response that does not contain valid JSON you would get an exception, same applies to other types as well.

Extract fields from JSON response in Mule

I have a JSON response which is like {"id":10,"name":"ABCD","deptId":0,"address":null}
I need to split this JSON and extract the id to pass on to another service.
My mule xml is as below
<jersey:resources doc:name="REST">
<component class="com.employee.service.EmployeeService"/>
</jersey:resources>
<object-to-string-transformer doc:name="Object to String"/>
<logger message="Employee Response #[payload]" level="INFO" doc:name="Logger"/>
<set-payload value="#[payload]" doc:name="Set Payload" />
<json:object-to-json-transformer doc:name="Convert String to JSON" />
<logger message="JSON Response #[payload]" level="INFO" doc:name="Logger"/>
<json:json-to-object-transformer returnClass="java.util.Map" />
<expression-transformer expression="#[payload]" />
<collection-splitter />
When I run this I get the error
Object "java.util.LinkedHashMap" not of correct type. It must be of type "{interface java.lang.Iterable,interface java.util.Iterator,interface org.mule.routing.MessageSequence,interface java.util.Collection}" (java.lang.IllegalArgumentException). Message payload is of type: LinkedHashMap
How can I fix this error?
Thanks
I was able to get this done by writing a custom converter
remove your last four lines of code. set logger #[payload.id] in flowvars and access it
I believe the error you are getting is already on this part, <collection-splitter />. Have you debugged this already?
Not sure what the splitter is for but you can simply do #[payload.id] to get id once you have a HashMap type of payload.
The JSon module as well has the ability to use jsonpath in expressions, such as:
#[json:/id]

Insert json object in mysql DB with Mule ESB

Good evening!
I’m trying to insert an entire json object into mysql table. I’m using json to object transformer to convert json into HashMap. Json is this:
{
"content": {
"fill": "none",
"stroke": "#fff",
"path": [
["M", 422, 115],
["L", 472, 167.5]
],
"stroke-width": 4,
"stroke-linecap": "round",
"stroke-linejoin": "round",
"transform": [],
"type": "path",
"note": {
"id": 47,
"page":0,
"ref": 3,
"txt": "teste do serviço",
"addedAt": 1418133743604,
"addedBy": "valter.gomes"
}
}
}
I need insert "content" object, but when I try access it by #[payload.content], threws an exception :
Root Exception stack trace:
java.sql.SQLException: Incorrect string value: '\xAC\xED\x00\x05sr...' for column 'content' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
+ 3 more (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
We found what I think is a workaround. Before convert into a HashMap I get "content" object into a variable #[json:content] and record it in DB #[flowVars.rawContent]. When I retrive it from DB I convert ResultSet into String using Object to String converter.
But, Im not confortable with this solution. Is this the right way to do so? Or does exist other one ? Maybe the right one.
Tks a lot for your help.
When you receive the json you can transform to Map class (By default json:json-to-object-transformer return JsonData). For that reason I have specified Map class. So, after that you can read content from payload using #[payload.content]
I attached my flow:
<flow name="demoFlow1" doc:name="demoFlow1">
<http:inbound-endpoint exchange-pattern="request-response"
host="localhost" port="8081" path="demo" doc:name="HTTP" />
<scripting:component doc:name="Groovy">
<scripting:script engine="Groovy"><![CDATA[
Map<String, Object> map1 = new HashMap<String, Object>();
map1.put("fill","none");
map1.put("stroke","#fff");
Map<String, Object> map = new HashMap<String, Object>();
map.put("content", map1);
return map;]]></scripting:script>
</scripting:component>
<json:object-to-json-transformer doc:name="Object to JSON"/>
<logger level="INFO" message=">>1 #[payload]" doc:name="Logger" />
<json:json-to-object-transformer returnClass="java.util.Map" doc:name="JSON to Object"/>
<set-payload value="#[payload.content]" doc:name="Set Payload"/>
<json:object-to-json-transformer doc:name="Object to JSON"/>
<logger level="INFO" message=">>3 #[payload]" doc:name="Logger" />
</flow>
Eddú is right, but the example he gives is really too complex.
As he said, all you need is:
<json:json-to-object-transformer returnClass="java.util.Map" />
After that transformer, you can retrieve any field/sub-field in the Map. I suggest using message.payload instead of payload by the way, the latter has shown some odd behaviours in the past.
So use: #[message.payload.content]
Also, this will give you an object of type java.util.Map. Not sure how you're going to insert the object in the DB but since you are not showing this part in your question, I imagine you'll figure it out...

MULE ENRICH with incoming pojo

i want add some extra information on incomming Pojo, i have used message enricher in mule and do that, here is my full flow. I am using subflow to get the payload and select some values in DB and set that value in same pojo and returning, while in target i am setting as payload but i am getting error like this "An Expression Enricher for "payload" is not registered with Mule."
here is mu flow
<enricher doc:name="Message Enricher">
<core:flow-ref name="flows1Flow1" doc:name="Flow Reference"/>
<enrich source="#[groovy:payload]" target="#[payload]"/>
<logger message="AFTER Enrich: #[payload]" level="INFO" doc:name="Logger"/>
<component class="com.enrich.AfterEnricher" doc:name="Java"/>
<sub-flow name="flows1Flow1" doc:name="flows1Flow1">
<component class="com.enrich.MessageEnrichPattern" doc:name="Java"/>
<jdbc:outbound-endpoint exchange-pattern="request-response" queryKey="selectData" connector-ref="jdbcConnector" doc:name="Database (JDBC)">
<jdbc:query key="selectData" value="SELECT Username, Password, ModuleId from Credentials where ModuleId=#[map-payload:moduleId]"/>
</jdbc:outbound-endpoint>
<logger message="#[payload]" level="INFO" doc:name="Logger"/>
<component class="com.enrich.ReceiveMessageEnrichPattern" doc:name="Java"/>
</sub-flow>
here ReceiveMessageEnrichPattern returing
Credential credential = new Credential();
credential.setUname(hashMap.get("USERNAME").toString());
credential.setPwd(hashMap.get("PPPP").toString());
credential.setMid(hashMap.get("MODULEID").toString());
return credential;
but in after enrich component i am getting exception. Please help me how can enrich my incoming pojo with extra info can add.
According to the docs, Mule currently only supports two targets for enrichment:
flow variables,
message headers.
To achieve your goal you need to:
store the enricher result (Credential object) in a flow variable,
use a custom transformer to copy the values from the Credential object found in the flow variable to the POJO payload in your main flow.