Append to CSV file with header in Apache Camel - csv

I need to build a csv file based on incoming messages. I do this by appending to the file with:
.toD("file://" + OUTPUT_PATH + "?FileName=${exchangeProperty.OUTPUT_FILENAME}" + "&FileExist=Append")
While this works fine I've run into one problem. I also need to include a header row in the CSV file. Right now I'm marshalling into CSV format with .mashall().csv() but that omits the header.
While I can create a CSV format with header with:
CsvDataFormat csvFormatWithHeader = new CsvDataFormat();
csvFormatWithHeader.setHeader(Arrays.asList(new String[] { "A", "B", "C", "D" }.clone()));
.marshall(csvFormatWithHeader)
That will add the header row for each row I add.
So what I want to achieve is to add the header only when a new file is created.
I've been trying two approaches but haven't gotten any to work:
Check if the file exists in the route and apply the csv format accordingly
Set the marshall dataformat with a bean or method
As a final option I could add the header when the file is closed but that feels inefficient as I don't know how big that file might become.
How can I achieve either of these approaches with Apache Camel 2.23.2.

Ok, ultimately using a header I was able to get option 1 to work. This is what my route looks like now:
.setProperty("OUTPUT_FILENAME",method(this, "determineOutputFilename()"))
.setHeader("fileExists", method(this, "outputFileExists"))
.choice()
.when(header("fileExists"))
.marshal().csv()
.endChoice()
.otherwise()
.marshal(csvFormatWithHeader)
.endChoice()
.end()
The file check logic is (implemented within the route class):
public boolean outputFileExists(#ExchangeProperty("OUTPUT_FILENAME") String fileName){
boolean fileExists = new File(PROCESSING_PATH + "/" + fileName).exists();
return fileExists;
}

Related

How to do simulation of the json file ( Body data) in the jmeter using Http- POST method

My test objective:
Using POST method I have to send the json data but there is an ex.
A, b, c property in that Json file that need to be updated, so It should consider the different request at application end.
To prepare lot .Json file and provide input to j meter is not a feasible solution. How to simulate the data in the.Json file .
I have tried .config csv method but my packet is not formed by appending csv file values . Kindly help me. How to use the variable method to achieve this
Note : This is not the full .json file . Just part of the body . "Properties": { "AccessToken": "111111111-11111-11111-1111111111", "InstallationId": "E1", "AgentType": "xxx", "AgentId": "Vxxx", "SentDateTime": "2018-07-19-13-50-24-5916045", "SourceDateTimeOfEvent": "2018-07-19T13:50:24.5916045+05:30Z", "DateTimeOfEvent": "2018-07-19T08:06:24.5786045Z" "MachineName": "AS-72" } This three parameter need to be updated with every new POST request SourceDateTimeOfEvent,DateTimeOfEvent,SentDateTime and rest of whole body should remains

Postman - use CSV as input

I am using Postman to test a microservice and I was wondering if you can do something like this.
have a collection with 2 GET request (request1, request2) that have
as one of the headers - userId
have a CSV file with two values for userId: test1, test2
run the collection using the CSV file like this: request1 uses the userId= test1 and request2 uses the parameter userId=test2
I know you can run the collection so that it iterates for each value in the CSV file through each request, but I would like to to map each request to a value in the CSV file. is this possible? If yes, how can you do that?
CSV files are only accepted as a iteration data files, so
is this possible to map each request to a value in the CSV file.
answer is: No, you can't.
You should give us more detailed question, but I feel that this link
will be helpful.
Also you can use JSON as datafile instead of CSV and make such construction:
Firstly, set environment variable "count" to 0.
JSON:
[
{
"UserIdHeadersValue":
[
"firstvalue",
"secondValue",
"thirdValue"
]
}
]
And in Scripts:
pre-request:
var count = parseInt(pm.variables.get("count"));
pm.variables.set("headerValue", data.UserIdHeaderValue[count]);
//you can put now that header value using {{headerValue}};
pm.environment.set("count", count+1);

Create a CSV file from marklogic using Java Client Api(DMSDK)

I want to create a csv file for 1.3M records from my marklogic db . I tried using CORB for that but it had taken more time than i expected.
My data is like this
{
"One": {
"Name": "One",
"Country": "US"
},
"Two": {
"State": "kentucky"
},
"Three": {
"Element1": "value1",
"Element2": "value2",
"Element3": "value3",
"Element4": "value4",
so on ...
}
}
Below are the my Corb modules
Selector.xqy
var total = cts.uris("", null, cts.collectionQuery("data"));
fn.insertBefore(total,0,fn.count(total))
Transform.xqy(Where i am keeping all the elements in an array )
var name = fn.tokenize(URI, ";");
const node = cts.doc(name);
var a= node.xpath("/One/*");
var b= node.xpath("/Two/*");
var c= node.xpath("/Three/*");
fn.stringJoin([a, b, c,name], " , ")
my properties file
THREAD-COUNT=16
BATCH-SIZE=1000
URIS-MODULE=selector.sjs|ADHOC
PROCESS-MODULE=transform.sjs|ADHOC
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-NAME=Report.csv
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask
EXPORT-FILE-TOP-CONTENT=Col1,col2,....col16 -- i have 16 columns
It had taken more than 1 hour for creating a csv file . And also for trying in cluster i need to configure a load balancer first. Whereas Java Client api will distribute the work among all nodes without any load balancer.
How can i implement the same in Java Client APi , i know i can trigger transform module using ServerTransform and ApplyTransformListener .
public static void main(String[] args) {
// TODO Auto-generated method stub
DatabaseClient client = DatabaseClientFactory.newClient
("localhost", pwd, "x", "x", DatabaseClientFactory.Authentication.DIGEST);
ServerTransform txform = new ServerTransform("tsm"); -- Here i am implementing same logic of above `tranform module` .
QueryManager qm = client.newQueryManager();
StructuredQueryBuilder query = qm.newStructuredQueryBuilder();
query.collection();
DataMovementManager dmm = client.newDataMovementManager();
QueryBatcher batcher = dmm.newQueryBatcher(query.collections("data"));
batcher.withBatchSize(2000)
.withThreadCount(16)
.withConsistentSnapshot()
.onUrisReady(
new ApplyTransformListener().withTransform(txform))
.onBatchSuccess(batch-> {
System.out.println(
batch.getTimestamp().getTime() +
" documents written: " +
batch.getJobWritesSoFar());
})
.onBatchFailure((batch,throwable) -> {
throwable.printStackTrace();
});
// start the job and feed input to the batcher
dmm.startJob(batcher);
batcher.awaitCompletion();
dmm.stopJob(batcher);
client.release();
}
But how can i send the csv file header like that one in CORB(i.e. EXPORT-FILE-TOP-CONTENT) . Is there any documentation for implementing CSV file ? Which class will implement that ?
Any help is appreciated
Thanks
Probably the easiest option is ml-gradle Exporting data to CSV which uses Java Client API and DMSDK under the hood.
Note that you'll probably want to install a server-side REST transform to extract only the data you want in the CSV output, rather than download the entire doc contents then extract on the Java side.
For a working example of the code required to use DMSDK and create an aggregate CSV (one CSV for all records), see ExporToWriterListenerTest.testMassExportToWriter. For the sake of SO, here's the key code snippet (with a couple a minor simplification changes, including writing column headers (untested code)):
try (FileWriter writer = new FileWriter(outputFile)) {
writer.write("uri,collection,contents");
writer.flush();
ExportToWriterListener exportListener = new ExportToWriterListener(writer)
.withRecordSuffix("\n")
.withMetadataCategory(DocumentManager.Metadata.COLLECTIONS)
.onGenerateOutput(
record -> {
String uri = record.getUri();
String collection = record.getMetadata(new DocumentMetadataHandle()).getCollections().iterator().next();
String contents = record.getContentAs(String.class);
return uri + "," + collection + "," + contents;
}
);
QueryBatcher queryJob =
moveMgr.newQueryBatcher(query)
.withThreadCount(5)
.withBatchSize(10)
.onUrisReady(exportListener)
.onQueryFailure( throwable -> throwable.printStackTrace() );
moveMgr.startJob( queryJob );
queryJob.awaitCompletion();
moveMgr.stopJob( queryJob );
}
However, unless you know your content has no double quotes, newlines, or non-ascii characters, a CSV library is recommended to make sure your output is properly escaped. To use a CSV library, you can of course use any tutorial out there for your library. You don't need to worry about thread safety because ExportToWriterListener runs your listeners in a synchronized block to prevent overlapping writes to the writer. Here's an example of using one CSV library, Jackson CsvMapper.
Please note that you don't have to use ExportToWriterListener . . . you can use it as a starting point to write your own listener. In particular, since your major concern is performance, you may want to have your listeners write to one file per thread, then post-process to combine things together. It's up to you.

Unable to print complete JSON using console.log

I am trying to import JSON file into Sample variable but only first few characters are displayed from Sample variable.
The sample.json is 20,00,000 characters, When i print Sample variable on Console only first 3,756 characters are printed.Is there any limitations on the characters that can be printed through console.log?
Complete data persists in Sample variable, I verified it by searching for strings that occur at the end of sample.json file
var Sample = require('./sample.json');
export default class proj extends Component {
constructor(props) {
super(props);
this.state = {
locations: [],
};
}
loadOnEvent() {
console.log(Sample);
//this.state={ locations : Sample };
}
}
Is there any other way to print data in Sample variable.
You have to convert json to string using JSON.stringify before logging.
/* ... */
loadOnEvent() {
console.log(JSON.stringify(Sample));
//this.state={ locations : Sample };
}
/* ... */
Try to use another way to load. Use fetch if file is remote or use fs if file is local.
If it is memory problem supposed by #Shota consider to use server side processing requests to json file. It is good solution to setup microservice which load json file at startup and handle requests to data struct parsed from json file.
Answer for webpack use case:
Configure webpack to use file-loader or copy-webpack-plugin for specifically this file because it enough big. Consider to load it in parallel with webpack bundle. If your application have big parts which need not each case they must be moved to separated bundles.

Store and update JSON Data on a Server

My web-application should be able to store and update (also load) JSON data on a Server.
However, the data may contain some big arrays where every time they are saved only a new entry was appended.
My solution:
send updates to the server with a key-path within the json data.
Currently I'm sending the data with an xmlhttprequest by jquery, like this
/**
* Asynchronously writes a file on the server (via PHP-script).
* #param {String} file complete filename (path/to/file.ext)
* #param content content that should be written. may be a js object.
* #param {Array} updatePath (optional), json only. not the entire file is written,
* but the given path within the object is updated. by default the path is supposed to contain an array and the
* content is appended to it.
* #param {String} key (optional) in combination with updatePath. if a key is provided, then the content is written
* to a field named as this parameters content at the data located at the updatePath from the old content.
*
* #returns {Promise}
*/
io.write = function (file, content, updatePath, key) {
if (utils.isObject(content)) content = JSON.stringify(content, null, "\t");
file = io.parsePath(file);
var data = {f: file, t: content};
if (typeof updatePath !== "undefined") {
if (Array.isArray(updatePath)) updatePath = updatePath.join('.');
data.a = updatePath;
if (typeof key !== "undefined") data.k = key;
}
return new Promise(function (resolve, reject) {
$.ajax({
type: 'POST',
url: io.url.write,
data: data,
success: function (data) {
data = data.split("\n");
if (data[0] == "ok") resolve(data[1]);
else reject(new Error((data[0] == "error" ? "PHP error:\n" : "") + data.slice(1).join("\n")));
},
cache: false,
error: function (j, t, e) {
reject(e);
//throw new Error("Error writing file '" + file + "'\n" + JSON.stringify(j) + " " + e);
}
});
});
};
On the Server, a php script manages the rest like this:
recieves the data and checks if its valid
check if the given file path is writable
if the file exists and is .json
read it and decode the json
return an error on invalid json
if there is no update path given
just write the data
if there is an update path given
return an error if the update path in the JSON data can't be traversed (or file didn't exist)
update the data at update-path
write the pretty-printed json to file
However I'm not perfectly happy and problems kept coming for the last weeks.
My Questions
Generally: How would you approach this problem? alternative suggestions, databases? any libraries that could help?
Note: I would prefer solutions, that just use php or some standart apache stuff.
One problem was, that sometimes, multiple writes on the same file were triggered. To avoid this I used the Promises (wrapped it because I read jquerys deferred stuff isnt Promise/A compliant) client side, but I dont feel 100% sure it is working. Is there a (file) lock in php that works across multiple requests?
Every now and then the JSON files break and its not clear to me how to reproduce the problem. At the time it breaks, I don't have a history of what happened. Any general debugging strategies with a client/server saving/loading process like this?
I wrote a comet enable web server that does diffs on updates of json data structures. For the exactly same reason. The server keeps a few version of a json document and serves client with different version of the json document with the update they need to get to the most reason version of the json data.
Maybe you could reuse some of my code, written in C++ and CoffeeScript: https://github.com/TorstenRobitzki/Sioux
If you have concurrent write accesses to your data structure, are your sure, that who ever writes to the file has the right version of the file in mind when reading the file?