ArangoDB: How to export collection to CSV? - csv

I have noticed there is a feature in web interface of ArangoDB which allows users to Download or Upload data as JSON file. However, I find nothing similar for CSV exporting. How can an existing Arango DB collection be exported to a .csv file?

If you want to export data from ArangoDB to CSV, then you should use Arangoexport. It is included in the full packages as well as the client-only packages. You find it next to the arangod server executable.
Basic usage:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-csv
Also see the CSV example with AQL query:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-via-aql-query
Using an AQL query for a CSV export allows you to transform the data if desired, e.g. to concatenate an array to a string or unpack nested objects. If you don't do that, then the JSON serialization of arrays/objects will be exported (which may or may not be what you want).

The default Arango install includes the following file:
/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js
It includes this comment:
// This is a generic CSV exporter for collections.
//
// Usage: Run with arangosh like this:
// arangosh --javascript.execute <CollName> [ <Field1> <Field2> ... ]
Unfortunately, at least in my experience, that usage tip is incorrect. Arango team, if you are reading this, please correct the file or correct my understanding.
Here's how I got it to work:
arangosh --javascript.execute "/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js" "<CollectionName>"
Please specify a password:
Then it sends the CSV data to stdout. (If you with to send it to a file, you have to deal with the password prompt in some way.)

Related

Best data processing software to parse CSV file and make API call per row

I'm looking for ideas for an Open Source ETL or Data Processing software that can monitor a folder for CSV files, then open and parse the CSV.
For each CSV row the software will transform the CSV into a JSON format and make an API call to start a Camunda BPM process, passing the cell data as variables into the process.
Looking for ideas,
Thanks
You can use a Java WatchService or Spring FileSystemWatcher as discussed here with examples:
How to monitor folder/directory in spring?
referencing also:
https://www.baeldung.com/java-nio2-watchservice
Once you have picked up the CSV you can use my example here as inspiration or extend it: https://github.com/rob2universe/csv-process-starter specifically
https://github.com/rob2universe/csv-process-starter/blob/main/src/main/java/com/camunda/example/service/CsvConverter.java#L48
The example starts a configurable process for every row in the CSV and includes the content of the row as a JSON process data.
I wanted to limit the dependencies of this example. The CSV parsing logic applied is very simple. Commas in the file may break the example, special characters may not be handled correctly. A more robust implementation could replace the simple Java String .split(",") with an existing CSV parser library such as Open CSV
The file watcher would actually be a nice extension to the example. I may add it when I get around to it, but would also accept a pull request in case you fork my project.

How can I use any json or array or any js file in .testcaferc.json?

I have created one file .testcaferc.json that contains all configuration information like browser name, specs, timeouts etc. I want to fetch the configuration data from file so that I have to change the information at only one place
I want to store all these information in single file, I tried, js, json and array. But I can not import all above format files in my .testcaferc.json, when I press Alt+F8 I see the error "Expected a JSON object, array or literal"
Is there any way I can import json, array or js data in .testcaferc.json?
Thanks in advance!!
The JSON format doesn't support any import directives. The TestCafe configuration file (.testcaferc.json) is a simple JSON file. So, the TestCafe configuration file doesn't support such functionality.
To achieve your goal, you can transform the existing .testcaferc.json file before test running: load data from various sources and add/replace values for the appropriate data fields.
Also, there is a suggestion in the TestCafe GitHub repository, which will make your scenario easier to implement. Track it to be notified about its progress.

ADF Merge-Copying JSON files in Copy Data Activity creates error for Mapping Data Flow

I am trying to do some optimization in ADF. Setup is a third-party tool copies one JSON file per object to a BLOB storage container. These feed to a Mapping Data Flow. The individual files written by the third party tool work great. If I copy these files to a different BLOB folder using an Azure Copy Data activity, the MDF can no longer parse the files and gives an error: "JSON parsing error, unsupported encoding or multiline." I started this with a Merge Files, but outcome is same regardless of copy behavior I choose.
2ND EDIT: After another day's work, I have found that the Copy Activity Merge File from JSON to JSON definitely adds an EOL character to each single JSON object as it gets imported to the Merge file. I have also found that the MDF fails definitely with those EOL characters in the Merge file. If I remove all EOL characters from the Merge file, the same MDF will work. For me, this is a bug. The copy activity is adding a character that breaks the MDF. There seems to be a second issue in some of my data that doesn't fail as an individual file but does when concatenated that breaks the MDF when I try to pull all the files together, but I have tested the basic behavior on 1-5000 files and been able to repeat the fail/success tests.
I took the original file, and the copied file, ran them through all of sorts of test, what I eventually found when I dump into Notepad++:
Copied file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\r\n
Original file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\n
If I change the copied file from ending with \r\n to \n, the MDF can read the file again. What is going on here? And how do I change the file write behavior or the MDF settings so that I can concatenate or copy files without the CRLF?
EDIT: NEW INFORMATION -- It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Other possibly relevant information:
Both the source and sink JSON datasets are set to use UTF-8 (not default(UTF-8), although I tried that). Would a different encoding fix this?
I have tried remapping schemas, creating new data sets, creating new Mapping Data Flows, still get the same error.
EDITED for clarity based on comments:
In the case of a single JSON element in a file, I can get this to work -- data preview returns same success or failure as pipeline when run
In the case of multiple documents merged by ADF I get the below instead. It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Repro: Create any valid JSON as a single file, put it in blob storage, use it as a source in a mapping data flow, to do any sink operation. Create a second file with same schema, get them both to run in same flow using wildcard paths. Use a Copy Activity with Merge Files as the Sink Copy Activity and Array of Objects as the File pattern. Try to make your MDF use this new file. If it fails, download the file created by ADF, run it through a formatter (I have used both VS Code -> "Format Document" from standard VS Code JSON extension, and VS 2019 "Unminify" command) and reupload... It should work now.
don't know if you already solved the problem: I came across the exact same problem 3 days ago and after several tries I found a solution:
in the copy data activity under sink settings, use "set of objects" (instead of "array of objects") under File Pattern, so that the merged big JSON has the value of the original small JSON files written per line
in the MDF after setting up the wildcard paths with the *.json pattern, under JSON Settings select: Document per line as the Document form.
After that you should be good to go, as least it solved my problem. The automatic written CRLF in "array of objects" setting in the copy data activity should be a default setting and MSFT should provide the option to omit it in the settings in the future.
According to my test:
1.copy data activity can't change unix(LF) to windows(CRLF).
2.MDF can also parse unix(LF) file and windows(CRLF) file.
Maybe there is something else wrong.
By the way,I see there is a comma after "name":"Customer Name" in your Original file,I delete it before my test.

Loading JSON data with CSV Data Set Config

I'm new to Jmeter so I hope this question is not too off the wall. I am trying to test an HTTP endpoint that accepts a large JSON payload and processes it. I have collected a few hundred JSON blobs in a file and want to use those as my input for testing. The only way that I have come across for loading the data is using the CSV config. I have a single line of the file for each request. I have attempted to use \n as a delimiter and have also tried adding a tab character \t to the end of each line. My requests all show in put of<EOF>.
Is there a way to read a file of JSON objects, line at a time, and pass them to my endpoint as the body in a POST?
You need to provide more information, to wit: example JSON file (first 2 lines), your CSV Data Set Configuration, jmeter.log file, etc. so we could help.
For the time being I can state that:
Given CSV file looking like:
{"foo":"bar"}
{"baz":"qux"}
And pretty much default CSV Data Set Config setup
JMeter normally reads the CSV data
Be aware that there are alternatives to the CSV Data Set Config, for example:
__CSVRead() function. The equivalent syntax would be ${__CSVRead(test.csv,0)}
__StringFromFile() function. The equivalent syntax would be ${__StringFromFile(test.csv,,,)}
See Apache JMeter Functions - An Introduction to get familiarized with the JMeter Functions concept.

Export from Couchbase to CSV file

I have a Couchbase Cluster with only one node (let's call it localhost) and I need to export all the data from a very big bucket (let's call it XXX) into a CSV file.
Now this seems to be a pretty easy task but I can't find the way to make it work.
According to the (really bad) documentation on the cbtransfer toold from Couchbase http://docs.couchbase.com/admin/admin/CLI/cbtransfer_tool.html they say this is possible but they don't explain it clearly. They just add a flag if you want the transfer to occur in csv format (?) but it is not working. Maybe someone who already did this can give me a hand?
Using the documentation I've been able to make an approach to the result I want to obtain (a clean CSV file with all the documents in the XXX bucket) using this command:
/opt/couchbase/bin/cbtransfer http://localhost:8091 /path/to/export/output.csv -b XXX
But what I get is that /path/to/export/output.csv is actually a folder with a lot of folders inside and it is storing some kind of json metadata that can be used to restore the XXX bucket in another instance of Couchbase.
Has anyone been able to export data from a Couchbase bucket (Json documents) into a CSV file?
From looking at the documentation, you have to put a slightly different syntax to export to a CSV. http://docs.couchbase.com/admin/admin/CLI/cbtransfer_tool.html
It needs to look like so:
cbtransfer http://[localhost]:8091 csv:./data.csv -b default -u Administrator -p password
Notice the "csv:" before the name of the csv file.
I tested this and it does export a CSV. Just be forwarned that you need a relatively flat document structure for this to work really well, as JSON can represent far more complex data structures than CSV obviously, e.g. arrays, sub-documents, etc. cbtransfer will not unravel those. For example, if there is a subdocument, cbtransfer will represent it as a JSON doc in the line of each CSV.
So depending on what your document structure is, exporting to CSV is not an ideal format. It is a step backwards.