Export from Couchbase to CSV file - csv

I have a Couchbase Cluster with only one node (let's call it localhost) and I need to export all the data from a very big bucket (let's call it XXX) into a CSV file.
Now this seems to be a pretty easy task but I can't find the way to make it work.
According to the (really bad) documentation on the cbtransfer toold from Couchbase http://docs.couchbase.com/admin/admin/CLI/cbtransfer_tool.html they say this is possible but they don't explain it clearly. They just add a flag if you want the transfer to occur in csv format (?) but it is not working. Maybe someone who already did this can give me a hand?
Using the documentation I've been able to make an approach to the result I want to obtain (a clean CSV file with all the documents in the XXX bucket) using this command:
/opt/couchbase/bin/cbtransfer http://localhost:8091 /path/to/export/output.csv -b XXX
But what I get is that /path/to/export/output.csv is actually a folder with a lot of folders inside and it is storing some kind of json metadata that can be used to restore the XXX bucket in another instance of Couchbase.
Has anyone been able to export data from a Couchbase bucket (Json documents) into a CSV file?

From looking at the documentation, you have to put a slightly different syntax to export to a CSV. http://docs.couchbase.com/admin/admin/CLI/cbtransfer_tool.html
It needs to look like so:
cbtransfer http://[localhost]:8091 csv:./data.csv -b default -u Administrator -p password
Notice the "csv:" before the name of the csv file.
I tested this and it does export a CSV. Just be forwarned that you need a relatively flat document structure for this to work really well, as JSON can represent far more complex data structures than CSV obviously, e.g. arrays, sub-documents, etc. cbtransfer will not unravel those. For example, if there is a subdocument, cbtransfer will represent it as a JSON doc in the line of each CSV.
So depending on what your document structure is, exporting to CSV is not an ideal format. It is a step backwards.

Related

Read a flat file in Pentaho Spoon and then export it's metadata into a CSV

I am wondering if it is possible to extract the metadata of a flat file in a CSV using Pentaho Spoon. What I mean by that is for example get a CSV file input step, choose the file you want to read and then somehow get access to the metadata of that file and export it into a CSV.
I found on the documentation a step called Metadata Structured that was introduced in 3.1.0 but I can't find it in the latest version of Spoon, maybe it got removed by now.
Update: I found the "Metadata structure of stream" that almost does what I need to be done. Right now my transformation looks like this: csv file input -> metadata structure of stream -> text file ouput. The problem is that it doesnt extract all the metadata. It doesn't extract Format, Decimal and Group. It also gets me an Origin column that I don't really need and I have to get rid of it.
Update2: I keep trying to get to those columns that are missing but the problem is that the Metadata structure of stream step only outputs these columns "Position,Fieldname,Comments,Type,Length,Precision,Origin" so I cannot really access the format column for example that is an input for the step :( I can't really find a work-around for this

ArangoDB: How to export collection to CSV?

I have noticed there is a feature in web interface of ArangoDB which allows users to Download or Upload data as JSON file. However, I find nothing similar for CSV exporting. How can an existing Arango DB collection be exported to a .csv file?
If you want to export data from ArangoDB to CSV, then you should use Arangoexport. It is included in the full packages as well as the client-only packages. You find it next to the arangod server executable.
Basic usage:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-csv
Also see the CSV example with AQL query:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-via-aql-query
Using an AQL query for a CSV export allows you to transform the data if desired, e.g. to concatenate an array to a string or unpack nested objects. If you don't do that, then the JSON serialization of arrays/objects will be exported (which may or may not be what you want).
The default Arango install includes the following file:
/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js
It includes this comment:
// This is a generic CSV exporter for collections.
//
// Usage: Run with arangosh like this:
// arangosh --javascript.execute <CollName> [ <Field1> <Field2> ... ]
Unfortunately, at least in my experience, that usage tip is incorrect. Arango team, if you are reading this, please correct the file or correct my understanding.
Here's how I got it to work:
arangosh --javascript.execute "/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js" "<CollectionName>"
Please specify a password:
Then it sends the CSV data to stdout. (If you with to send it to a file, you have to deal with the password prompt in some way.)

AWS Glue Crawler Classifies json file as UNKNOWN

I'm working on an ETL job that will ingest JSON files into a RDS staging table. The crawler I've configured classifies JSON files without issue as long as they are under 1MB in size. If I minify a file (instead of pretty print) it will classify the file without issue if the result is under 1MB.
I'm having trouble coming up with a workaround. I tried converting the JSON to BSON or GZIPing the JSON file but it is still classified as UNKNOWN.
Has anyone else run into this issue? Is there a better way to do this?
I have two json files which are 42mb and 16mb, partitioned on S3 as path:
s3://bucket/stg/year/month/_0.json
s3://bucket/stg/year/month/_1.json
I had the same problem as you, crawler classification as UNKNOWN.
I were able to solved it:
You must create custom classifier with jsonPath as "$[*]" then create new crawler with the classifier.
Run your new crawler with the data on S3 and proper schema will be created.
DO NOT update your current crawler with the classifier as it won't apply the change, I don't know why, maybe because of classifier versioning AWS mentioned in their documents. Create new crawler make them work
As mentioned in
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json
When you run a crawler using the built-in JSON classifier, the entire file is used to define the schema. Because you don’t specify a JSON path, the crawler treats the data as one object, that is, just an array.
That is something which Dung also pointed out in his answer.
Please also note that file encoding can lead to JSON being classified as UNKNOWN. Please try and re-encode the file as UTF-8.

Tool for export to JSON from ArangoDB

To create a native backup and restore it, one has to use arangodump and arangorestore.
To import from JSON (and CSV, TSV), one has to use arangoimp.
What can I use to export to JSON from ArangoDB?
One possibility is to use the arangodump tool that is shipped with ArangoDB.
It can be used to dump an entire database or individual collections. It stores dumped data in JSON format on disk.
Maybe arangodump's output already is in a format that you can work with.

Migrating from Lighthouse to Jira - Problems Importing Data

I am trying to find the best way to import all of our Lighthouse data (which I exported as JSON) into JIRA, which wants a CSV file.
I have a main folder containing many subdirectories, JSON files and attachments. The total size is around 50MB. JIRA allows importing CSV data so I was thinking of trying to convert the JSON data to CSV, but all convertors I have seen online will only do a file, rather than parsing recursively through an entire folder structure, nicely creating the CSV equivalent which can then be imported into JIRA.
Does anybody have any experience of doing this, or any recommendations?
Thanks, Jon
The JIRA CSV importer assumes a denormalized view of each issue, with all the fields available in one line per issue. I think the quickest way would be to write a small Python script to read the JSON and emit the minimum CSV. That should get you issues and comments. Keep track of which Lighthouse ID corresponds to each new issue key. Then write another script to add things like attachments using the JIRA SOAP API. For JIRA 5.0 the REST API is a better choice.
We just went through a Lighthouse to JIRA migration and ran into this. The best thing to do is in your script, start at the top-level export directory and loop through each ticket.json file. You can then build a master CSV or JSON file to import into JIRA that contains all tickets.
In Ruby (which is what we used), it would look something like this:
Dir.glob("path/to/lighthouse_export/tickets/*/ticket.json") do |ticket|
JSON.parse(File.open(ticket).read).each do |data|
# access ticket data and add it to a CSV
end
end