CSVDE export file-column order wrong? - csv

I'm using CSVDE to export data from our active directory into a CSV file, which then gets imported into a database. I'm using the -l switch to specify the columns that I'd like to export, but they don't come out in the same order consistently. Is there a workaround for this that doesn't involve opening the file in Excel? This is a nightly batch process and we'd like it to run unattended.
Thanks!

If you simply want a command-line utility that can re-order the CSV (and do much else as well), take a look at my FOSS CSV stream editor, CSVfix.

Per the docs:
LDAP can return attributes in any
order, and csvde does not attempt to
impose any order on the columns.
How about writing a python script to read reorder the csv file? You may find the python csv module useful for this.

Related

ArangoDB: How to export collection to CSV?

I have noticed there is a feature in web interface of ArangoDB which allows users to Download or Upload data as JSON file. However, I find nothing similar for CSV exporting. How can an existing Arango DB collection be exported to a .csv file?
If you want to export data from ArangoDB to CSV, then you should use Arangoexport. It is included in the full packages as well as the client-only packages. You find it next to the arangod server executable.
Basic usage:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-csv
Also see the CSV example with AQL query:
https://docs.arangodb.com/3.4/Manual/Programs/Arangoexport/Examples.html#export-via-aql-query
Using an AQL query for a CSV export allows you to transform the data if desired, e.g. to concatenate an array to a string or unpack nested objects. If you don't do that, then the JSON serialization of arrays/objects will be exported (which may or may not be what you want).
The default Arango install includes the following file:
/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js
It includes this comment:
// This is a generic CSV exporter for collections.
//
// Usage: Run with arangosh like this:
// arangosh --javascript.execute <CollName> [ <Field1> <Field2> ... ]
Unfortunately, at least in my experience, that usage tip is incorrect. Arango team, if you are reading this, please correct the file or correct my understanding.
Here's how I got it to work:
arangosh --javascript.execute "/usr/share/arangodb3/js/contrib/CSV_export/CSVexport.js" "<CollectionName>"
Please specify a password:
Then it sends the CSV data to stdout. (If you with to send it to a file, you have to deal with the password prompt in some way.)

Puppet - CSV file header

I'm, writing a Puppet (3.6.2) module that reads data fields from a CSV file via the extlookup function and I cannot figure out how to tell extlookup that the first line is the header field. Does extlookup support this? If not, can anyone recommend an external function I could import and use?
thanks,
PS - Yes I know about hiera, and having the data in YAML or JSON files but my requirement is CSV files only.
Brandon
The behavior of extlookup() is pretty well documented. It makes no special provision for column headers, which are by no means an inherent feature of CSV format. Indeed, if your header line is not readable as a data line, then your file is not CSV at all.
Supposing that your file is indeed valid CSV, the absolute simplest solution would be to ignore the issue. It presents a problem only if the first column heading duplicates an actual or potential data name. If it does not, then you will never look up or use the psuedo-value represented by the first row.
If your file in fact is not CSV on account of its first line, or if the first column name conflicts with a real data name, then it seems the next best alternative would be to just remove that line, or to avoid creating it in the first place. I don't see any reason why one of these should not be possible.
I know about heira, and having the data in YAML or JSON files but my requirement is CSV files only.
How sad. Do be aware that extlookup() has long been deprecated, and it was removed from Puppet 4.
I'm inclined to suggest you implement a translator from CSV to Hiera-friendly YAML, and use Hiera in your module. Alternatively, Hiera supports custom backends, and it's not too hard to write one. I am unaware of an existing CSV backend for Hiera, but you could write one. Ignoring a header line would then be under your control, and you would simultaneously achieve a measure of future-proofing.

Export from Couchbase to CSV file

I have a Couchbase Cluster with only one node (let's call it localhost) and I need to export all the data from a very big bucket (let's call it XXX) into a CSV file.
Now this seems to be a pretty easy task but I can't find the way to make it work.
According to the (really bad) documentation on the cbtransfer toold from Couchbase http://docs.couchbase.com/admin/admin/CLI/cbtransfer_tool.html they say this is possible but they don't explain it clearly. They just add a flag if you want the transfer to occur in csv format (?) but it is not working. Maybe someone who already did this can give me a hand?
Using the documentation I've been able to make an approach to the result I want to obtain (a clean CSV file with all the documents in the XXX bucket) using this command:
/opt/couchbase/bin/cbtransfer http://localhost:8091 /path/to/export/output.csv -b XXX
But what I get is that /path/to/export/output.csv is actually a folder with a lot of folders inside and it is storing some kind of json metadata that can be used to restore the XXX bucket in another instance of Couchbase.
Has anyone been able to export data from a Couchbase bucket (Json documents) into a CSV file?
From looking at the documentation, you have to put a slightly different syntax to export to a CSV. http://docs.couchbase.com/admin/admin/CLI/cbtransfer_tool.html
It needs to look like so:
cbtransfer http://[localhost]:8091 csv:./data.csv -b default -u Administrator -p password
Notice the "csv:" before the name of the csv file.
I tested this and it does export a CSV. Just be forwarned that you need a relatively flat document structure for this to work really well, as JSON can represent far more complex data structures than CSV obviously, e.g. arrays, sub-documents, etc. cbtransfer will not unravel those. For example, if there is a subdocument, cbtransfer will represent it as a JSON doc in the line of each CSV.
So depending on what your document structure is, exporting to CSV is not an ideal format. It is a step backwards.

2 Hdfs files comparison

I have 6000+ .csv files in /hadoop/hdfs/location1 and 6100+ .csv files in /hadoop/hdfs/location2.
I want to compare these two hdfs directories and find the diff of files. The diff .csv files(non-similar) should be reflected in a 3rd hdfs directory(/hadoop/hdfs/location3). I am not sure we can use diff command as in unix to hdfs file system.
Any idea on how to resolve this would be appreciable.
Anshul
You could use some python (perl/etc.) script to check it. Depending on your special needs and speed, you could check for file-size first. Are the filenames identical? Are the creation-dates identical etc.?
If you want to use python, check out the filecmp module.
>>> import filecmp
>>> filecmp.cmp('undoc.rst', 'undoc.rst')
True
>>> filecmp.cmp('undoc.rst', 'index.rst')
False
Look at the below post which provides an answer on how to compare 2 HDFS files. You will need to extend this for 2 folders.
HDFS File Comparison
You could easily do this with the Java API and create a small app:
FileSystem fs = FileSystem.get(conf);
chksum1 = fs.getFileChecksum(new Path("/path/to/file"));
chksum2 = fs.getFileChecksum(new Path("/path/to/file2"));
return chksum1 == chksum2;
We don't have hdfs commands to compare the files.
Check below post we can achieve by writing the PIG Program or We need to Write Map Reduce Program.
Equivalent of linux 'diff' in Apache Pig
I think below steps will solve your problem:
Get the list of file names which are in first location into one file
Get the second location files into another file
Find the diff between two files using unix commands
Whatever the diff files you found, copy those files in the other location.
I hope this helps you. otherwise let me know.

iSeries Export to CSV

Is there an iSeries command to export the data in a table to CSV format?
I know about the Windows utilities, but since this needs to be run automatically I need to run this from a CL program.
You can use CPYTOIMPF and specify the TOSTMF option to place a CSV file on the IFS.
Example:
CPYTOIMPF FROMFILE(DBFILE) TOSTMF('/outputfile.csv') STMFCODPAG(*PCASCII) RCDDLM(*CRLF)
If you want the data to be downloaded directly to a PC, you can use the "Data Transfer from iSeries" function of IBM iSeries Client Access to create a .CSV file. In the file output details dialog, set the file type to Comma Separated Variable (CSV).
You can save the transfer description to be reused later.
You could use a trigger. The iSeries Client Access software wont do since that is a windows application, what I understand is that you need the data to be exported each time that the file is written. Check this link to know more about triggers.
You are going to need FTP to perform that action.
If your iSeries shop uses ZMOD/FTP your shortest solution is a few lines of code away -- 3 lines to be exact -- the three lines are to Start FTP, Put DBF, and finally, End FTP.
IF you don't use ZMOD/FTP:
- You could use native FTP/400 to accomplish what you need to do, but it is quite involved!!!
- you may probably need to use an RPGLE program to parse, format, and move, data into a "flatfile", then use native FTP/400 to FTP the file out
- and yes, a CL will need as a wrapper!
You can do it all in one very simple CL program:
CPYTOIMPF the file TOSTMF -> the cvs file will be in the IFS
FTP the file elsewhere (to a server or a PC)
It works like a charm