Mule - remove first row from csv file - csv

I'm using Mule CE and need to remove the first row (containing headers) of an incoming CSV file so that I can pass it to a transformer to be converted to maps. How would I achieve this? My guess would be some kind of splitter, but I've not had any luck so far.

The jdbc:csv-to-maps-transformer contains a property ignoreFirstRecord, but if you have your csv transformations done with some custom transformer, you may have to do this with Java (or some of the scripting languages). I don't think there is any standard Mule component that does this out-of-the-box.

Related

Best data processing software to parse CSV file and make API call per row

I'm looking for ideas for an Open Source ETL or Data Processing software that can monitor a folder for CSV files, then open and parse the CSV.
For each CSV row the software will transform the CSV into a JSON format and make an API call to start a Camunda BPM process, passing the cell data as variables into the process.
Looking for ideas,
Thanks
You can use a Java WatchService or Spring FileSystemWatcher as discussed here with examples:
How to monitor folder/directory in spring?
referencing also:
https://www.baeldung.com/java-nio2-watchservice
Once you have picked up the CSV you can use my example here as inspiration or extend it: https://github.com/rob2universe/csv-process-starter specifically
https://github.com/rob2universe/csv-process-starter/blob/main/src/main/java/com/camunda/example/service/CsvConverter.java#L48
The example starts a configurable process for every row in the CSV and includes the content of the row as a JSON process data.
I wanted to limit the dependencies of this example. The CSV parsing logic applied is very simple. Commas in the file may break the example, special characters may not be handled correctly. A more robust implementation could replace the simple Java String .split(",") with an existing CSV parser library such as Open CSV
The file watcher would actually be a nice extension to the example. I may add it when I get around to it, but would also accept a pull request in case you fork my project.

Trying to parse a JSON file but it seems the format is different or something is wrong with the JSON file

Hi I'm trying to parse any of the files from the link underneath. I've tried reaching out to the owner of the data dumps, but nothing works in trying to parse the files as proper JSON files. No program we use (Power BI, Jupyter, Excel) anything really, wants to recognise the files as JSON and we can't figure out why this might be. I was wondering if anyone could help figuring out what the issue is here as this dataset is very interesting to me and my co-students. I hope I'm using the word 'parsing' correctly.
The link to the data dumps is linked underneath:
https://files.pushshift.io/reddit/comments/
The file I downloaded (I just tried one at random) was handled just fine by jq, my preferred command-line tool for processing JSON files.
jq accepts an input consisting of a sequence of JSON objects, which is what I found when I decompressed the test file. This format is commonly known as JSON lines, and many tools can handle it. The Wikipedia article on JSON streaming contains more information and a (possibly outdated) list of tools.
If your tools aren't capable of handling more than one JSON object in an input, you could turn the files into something which you can handle by adding a comma to the end of every line except the last one (since each JSON object is a single line) and then surrounding the whole input inside a pair of brackets to turn the sequence into a JSON list. Since JSON does not actually care about newlines, it would be sufficient to add a line containing [ at the beginning and a line containing ] at the end. I don't know what command-line tools you have available and are comfortable with, but the task shouldn't be too difficult.

What does a JSON file do?

I went through previous posts on SO and some of the answers say that a JSON file is used to send data from server to client.
Well that seems to be okay but then we can create package.json, Apidoc.json, manifest.json which do not interact with the client and server
So can someone tell me what actually is a JSON file?
JSON stands for JavaScript Object Notation. It is used to describe a data structure in a simple format. It can be a plain text file, which may be used to pass data from the server to a client, but it could be equally used to hold and consume that data at the same layer e.g. you could have a configuration file at the client side which is read an interpreted by your application.
Note also that JSON does not need to be held in a file; you could create a string variable with JSON data in it and pass this from one method to another without ever storing it in a file.
The tag definition in Stack Overflow can be found here https://stackoverflow.com/tags/json/info and further information can be found here https://www.json.org/.
JSON is a file format, just like CSV. Just because CSV is used with Microsoft Excel, does not mean that is all it is used for (just like with JSON). Just because it is common to get info from a server in JSON format, does not mean that is all JSON is used for. Do some googling before asking a question like this on Stack Overflow.
Here is an intro to JSON. JSON Intro W3Schools

Convert JSON to CSV using Microsoft Flow

I am trying to parse JSON data from an API into Flow, convert it into a CSV and then output the CSV to my Google Drive.
The API I am trying to work with is located here:
https://www.binance.com/api/v1/klines?symbol=BNBBTC&interval=1h&limit=24
Is this possible using Microsoft flow? I have tried various things without much success.
Thanks in advance.
I'd say it is possible. What have you tried so far?
First you have to get the response body. Then extract the "meat" from each element, which has to be done with flow expressions "body(response_body)[0]" - depending on format. Then feed all these data parts to a newly created excel file.

Puppet - CSV file header

I'm, writing a Puppet (3.6.2) module that reads data fields from a CSV file via the extlookup function and I cannot figure out how to tell extlookup that the first line is the header field. Does extlookup support this? If not, can anyone recommend an external function I could import and use?
thanks,
PS - Yes I know about hiera, and having the data in YAML or JSON files but my requirement is CSV files only.
Brandon
The behavior of extlookup() is pretty well documented. It makes no special provision for column headers, which are by no means an inherent feature of CSV format. Indeed, if your header line is not readable as a data line, then your file is not CSV at all.
Supposing that your file is indeed valid CSV, the absolute simplest solution would be to ignore the issue. It presents a problem only if the first column heading duplicates an actual or potential data name. If it does not, then you will never look up or use the psuedo-value represented by the first row.
If your file in fact is not CSV on account of its first line, or if the first column name conflicts with a real data name, then it seems the next best alternative would be to just remove that line, or to avoid creating it in the first place. I don't see any reason why one of these should not be possible.
I know about heira, and having the data in YAML or JSON files but my requirement is CSV files only.
How sad. Do be aware that extlookup() has long been deprecated, and it was removed from Puppet 4.
I'm inclined to suggest you implement a translator from CSV to Hiera-friendly YAML, and use Hiera in your module. Alternatively, Hiera supports custom backends, and it's not too hard to write one. I am unaware of an existing CSV backend for Hiera, but you could write one. Ignoring a header line would then be under your control, and you would simultaneously achieve a measure of future-proofing.