How to reorder CSV columns using apache NIFI processor? - csv

In my scenario,Users have the option to upload a CSV file and can map the columns of that CSV file to a predefined schema.I need to reorder the columns of that CSV file based on user mapping and upload it to HDFS. Is there any way to achieve this via a NIFI processor ?

You can accomplish this with a ConvertRecord processor. Register an Avro schema describing the expected format in a Schema Registry (controller service), and create a CSVReader implementation to convert this incoming data to the generic Apache NiFi internal record format. Similarly, use a CSVRecordSetWriter with your output schema to write the data back to CSV in whatever columnar order you like.
For more information on the record processing philosophy and some examples, see Record-oriented data with NiFi and Apache NiFi Records and Schema Registries.

Related

What is an alternative to CSV data set config in JMeter?

We want to use 100 credentials from .csv but I would rather like to know if there is any other alternative to this available in jmeter.
If you have the credentials in the CSV file there are no better ways of "feeding" them to JMeter than CSV Data Set Config.
Just in case if you're still looking for alternatives:
__CSVRead() function. The disadvantage is that the function reads the whole file into memory which might be a problem for large CSV files. The advantage is that you can choose/change the name of the CSV file dynamically (in the runtime) while with the CSV Data Set Config it has to be immutable and cannot be changed once it's initialized.
JDBC Test Elements - allows fetching data (i.e. credentials) from the database rather than from file
Redis Data Set - allows fetching data from Redis data storage
HTTP Simple Table Server - exposes simple HTTP API for fetching data from CSV (useful for distributed architecture when you want to ensure that different JMeter slaves will use the different data), this way you don't have to copy .csv file to slave machines and split it
There are few alternatives
JMeter Plugin for reading random CVS data : Random CSV Data Set Config
JMeter function : __CSVRead
Reading CSV file data from a JSR223 Pre Processor
CSV Data Set Config is simple, easier to user and available out of the box.

How to create a BQ-schema from XSD

I need some guidance on how to proceed with a problem.
Our integration team receives xml files which are converted to json and sent to pub/sub. We then ingest the json files (or are supposed to) into bigquery.
The problem is that the xml files do not include all possible objects or values all the time. So, I cant create a correct schema in bq to receive the json files. I got the xsd file with an extension file which gives me all possible objects but I don't know how to convert this to a correct bq schema.
Do you have any suggestions on how to create a bq schema from xsd files? I was thinking that if I create an xml file with dummy data (including all objects and more than one object when creating repeated objects) with help of the xsd maybe that xml file may be converted to json and then use the auto-schema detection of bq.
Any suggestions?
Thanks,
Cris
If you have the XSD schema files, you can convert these to a valid JSON schema. There are a few tools that can help you to accomplish this.
Keep in mind that the tools are for general purposes and not for the particular case of BigQuery, so you'll have to tune the result to get a valid JSON schema. For this check the components of a BigQuery schema, and for quick reference the sample provided in the documentation.

Convert JSON to CSV in nifi

I want to convert JSON files to CSV in nifi. We can achieve this in Python and other programming languages and have multiple articles on it. I have multiple JSON files and each file has different schema(one specific file will have one schema only). I can see there are templates to convert CSV to JSON and other conversions. But I didn't see any template to convert JSON data to CSV. I have gone through the article https://community.hortonworks.com/articles/64069/converting-a-large-json-file-into-csv.html ,however here we are hard coding the schema. As I have multiple files and each file has different schema, I can't hardcode the schema. Any suggestions please.
Conversion between formats is typically done through ConvertRecord by plugging in the appropriate record reader and record writer, in this case a JSON reader and CSV writer.
To make use of the record processors you need to defined Avro schemas for your data and put them in a schema registry, NiFi provides a local one.
There are lots of examples and posts out there about the record stuff, this slide deck shows an example of CSV to JSON, but would be easy to reverse the situation for your scenario:
https://www.slideshare.net/BryanBende/apache-nifi-record-processing
This post has some other info:
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries

Indexing JSON in GeoMesa

Assume I want to perist JSON files in GeoMesa (on Accumulo). These JSON files have geometries and time. Can I use a XZ3 index? If yes then how?
NB: By JSON I am not refering to GeoJSON.
You can write a GeoMesa converter (a configuration file) to extract the values you want out of your JSON and into a GeoTools SimpleFeature, and ingest those into GeoMesa. Download the Accumulo distribution from github and look at the example under examples/ingest/json/.
Full documentation for converters is available here.
You also have the option of storing JSON strings as attributes, and querying them using JSON-Path. There is more information on that here.
The indices created for your data will depend on the attributes present. If you have a non-point geometry and a date defined, then you will automatically build an XZ3 index. More information on indices is available here and here

Can the avro json be extened with additional information?

The avro format is used in hadoop as a header to describe the contents of the binary file that follows. My question is whether the json part of the avro file can be extended to include information that is not necessary for hadoop? The typical use case would be to attach meta-data like the originator of the file and a date to the file without it needing to be data and part of the file.
Yes. Avro files can be annotated with additional information in the json schema or with specific additional name:value pairs. Additionally, we have been able to read these avro files with Pentaho and Google Big Query. One caveat is that the schema and name:value pairs are discarded during the import process. So if you feel you will need them later, you should extract and store local copies of them.