Configure Apache Drill to read xml files in Mapr distribution - configuration

I have a project where I should read xml files with Apache Drill to process it , can someone tell me how I can configure it?
NB: I use Mapr distribution
I tried to add the configuration to the configuration UI but I get a error(see image)
enter image description here
Thanks in advance

You'll need to use a Drill distribution based on Apache Drill >= 1.19 for the XML format plugin.

So this is more of a Drill question than a MapR question.
There are two key steps here
make sure that Drill can access whatever you use to store your data (sounds your data is xml files in MapR (which is now called HPE Ezmeral Data Fabric))
make sure that Drill can understand the data you have. I am not current on Drill, but reading many kinds of XML should be doable.
For getting access, there are two major paths to accessing files on Ezmeral Data Fabric. One path is to mount the data fabric as a conventional file system on all the nodes running Drillbits. This is often done using NFS mounts, but can also be the FUSE driver provided with data fabric.
The other major approach to getting data access is to use the HDFS API framework to access data via maprfs://... path names. This requires installing the data fabric client on all of the nodes running Drillbits.
It sounds like you are running the version of Drill that is packaged with the old MapR or current HPE Ezmeral system. This is the easiest approach since the packaged version is integrated with the client libraries needed to use the HDFS API with maprfs:// resources (it also provides access to the tables and streams in the data fabric).

Related

Connecting a database with Thingsboard

Will you Please help me in one more important thing??? I need to store dashboard's data in a database.. according to my study thingsboard supports 3 database at the moment. NoSQL, Mysql and Hybrid (Psql+cassandra) so i have researched a lot but could not find any way to send my telemetry data to any database. I know it is possible because the thingsboard doc itself say so... but how do i do that?? I checked Psql database thingsboard that i created during installation but there are those relations are present that was made by default. i need to store my project's data in databases just like in AWS we store IoT core's data in Dynamo DB or in IoT analytics. Thingsboard do not provide any node related to DB in his rule engine?? so How do i make a rule chain to transfer my projects data in any Database server. i have installed pgadmin4 to Graphically see the database but nothing useful found. Documentation and stakoverflow geeks said that configuring Thingsboard.yml file located in monolithic installation on linux (/etc/thingsboard/conf/thingsboard.conf ) in this path it have cassandra,mysql,postgres configuration but how to properly configure it??? i tried to access the default database of postgres named thingsboard that i created on installing time but when i list the content of database it only shows the default things/relations of thingsboard if i create a device on thingsboard that is not showing in database why?? I really can use your help. Please enlighten me a way to connect my THINGSBOARD with a DATABASE.
see in my attached images there are everything default, nothing that i create on thingsboard.
enter image description here
That's wrong, ThingsBoard currently supports 3 database setups: Postgres only, Hybrid Postgres + Cassandra (only telemetries) & Postgres + Timescale. So there is no MySQL database used anywhere.
https://thingsboard.io/docs/user-guide/install/ubuntu/#step-3-configure-thingsboard-database
Find guides for connecting your devices to Thingsboard here, e.g. via MQTT:
https://thingsboard.io/docs/reference/mqtt-api/
If you would like to forward the stored telemetries of ThingsBoard to different databases, this is not possible directly with rulechains (there's only one node to store data in a cassandra table)
One way to achieve this, would be fetching the data with an external microservice/program via HTTP API and persisting the data in the database of your choice. You could use a python script for example.
https://thingsboard.io/docs/reference/rest-api/
Alternatively, you can send the data to a Message queue like Kafka instead of fetching via HTTP API. But still it would require additional tools for storing the data in external databases outside ThingsBoard.

How to transfer JSON files(containing test execution results) from my local system to a azure VM (Linux) or to a storage disk attached to a webserver.

It would be a great help if someone could recommend an easier approach to this using Java.
Current Framework Implementation
Expected Framework Implementation
You haven't really shared any code or deep technical details, so I can only answer on high level.
On your first screenshot, you have an arrow, where you copy back the test execution result to your localhost. You have to modify this logic, instead of copying back to localhost you have to upload the test executions files to azure storage account. You'll need blob storage to store your files. If you want to do it with Java, here is some sample code with Java v7 about how to upload files to blob storage: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-java
After you uploaded the files to azure storage, you can access the JSON files from your web server. Or, if you make your container public, you won't even need the web server to access these files.

Fastest way to download/access files in Amazon S3

I'm trying to access files in my amazon S3 and do some operations on it. Currently evaluating the options.
Since I will be doing some operations on the S3 files, I would prefer using some language to access the files in S3 (I have already tried copy command).
My S3 contains JSON files which range between 2MB to 4 MB and I would need to parse these JSON and load them into a database (thinking about using JQuery here, but any other suggestions are welcome)
Given these requirements which is most efficient language/platform to be used here.
You options are pretty broad here. AWS has a list of SDKs for you to choose from. https://aws.amazon.com/tools/#sdk
So your comfort level with a particular language should be your largest influencer. Given that you mentioned JSON and JQuery perhaps you should look at Node.js SDK and AWS Lambda.

Logstash INPUT MySQL

Can't find any input plugin for Relational Databases in Logstash Documentation.
What is the best approach to import data from one Relational Database Table with logstash? Is to connect Elastic Search directly to the database using JDBC?
You'll need to use JDBC River (https://github.com/jprante/elasticsearch-river-jdbc) for loading JDBC data into elastic search (or write your own code to do it).
It looks like there are several JIRAs open requesting JDBC loading in Logstash, but they haven't been worked: https://logstash.jira.com/browse/LOGSTASH-1764
There's this
WIP: Under Development, NOT FOR PRODUCTION
This is a plugin for Logstash.
It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one central location.
So far, there is no any Logstash API for reading SQL.
For my recommendation, you can write a program/script such as JAVA/python to read the logs from sql database and write to a file. Then use logstash file
API to read from the file. The Logstash website has getting started tutorial. It is easy to learn.
Good Luck

NetSuite Migrations

Has anyone had much experience with data migration into and out of NetSuite? I have to export DB2 tables into MySQL, manipulate data, and then export ina CSV file. Then take a CSV file of accounts and manipulate the data again for accounts to match up from our old system to new. Anyone tried to do this in MySQL?
A couple of options:
Invest in a data transformation tool that connects to NetSuite and DB2 or MySQL. Look at Dell Boomi, IBM Cast Iron, etc. These tools allow you to connect to both systems, define the data to be extracted, perform data transformation functions and mappings and do all the inserts/updates or whatever you need to do.
For MySQL to NetSuite, php scripts can be written to access MySQL and NetSuite. On the NetSuite side, you can either do SOAP web services, or you can write custom REST APIs within NetSuite. SOAP is probably a bit slower than REST, but with REST, you have to write the API yourself (server side JavaScript - it's not hard, but there's a learning curve).
Hope this helps.
I'm an IBM i programmer; try CPYTOIMPF to create a pretty generic CSV file. I'll go to a stream file - if you have NetServer running you can map a network drive to the IFS directory or you can use FTP to get the CSV file from the IFS to another machine in your network.
Try Adeptia's Netsuite integration tool to perform ETL. You can also try Pentaho ETL for this (As far as I know Celigo's Netsuite connector is built upon Pentaho). Also Jitterbit does have an extension for Netsuite.
We primarily have 2 options to pump data into NS:
i)SuiteTalk ---> Using which we can have SOAP based transformations.There are 2 versions of SuiteTalk synchronous and asynchronous.
Typical tools like Boomi/Mule/Jitterbit use synchronous SuiteTalk to pump data into NS.They also have decent editors to help you do mapping.
ii)RESTlets ---> which are typical REST based architures by NS can also be used but you may have to write external brokers to communicate with them.
Depending on your need you can have whatever you need.IN most of the cases you will be using SuiteTalk to bring in data to Netsuite.
Hope this helps ...
We just got done doing this. We used an iPAAS platform called Jitterbit (similar to Dell Boomi). It can connect to mySql and to NetSuite and you can do transformations in the tool. I have been really impressed with the platform overall so far
There are different approaches, I like the following to process a batch job:
To import data to Netsuite:
Export CSV from old system and place it in Netsuite's a File Cabinet folder (Use a RESTlet or Webservices for this).
Run a scheduled script to load the files in the folder and update the records.
Don't forget to handle errors. Ways to handle errors: send email, create custom record, log to file or write to record
Once the file has been processed move the file to another folder or delete it.
To export data out of Netsuite:
Gather data and export to a CSV (You can use a saved search or similar)
Place CSV in File Cabinet folder.
From external server call webservices or RESTlet to grab new CSV files in the folder.
Process file.
Handle errors.
Call webservices or RESTlet to move CSV File or Delete.
You can also use Pentaho Data Integration, its free and the learning curve is not that difficult. I took this course and I was able to play around with the tool within a couple of hours.