Load CSV data as RDF using Ontorefine CLI - csv

I'm trying to programmatically add a csv file that's generated everyday to a GraphDB repository. I have already created the CSV to RDF mapping using Ontorefine. How does one use the CSV and the mapping now to add RDF triples programmatically.

Use the open source CLI https://github.com/Ontotext-AD/ontorefine-client (that's probably what #aksanoble refers to).
Please note that the CLI is not yet available in Ontotext Refine 1.0 (which was split off from GraphDB), and will be available in September. In the meantime, you could use GraphDB 9.11.
We are working on extended ETL pipeline scenarios, including
Reuse of cleaning and transformation scripts between projects
Run all cleaning, transformation and RDF data update or download steps on a new dataset automatically
BTW, is your file stored locally or accessed through a URL? We have an idea to handle the latter case specially.

Related

Configure Apache Drill to read xml files in Mapr distribution

I have a project where I should read xml files with Apache Drill to process it , can someone tell me how I can configure it?
NB: I use Mapr distribution
I tried to add the configuration to the configuration UI but I get a error(see image)
enter image description here
Thanks in advance
You'll need to use a Drill distribution based on Apache Drill >= 1.19 for the XML format plugin.
So this is more of a Drill question than a MapR question.
There are two key steps here
make sure that Drill can access whatever you use to store your data (sounds your data is xml files in MapR (which is now called HPE Ezmeral Data Fabric))
make sure that Drill can understand the data you have. I am not current on Drill, but reading many kinds of XML should be doable.
For getting access, there are two major paths to accessing files on Ezmeral Data Fabric. One path is to mount the data fabric as a conventional file system on all the nodes running Drillbits. This is often done using NFS mounts, but can also be the FUSE driver provided with data fabric.
The other major approach to getting data access is to use the HDFS API framework to access data via maprfs://... path names. This requires installing the data fabric client on all of the nodes running Drillbits.
It sounds like you are running the version of Drill that is packaged with the old MapR or current HPE Ezmeral system. This is the easiest approach since the packaged version is integrated with the client libraries needed to use the HDFS API with maprfs:// resources (it also provides access to the tables and streams in the data fabric).

Backup core data, one entity only

My application requires some kind of data backup and some kind of data exchange between users, so what I want to achieve is the ability to export an entity but not the entire database.
I have found some help but for the full database, like this post:
Backup core data locally, and restore from backup - Swift
This applies to the entire database.
I tried to export a JSON file, this might work except that the entity I'm trying to export contains images as binary data.
So I'm stuck.
Any help exporting not the full database but just one entity or how to write a JSON that includes binary data.
Take a look at protobuf. Apple has an official swift lib for it
https://github.com/apple/swift-protobuf
Protobuf is an alternate encoding to JSON that has direct support for serializing binary data. There are client libraries for any language you might need to read the data in, or command-line tools if you want to examine the files manually.

Can I export all of my JSON documents of a collection to a CSV in Marklogic?

I have millions of documents in different collections in my database. I need to export them to a csv onto my local storage when I specify the collection name.
I tried mlcp export but didn't work. We cannot use corb for this because of some issues.
I want the csv to be in such a format that if I try a mlcp import then I should be able to restore all docs just the way they were.
My first thought would be to use MLCP archive feature, and to not export to a CSV at all.
If you really want CSV, Corb2 would be my first thought. It provides CSV export functionality out of the box. It might be worth digging into why that didn't work for you.
DMSDK might work too, but involves writing code that handles the writing of CSV, which sounds cumbersome to me.
Last option that comes to mind would be Apache NiFi for which there are various MarkLogic Processors. It allows orchestration of data flow very generically. It could be rather overkill for your purpose though.
HTH!
ml-gradle has support for exporting documents and referencing a transform, which can convert each document to CSV - https://github.com/marklogic-community/ml-gradle/wiki/Exporting-data#exporting-data-to-csv .
Unless all of your documents are flat, you likely need some custom code to determine how to map a hierarchical document into a flat row. So a REST transform is a reasonable solution there.
You can also use a TDE template to project your documents into rows, and the /v1/rows endpoint can return results as CSV. That of course requires creating and loading a TDE template, and then waiting for the matching documents to be re-indexed.

Qlikview and Qliksense VS MSBI

This question can be seen as very stupid, but i'm actually strugling to make it clear into my head.
I have some academic experience with SSIS, SSAS and SSRS.
In simple terms:
SSIS - Integration of data from a data source to a data destination;
SSAS - Building a cube of data, which allows to analize and discover the data.
SSRS - Allows the data source to create dashboards with charts, etc...
Now, doing a comparison with Qlikview and Qliksense...
Can the Qlik products do exactly the same as SSIS, SSAS, SSRS? Like, can Qlik products do the extraction(SSIS), data proccessing(SSAS) and data visualization(SSRS)? Or it just works more from a SSRS side (creating dashboards with the data sources)? Does the Qlik tools do the ETL stages (extract, transform and load) ?
I'm really struggling here, even after reading tons of information about it, so any clarifications helps ALOT!
Thanks,
Anna
Yes. Qlik (View and Sense) can be used as ETL tool and presentation layer. Each single file (qvw/View and qvf/Sense) contains the script that is used for ETL (load all the required data from all data sources, transform the data if needed), the actual data and the visuals.
Depends on the complexity, only one file can be used for everything. But the process can be organised in multiple files as well (if needed). For example:
Extract - contains the script for data extract (eventually with incremental load implemented if the data volumes are big) and stores the data in qvd files
Transform - loads the qvd files from the extraction process (qvd load is quite fast) and perform the required transformations
Load - load the data model from the transformation file (binary load) and create the visualisations
Another example of multiple files - had a project which required multiple extractors and multiple transformation files. Because the data was extracted from multiple data sources to speed up the process we've ran all the extractors files are the same time, then ran all the transformation files at the same time, then the main transform (which combined all the qvd files) into a single data model.
In addition to the previous comment have a look at the layered Qlik architecture.
There it is described quite well how you should structure your files.
However, I would not recommend to use Qlik for a full-blown data-warehouse (which you could do with SSIS easily) as it lacks some useful functions (e.g. helpers for slowly-changing-dimensions).

NetSuite Migrations

Has anyone had much experience with data migration into and out of NetSuite? I have to export DB2 tables into MySQL, manipulate data, and then export ina CSV file. Then take a CSV file of accounts and manipulate the data again for accounts to match up from our old system to new. Anyone tried to do this in MySQL?
A couple of options:
Invest in a data transformation tool that connects to NetSuite and DB2 or MySQL. Look at Dell Boomi, IBM Cast Iron, etc. These tools allow you to connect to both systems, define the data to be extracted, perform data transformation functions and mappings and do all the inserts/updates or whatever you need to do.
For MySQL to NetSuite, php scripts can be written to access MySQL and NetSuite. On the NetSuite side, you can either do SOAP web services, or you can write custom REST APIs within NetSuite. SOAP is probably a bit slower than REST, but with REST, you have to write the API yourself (server side JavaScript - it's not hard, but there's a learning curve).
Hope this helps.
I'm an IBM i programmer; try CPYTOIMPF to create a pretty generic CSV file. I'll go to a stream file - if you have NetServer running you can map a network drive to the IFS directory or you can use FTP to get the CSV file from the IFS to another machine in your network.
Try Adeptia's Netsuite integration tool to perform ETL. You can also try Pentaho ETL for this (As far as I know Celigo's Netsuite connector is built upon Pentaho). Also Jitterbit does have an extension for Netsuite.
We primarily have 2 options to pump data into NS:
i)SuiteTalk ---> Using which we can have SOAP based transformations.There are 2 versions of SuiteTalk synchronous and asynchronous.
Typical tools like Boomi/Mule/Jitterbit use synchronous SuiteTalk to pump data into NS.They also have decent editors to help you do mapping.
ii)RESTlets ---> which are typical REST based architures by NS can also be used but you may have to write external brokers to communicate with them.
Depending on your need you can have whatever you need.IN most of the cases you will be using SuiteTalk to bring in data to Netsuite.
Hope this helps ...
We just got done doing this. We used an iPAAS platform called Jitterbit (similar to Dell Boomi). It can connect to mySql and to NetSuite and you can do transformations in the tool. I have been really impressed with the platform overall so far
There are different approaches, I like the following to process a batch job:
To import data to Netsuite:
Export CSV from old system and place it in Netsuite's a File Cabinet folder (Use a RESTlet or Webservices for this).
Run a scheduled script to load the files in the folder and update the records.
Don't forget to handle errors. Ways to handle errors: send email, create custom record, log to file or write to record
Once the file has been processed move the file to another folder or delete it.
To export data out of Netsuite:
Gather data and export to a CSV (You can use a saved search or similar)
Place CSV in File Cabinet folder.
From external server call webservices or RESTlet to grab new CSV files in the folder.
Process file.
Handle errors.
Call webservices or RESTlet to move CSV File or Delete.
You can also use Pentaho Data Integration, its free and the learning curve is not that difficult. I took this course and I was able to play around with the tool within a couple of hours.