how to create a CSV in gcloud command? - json

I am currently trying to search a group of ebooks to learn more about C#. The aim is to ask a question get a page in one or multiple of the ebooks to read. I went to the g_suite chat team and they have kindly directed me to vision commands that was easy enough to follow to make multiple json files.
https://cloud.google.com/vision/docs/pdf
I want to implement this files in to AUTO ML Natural Language Processing. To do so, a CSV file is required.
I do not know how to create a CSV file that would get me past this point and I am currently stuck.
How to create a CSV file using gcloud command and should not the Json file be Jsonl file to be accepted?
thanks for your answer in advance

The output from the Vision API (service) is a JSON file written to Cloud Storage.
The input dataset to Auto ML expects the data to be in CSV format and stored in Cloud Storage.
This isn't a gcloud issue but a general data-transformation problem: transforming JSON to CSV.
Google Cloud includes services that could help you with this but I suggest you start by writing a script that converts the data (i.e. loads then parses the JSON file creating a CSV file in the required format for Auto ML).
You may want to Google to see whether others have done similar and use their code as a starting point.
NOTE IIUC your solution, while an interesting use of these technologies may be overkill. If you're looking to learn Vision API and Auto ML, great. If not, most of this content is available more directly as searchable HTML and text on the web and indeed Stack overflow exists to answer developer questions on a myriad of topics including C#.

Related

How to automate data extraction from Elasticsearch Dev Tools?

I have to do the following steps two or three times a day
log in into Elasticsearch
Go to Dev Tools
Run a specific query by selecting it and pressing ctrl + enter
Query that I have to run
Select the results that returns in the "buckets" and copy it.
The yellow markdown in the image is what I have to select and copy
Then I go to https://www.convertcsv.com/json-to-csv.htm and paste the results so it converts to CSV.
Where I have to paste the results.
I can then download the CSV and then import it into google sheets so I can view the results in a Looker Dashboard.
Button to download the converted CSV.
This take me some time everyday and I would like if there is any way that I could automate such routine.
Maybe some ETL tool that can perform at least part of the process or may some more specific way to do it with python.
Thanks in advance.
I don't have much experience with what I want to do and I tried to search online similar issues, but couldn't really find anything useful.
I don't know you tried, but there is a reporting tool on elasticsearch inside the "Stack Management > Reporting". On the other side, there are another tools which you can work from a server with crontab. Here are some of them :
A little bit old but I think it can work for you. ES2CSV. YOu can check there are examples inside the docs folder. YOu can send queries via file and report to CSV.
Another option which is my preference too. You can use pandas library of python. YOu can write a script according to this article, and you can get a csv export CSV. The article I mentioned is really explained in a great way.
Another alternative a library written in Java. But the documentation is a little bit weak.
Another alternative for python library can be elasticsearch-tocsv. This one is a little bit recently updated when I compare it to first alternative. But the query samples are a little bit weak. But there is a detailed article. You can check it.
You can use elasticdump, which is written on NodeJS and a great tool to report data from elasticsearch. And there is a CSV export option. You can see examples on GitHub page.
I will try to find more and I will update this answer time by time. Thanks!

Parsing JSON file with excel but only certain rows

I have a Hospital pricing JSON file that management wants me to parse but the file is over 4 million rows and as all of you know Excel can only handle 1 million lines. Fortunately, they only want pricing from a certain hospital group. I know how to do a basic parse of JSON files using excel but don't know how to manipulate the parse so it only pulls down data matching a certain criteria.
I don't see a specific question here so I'll give a broad answer. I don't think Excel is the right tool for the job here. You're better off using either a scripting or programming tool to filter out the rows from the JSON file that you need. You can then also reuse the script you wrote when another one of these questions comes in. A simple and easy to use contender here would be python and its json module.

How to send data from Spark to my Angular8 project

Technologies I am using to fetch data from my MySQL database is Spark 2.4.4 and Scala. I want to display that data in my Angular8 project. Any help on how to do it? I could not find any documentation regarding this.
I am not sure if this is a scala/spark related question. It sounds more towards system design of your project.
One solution is to use your Angular8 directly read from MySQL. There are tons of tutorials online.
Another solution is to use your spark/scala to read data and dump to CSV/JSON file at somewhere and use Angular8 to read in that file. The pros is that you can do some transformation before displaying your data. The cons is that there is latency between transformation and displaying. After reading the flat file into JSON it's up to you how to render that data on user's screen.

How can i publish CSV data as Linked data on Web?

My work is mainly focused on conversion of CSV data to RDF data format. After get RDF data ,i need to publish that RDF data as Linked data on web. Actually i want to convert CSV data to RDF data using java programming by myself then i want to publish that RDF data as Linked data on web using any tools.Can anyone help me finding any ways to do this or give me any suggestion or reference ? which tools i should use for this work? Thanks
You can publish your RDF in a variety of ways. Here is a common reference where they explain the steps, software tools and examples: http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf
In a nutshell, once you have your RDF data, you should think about the following:
1) Which tool/set of tools do I want to use to store my RDF data? For instance, I commonly use Virtuoso because I can use it for free and it facilitates the creation of the endpoint. But you can use Jena TDB, Allegro Graph, or many other triple stores.
2) Which tool do I use to make my data derreferenceable? For example, I use Pubby because I can configure it easily. But you can use Jena TDB (for the previous step) + Fuseki + Snorql for the same purpose. See the reference above for more information on the links and features of each tool.
3) Which datasets should I link to? (i.e., which data from other datasets do I reference, in order to make my dataset part of the Linked Data cloud?)
4) How should I link to these datasets? For example, the SILK framework can be used to analyze which of the URIs of your dataset are owl:sameAs other URIs in the target dataset of your choice.
Many people just publish their RDF in their endpoints, without linking it to other datasets. Although this follows the Linked Data principles (http://www.w3.org/DesignIssues/LinkedData.html), it is always better to link to other existing URIs when possible.
This is a short summary, assumming you already have the RDF data created. I hope it helps.
You can use Tarql (https://tarql.github.io/) or if you want to do more advanced mapping you can use SparqlMap (http://aksw.org/Projects/SparqlMap).
In both cases you will end up having a SPARQL endpoint which you can make available on-line and people can query your data.
Making each data item available under its URL is a very good idea, following the Linked Data principles as mentioned by #daniel-garijo in the other answer: http://www.w3.org/DesignIssues/LinkedData.html.
So you can also publish the data-items with all its properties in individual files.

NetSuite Migrations

Has anyone had much experience with data migration into and out of NetSuite? I have to export DB2 tables into MySQL, manipulate data, and then export ina CSV file. Then take a CSV file of accounts and manipulate the data again for accounts to match up from our old system to new. Anyone tried to do this in MySQL?
A couple of options:
Invest in a data transformation tool that connects to NetSuite and DB2 or MySQL. Look at Dell Boomi, IBM Cast Iron, etc. These tools allow you to connect to both systems, define the data to be extracted, perform data transformation functions and mappings and do all the inserts/updates or whatever you need to do.
For MySQL to NetSuite, php scripts can be written to access MySQL and NetSuite. On the NetSuite side, you can either do SOAP web services, or you can write custom REST APIs within NetSuite. SOAP is probably a bit slower than REST, but with REST, you have to write the API yourself (server side JavaScript - it's not hard, but there's a learning curve).
Hope this helps.
I'm an IBM i programmer; try CPYTOIMPF to create a pretty generic CSV file. I'll go to a stream file - if you have NetServer running you can map a network drive to the IFS directory or you can use FTP to get the CSV file from the IFS to another machine in your network.
Try Adeptia's Netsuite integration tool to perform ETL. You can also try Pentaho ETL for this (As far as I know Celigo's Netsuite connector is built upon Pentaho). Also Jitterbit does have an extension for Netsuite.
We primarily have 2 options to pump data into NS:
i)SuiteTalk ---> Using which we can have SOAP based transformations.There are 2 versions of SuiteTalk synchronous and asynchronous.
Typical tools like Boomi/Mule/Jitterbit use synchronous SuiteTalk to pump data into NS.They also have decent editors to help you do mapping.
ii)RESTlets ---> which are typical REST based architures by NS can also be used but you may have to write external brokers to communicate with them.
Depending on your need you can have whatever you need.IN most of the cases you will be using SuiteTalk to bring in data to Netsuite.
Hope this helps ...
We just got done doing this. We used an iPAAS platform called Jitterbit (similar to Dell Boomi). It can connect to mySql and to NetSuite and you can do transformations in the tool. I have been really impressed with the platform overall so far
There are different approaches, I like the following to process a batch job:
To import data to Netsuite:
Export CSV from old system and place it in Netsuite's a File Cabinet folder (Use a RESTlet or Webservices for this).
Run a scheduled script to load the files in the folder and update the records.
Don't forget to handle errors. Ways to handle errors: send email, create custom record, log to file or write to record
Once the file has been processed move the file to another folder or delete it.
To export data out of Netsuite:
Gather data and export to a CSV (You can use a saved search or similar)
Place CSV in File Cabinet folder.
From external server call webservices or RESTlet to grab new CSV files in the folder.
Process file.
Handle errors.
Call webservices or RESTlet to move CSV File or Delete.
You can also use Pentaho Data Integration, its free and the learning curve is not that difficult. I took this course and I was able to play around with the tool within a couple of hours.