I am developing a prototype on Google cloud platform for which I am using cloud storage, appengine and bigquery.
Now, one of the tasks is to load a file daily from google cloud storage to bigquery for which I am using Cron task on Appengine
The problem is bigquery expects the data to be in the NDJSON format.(new line delimited json) whereas my source file is in normal JSON format.
Currently, I downloaded the file to my laptop and converted it to NDJSOn and then uploaded to bigquery but how do I do it programatically on google clould platform? I am hoping there is something available which I can use as I do not want to write from scratch.
Might be useful to others. This is how I did it but let me know if there's a better or easier way to do it.
Need to download Cloud storage java API and dependencies (http client api and oauth api):
https://developers.google.com/api-client-library/java/apis/
Need to download JSON parser like jackson.
Steps:
1> Read the json file as inputstream using the java cloud storage API
Storage.Objects.Get getObject = client.objects().get("shiladityabucket", "abc.json");
InputStream input = getObject.executeMediaAsInputStream();
2> Convert into array of Java objects (the json file in my case has multiple records). If it's a single record, no need of the Array.
ObjectMapper mapper = new ObjectMapper();
BillingInfo[] infoArr = mapper.readValue(input, BillingInfo[].class);
3> Create a StorageObject to upload to cloud storage
StorageObject objectMetadata = new StorageObject()
// Set the destination object name
.setName("abc.json")
// Set the access control list to publicly read-only
.setAcl(Arrays.asList(
new ObjectAccessControl().setEntity("allUsers").setRole("READER")));
4> iterate over objects in the array and covert them to json string. Append newline for ndjson.
for (BillingInfo info:infoArr) {
jSonString += mapper.writeValueAsString(info);
jSonString += "\n";
}
5> Create an Inputstream to insert using cloud storage java api
InputStream is = new ByteArrayInputStream(jSonString.getBytes());
InputStreamContent contentStream = new InputStreamContent(null, is);
6> Upload the file
Storage.Objects.Insert insertRequest = client.objects().insert(
"shiladitya001", objectMetadata, contentStream);
insertRequest.execute();
Related
I need to convert a JSON message to Avro so that I can send it to Kafka.
I have the sample JSON request and also the schema from an avsc file for Avro.
Any ideas how I can do this please?
Thanks
You will need:
To download Avro Java libraries and put them (along with the dependencies) to JMeter Classpath
To restart JMeter to pick up the libraries
To add a suitable JSR223 Test Element with the relevant Groovy code to perform the conversion, a piece of example code is below:
String schemaJson = 'your schema here'
String genericRecordStr = 'your json payload here'
def schemaParser = new org.apache.avro.Schema.Parser()
def schema = schemaParser.parse(schemaJson)
def decoderFactory = new org.apache.avro.io.DecoderFactory()
def decoder = decoderFactory.jsonDecoder(schema, genericRecordStr)
def reader = new org.apache.avro.generic.GenericDatumReader()(schema)
def record = reader.read(null, decoder)
To send the message to Kafka, out of box JMeter doesn't support Kafka a couple of options are described in the Apache Kafka - How to Load Test with JMeter article
Assuming by "Avro", you are using the Schema Registry. If you're able to install the Kafka REST Proxy, you could send JSON events to it, then the proxy can be configured to integrate with the Registry to send Avro objects to the brokers
Or you can write a custom sampler, as linked in the other answer, but you'd swap the StringSerializer for an Avro Serializer implementation (you'd need to add that to the Jmeter classpath)
I am new with Kafka tool.
Here are the steps I would like to perform in Kafka:
Connect Kafka and input data from a JSON file (I am familiar with this part)
Publish to a Kafka topic
Extract subset of data from the topic and publish (create) to a new topic
Extract data from that new topic and output to a JSON file
Note: my coding preference is Python.
If you want to use Python, you can use Faust to map data between topics
By default, it uses JSON serialization, and if you want to write to a new file, you want to consume through a stream
I created a project from this tutorial.
How may I send a huge json file by user to be read in Command.cs in Design Automation for Revit on cloud? I receive the file in DesignAutomationController.cs using form, but i am unable to send it to Command.cs as in this the url becomes way too huge.
XrefTreeArgument inputJsonArgument = new XrefTreeArgument()
{
Url = "data:application/json, " + ((JObject)inputJson).ToString(Formatting.None).Replace("\"", "'")
};
How huge is the json file? The workitem payload limit is only 16 kb.
We recommend embedded json only for small contents. For anything big, you may upload the json content to a cloud storage and pass the signed url to the file as input argument URL.
Design Automation API limits are defined here:
https://forge.autodesk.com/en/docs/design-automation/v3/developers_guide/quotas/
I'm new in it and trying to understand Azure Logic Apps.
I would like to create a LogicApp that:
Looks for new XML-Files
and for each file:
Read the XML
Check if Node "attachment" is present
and for each Attachment:
Read the Filename
Get the File from FTP and do BASE64-encoding
End for each Attachment.
Write JSON File (I have a schema)
DO HTTP-Post to API with JSON file as "application/json"
Is this possible with the Logic-Apps?
Yes, you can.
Check if a node is present, with xpath expression (e.g. xpath(xml(item()),'string(//Part/#ref)'))
For Get File from FTP, use the action FTP - Get File Content
Write JSON File, use the action Data Operations - Compose. If you need transformations, you have to use an Integration Account and Maps.
Do HTTP Post to API, use de action HTTP
I have deployed a asp mvc where I am trying to display csv files as tables which have been stored in Azure blob storage.
I have problems to read the files in a blob container. I couldn't find any solution in the Microsoft documentation.
My blob containers are public and maybe I could access through their url, but I dont know how to read the csv files. Any Ideas?
My blob containers are public and maybe I could access through their url, but I dont know how to read the csv files. Any Ideas?
To read the csv file stored in Azure Blob storage, you could refer to the following sample code.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("connection string");
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
CloudBlockBlob blockBlobReference = container.GetBlockBlobReference("testdata.csv");
using (var reader = new StreamReader(blockBlobReference.OpenRead()))
{
string row = "";
while (!reader.EndOfStream)
{
//read data from csv file
row = reader.ReadLine();
}
}
My aim is to visualize real time data that comes into the blob storage.
It seems that you’d like to real-time display csv data as tables in clients’ web page. ASP.NET SignalR could help us develop real-time web functionality easily, you could detect csv file under a specified Blob container and call hub method to read data from csv file and push data to connected clients in your WebJob function, and then you could update UI based on the pushed csv data on SignalR client side.
call hub method inside your WebJob function
var hub = new HubConnection("http://xxx/signalr/hubs");
var proxy = hub.CreateHubProxy("HubName");
hub.Start().Wait();
//invoke hub method
proxy.Invoke("PushData", "filename");
hub method to push data to connected clients
public void PushData(string filename)
{
//read data from csv file (blob)
//call javascript side function to populate (or update) tables with csv data
Clients.All.UpdateTables(data);
}