Azure functions Blob Storage Date Time output path - function

I have an azure function with Blob Storage as an output.
My question is how to specify a {date}/{time} output path pattern from the Azure functions? I don't want to store all blobs flat in the container.
I tried mycontainername/{date}/{time}, but it complaints saying "No binding parameter exists for 'date'"
Thanks

You can use the datetime parameter resolver with the appropriate format string.
For example:
{datetime:yyyy} will result in 2017 (on 2017)
{datetime:hhmmss} will result in hours, minutes and seconds, with no separators.
The format strings used are the ones supported by the .NET framework and you can learn more about them here. (Standard strings are also supported).

You can use it now in Java Azure Function as well.
{DateTime}
example:
#BlobOutput(name = "blob", connection = "AzureWebJobsStorage", path = "samples-java/new-{DateTime}.zip") OutputBinding<byte[]> blob

Related

How to store a CSV into a PCollection in Apache Beam

I have a CSV Data stored at some location. I want to read this data and store it in some PCollection but I am unable to specify the type of PCollection or the way it should be stored in a PCollection
As a quite straitforward way - you can read it as just strings (e.g. TextIO for Java SDK will return a PCollection<String>) and then parse it with your own PTransform into required format that returns a PCollection<YourPOJO> (see an example).

Azure Logic Apps - Convert JSON Epoch Timestamp to DateTime String

I am working on an Azure Logic App that is triggered via an HTTP call and returns a JSON response. Internally, the logic app retrieves JSON data from a web API and then converts the response JSON data to a format that is acceptable to the calling client of the logic app.
The problem I'm having is that the web API is returning dates in the format "/Date(1616371200000)/" and I need the date format to look like "2021-03-32T19:00:00Z". I don't see any built-in logic app function that can work with or convert the Epoch timestamp as-is (unless I'm missing something).
To clarify...
Source Data:
{
"some_silly_date": "/Date(1616371200000)/"
}
Desired Data:
{
"some_silly_date": "2021-03-32T19:00:00Z"
}
The following solution would theoretically work if the source date wasn't wrapped with "/Date(...)/":
"#addToTime('1970-01-01T00:00:00Z', 1616371200000, 'Second')"
Parsing that text off the timestamp before converting it would lead to a really ugly expression. Is there a better way to convert the timestamp that I'm not aware of?
Note that using the Liquid JSON-to-JSON templates is not an option. I was using that and found this action apparently has to JIT compile before use which is causing my logic app to time-out when being called after a long period of inactivity.
Can you get the value "/Date(1616371200000)/" from the JSON into a variable? If so, a little string manipulation would do the trick:
int(replace(replace(variables('data_in'),'/Date(',''),')/',''))
Then use the variable in the addToTime function.
Result:
The following expression seems to be working and returns a timestamp in UTC. Note that the substring() function is only using a length of 10 instead of 13. I'm intentionally trimming-off the milliseconds from the Epoch timestamp because the addToTime() function only handles seconds.
{
"some_silly_date": "#addToTime('1970-01-01T00:00:00Z', int(substring(item()?['some_silly_date'],6,10)), 'Second')"
}
As an added bonus, if the timestamp value is null in the source JSON, do the following:
{
"some_silly_date": "#if(equals(item()?['some_silly_date'], null), null, addToTime('1970-01-01T00:00:00Z', int(substring(item()?['some_silly_date'],6,10)), 'Second'))"
}
Quite possibly the ugliest code I've ever written.

Export as JSON using BigQueryToCloudStorageOperator

When I use the BigQuery console manually, I can see that the 3 options when exporting a table to GCS are CSV, JSON (Newline delimited), and Avro.
With Airflow, when using the BigQueryToCloudStorageOperator operator, what is the correct value to pass to export_format in order to transfer the data to GCS as JSON (Newline delimited)? Is it simply JSON? All examples I've seen online for BigQueryToCloudStorageOperator use export_format='CSV', never for JSON, so I'm not sure what the correct value here is. Our use case needs JSON, since the 2nd task in our DAG (after transferring data to GCS) is to then load that data from GCS into our MongoDB Cluster with mongoimport.
I found that the value export_format='NEWLINE_DELIMITED_JSON' was required after finding the documentation https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationextract and refering to the values for destinationFormat
According to the BigQuery documentation the three possible formats to which you can export BigQuery query results are: CSV, JSON, and Avro (and this is compatible with the UI drop-down menu).
I would try with export_format='JSON' as you already proposed.

Amazon Data Pipeline: List of acceptable types for Custom Format

I am using DataPipeline to pipe a CSV from S3 into RDS
As part of this process, I'm using a DataFormat which is a CSV
According to the documentation , I can have STRING, DATETIME and INT
Are there other types that I can use? (namely date, floating numbers, etc etc)
Thanks!
you can use any data type that is supported by the database you are using. Datapipeline does not impose any restriction on datatypes you can use.

Gemfire pdxInstance datatype

I am writing pdxInstances to GemFire using the sequence: rabbitmq => springxd => gemfire.
If I put this JSON into rabbitmq {'ID':11,'value':5}, value appears as a byte value in GemFire. If I put {'ID':11,'value':500}, value appears as a word and if I put {'ID':11,'value':50000} it appears as an Integer.
A problem arises when I query data from GemFire and order them. For example, if I use a query such as select * from /my_region order by value it fails, saying it cannot compare a byte with a word (or byte with an integer).
Is there any way to declare the data type in JSON? Or any other method to get rid of this problem?
To add a bit of insight into this problem... in reviewing GemFire/Geode source code, it would seem it is not possible to configure the desired value type and override GemFire/Geode's default behavior, which can be seen in JSONFormatter.setNumberField(..).
I will not explain how GemFire/Geode involves the JSONFormatter during a Region.put(key, value) operation as it is rather involved and beyond the scope of this discussion.
However, one could argue that the problem is not necessarily with the JSONFormatter class, since storing a numeric value in a byte is more efficient than storing the value in an integer, especially when the value would indeed fit into a byte. Therefore, the problem is really that the Comparator used in the Query processor should be able to compare numeric values in the same type family (byte, short, int, long), upcasting where appropriate.
If you feel so inclined, feel free to file a JIRA ticket in the Apache Geode JIRA repository at https://issues.apache.org/jira/browse/GEODE-72?jql=project%20%3D%20GEODE
Note, Apache Geode is the open source "core" of Pivotal GemFire now. See the Apache Geode website for more details.
Cheers!
Your best bet would be to take care of this with a custom module or a groovy script. You can either write a custom module in Java to do the conversion and then upload the custom module into SpringXD, then you could reference your custom module like any other processor. Or you could write a script in Groovy and pass the incoming data through a transform processor.
http://docs.spring.io/spring-xd/docs/current/reference/html/#processors
The actual conversion probably won't be too tricky, but will vary depending on which method you use. The stream creation would look something like this when you're done.
stream create --name myRabbitStream --definition "rabbit | my-custom-module | gemfire-json-server etc....."
stream create --name myRabbitStream --definition "rabbit | transform --script=file:/transform.groovy | gemfire-json-server etc...."
It seems like you have your source and sink modules set up just fine, so all you need to do is get your processor module setup to do the conversion and you should be all set.