How to stop a streaming pipeline in google cloud dataflow

How to stop a streaming pipeline in google cloud dataflow - google-compute-engine

I have a Streaming dataflow running to read the PUB/SUB subscription.
After a period of a time or may be after processing certain amount of data, i want the pipeline to stop by itself. I don't want my compute engine instance to be running indefinitely.
When i cancel the job through dataflow console, it is shown as failed job.
Is there a way to achieve this? am i missing something ? Or that feature is missing in the API.

Could you do something like this?
Pipeline pipeline = ...;
... (construct the streaming pipeline) ...
final DataflowPipelineJob job =
DataflowPipelineRunner.fromOptions(pipelineOptions)
.run(pipeline);
Thread.sleep(your timeout);
job.cancel();

I was able to drain (canceling a job without losing data) a running streaming job on data flow with Rest API.
See my answer
Use Rest Update method, with this body :
{ "requestedState": "JOB_STATE_DRAINING" }

Related

Execution ID on Google Cloud Run

I am wondering if there exists an execution id into Cloud Run as the one into Google Cloud Functions?
An ID that identifies each invocation separately, it's very useful to use the "Show matching entries" in Cloud Logging to get all logs related to an execution.
I understand the execution process is different, Cloud Run allows concurrency, but is there a workaround to assign each log to a certain execution?
My final need is to group at the same line the request and the response. Because, as for now, I am printing them separately and if a few requests arrive at the same time, I can't see what response corresponds to what request...
Thank you for your attention!

Open Telemetry looks like a great solution, but the learning and manipulation time isn't negligible,
I'm going with a custom id created in before_request, stored in Flask g and called at every print().
#app.before_request
def before_request_func():
execution_id = uuid.uuid4()
g.execution_id = execution_id

Couchbase Java SDK times out with BUCKET_NOT_AVAILABLE

I am doing a lookup operation Couchbase Java SDK 3.0.9 which looks like this:
// Set up
bucket = cluster.bucket("my_bucket")
collection = bucket.defaultCollection()
// Look up operation
val specs = listOf(LookupInSpecStandard.get("hash"))
collection.lookupIn(id, specs)
The error I get is BUCKET_NOT_AVAILABLE. Here are is the full message:
com.couchbase.client.core.error.UnambiguousTimeoutException: SubdocGetRequest, Reason: TIMEOUT {"cancelled":true,"completed":true,"coreId":"0xdb7f8e4800000003","idempotent":true,"reason":"TIMEOUT","requestId":608806,"requestType":"SubdocGetRequest","retried":39,"retryReasons":["BUCKET_NOT_AVAILABLE"],"service":{"bucket":"export","collection":"_default","documentId":"export:main","opaque":"0xcfefb","scope":"_default","type":"kv"},"timeoutMs":15000,"timings":{"totalMicros":15008977}}
The strange part is that this code hasn't been touched for months and the lookup broke out of a sudden. The CB cluster is working fine. Its version is
Enterprise Edition 6.5.1 build 6299.
Do you have any ideas what might have gone wrong?

Note that in Couchbase Java SDK 3.x, the Cluster::bucket method returns instantly, and continues opening a bucket in the background. So the first operation you perform - a lookupIn here - needs to wait for that resource opening to complete before it can proceed. It looks like it took a little longer to access the Couchbase bucket than usual and you got a timeout.
I recommend using the Bucket::waitUntilReady method after opening a bucket, to block until the resource opening is complete:
bucket = cluster.bucket("my_bucket")
bucket.waitUntilReady(Duration.ofMinutes(1));

This problem can occur because of firewall. You need to allow these ports.
Client-to-node
Unencrypted: 8091-8097, 9140 [3], 11210
Encrypted: 11207, 18091-18095, 18096, 18097
You can check more from below
https://docs.couchbase.com/server/current/install/install-ports.html#_footnotedef_2

SnappyData Job API

We have created a job by extending SnappySqlJob and overriding runSnappyJob and isValidJob.
In runSnappyJob we are creating connection to Kafka to poll messages from Kafka in every 1 second.
After terminating the job using:
bin/snappy-job.sh stop --lead lead:8090 --job-id <job_id>
we can see in the logs that Kafka is still polling the data.
Is there any API to check the status of running job so that we can stop the Kafka consumer?

Not sure about this behavior you noticed but you should use a SnappyStreamingJob instead of SnappyJob for managing streaming jobs.
See https://snappydatainc.github.io/snappydata/programming_guide/snappydata_jobs/ ... there is a link to the examples from there.

Autodesk Forge register job conflict

When POSTing to https://developer.api.autodesk.com/viewingservice/v1/register I sometiems receive the following error:
{
Diagnostic: The request is rejected as it conflicts with a previous request that is in-progress.,
registerKeys: {},
Result: Conflict
}
How can I find out which job is already in progress so that I can track its progress and get its result?

First, this is the old API, you need to consider using the ModelDerivtive API instead (ie https://developer.autodesk.com/en/docs/model-derivative/v2)
Like Xiaodond said there is no API to collect all jobs currently processing on your account. You need to request each URN manifest to determine how many jobs runs on this model as know you can translate to SVF, but also export to other formats such as obj, stl, ... when it is possible. Manifest end point and documentation here - https://developer.autodesk.com/en/docs/model-derivative/v2/reference/http/urn-manifest-GET/
Last, we are working on a webhook solution which will be a better solution as a Webhook will call you back when a job is starting and completes. Webhooks aren't yet available at the time of this post, but you should be notified via the developer newsletter when it will be on production.
Hope that helps,

How can SQL Server 2012 (or SSIS) notify NServiceBus upon completion of a task?

We have some very long running ETL packages (some run for hours) that need to be kicked off by NServiceBus endpoints. We do not need to keep a single transaction alive for the entire process, and can break it up into smaller transactions. Since an NServiceBus handler will wrap itself in a transaction for the entirety, we do not want to handle this in a single transaction because it will time out--let alone create issues with locking in the DBMS.
My current thoughts are that we could spawn another process asynchronously, immediately return from the handler, and publish an event upon completion (success or failure). I have not found a lot of documentation on how to integrate the new NServiceBus 4.0 SQL Server Broker support with the traditional MSMQ transport. Is that even possible?
What is the preferred way to have a long running process in SQL Server 2012 (or an SSIS package) notify NServiceBus subscribers when it completes in an asynchronous manner?

It looks like it is possible to do a http request from SSIS, see How to make an HTTP request from SSIS?
With that in mind you can use send a message to NServiceBus via the Gateway (the Gateway is just an HttpListener) to your Publisher to tell it to publish a message informing all the subscribers that the long running ETL package has completed.
To send a message to the gateway you need to do something like:
var webRequest = (HttpWebRequest)WebRequest.Create("http://localhost:25898/Headquarters/");
webRequest.Method = "POST";
webRequest.ContentType = "text/xml; charset=utf-8";
webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
webRequest.Headers.Add("Content-Encoding", "utf-8");
webRequest.Headers.Add("NServiceBus.CallType", "Submit");
webRequest.Headers.Add("NServiceBus.AutoAck", "true");
webRequest.Headers.Add("NServiceBus.Id", Guid.NewGuid().ToString("N"));
const string message = "<?xml version=\"1.0\" ?><Messages xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns=\"http://tempuri.net/NServiceBus.AcceptanceTests.Gateway\"><MyRequest></MyRequest></Messages>";
using (var messagePayload = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(message)))
{
webRequest.Headers.Add(HttpRequestHeader.ContentMd5, HttpUtility.UrlEncode(Hasher.Hash(messagePayload))); //Need to specify MD5 hash of the payload
webRequest.ContentLength = messagePayload.Length;
using (var requestStream = webRequest.GetRequestStream())
{
messagePayload.CopyTo(requestStream);
}
}
using (var myWebResponse = (HttpWebResponse) webRequest.GetResponse())
{
if (myWebResponse.StatusCode == HttpStatusCode.OK)
{
//success
}
}
Hope this helps!

There is actually a task in SSIS 2012 for placing messages in an MSMQ, the Message Queue Task. You just point it to your MSMQ connection and can use an Expression to customize your message with the package name, success/failure, row counts, etc.
Depending on how many packages we're talking about and how customized you want the messages to be, your best bet is to write a standalone utility to create messages in whatever format you desire, and then use an Execute Process Task to invoke that utility with whatever parameters from the package you want to pass in to be formatted into the message.
You could also use that same codebase and just create a custom SSIS task (a lot easier than it sounds.)

One thought I had to help adhere to the DRY principle would be to use a Master SSIS package.
In my mind, it would look something like an Execute Package Task with an X connected to that. Configure the package to take as a parameter a Package Name. Configure the Execute Package Task to use the Parameter for determining what package to call.
The X would probably be a Script Task but perhaps as #Kyle Hale points out, it might be the Message Queue Task. I leave that decision to those more versed in NServiceBus.
The important thing in my mind, is to not add this logic into every package as that'd be a maintenance nightmare.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to stop a streaming pipeline in google cloud dataflow - google-compute-engine

Could you do something like this? Pipeline pipeline = ...; ... (construct the streaming pipeline) ... final DataflowPipelineJob job = DataflowPipelineRunner.fromOptions(pipelineOptions) .run(pipeline); Thread.sleep(your timeout); job.cancel();

I was able to drain (canceling a job without losing data) a running streaming job on data flow with Rest API. See my answer Use Rest Update method, with this body : { "requestedState": "JOB_STATE_DRAINING" }

Related

Execution ID on Google Cloud Run

Couchbase Java SDK times out with BUCKET_NOT_AVAILABLE

SnappyData Job API

Autodesk Forge register job conflict

How can SQL Server 2012 (or SSIS) notify NServiceBus upon completion of a task?

Categories

Resources