Sending data to CloudWatch using the AWS-SDK - aws-sdk

I want to write data to CloudWatch using the AWS-SDK (or whatever may work).
I see this:
The only method that looks remotely like publishing data to CloudWatch is the putMetricData method..but it's hard to find an example of using this.
Does anyone know how to publish data to CloudWatch?
When I call this:
cw.putMetricData({
Namespace: 'ec2-memory-usage',
MetricData: [{
MetricName:'first',
Timestamp: new Date()
}]
}, (err, result) => {
console.log({err, result});
});
I get this error:
{ err:
{ InvalidParameterCombination: At least one of the parameters must be specified.
at Request.extractError (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/protocol/query.js:50:29)
at Request.callListeners (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/request.js:685:12)
at Request.callListeners (/Users/alex/codes/interos/jenkins-jobs/jobs/check-memory-ec2-instances/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
message: 'At least one of the parameters must be specified.',
code: 'InvalidParameterCombination',
time: 2019-07-08T19:41:41.191Z,
requestId: '688a4ff3-a1b8-11e9-967e-431915ff0070',
statusCode: 400,
retryable: false,
retryDelay: 7.89360948163893 },
result: null }

You're getting this error because you're not specifying any metric data. You're only setting the metric name and the timestamp. You also need to send some values for the metric.
Let's say your application is measuring the latency of requests and you observed 5 requests, with latencies 100ms, 500ms, 200ms, 200ms and 400ms. You have few options for getting this data into CloudWatch (hence the At least one of the parameters must be specified. error).
You can publish these 5 values one at a time by setting the Value within the metric data object. This is the simplest way to do it. CloudWatch does all the aggregation for you and you get percentiles on your metrics. I would not recommended this approach if you need to publish many observations. This option will result in the most requests made to CloudWatch, which may result in a big bill or throttling from CloudWatch side if you start publishing too many observations.
For example:
MetricData: [{
MetricName:'first',
Timestamp: new Date(),
Value: 100
}]
You can aggregate the data yourself and construct and publish the StatisticValues. This is more complex on your end, but results in the fewest requests to CloudWatch. You can aggregate for a minute for example and execute 1 put per metric every minute. This will not give you percentiles (since you're aggregating data on your end, CloudWatch doesn't know the exact values you observed). I would recommend this if you do not need percentiles.
For example:
MetricData: [{
MetricName:'first',
Timestamp: new Date(),
StatisticValues: {
Maximum: 500,
Minimum: 100,
SampleCount: 5,
Sum: 1400
}
}]
You can count the observations and publish Values and Counts. This is kinda the best of both worlds. There is some complexity on your end, but counting is arguably easier than aggregating into StatisticValues. You're still sending every observation so CloudWatch will do the aggregation for you, so you'll get percentiles. The format also allows more data to be sent than in the option 1. I would recommend this if you need percentiles.
For example:
MetricData: [{
MetricName:'first',
Timestamp: new Date(),
Values: [100, 200, 400, 500],
Counts: [1, 2, 1, 1]
}]
See here for more details for each option: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatch.html#putMetricData-property

Related

How to aggregate an ObjectSet after SearchAround

I have 2 phonograph objects, each one having millions of rows, which I have linked by using the Search Around methods.
On the example below, I filter to an Object Set of Flights based on the departure code, then I Search Around to the Passengers on those flights and then I filter again based on an attribute of Passengers Object.
const passengersDepartingFromAirport = Objects.search()
.flights()
.filter(flight => flight.departureAirportCode.exactMatch(airportCode))
.searchAroundPassengers()
.filter(passenger => passenger.passengerAttribute.exactMatch(value));
The result of the above code is:
LOG [2022-04-19T14:25:58.182Z] { osp: {},
objectSet:
{ objectSetProvider: '[Circular]',
objectSet: { type: 'FILTERED', filter: [Object], objectSet: [Object] } },
objectTypeIds: [ 'passengers' ],
emptyOrderByStep:
{ objectSet: '[Circular]',
orderableProperties:
{ attributeA: [Object],
attributeB: [Object],
attributeB: [Object],
...
Now, when I am trying to use take() or takeAsync() or to aggregate the result using groupBy(), I receive the below error:
RemoteError: INVALID_ARGUMENT ObjectSet:ObjectSetTooLargeForSearchAround with instance ID xxx.
Error Parameters: {
"RemoteError.type": "STATUS",
"objectSetSize": "2160870",
"maxAllowedSize": "100000",
"relationSide": "TARGET",
"relationId": "flights-passengers"
}
SafeError: RemoteError: INVALID_ARGUMENT ObjectSet:ObjectSetTooLargeForSearchAround with instance ID xxx
What could be the way to aggregate or to reduce the result of the above ObjectSet?
The current object storage infrastructure has a limit on the size of the "left side" or "starting object set" for a search around of 100,000 objects.
You can define and object set that uses a search around, which is what you're seeing as the result when you execute the Function before attempting any further manipulations.
Using take() or groupBy "forces" the resolution of the object set definition. I.e. you no longer need the pointer to the objects, but you need to actually materialize some data from each individual object to do that operation.
It's in this materialization step that the limit comes into play - the object sets are resolved and, if the object set at the search around step is larger than 100,000 objects, the request will fail with the above message.
There is ongoing work for Object Storage v2, which will eventually support much larger search-around requests, but for now it's necessary create a query pattern that results in less than 100,000 objects before making a search around.
In some cases it's possible to create an "intermediate" object type that represents a different level of granularity in your data or two invert the direction of your search around to find a way to address these limits.

REST API status as integer or as string?

Me and my colleague are working on REST API. We've been arguing quite a lot whether status of a resource/item should be a string or an integer---we both need to read, understand and modify this resource (using separate applications). As this is a very general subject, google did not help to settle this argument. I wonder what is your experience and which way is better.
For example, let's say we have Job resource, which is accesible through URI http://example.com/api/jobs/someid and it has the following JSON representation which is stored in NoSQL DB:
JOB A:
{
"id": "someid",
"name": "somename",
"status": "finished" // or "created", "failed", "compile_error"
}
So my question is - maybe it should be more like following?
JOB B:
{
"id": "someid",
"name": "somename",
"status": 0 // or 1, 2, 3, ...
}
In both cases each of us would have to create a map, that we use to make sense of status in our application logic. But I myself am leaning towards first one, as it is far more readable... You can also easily mix up '0' (string) and 0 (number).
However, as the API is consumed by machines, readability is not that important. Using numbers also has some other advantages - it is widely accepted when working with applications in console and can be beneficial when you want to include arbitrary new failed statuses, say:
status == 50 - means you have problem with network component X,
status > 100 - means some multiple special cases.
When you have numbers, you don't need to make up all those string names for them. So which way is best in you opinion? Maybe we need multiple fields (this could make matters a bit confusing):
JOB C:
{
"id": "someid",
"name": "somename",
"status": 0, // or 1, 2, 3...
"error_type": "compile_error",
"error_message": "You coding skill has failed. Please go away"
}
Personally I would look at handling this situation with a combination of both approaches you have mentioned. I would store the statuses as integers within a database, but would create an enumeration or class of constants to map status names to numeric status values.
For example (in C#):
public enum StatusType
{
Created = 0,
Failed = 1,
Compile_Error = 2,
// Add any further statuses here.
}
You could then convert the numeric status stored in the database to an instance of this enumeration, and use this for decision making throughout your code.
For example (in C#):
StatusType status = (StatusType) storedStatus;
if(status == StatusType.Created)
{
// Status is created.
}
else
{
// Handle any other statuses here.
}
If you're being pedantic, you could also store these mappings in your DB.
For access via an API, you could go either way depending on your requirements. You could even return a result with both the status number and status text:
object YourObject
{
status_code = 0,
status = "Failed"
}
You could also create an API to retrieve the status name from a code. However returning both the status code and name in the API would be the best from a performance standpoint.

Right structure for a series of dates: values

I'm having a hard time trying to figure out what is the right JSON structure for the following set of data. I've got a sensor that logs humidity of a given room on a daily basis. Logs look like:
...
2015-01-19 8%
2015-01-20 13%
...
I'd like to convert it to JSON. My first bet was:
{
'2015-01-19': 8,
'2015-01-20': 13
}
But, is it correct? Shouldn't it be:
[
{ '2015-01-19', 8 },
{ '2015-01-20', 13}
]
Or:
[
{
'date': '2015-01-19',
'value': 8
},
{
'date': '2015-01-20',
'value': 13
}
]
And, at the end of the day, is there a series of best practices I could refer to in order to help me determine what's the best structure on my own?
Your first example is simple and easy, though perhaps not extensible if you decide to add more attributes later. If that's unlikely, you should use that method.
Your second example is not valid JSON.
Your third example makes some sense, though it is not a very compact encoding (wastes space).
A fourth method you should consider is to use separate arrays. This is not necessarily intuitive at first, but it does work well, is compact yet extensible, and is directly compatible with some tools such as HighCharts. That is:
{
'dates': ['2015-01-19', '2015-01-20'],
'humidity': [8, 13]
}

Graph for data updated every 30 minutes

I make weather station which uploads data to my MySQL database every 30-60 minutes. How to make on example temperature plot for a week on my website? I've looked for such option in Highcharts but I don't know does it is possible. Date and time is saved in database as timestamp.
They have an example specifically for time data with irregular intervals: http://www.highcharts.com/demo/spline-irregular-time
Get your data from database for last week, then preprocess in backend to fit Highcharts data format, and as result you should have something like this:
var myData = [
[1388534400000, 12],
[next_timestamp, next_value],
[another_timestamp, another_value],
...
]
Now you can use that data to generate chart:
$("#container").highcharts({
series: [{
data: myData
}]
})
Note: timestamps are in milliseconds.
Now to update chart every 30minutes, just create call some AJAX call to get new data from the server:
setInterval(function() {
$.getJSON('path/to/data', function(myData) {
$("#container").highcharts().series[0].setData(myData);
});
}, 30 * 60 * 1000); // 30minutes

dgrid and DateTextBox

I am trying to get a dgrid OnDemandGrid to work with dijit.form.DateTextBox as an editor. The data is fed to OnDemandGrid via JSON. Initially, I was trying to feed dates in the raw format from the MySQL database (e.g. YYYY-MM-DD HH:MM:SS), however when DateTextBox seemed incapable of parsing that string, I tried feeding it just the date (e.g. 2012-11-20). However, this too failed to work.
So, my primary issue is getting DateTextBox to process the date information. A secondary issue is how to deal with the time information, since DateTextBox cannot edit times. My current approach is that when I split the SQL date string, I am feeding dgrid the time as a separate column for a dijit.form.TimeTextBox. This seems like a messy solution, so I'm open to suggestions.
Here's my grid code:
var grid = new declare([OnDemandGrid, Editor, Keyboard, Selection])({
store: store,
query: {aid: "1900", action: "objectListGenerator2" },
bufferRows: 40,
loadingMessage: "Loading...",
columns: [
{field: "oid", label: "Object ID"},
Editor({field: "startDate", name: "Start Date", editorArgs: { selector: 'date', datePattern: 'yyyy-mm-dd', locale: 'en-us' }}, DateTextBox, "click"),
Editor({field: "startTime", name: "Start Time"}, TimeTextBox, "click"),
Editor({field: "endDate", name: "End Date"}, DateTextBox, "click"),
Editor({field: "endTime", name: "End Time"}, TimeTextBox, "click"),
{field: "endDateOid", label: "End OID"}
],
}, "grid");
Here's a sample string of my JSON source:
[{"content":"2012-11-20 18:12:00","oid":"2112","author":"","endDateOid":"2113","group":"","endTime":"17:59:00","poid":"0","id":null,"startTime":"18:12:00","gmt":"2012-11-22 00:12:43","name":"The Windows 8 Disaster Rolls On","paid":"1900","endDate":"2012-11-21","type":"startDate","startDate":"2012-11-20","cache":"","cachedate":"0000-00-00 00:00:00"},
{"content":"2013-01-01 17:59:00","oid":"2114","author":"","endDateOid":"2115","group":"","endTime":"16:59:00","poid":"0","id":1,"startTime":"17:59:00","gmt":"2012-11-22 00:14:49","name":"The Windows 8 Disaster Rolls On","paid":"1900","endDate":"2013-01-02","type":"startDate","startDate":"2013-01-01","cache":"","cachedate":"0000-00-00 00:00:00"}]
As I noted in the comments, if I remove "click" from the column definition and thus allow the DateTextBox to be created immediately, the correct date shows up. I'm not sure why the data is not parsed properly if the DateTextBox is added later, but at least creating it immediately yields a workable result.