java.lang.IllegalStateException: Must specify a valid bucketing strategy while requesting aggregation - google-fit

I get this error while I am creating a read request object DataReadRequest class. I tried to look for the documentation but it is unclear. Here is my code:
DataReadRequest readRequest = new DataReadRequest.Builder()
.read(DataType.TYPE_LOCATION_SAMPLE)
.setTimeRange(startTime, endTime, TimeUnit.MILLISECONDS)
.bucketByTime(1, TimeUnit.HOURS)
.build();
The error is in bucketByTime method and I don't know how to proceed.

I had this error before. The short answer is to remove the line
.bucketByTime(1, TimeUnit.HOURS)
The reason why this does not work with your request is that the bucketByTime method aggregates data according to the period of time you're asking for, but the data you're requesting can't be aggregated (what does it mean to add together location samples??). In fact all the bucketing methods expect an aggregate data type because bucketing implies that you're trying to represent a number of data points as one data point.

Related

Pagination yields no results in Google Fit

I am using the REST API of Google Fit. I want to list sessions with the fitness.users.sessions.list method. This gives me a few dozen of results.
Now I would like to get more results and for this I set the pageToken to the value I got from the previous response. But the new results does not contain any data points, just yet another pageToken:
{
"session": [
],
"deletedSession": [
],
"nextPageToken": "1541027616563"
}
The same happens when I use the pagination function of the Google Python API Client: I iterate on results but never get any new data.
request = self.service.users().sessions().list(userId='me')
while request is not None:
response = request.execute()
for ds in response['session']:
yield ds
request = self.service.users().sessions().list_next(request, response)
I am sure there is much(!) more session data in Google Fit for my account. Am I missing something regarding pagination?
Thanks
I think that the description of the pageToken parameter is actually rather confusing in the documentation (this answer was written prior to the documentation being updated).
The continuation token, which is used to page through large result sets. To get the next page of results, set this parameter to the value of nextPageToken from the previous response.
This is conflating two concepts: continuation, and paging. There isn't actually any paging in the implementation of Users.sessions.
Sessions are indexed by their modification timestamp. There are two (or three, depending on how you count) ways to interact with the API:
Pass a start and/or end time. Omitted start and end times are taken to be the start and end of time respectively. In this case, you will get back all sessions falling between those times.
Pass neither start nor end times. In this case, you will receive all sessions between some time in the past and now. That time is:
pageToken, if provided
Otherwise, it's 7 days ago (this doesn't actually appear in the documentation, but it is the behavior)
In any of these cases, you receive a nextPageToken back which is just after the most recent session in the results. As such, nextPageToken is really a continuation token, because what it is saying is that you have been told about all sessions modified up to now: pass that token back to be told about anything modified between nextPageToken and "current time" to get updates.
As such, if you issue a request that fetches all sessions for the last 7 days (no start/end time, no page token) and get a nextPageToken, you will only get something back in a request using that nextPageToken if any sessions have been modified in between the first and second requests.
So, if you're making these requests in quick succession, it is expected that you won't see anything in the second response.
In terms of the validity of the startTime you were passing in your comment, that's a bug. RFC3339 defines that fractional seconds should be optional.
I'll see about getting that fixed; but in the interim, just make sure you pass a fractional number of seconds (even if it is just .0, e.g. 2018-10-18T00:00:00.0+00:00).
It may be because the format of the URL you're using is different from the example in the documentation.
You are using:
startTime=2018-10-18T00:00:00+00:00
Wherein the one in the documentation has it as:
startTime=2014-04-01T00:00:00.00Z
The documentation also stated that both startTime and endTime query parameters are required.

Setting a completion size or predicate for unknown number of splited elements in Camel route

I have a camel route that consume an http service which return a json with several elements I need to correlate through an Id. I don't know how many elements with the same Id are coming in the response, so, How can I set the completion in the aggregation in order to correlate all of them?
These are my routes:
from("direct:getInfo")
.id("getInfo")
.setHeader("accept", constant("application/json"))
.setHeader("authorization", constant("xyz"))
.setHeader("Cache-Control", constant("no-cache"))
.setHeader("content-Type", constant("application/json"))
.setHeader(Exchange.HTTP_METHOD, constant("GET"))
.removeHeader(Exchange.HTTP_PATH)
.removeHeader("CamelHttp*")
.setBody(simple("${null}"))
.streamCaching()
.to("http4:someURL") //responses an array of n json elements
.split().jsonpath("$").streaming()
.to("direct:splitInfo");
from("direct:splitInfo")
.id("splitInfo")
.aggregate(jsonpath("CustomerId"), new ArrayListAggregationStrategy())
.completionSize(???) //How must I set the completion in order to correlate all items
.to("direct:process");
Thanks so much.
Complete rewrite of the answer due to example in the comments
Since you want to split and re-aggregate complete JSON payloads, you only need the Splitter EIP with an aggregation strategy.
If you provide the splitter with the expression to split the payload as well as an aggregation strategy, you don't need a completion criteria at all. Every JSON payload is processed as a "batch".
.from(endpoint)
// body = complete JSON payload
.split([split-expression], new MyAggregationStrategy())
// each element is sent to this bean
.to("bean:elementProcessorBean")
// you must end the splitter
.end()
// here you get the complete re-aggregated JSON payload
// how it is re-aggregated is up to MyAggregationStrategy
Check out the linked Splitter documentation for an example.

Request.post arguments in python

Is there a way to determine the max value of an argument in a requests.post command to a website if we don't know the amount of data in the website's dataset? I'm trying to execute the following code to get specific information on all daycares from this website, but don't know the value of the last argument (length). Currently, I'm assuming this value is 20, but it is subject to change from time to time. How do I keep it open ended so I don't have to guess the max value for lenth? Code as follows:
data_requested = requests.post("https://data.nj.gov/views/INLINE/rows.json?"
"accessType=WEBSITE&method=getByIds&asHashes=true&start=0&length=20",
json=data)
njcc_data = data_requested.json()
Notice that this has nothing to do with requests.post - the range of values length can take is determined by the creator of that API and is an unknown quantity both to you and to requests.
You can try to reason about what possible values it could take, is it the length of a person? If yes, it's probably not going to be more than 250cm.
You can also use trial and error and see how high you can make it before the API endpoint gives back an error, but I guess this is what you were trying to avoid.
If length is the number of items returned (the length of the returned json array) then you could just try setting it to a high number like 1000 and see if you can get away with it.

kafka-python 1.3.3: KafkaProducer.send with explicit key fails to send message to broker

(Possibly a duplicate of Can't send a keyedMessage to brokers with partitioner.class=kafka.producer.DefaultPartitioner, although the OP of that question didn't mention kafka-python. And anyway, it never got an answer.)
I have a Python program that has been successfully (for many months) sending messages to the Kafka broker, using essentially the following logic:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
retries=3)
...
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg)
Recently, I tried to upgrade it to send messages to different partitions based on a definite key value extracted from the message:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
key_serializer=str.encode,
retries=3)
...
try:
key = some_message[0]
except:
key = None
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg, key=key)
However, with this code, no messages ever make it out of the program to the broker. I've verified that the key value extracted from some_message is always a valid string. Presumably I don't need to define my own partitioner, since, according to the documentation:
The default partitioner implementation hashes each non-None key using the same murmur2 algorithm as the java client so that messages with the same key are assigned to the same partition.
Furthermore, with the new code, when I try to determine what happened to my send by calling res.get (to obtain a kafka.FutureRecordMetadata), that call throws a TypeError exception with the message descriptor 'encode' requires a 'str' object but received a 'unicode'.
(As a side question, I'm not exactly sure what I'd do with the FutureRecordMetadata if I were actually able to get it. Based on the kafka-python source code, I assume I'd want to call either its succeeded or its failed method, but the documentation is silent on the point. The documentation does say that the return value of send "resolves to" RecordMetadata, but I haven't been able to figure out, from either the documentation or the code, what "resolves to" means in this context.)
Anyway: I can't be the only person using kafka-python 1.3.3 who's ever tried to send messages with a partitioning key, and I have not seen anything on teh Intertubes describing a similar problem (except for the SO question I referenced at the top of this post).
I'm certainly willing to believe that I'm doing something wrong, but I have no idea what that might be. Is there some additional parameter I need to supply to the KafkaProducer constructor?
The fundamental problem turned out to be that my key value was a unicode, even though I was quite convinced that it was a str. Hence the selection of str.encode for my key_serializer was inappropriate, and was what led to the exception from res.get. Omitting the key_serializer and calling key.encode('utf-8') was enough to get my messages published, and partitioned as expected.
A large contributor to the obscurity of this problem (for me) was that the kafka-python 1.3.3 documentation does not go into any detail on what a FutureRecordMetadata really is, nor what one should expect in the way of exceptions its get method can raise. The sole usage example in the documentation:
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
# Block for 'synchronous' sends
try:
record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed...
log.exception()
pass
suggests that the only kind of exception it will raise is KafkaError, which is not true. In fact, get can and will (re-)raise any exception that the asynchronous publishing mechanism encountered in trying to get the message out the door.
I also faced the same error. Once I added json.dumps while sending the key, it worked.
producer.send(topic="first_topic", key=json.dumps(key)
.encode('utf-8'), value=json.dumps(msg)
.encode('utf-8'))
.add_callback(on_send_success).add_errback(on_send_error)

Simperium Data Dictionary or Decoder Ring for Return Value on "all" call?

I've looked through all of the Simperium API docs for all of the different programming languages and can't seem to find this. Is there any documentation for the data returned from an ".all" call (e.g. api.todo.all(:cv=>nil, :data=>false, :username=>false, :most_recent=>false, :timeout=>nil) )?
For example, this is some data returned:
{"ccid"=>"10101010101010101010101010110101010",
"o"=>"M",
"cv"=>"232323232323232323232323232",
"clientid"=>"ab-123123123123123123123123",
"v"=>{
"date"=>{"o"=>"+", "v"=>"2015-08-20T00:00:00-07:00"},
"calendar"=>{"o"=>"+", "v"=>false},
"desc"=>{"o"=>"+", "v"=>"<p>test</p>\r\n"},
"location"=>{"o"=>"+", "v"=>"Los Angeles"},
"id"=>{"o"=>"+", "v"=>43}
},
"ev"=>1,
"id"=>"abababababababababababababab/10101010101010101010101010110101010"}
I can figure out some of it just from context or from the name of the key but a lot of it is guesswork and trial and error. The one that concerns me is the value returned for the "o" key. I assume that a value of "M" is modify and a value of "+" is add. I've also run into "-" for delete and just recently discovered that there is also a "! '-'" which is also a delete but don't know what else it signifies. What other values can be returned in the "o" key? Are there other keys/values that can be returned but are rare? Is there documentation that details what can be returned (that would be the most helpful)?
If it matters, I am using the Ruby API but I think this is a question that, if answered, can be helpful for all APIs.
The response you are seeing is a list of all of the changes which have occurred in the given bucket since some point in its history. In the case where cv is blank, it tries to get the full history.
You can find some of the details in the protocol documentation though it's incomplete and focused on the WebSocket message syntax (the operations are the same however as with the HTTP API).
The information provided by the v parameter is the result of applying the JSON-diff algorithm to the data between changes. With this diff information you can reconstruct the data at any given version as the changes stream in.