Google Fit estimated steps through REST API decrease in time with some users - google-fit

We are using Googlefit REST API in a process with thousands of users, to get daily steps. With most of users, process is OK, although we are finding some users with this specific behaviour: users step increase during the day, but at some point, they decrease significantly.
We are finding a few issues related to this with Huawei health apps mainly (and some Xiaomi health apps).
We use this dataSourceId to get daily steps: derived:com.google.step_count.delta:com.google.android.gms:estimated_steps
An example of one of our requests to get data for 15th March (Spanish Times):
POST https://www.googleapis.com/fitness/v1/users/me/dataSources
Accept: application/json
Content-Type: application/json;encoding=utf-8
Authorization: Bearer XXXXXXX
{
"aggregateBy": [{
"dataTypeName": "com.google.step_count.delta",
"dataSourceId": "derived:com.google.step_count.delta:com.google.android.gms:estimated_steps"
}],
"bucketByTime": { "durationMillis": 86400000 },
"startTimeMillis": 1615244400000,
"endTimeMillis": 1615330800000
}
With most of users, this goes well (it gets the same data that shows up to the user in googlefit app), but with some users as described, numbers during day increase at first, and decrease later. Some users' data in the googlefit app is much greater (or significantly greater) than the one found through the REST API.
We have even traced this with a specific user during the day. Using buckets of 'durationMillis': 3600000, we have painted a histogram of hourly steps in one day (with a custom made process).
For the same day, in different moments of time (a couple of hours difference in this case), we get this for the EXACT SAME USER:
20210315-07 | ########################################################## | 1568
20210315-08 | ############################################################ | 1628
20210315-09 | ########################################################## | 1574
20210315-10 | ####################### | 636
20210315-11 | ################################################### | 1383
20210315-12 | ###################################################### | 1477
20210315-13 | ############################################### | 1284
20210315-14 | #################### | 552
vs. this, that was retrieved A COUPLE OF HOURS LATER:
20210315-08 | ################# | 430
20210315-09 | ######### | 229
20210315-10 | ################# | 410
20210315-11 | ###################################################### | 1337
20210315-12 | ############################################################ | 1477
20210315-13 | #################################################### | 1284
20210315-14 | ###################### | 552
("20210315-14" means 14.00 at 15th March of 2021)
This is the returning JSON in the first case:
[{"startTimeNanos":"1615763400000000000","endTimeNanos":"1615763460000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":6,"mapVal":[]}]},
{"startTimeNanos":"1615788060000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1568,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615795080000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1628,"mapVal":[]}]},
{"startTimeNanos":"1615795200000000000","endTimeNanos":"1615798500000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1574,"mapVal":[]}]},
{"startTimeNanos":"1615798860000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":636,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1383,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
This is the returning JSON in the latter case:
[{"startTimeNanos":"1615788300000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":517,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615794540000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":430,"mapVal":[]}]},
{"startTimeNanos":"1615796400000000000","endTimeNanos":"1615798200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":229,"mapVal":[]}]},
{"startTimeNanos":"1615798980000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":410,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1337,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
AS you can see, all points always come from originDataSourceId: "raw:com.google.step_count.delta:com.huawei.health"
It looks like a process of Googlefit is doing some kind of adjustments, removing some steps or datapoints, although we cannot find a way to detect what and why, and we cannot explain to the user what is happening or what he or we can do to make his app data to be exactly like ours (or the other way around). His googlefit app shows a number that is not the same one as the one that the REST API shows.
User has already disabled the "googlefit app tracking activities" option.
I would love to know, or try to get some hints to know:
What can I do to debug even more?
Any hint about why is happening this?
Is there anyway, from a configuration point of view (for the user) to prevent this to happen?
Is there anyway, from a development point of view, to prevent this to happen?
Thanks and regards.
UPDATE AFTER Andy Turner's question (thanks for the comment!).
We were able to "catch" this during several hours: 18.58 (around 6K steps), 21.58 (around 25K steps), 22.58 (around 17K steps), 23.58 (around 26K steps). We exported datasets for those, and here is the result.
Another important info: Data is coming only from "raw:com.google.step_count.delta:com.huawei.health". We went through other datasets that might look suspicious, and all were empty (apart from derived and so on).
If we interpret this correctly, probably it's huawei which is sending sometimes a value, and next time, another thing; so it's probably some misconfiguration in the huawei part.
Here are the datasets exported:
https://gist.github.com/jmarti-theinit/8d98996873a9c499a14899a9b62162f3
Result of the GIST is:
Length of 18.58 points 165
Length of 21.58 points 503
Length of 22.58 points 294
Length of 23.58 points 537
How many points in 21.58 that exist in 18.58 => 165
How many points in 22.58 that exist in 18.58 => 57
How many points in 22.58 that exist in 21.58 => 294
How many points in 23.58 that exist in 18.58 => 165
How many points in 23.58 that exist in 21.58 => 503
How many points in 23.58 that exist in 22.58 => 294
So our bet is points are removed and added by devices behind huawei (for example only 57 are common in 18.58 - 22.58), and we cannot control anything more from googlefit's side. Is that correct? Anything else we could see?

We're having similar issues using the REST API.
Here you have what coincides with the case of Jordi:
we are also from Spain (and our users too), although we use servers in Spain and the US
we get the same daily steps value as the google fit app for some users, but not for other users
daily steps increases during the current day, but every next day we make the request, daily steps decrease sometimes
we are making the same request, from the start of day to the end of the day, with 86400000 as bucket time and same data type and data source id
We are in the final development phase, so we're testing with a few users only. Our users have Xiaomi mi band devices.
We think that the problem could be a desynchronization of the servers that we're hitting, because if we test with other apps like this one, they show the correct values. We've created another google cloud console oauth client credentials and new email accounts to test with a brand new users and oauth clients, but the results are the same.
This is the recommended way to get the daily steps andwe are using exactly the same request
https://developers.google.com/fit/scenarios/read-daily-step-total
and even with the "try it" option in the documentation the results are wrong.
What else can we do to help you resolve the issue?
Thank you very much!

Related

GoogleFit - HistoryClient daily step count data

I am using the historyclient of the googlefit. I am looking for the steps , calories and distance aggregated data per day for last 15 to 30 days .
However, I am getting inconsistent results every time I make a call to the historyclient which normally not matches to the googleFit dashboard . Any code snippet to get a reliable and consistent data.
Also, how reliable is the historyclient compare to the healthconnect apis newly launched
Thanks,
Nitin
Implementing the historyclient with aggregate_step_Delta. This is getting me the steps but always inconsisten with the googlefit dashboard;

Is there a way to combine these variables in a way that makes sense?

Hello stack overflow community!
I am a sociology student working on a thesis project comparing home value appreciation and neighborhood racial composition over time.
I'm currently using two separate data sources and trying to combine them in a way that makes sense without aggregating anything.
The first data source is GIS data which has information on home sales in each year by home. The second is census data which has yearly estimates of racial composition by census tract. Both are in .csv formats.
My goal is to create a set of variables for each home row in the GIS data which represents the racial composition for the tract the home is in at the year it was sold (e.g. home 1 | 2010| $500,000 | Census tract 10 | 10% white).
I began doing this by going into Stata and using the following strategy:
For example, if I'm looking at a home sold in 2010 in Census tract 10 and I find that this tract was 10% white in 2010, using something like
If censustract=10 and year=2010, replace percentwhite = 10
However, this seemed incredibly time consuming, as I'm using data that go back decades and a couple dozen Census tracts.
Does anyone have any suggestions on how I might do this smarter, not harder? The first thought I had was to aggregate the data by census tract and year, but was hoping to avoid that if possible. Thank you so much in advance for your help and have a terrific day and start to the new year!
It sounds like you can simply merge census data onto your GIS data. That will be much less painful than using -replace-. Here's an example:
*GIS data: information on home sales in each year by home
clear
input censustract house_id year house_value_k
10 100 2010 200
11 101 2020 500
11 102 1980 100
end
tempfile GIS_data
sa `GIS_data'
*census data: yearly estimates of racial composition by census tract
clear
input censustract year percentwhite
10 2010 20
10 2000 10
11 2010 25
11 2000 5
end
tempfile census_data
sa `census_data'
*easy method: merge the census data onto your GIS data
use `GIS_data', clear
mer m:1 censustract year using `census_data'
drop if _merge==2
list
*hard method: use -replace-
use `GIS_data', clear
gen percentwhite=.
replace percentwhite=20 if censustract==10 & year==2010
replace percentwhite=10 if censustract==10 & year==2000
replace percentwhite=25 if censustract==11 & year==2010
replace percentwhite=5 if censustract==11 & year==2000
list
Both methods "work", but using -merge- is much easier and less prone to errors.
Note: I intentionally created the data sets so that the merge wouldn't be perfect. You will likely want to drop some of the observations in that case. In the code above I dropped when _merge==2

Determine Issuer of EMV card

What would be the best way to determine the issuer of a contactless EMV card. I am trying to determine if a card was issued by Amex, Visa or Mastercard. Is that information available via a USB EMV reader? I don't need to pull any other information from the card..
I'm assuming that it could be done by some python, or C++ code interacting with the card. I'm looking for a good jumping off point.
You should be able to get this info from the successful response of SELECT. Store the list of RIDs ( AID = RID + PIX ), and do SELECT one by one. On success, it will return status bytes 90 00, otherwise 6A 82( file not found ).
The easiest option would be through SELECT command as mentioned
before. The list of AID:
https://www.eftlab.com/knowledge-base/211-emv-aid-rid-pix/
The other option would be getting it from the PAN. You can define issuer
based on first 6 digits or 8 digits of the PAN, which represents Issuer
Identification Number (IIN)/Bank Identification Number(BIN).
34, 37 - American Express
4 - Visa
51-55, 2221-2720 - MasterCard
https://en.wikipedia.org/wiki/Payment_card_number#Issuer_identification_number_(IIN)
You would have to send commands:
SELECT
GET PROCESSING OPTIONS
READ RECORD
You would look for 5A - PAN and extract first digits.
Good tool that you can just use to read data from contactless EMV card is:
https://www.javacardos.com/tools/pyresman
You can create your own scripts or just proceed with some basic commands like a SELECT command.

How can I filter out fictional locations (ex. "under a rock", "hiding") from Google Maps API geocode results?

Google Maps API does a great job trying to locate a match for nearly every query. But if I'm only interested in real locations, how can I filter out Google's guesses?
For example, according to Google, "under a rock" is located at "The Rock, Shifnal, Shropshire TF11, UK". But a person who answers the question, "Where are you?" with "Under a rock" does not mean to indicate that they are in Shropshire, UK. Instead they just don't want to tell you — well, either that or they are in real trouble, thankfully with web access, stuck under some rock.
I have several million user generated location strings that I'm attempting to find coordinates for. If someone writes "under a rock" I'd rather just leave the coordinates null instead of putting an obviously wrong point in Shropshire, UK.
Here are some other examples:
under a rock => Shropshire, UK
planet earth => Cheshire, UK
nowhere => Scituate, RI, USA
travelling => Madrid, Spain
hiding => Anderson, CA, USA
global => Midland, TX, USA
on the web => North Part, ON, Canada
internet => Frisco, TX, USA
worldwide => Mie Prefecture, Japan
Ultimately I'm after a solid way to return coordinates from a string but return false if the location is like the above.
I need to build a function that returns the following:
Twin Cities => Return the colloquial coordinates of Minneapolis-St. Paul
right behind you => false [Google get's this one "right" -- at least for my purposes]
under a rock => false
nowhere => false
Canada => Return coordinates
Mission District San Francisco => Return coordinates
Chicago => Return coordinates
a galaxy far far away => false [Google also get's this "right" — zero results]
What do you recommend?
Here's a comma-delimited array for you to play at home:
'twin cities','right behind you','under a rock','nowhere','canada','mission district san francisco','chicago','a galaxy far far away','london, england','1600 pennsylvania ave, washington, d.c.','california','41.87194,12.56738','global','worldwide','on the internet','mars'
And here's the url format:
'http://maps.googleapis.com/maps/api/geocode/json?address=' + query + '&sensor=false'
ex: http://maps.googleapis.com/maps/api/geocode/json?address=twin+cities&sensor=false
It seems most of your incorrect results have a "partial_match" attribute set to "true".
e.g.
Twin Cities, no partial match:
http://maps.googleapis.com/maps/api/geocode/json?address=Twin%20Cities&sensor=false
under a rock, 10+ results, all with partial match:
http://maps.googleapis.com/maps/api/geocode/json?address=under%20a%20rock&sensor=false
Though the original purpose of this attribute is not to tell wether a locality is correct or not, it's still pretty accurate on the dataset you provided.
From Google Maps API documentation:
partial_match indicates that the geocoder did not return an exact match for the original request, though it was able to match part of the requested address. You may wish to examine the original request for misspellings and/or an incomplete address.
Partial matches most often occur for street addresses that do not exist within the locality you pass in the request. Partial matches may also be returned when a request matches two or more locations in the same locality. For example, "21 Henr St, Bristol, UK" will return a partial match for both Henry Street and Henrietta Street. Note that if a request includes a misspelled address component, the geocoding service may suggest an alternate address. Suggestions triggered in this way will not be marked as a partial match.
This might not be the direct answer to your question.
If you are currently going through 1000s of user input saved in db, and filter out the invalid ones, I think it is too late and not feasible. The output can be only good as input.
The better way is to make input as valid as possible, and end users don't always know what they want.
I would suggest you that user enter their address through autocomplete, so that you will always have the valid address
User enters text, and select the suggestions
An marker and info window will be shown
When user confirms input, you save info window text as user input, not from text input.
By doing this way, you don't need to validate or filter user input.
I know there are Bayes Classifier implementations in javascript. Never tried them though, I currently use a Ruby implementation which works correctly.
You could have two classifications (Real and Unreal), training each of them with how many samples you want (30, 50 samples each?). "If your classifier is well trained, it will be more accurate".
Then you'd have to test the location before calling GoogleMaps API to filter Unreal locations.
To truly succeed here you are going to have to build a database driven system that facilitates both positive and negative lookups with AI that gets smarter over time, just like Google did. I don't believe that there is a single algorithm that will filter out results based on cosmetics alone.
I looked around and found a site that contains every city in the world. Unfortunately, it doesn't give it as a single list so you'd have to spend a bit of time harvesting data. the site is http://www.fallingrain.com/world/index.html.
They seem to be using individual directories for organizing countries, states, and cities. Then, broken down further by alphabet. It is however the only comprehensive that I could find.
If you manage to get all of these locations into a database then you will have the beginnings of a positive lookup system for your queries. Also, you'll need to start building separate lists of bi, tri, and quad-city areas as well as popular destinations and land marks.
You should also store a negative lookup table for all known mismatches. People have a tendency to generate similar false data and type-o's across large populations. So, the most popular "nowhere" and "planet earth" answers will be repeated over and over again and, in every language you can think of.
One of the benefits of this strategy is that you can run relational queries against your data to get matches in bulk instead as well as one at a time. Since some false negatives will occur at the beginning then your main decision is to determine what you want to do with unmatched items. You may want to adopt a strategy where you have the ability to both reject non-matches as well as substituting partial matches with the nearest actual match.
Anyhow, I hope this helps. It is a bit of effort but if it's important it will be worth it. Who knows, you may end up with a database that's actually worth something. Maybe even a Google maps gateway service for companies/developers who need the same functionality. (:
Take care.

pyephem, libnova, stellarium, JPL Horizons disagree on moon RA/DEC?

MINOR EDIT: I say below that JPL's Horizons library is not open source. Actually, it is, and it's available here: http://naif.jpl.nasa.gov/naif/tutorials.html
At 2013-01-01 00:00:00 UTC at 0 degrees north latitude, 0 degrees east
latitude, sea level elevation, what is the J2000 epoch right ascension
and declination of the moon?
Sadly, different libraries give slightly different answers. Converted
to degrees, the summarized results (RA first):
Stellarium: 141.9408333000, 9.8899166666 [precision: .0004166640, .0000277777]
Pyephem: 142.1278749990, 9.8274722221 [precision .0000416655, .0000277777]
Libnova: 141.320712606865, 9.76909442356909 [precision unknown]
Horizons: 141.9455833320, 9.8878888888 [precision: .0000416655, .0000277777]
My question: why? Notes:
I realize these differences are small, but:
I use pyephem and libnova to calculate sun/moon rise/set, and
these times can be very sensitive to position at higher latitudes
(eg, midnight sun).
I can understand JPL's Horizons library not being open source,
but the other three are. Shouldn't someone work out the
differences in these libraries and merge them? This is my main
complaint. Do the stellarium/pyephem/libnova library authors have
a fundamental difference in how to make these calculations, or do
they just need to merge their code?
I also realize there might be other reasons the calculations are
different, and would appreciate any help in rectifying these
possible errors:
Pyephem and Libnova may be using the epoch of the date instead of J2000
The moon is close enough that observer location can affect its
RA/DEC (parallax effect).
I'm using Perl's Astro::Nova and Python's pyephem, not the
original C implementations of these libraries. However, if these
differences are caused by using Perl/Python, that is important in
my opinion.
My code (w/ raw results):
First, Perl and Astro::Nova:
#!/bin/perl
# RA/DEC of moon at 0N 0E at 0000 UTC 01 Jan 2013
use Astro::Nova;
# 1356998400 == 01 Jan 2013 0000 UTC
$jd = Astro::Nova::get_julian_from_timet(1356998400);
$coords = Astro::Nova::get_lunar_equ_coords($jd);
print join(",",($coords->get_ra(), $coords->get_dec())),"\n";
RESULT: 141.320712606865,9.76909442356909
- Second, Python and pyephem:
#!/usr/local/bin/python
# RA/DEC of moon at 0N 0E at 0000 UTC 01 Jan 2013
import ephem; e = ephem.Observer(); e.date = '2013/01/01 00:00:00';
moon = ephem.Moon(); moon.compute(e); print moon.ra, moon.dec
RESULT: 9:28:30.69 9:49:38.9
- The stellarium result (snapshot):
- The JPL Horizons result (snapshot):
[JPL Horizons requires POST data (not really, but pretend), so I
couldn't post a URL].
I haven't linked them (lazy), but I believe there are many
unanswered questions on stackoverflow that effectively reduce to
this question (inconsistency of precision astronomical libraries),
including some of my own questions.
I'm playing w this stuff at: https://github.com/barrycarter/bcapps/tree/master/ASTRO
I have no idea what Stellarium is doing, but I think I know about the other three. You are correct that only Horizons is using J2000 instead of the epoch-of-date for this apparent, locale-specific observation. You can bring it into close agreement with PyEphem by clicking "change" next to the "Table Settings" and switching from "1. Astrometric RA & DEC" to "2. Apparent RA & DEC."
The difference with Libnova is a bit trickier, but my late-night guess is that Libnova uses UT instead of Ephemeris Time, and so to make PyEphem give the same answer you have to convert from one time to the other:
import ephem
moon, e = ephem.Moon(), ephem.Observer()
e.date = '2013/01/01 00:00:00'
e.date -= ephem.delta_t() * ephem.second
moon.compute(e)
print moon.a_ra / ephem.degree, moon.a_dec / ephem.degree
This outputs:
141.320681918 9.77023197401
Which is, at least, much closer than before. Note that you might also want to do this in your PyEphem code if you want it to ignore refraction like you have asked Horizons to; though for this particular observation I am not seeing it make any difference:
e.pressure = 0
Any residual difference is probably (but not definitely; there could be other sources of error that are not occurring to me right now) due to the different programs using different formulae to predict where the planets will be. PyEphem uses the old but popular VSOP87. Horizons uses the much more recent — and exact — DE405 and DE406, as stated in its output. I do not know what models of the solar system the other products use.