R - json webpage to data frame

R - json webpage to data frame - json

I'm looking to put the data from this webpage: http://live.nhl.com/GameData/20162017/2016020725/PlayByPlay.json into a usable R data frame.
I've tried what I've seen so far by using:
library(jsonlite)
json <- "http://live.nhl.com/GameData/20162017/2016020725/PlayByPlay.json"
doc <- fromJSON(json, simplifyDataFrame = TRUE)
That puts the file into a list of 1 and to be honest, working with lists in R is not yet a skill of mine (more comfortable with data frames).
I'd like to be able to get scrape that webpage into a usable data frame.
I've tried
PBP <- rbindlist(lapply(doc, as.data.table), fill = TRUE)
but that did not work.
Any ideas? Happy to provide any more info if needed.

Perhaps the first course of action would be to understand lists down to the bone. What you have there is a list of length 1. If you do names(doc) you will notice that this list element is named data. To fully reveal the structure of the object, try str(doc). That's a lot of output! Here are a few first lines to give you the sense of what is going on.
Working with lists can be done using [[ and $. Also [ but see this tweet for details. You can access the first element by doc$data, doc[[1]] or doc[["data]]. All are equivalent, but some may be more handy for some tasks. To "climb" down the list tree, just append extra arguments. Note that you can mix all off these. See the inline code for a sneak preview. From your question it's not clear what part of the json file you're after. Try expanding the question or even better, tinker around with doc.
doc:
data # doc[[1]] or doc[["data"]] pr doc$data
|___ refreshInterval # doc[[1]][[1]] or doc[[1]][["refreshinterval"]] or doc[["data"]][["refreshinterval]] or doc$data$refreshinterval
|___ game # doc[[1]][[2]] or doc[[1]][["game"]] or you go the idea
|___ awayteamid # doc$data$refreshinterval
|___ awayteamname
|___ hometeamname
|___ plays
|___ awayteamnick
|___ hometeamnick
|___ hometeamid
You can access game stats through
xy <- doc$data$game$plays$play
xy[1:6, c("desc", "type", "p2name", "teamid", "ycoord", "xcoord")]
desc type p2name teamid ycoord xcoord
1 Radko Gudas hit Chris Kreider Hit Chris Kreider 4 -12 -96
2 Pavel Buchnevich Wrist Shot saved by Steve Mason Shot Steve Mason 3 26 -42
3 Brandon Pirri hit Brandon Manning Hit Brandon Manning 3 42 -68
4 Nick Cousins hit Adam Clendening Hit Adam Clendening 4 35 92
5 Nick Cousins Wrist Shot saved by Henrik Lundqvist Shot Henrik Lundqvist 4 19 86
6 Michael Grabner Wrist Shot saved by Steve Mason Shot Steve Mason 3 5 -63

Related

Google Fit estimated steps through REST API decrease in time with some users

We are using Googlefit REST API in a process with thousands of users, to get daily steps. With most of users, process is OK, although we are finding some users with this specific behaviour: users step increase during the day, but at some point, they decrease significantly.
We are finding a few issues related to this with Huawei health apps mainly (and some Xiaomi health apps).
We use this dataSourceId to get daily steps: derived:com.google.step_count.delta:com.google.android.gms:estimated_steps
An example of one of our requests to get data for 15th March (Spanish Times):
POST https://www.googleapis.com/fitness/v1/users/me/dataSources
Accept: application/json
Content-Type: application/json;encoding=utf-8
Authorization: Bearer XXXXXXX
{
"aggregateBy": [{
"dataTypeName": "com.google.step_count.delta",
"dataSourceId": "derived:com.google.step_count.delta:com.google.android.gms:estimated_steps"
}],
"bucketByTime": { "durationMillis": 86400000 },
"startTimeMillis": 1615244400000,
"endTimeMillis": 1615330800000
}
With most of users, this goes well (it gets the same data that shows up to the user in googlefit app), but with some users as described, numbers during day increase at first, and decrease later. Some users' data in the googlefit app is much greater (or significantly greater) than the one found through the REST API.
We have even traced this with a specific user during the day. Using buckets of 'durationMillis': 3600000, we have painted a histogram of hourly steps in one day (with a custom made process).
For the same day, in different moments of time (a couple of hours difference in this case), we get this for the EXACT SAME USER:
20210315-07 | ########################################################## | 1568
20210315-08 | ############################################################ | 1628
20210315-09 | ########################################################## | 1574
20210315-10 | ####################### | 636
20210315-11 | ################################################### | 1383
20210315-12 | ###################################################### | 1477
20210315-13 | ############################################### | 1284
20210315-14 | #################### | 552
vs. this, that was retrieved A COUPLE OF HOURS LATER:
20210315-08 | ################# | 430
20210315-09 | ######### | 229
20210315-10 | ################# | 410
20210315-11 | ###################################################### | 1337
20210315-12 | ############################################################ | 1477
20210315-13 | #################################################### | 1284
20210315-14 | ###################### | 552
("20210315-14" means 14.00 at 15th March of 2021)
This is the returning JSON in the first case:
[{"startTimeNanos":"1615763400000000000","endTimeNanos":"1615763460000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":6,"mapVal":[]}]},
{"startTimeNanos":"1615788060000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1568,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615795080000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1628,"mapVal":[]}]},
{"startTimeNanos":"1615795200000000000","endTimeNanos":"1615798500000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1574,"mapVal":[]}]},
{"startTimeNanos":"1615798860000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":636,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1383,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
This is the returning JSON in the latter case:
[{"startTimeNanos":"1615788300000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":517,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615794540000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":430,"mapVal":[]}]},
{"startTimeNanos":"1615796400000000000","endTimeNanos":"1615798200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":229,"mapVal":[]}]},
{"startTimeNanos":"1615798980000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":410,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1337,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
AS you can see, all points always come from originDataSourceId: "raw:com.google.step_count.delta:com.huawei.health"
It looks like a process of Googlefit is doing some kind of adjustments, removing some steps or datapoints, although we cannot find a way to detect what and why, and we cannot explain to the user what is happening or what he or we can do to make his app data to be exactly like ours (or the other way around). His googlefit app shows a number that is not the same one as the one that the REST API shows.
User has already disabled the "googlefit app tracking activities" option.
I would love to know, or try to get some hints to know:
What can I do to debug even more?
Any hint about why is happening this?
Is there anyway, from a configuration point of view (for the user) to prevent this to happen?
Is there anyway, from a development point of view, to prevent this to happen?
Thanks and regards.
UPDATE AFTER Andy Turner's question (thanks for the comment!).
We were able to "catch" this during several hours: 18.58 (around 6K steps), 21.58 (around 25K steps), 22.58 (around 17K steps), 23.58 (around 26K steps). We exported datasets for those, and here is the result.
Another important info: Data is coming only from "raw:com.google.step_count.delta:com.huawei.health". We went through other datasets that might look suspicious, and all were empty (apart from derived and so on).
If we interpret this correctly, probably it's huawei which is sending sometimes a value, and next time, another thing; so it's probably some misconfiguration in the huawei part.
Here are the datasets exported:
https://gist.github.com/jmarti-theinit/8d98996873a9c499a14899a9b62162f3
Result of the GIST is:
Length of 18.58 points 165
Length of 21.58 points 503
Length of 22.58 points 294
Length of 23.58 points 537
How many points in 21.58 that exist in 18.58 => 165
How many points in 22.58 that exist in 18.58 => 57
How many points in 22.58 that exist in 21.58 => 294
How many points in 23.58 that exist in 18.58 => 165
How many points in 23.58 that exist in 21.58 => 503
How many points in 23.58 that exist in 22.58 => 294
So our bet is points are removed and added by devices behind huawei (for example only 57 are common in 18.58 - 22.58), and we cannot control anything more from googlefit's side. Is that correct? Anything else we could see?

We're having similar issues using the REST API.
Here you have what coincides with the case of Jordi:
we are also from Spain (and our users too), although we use servers in Spain and the US
we get the same daily steps value as the google fit app for some users, but not for other users
daily steps increases during the current day, but every next day we make the request, daily steps decrease sometimes
we are making the same request, from the start of day to the end of the day, with 86400000 as bucket time and same data type and data source id
We are in the final development phase, so we're testing with a few users only. Our users have Xiaomi mi band devices.
We think that the problem could be a desynchronization of the servers that we're hitting, because if we test with other apps like this one, they show the correct values. We've created another google cloud console oauth client credentials and new email accounts to test with a brand new users and oauth clients, but the results are the same.
This is the recommended way to get the daily steps andwe are using exactly the same request
https://developers.google.com/fit/scenarios/read-daily-step-total
and even with the "try it" option in the documentation the results are wrong.
What else can we do to help you resolve the issue?
Thank you very much!

Is there a way to combine these variables in a way that makes sense?

Hello stack overflow community!
I am a sociology student working on a thesis project comparing home value appreciation and neighborhood racial composition over time.
I'm currently using two separate data sources and trying to combine them in a way that makes sense without aggregating anything.
The first data source is GIS data which has information on home sales in each year by home. The second is census data which has yearly estimates of racial composition by census tract. Both are in .csv formats.
My goal is to create a set of variables for each home row in the GIS data which represents the racial composition for the tract the home is in at the year it was sold (e.g. home 1 | 2010| $500,000 | Census tract 10 | 10% white).
I began doing this by going into Stata and using the following strategy:
For example, if I'm looking at a home sold in 2010 in Census tract 10 and I find that this tract was 10% white in 2010, using something like
If censustract=10 and year=2010, replace percentwhite = 10
However, this seemed incredibly time consuming, as I'm using data that go back decades and a couple dozen Census tracts.
Does anyone have any suggestions on how I might do this smarter, not harder? The first thought I had was to aggregate the data by census tract and year, but was hoping to avoid that if possible. Thank you so much in advance for your help and have a terrific day and start to the new year!

It sounds like you can simply merge census data onto your GIS data. That will be much less painful than using -replace-. Here's an example:
*GIS data: information on home sales in each year by home
clear
input censustract house_id year house_value_k
10 100 2010 200
11 101 2020 500
11 102 1980 100
end
tempfile GIS_data
sa `GIS_data'
*census data: yearly estimates of racial composition by census tract
clear
input censustract year percentwhite
10 2010 20
10 2000 10
11 2010 25
11 2000 5
end
tempfile census_data
sa `census_data'
*easy method: merge the census data onto your GIS data
use `GIS_data', clear
mer m:1 censustract year using `census_data'
drop if _merge==2
list
*hard method: use -replace-
use `GIS_data', clear
gen percentwhite=.
replace percentwhite=20 if censustract==10 & year==2010
replace percentwhite=10 if censustract==10 & year==2000
replace percentwhite=25 if censustract==11 & year==2010
replace percentwhite=5 if censustract==11 & year==2000
list
Both methods "work", but using -merge- is much easier and less prone to errors.
Note: I intentionally created the data sets so that the merge wouldn't be perfect. You will likely want to drop some of the observations in that case. In the code above I dropped when _merge==2

EMV Offline Approval/Decline

I'm developing an interface to a VeriFone VX terminal. Although, this is really a general EMV question. Our processor has a zero floor limit, so it will always be sent online. However, in case it ever changes, how do you know (what tags) if the transaction was approved or declined offline? Or, in other words, how do you know to go online or not?

how do you know (what tags) if the transaction was approved or declined offline? Or, in other words, how do you know to go online or not?
The terminal has to decides either to proceed the transaction offline, to go online or to reject the transaction. Here terminal send a command (AC) to the card and response of this command helps terminal to decide the action next followed.
Decision making is depend on three fields -
1) - Issuer Action Code
2) - Terminal Action Code
3) - TVR
IAC, TAC and TVR have the same structure. For more to know this data you can see EMV BOOK 3
IAC Usage Example-
suppose IAC-ONLINE (TAG - 9F0F) = 08 00 00 00 00 ,
here byte 1 bit 4 is on i.e. offline DDA Failed ,
Here Issuer want to go online if offline DDA Failed.
when terminal perform DDA and it fails, it set corresponding bit in TVR
that means TVR says- offline DDA is failed for this card.
now terminal check IAC online and found DDA_Failed bit is on and same on in TVR, here terminal decision would be to go online and then it send a Gen AC command to card with p1 = 80 ( ARQC - Online authorisation requested).
Coding of P1 as below
Ex- Gen AC command
C: 80 AE 80 00 other data
R: SW1/SW2=9000 (Normal processing: No error) Lr=32
77 1E 9F 27 01 80 9F 36 02 02 13 9F 26 08 2D F3
83 3C 61 85 5B EA 9F 10 07 06 84 23 00 31 02 08
.
Now decision is made by card, Terminal get card decision in the response of Gen AC command. Card return tag 9F27 - Cryptogram Information Data. here card return 80 i.e. cards wants transaction to go Online.
Really your question is important and you need to read more spec for clarity on this topic. Please checks EMV BOOKs, for more in this topic. also can read - Terminal action analysis or Card Action analysis

Assuming you're using VeriFone's VIPA API, then the first 'Continue Transaction' command (GenAC1) returns tags wrapped in a TLV template (or 'constructed' TLV tag). The value of this template determines the result:
E3: Locally authorized
E4: Requires online authorization

AFAIK (in vanilla EMV) the tag Cryptogram Information Data ('9F27') returned during 1st GENERATE AC should serve this purpose.
See EMV Book 3, Table 14.
Beware, that this tag contains the decision of the card, so you won't see the cryptogram type the kernel required.

Is there a library than can convert ounces into Cups?

Ideally I could specify something like 10 as my input (in ounces) and get back a string like this: "1 & 1/4 cups". Is there a library that can do something like this? (note: I am totally fine with the rounding implicit in something like this).
Note: I would prefer a C library, but I am OK with solutions for nearly any language as I can probably find appropriate bindings.

It is really two things: 1) the data encompassing the conversion, 2) the presentation of the conversion.
The second is user choice: If you want fractions, you need to write or get a fractions library. There are many.
The first is fairly easy. The vast majority of conversions are just a factor. Usually you will organize known factors into a conversion into the appropriate SI unit for that type of conversion (volume, length, area, density, etc.)
Your data then looks something like this:
A acres 4.046870000000000E+03 6
A ares 1.000000000000000E+02 15
A barns 1.000000000000000E-28 15
A centiares 1.000000000000000E+00 15
A darcys 9.869230000000000E-13 6
A doors 9.290340000000000E+24 6
A ferrados 7.168458781362010E-01 6
A hectares 1.000000000000000E+04 15
A labors 7.168625518000000E+05 6
A Rhode Island 3.144260000000000E+09 4
A sections 2.590000000000000E+06 6
A sheds 1.000000000000000E-48 15
A square centimeters 1.000000000000000E-04 15
A square chains (Gunter's or surveyor's) 4.046860000000000E+02 6
A square chains (Ramsden's) 9.290304000000000E+02 5
A square feet 9.290340000000000E-02 6
A square inches 6.451600000000000E-04 15
A square kilometers 1.000000000000000E+06 15
A square links (Gunter's or surveyor's) 4.046900000000000E-02 5
A square meters (SI) 1.000000000000000E+00 15
A square miles (statute) 2.590000000000000E+06 7
A square millimeter 1.000000000000000E-06 15
A square mils 6.451610000000000E-10 5
A square perches 2.529300000000000E+01 5
A square poles 2.529300000000000E+01 5
A square rods 2.529300000000000E+01 5
A square yards 8.361270000000000E-01 6
A townships 9.324009324009320E+07 5
In each case, these are area conversions into the SI unit for area -- square meters. Then make a second conversion into the the desired conversion. The third number there is significant digits.
Keep a file of these for the desired factors and then you can convert from any area to any area that you have data on. Repeat for other categories of conversion (Volume, Power, Length, Weight, etc etc etc)

My thoughts were using Google Calculator for this task if you want generic conversions...
Example: http://www.google.com/ig/calculator?q=10%20ounces%20to%20cups -- returns JSON, but I believe you can specify format.
Here's a Java example for currency conversion:
http://blog.caplin.com/2011/01/06/simple-currency-conversion-using-google-calculator-and-java/

Well, for a quick and dirty solution you could always have it run GNU Units as an external program. If your software is GPL compatible you can even rip off the code from Units and use it in your program.

Please check out JSR 363, the Units of Measurement Standard for Java: http://unitsofmeasurement.github.io/
At least in C++ you get basic support via "value types" already, but you still have to implement those conversions yourself or find a suitable library similar to what JSR 363 offers for Java.

Can this Roman Number to Integer converter code be shorter?

95 bytes currently in python
I,V,X,L,C,D,M,R,r=1,5,10,50,100,500,1000,vars(),lambda x:reduce(lambda T,x:T+R[x]-T%R[x]*2,x,0)
Here is the few test results, it should work for 1 to 3999 (assume input is valid char only)
>>> r("I")
1
>>> r("MCXI")
1111
>>> r("MMCCXXII")
2222
>>> r("MMMCCCXXXIII")
3333
>>> r("MMMDCCCLXXXVIII")
3888
>>> r("MMMCMXCIX")
3999
And this is not duplicate with this, this is reversed one.
So, is it possible to make that shorter in Python, or Other languages like ruby could be done shorter than that?

Shortest solutions from codegolf.com
There was a "Roman to decimal" competition over at Code Golf some time ago. (Well, actually it's still running because they never end.) A Perl golfer by the name of eyepopslikeamosquito decided to win all four languages (Perl, PHP, Python, and Ruby), and so he did. He wrote a fascinating four-part series "The golf course looks great, my swing feels good, I like my chances" (part II, part III, part IV) describing his approaches over at Perl Monks.
Here are his solutions:
Ruby, 53 strokes
n=1;$.+=n/2-n%n=10**(494254%C/9)%4999while C=getc;p$.
Perl, 58 strokes
$\+=$z-2*$z%($z=10**(19&654115/ord)%1645)for<>=~/./g;print
He also has a 53-stroke solution, but it probably doesn't work right now: (it uses the $^T variable during a few second period in 2011!)
$\+=$z-2*$z%($z=10**(7&$^T/ord)%1999)for<>=~/./g;print
PHP, 70 strokes
<?while(A<$c=fgetc(STDIN))$t+=$n-2*$n%$n=md5(o²Ûö¬Ñ.$c)%1858+1?><?=$t;
The six weird characters in the md5(..) are chr(111).chr(178).chr(219).chr(246).chr(172).chr(209) in Perl notation.
Python, 78 strokes
t=p=0
for r in raw_input():n=10**(205558%ord(r)%7)%9995;t+=n-2*p%n;p=n
print t

Python - 94 chars
cheap shot :)
I,V,X,L,C,D=1,5,10,50,100,500
M,R,r=D+D,vars(),lambda x:reduce(lambda T,x:T+R[x]-T%R[x]*2,x,0)

Actually defining my own fromJust is smaller, a total of 98
r=foldl(\t c->t+y c-t`mod`y c*2)0 --34
y x=f$lookup x$zip"IVXLCDM"[1,5,10,50,100,500,1000] --52
f(Just x)=x --12
-- assumes correct input
Haskell gets close.
import Data.Maybe --18
r=foldl(\t c->t+y c-t`mod`y c*2)0 --34
y x=fromJust$lookup x$zip"IVXLCDM"[1,5,10,50,100,500,1000] --59
total bytes = 111
Would be 93 if i didn't need the import for fromJust

Adopting a response from Jon Skeet to a previously asked similar question:
In my custom programming language "CPL1839079", it's 3 bytes:
r=f

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

R - json webpage to data frame - json

Related

Google Fit estimated steps through REST API decrease in time with some users

Is there a way to combine these variables in a way that makes sense?

EMV Offline Approval/Decline

Is there a library than can convert ounces into Cups?

Can this Roman Number to Integer converter code be shorter?

Categories

Resources