How to import PBP Data from NFL.com into R - json

I am trying to import data from past NFL games in the form of Play-by-play tables and am mostly working in R to collect the data and create a data set.
An example of the data I am after is on this page: http://www.nfl.com/gamecenter/2012020500/2011/POST22/giants#patriots#menu=gameinfo&tab=analyze&analyze=playbyplay
I know that NFL.com uses JSON and much of the necessary data are in JSON files attached to the site. My efforts at extracting data from these files using the JSON package in R have been pretty feeble. Any advice y'all have is appreciated.
Would I just be better off using PHP to farm the data?

I don't know if you have already succeeded loading the JSON files into R, but here is an example of that:
library(rjson)
json=fromJSON(file='http://www.nfl.com/liveupdate/game-center/2012020500/2012020500_gtd.json')
json$`2012020500`$home$stats
If you are having trouble finding the URL of the JSON file, use Firebug (an extension for Firefox) and you can see the webpage requesting the JSON file.
The JSON file, is, of course, huge and complicated. But it is complicated data. Whatever you are looking for should be in there. If you are just looking for a straight dump of the play-by-play text, then you can use this URL:
http://www.nfl.com/widget/gc/2011/tabs/cat-post-playbyplay?gameId=2012020500

I extracted all the data for one team for one season more-or-less manually. If you want data for a lot of games consider emailing the league and asking for the files you mentioned. They publish the data, so maybe they will give you the files. The NFL spokesman is Greg Aiello. I suspect you could find his email address with Google.
Sorry this is not a suggested programming solution. If this answer is not appropriate for the forum please delete it. It is my first posted answer.

Related

Postman: How to make multiple GET requests at the same time and get the result into a .csv file

I want to make multiple GET request data and save the result of the JSON responses on a .csv file
I want to make 1000 requests with different request parameters
PS : the order of the requests is not important
Is it possible to do such with Postman?
If yes, can anyone explain to me how can this be achieved?
Welcome to SO.
Looks like this isn't a native feature included in postman.
I did find a workaround for this, but I personally think there are better solutions like using NodeJs and a http client. This will give you the flexibility to do whatever you want.
Since you are new here, I'd like to point your attention to this nice SO article about how to ask questions. In this case, your question doesn't show you did much research by yourself. You'll notice that quality questions will get you more quality answers ;-).
I think you can create collections and make many requests as you went through the collection. To save the response to CSV file, click on the runner button on the top left corner and then choose your collection, then give the data file format and click the run button, it will save all your responses to CSV file

trying to load data from url in R

so I want to load all the formatted data from this url: https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti
into r so I can filter some of it out. I know how to filter it properly once I get it, but I can't get it "injected" into R properly.
I've seen many ways to pull the data if the url ends in ".txt" or ".csv", but if this url doesn't end in a filetype, the only way I know how to get it is to pull the html, but then I get... all the html.
there are several options to download the file as a .csv and inject it that way, but if I ever get good enough to do real work, I feel like I should know how to get this directly from the source.
The closest I've gotten is using the function:
XML content does not seem to be XML: 'https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti'
but i get an error that says
XML content does not seem to be XML: 'https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti'
so that doesn't work either :(.
If anyone could help me out or at least point me in the right direction, I'd appreciate it greatly.
It is quite complicated to scrape the data from the table but this website provides a convenient .json link file which you can access quite easily from R. The link https://data.mo.gov/resource/nyk8-k9ti.json can be found from Export -> SODA API.
library(rjson)
data <- fromJSON('https://data.mo.gov/resource/nyk8-k9ti.json')
I believe your question could be more precisely defined as "How to scrape data from website" rather than just simply loading data from an URL in R. Web scraping is totally another technique here. If you know some Python, I recommend you to take this free course teaching you how to get access to data on website via Python. Or, you can trythis website to get what you want, though, some advanced tools are not free. Hope it helps.

"POST" form data to XLS/CSV

So I'm trying to essentially "POST" data from a form in an offline webpage, to an Excel spreadsheet, or CSV, or even just a TXT file. Now I have seen this to be possible using ActiveX in Internet Explorer, however, the methods I saw were pretty particular to the user's code, so I got a bit lost in translation being a beginner. Also some recommended using an offline database using JS, but I'm not sure where to begin with that.
Can anyone offer some insight on this? Is it possible? What would be the best route to take?
There are many ways to accomplish this. The best solution will be the one that suits your specific requirements. Obviously, creating a text/csv file is easier than creating an xls. That said, the basic psuedo code is as follows:
Collect form data
Create (in-memory or temporary) file from collected form data.
Return file as download to client, or just save to some location, OR (best option) insert a row into a database.

Using a JSON file instead of a database - feasible?

Imagine I've created a new javascript framework, and want to showcase some examples that utilise it, and let other people add examples if they want. Crucially I want this to all be on github.
I imagine I would need to provide a template HTML document which includes the framework, and sorts out all the header and footer correctly. People would then add examples into the examples folder.
However, doing it this way, I would just end up with a long list of HTML files. What would I need to do if I wanted to add some sort of metadata about each example, like tags/author/date etc, which I could then provide search functionality on? If it was just me working on this, I think I would probably set up a database. But because it's a collaboration, this is a bit tricky.
Would it work if each HTML file had a corresponding entry in a JSON file listing all the examples where I could put this metadata? Would I be able to create some basic search functionality using this? Would it be a case of: Step 1 : create new example file, step 2: add reference to file and file metadata to JSON file?
A good example of something similar to what I want is wbond's package manager http://wbond.net/sublime_packages/community
(There is not going to be a lot of create/update/destroy going on - mainly just reading.
Check out this Javascript database: http://www.taffydb.com/
There are other Javascript databases that let you load JSON data and then do database operations. Taffy lets you search for documents.
It sounds like a good idea to me though - making HTML files and an associated JSON document that has meta data about it.

How might I write a program to extract my data from Google Code?

I'm about to start writing a program which will attempt to extract data from a Google Code site so that it may be imported in to another project management site. Specifically, I need to extract the full issue detail from the site (description, comments, and so on).
Unfortunately Google don't provide an API for this, nor do they have an export feature, so to me the only option looks to be extracting the data from the actual HTML (yuck). Does any one have any suggestions on "best practice" from attempting to parse data out of HTML? I'm aware that this is less than ideal, but I don't think I have much choice. Can anyone else think of a better way, or maybe someone else has already done this?
Also, I'm aware of the CSV export feature on the issue page, however this does not give complete data about issues (but could be a useful starting point).
I just finished a program called google-code-export (hosted on Github). This allows you to export your Google Code project to an XML file, for example:
>main.py -p synergy-plus -s 1 -c 1
parse: http://code.google.com/p/synergy-plus/issues/detail?id=1
wrote: synergy-plus_google-code-export.xml
... will create a file named synergy-plus_google-code-export.xml.