trying to load data from url in R - html

so I want to load all the formatted data from this url: https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti
into r so I can filter some of it out. I know how to filter it properly once I get it, but I can't get it "injected" into R properly.
I've seen many ways to pull the data if the url ends in ".txt" or ".csv", but if this url doesn't end in a filetype, the only way I know how to get it is to pull the html, but then I get... all the html.
there are several options to download the file as a .csv and inject it that way, but if I ever get good enough to do real work, I feel like I should know how to get this directly from the source.
The closest I've gotten is using the function:
XML content does not seem to be XML: 'https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti'
but i get an error that says
XML content does not seem to be XML: 'https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti'
so that doesn't work either :(.
If anyone could help me out or at least point me in the right direction, I'd appreciate it greatly.

It is quite complicated to scrape the data from the table but this website provides a convenient .json link file which you can access quite easily from R. The link https://data.mo.gov/resource/nyk8-k9ti.json can be found from Export -> SODA API.
library(rjson)
data <- fromJSON('https://data.mo.gov/resource/nyk8-k9ti.json')

I believe your question could be more precisely defined as "How to scrape data from website" rather than just simply loading data from an URL in R. Web scraping is totally another technique here. If you know some Python, I recommend you to take this free course teaching you how to get access to data on website via Python. Or, you can trythis website to get what you want, though, some advanced tools are not free. Hope it helps.

Related

Postman: How to make multiple GET requests at the same time and get the result into a .csv file

I want to make multiple GET request data and save the result of the JSON responses on a .csv file
I want to make 1000 requests with different request parameters
PS : the order of the requests is not important
Is it possible to do such with Postman?
If yes, can anyone explain to me how can this be achieved?
Welcome to SO.
Looks like this isn't a native feature included in postman.
I did find a workaround for this, but I personally think there are better solutions like using NodeJs and a http client. This will give you the flexibility to do whatever you want.
Since you are new here, I'd like to point your attention to this nice SO article about how to ask questions. In this case, your question doesn't show you did much research by yourself. You'll notice that quality questions will get you more quality answers ;-).
I think you can create collections and make many requests as you went through the collection. To save the response to CSV file, click on the runner button on the top left corner and then choose your collection, then give the data file format and click the run button, it will save all your responses to CSV file

GetOrgChart JSON format

A little startup I am doing work for is searching for a JavaScript Org Chart, and we believe we'd like to use "GetOrgChart" from getorgchart.com.
We definitely have a working back-end already that provides data to the front-end via RESTful services and provides JSON data.
We know the GetOrgChart can be loaded with data from various sources, and in this case we'd like to know what format the JSON has to be in?
Are there any examples out there of how the JSON should look like?
We'd definitely like to download and register this product, but that is one of the questions we'd like to get answered.
Thanks!
On their demos page, you can click the 'Get HTML Code' link (upper right, below the site header) which opens the javascript used to render the demo, including the format of the data.

"POST" form data to XLS/CSV

So I'm trying to essentially "POST" data from a form in an offline webpage, to an Excel spreadsheet, or CSV, or even just a TXT file. Now I have seen this to be possible using ActiveX in Internet Explorer, however, the methods I saw were pretty particular to the user's code, so I got a bit lost in translation being a beginner. Also some recommended using an offline database using JS, but I'm not sure where to begin with that.
Can anyone offer some insight on this? Is it possible? What would be the best route to take?
There are many ways to accomplish this. The best solution will be the one that suits your specific requirements. Obviously, creating a text/csv file is easier than creating an xls. That said, the basic psuedo code is as follows:
Collect form data
Create (in-memory or temporary) file from collected form data.
Return file as download to client, or just save to some location, OR (best option) insert a row into a database.

How to import PBP Data from NFL.com into R

I am trying to import data from past NFL games in the form of Play-by-play tables and am mostly working in R to collect the data and create a data set.
An example of the data I am after is on this page: http://www.nfl.com/gamecenter/2012020500/2011/POST22/giants#patriots#menu=gameinfo&tab=analyze&analyze=playbyplay
I know that NFL.com uses JSON and much of the necessary data are in JSON files attached to the site. My efforts at extracting data from these files using the JSON package in R have been pretty feeble. Any advice y'all have is appreciated.
Would I just be better off using PHP to farm the data?
I don't know if you have already succeeded loading the JSON files into R, but here is an example of that:
library(rjson)
json=fromJSON(file='http://www.nfl.com/liveupdate/game-center/2012020500/2012020500_gtd.json')
json$`2012020500`$home$stats
If you are having trouble finding the URL of the JSON file, use Firebug (an extension for Firefox) and you can see the webpage requesting the JSON file.
The JSON file, is, of course, huge and complicated. But it is complicated data. Whatever you are looking for should be in there. If you are just looking for a straight dump of the play-by-play text, then you can use this URL:
http://www.nfl.com/widget/gc/2011/tabs/cat-post-playbyplay?gameId=2012020500
I extracted all the data for one team for one season more-or-less manually. If you want data for a lot of games consider emailing the league and asking for the files you mentioned. They publish the data, so maybe they will give you the files. The NFL spokesman is Greg Aiello. I suspect you could find his email address with Google.
Sorry this is not a suggested programming solution. If this answer is not appropriate for the forum please delete it. It is my first posted answer.

How might I write a program to extract my data from Google Code?

I'm about to start writing a program which will attempt to extract data from a Google Code site so that it may be imported in to another project management site. Specifically, I need to extract the full issue detail from the site (description, comments, and so on).
Unfortunately Google don't provide an API for this, nor do they have an export feature, so to me the only option looks to be extracting the data from the actual HTML (yuck). Does any one have any suggestions on "best practice" from attempting to parse data out of HTML? I'm aware that this is less than ideal, but I don't think I have much choice. Can anyone else think of a better way, or maybe someone else has already done this?
Also, I'm aware of the CSV export feature on the issue page, however this does not give complete data about issues (but could be a useful starting point).
I just finished a program called google-code-export (hosted on Github). This allows you to export your Google Code project to an XML file, for example:
>main.py -p synergy-plus -s 1 -c 1
parse: http://code.google.com/p/synergy-plus/issues/detail?id=1
wrote: synergy-plus_google-code-export.xml
... will create a file named synergy-plus_google-code-export.xml.