I've never worked with web pages before and I'd like to know how best to automate the following through programming/scripting:
go to http://financials.morningstar.com/ratios/r.html?t=GMCR®ion=USA&culture=en_US
invoke the 'Export to CSV' button near the top right
save this file into local directory
parse file
Part 4 doesn't need to use the same language as for 1-3 but ideally I would like to do everything in one shot using one language.
I noticed that if I hover my mouse over the button it says: javascript:exportKeyStat2CSV(); Is this a java function I could call somehow?
Any suggestions are appreciated.
It's a JavaScript function, which is not Java!
At first glance, this may seem like you need to execute Javascript to get it done, but if you look at the source of the document, you can see the function is simply implemented like this:
function exportKeyStat2CSV(){
var orderby = SRT_keyStuts.getOrderFromCookie("order");
var urlstr = "//financials.morningstar.com/ajax/exportKR2CSV.html?&callback=?&t=XNAS:GMCR®ion=usa&culture=en-US&cur=&order="+orderby;
document.location = urlstr;
}
So, it builds a url, which is completely fixed, except the order by part, which is taken from a cookie. Then it simply navigates to that url by setting document.location. A small test shows you even get a csv file if you leave the order by part empty, so probably, you can just download the CSV from the base url that is in the code.
Downloading can be done using various tools, for instance WGet for Windows. See SuperUser for more possibilities. Anyway, 'step 1 to 3' is actually just a single command.
After that, you just need to parse the file. Parsing CSV files can be done using batch, and there are several examples available. I won't get into details, since you didn't provide any in your question.
PS. I'd check their terms of use before you actually implement this.
The button directs me to this link:
http://financials.morningstar.com/ajax/exportKR2CSV.html?&callback=?&t=XNAS:GMCR®ion=usa&culture=en-US&cur=&order=asc
You could use the Python 3 module urllib and fetch the file, save it using the os or shutil modules, then parse it using one of the many CSV parsing modules, or by making your own.
Related
How can I edit a large JSON manually?
I have a large JSON file, about 100 MB. I'd like to manually inspect some attributes, and then add more attributes to some of the objects.
I'd start off by looking at a subset of the file. Say, the 1st 100 objects. I'd gradually scale up to looking then at maybe 250, then a thousand, etc.
Can someone suggest a language or software (I'm running Windows) that excels at this task?
Some previous suggestion that aren't working or can't work.
Sublime - Could never load the file. Loading bar forever. Had to kill.
NotePad++ - Could never load. Froze. Had to kill.
Anything online - The data is confidential.
More Python and Jupyter information.
with open(path, 'r') as f:
data = json.load(f)
for i, (k, v) in enumerate(data.items()):
print(i, k, v)
if i == 2:
break
Causes an error. I think it has to do with Jupyter, but I'm not sure.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
That makes me wonder if going about it this way is just dumb.
Possible Solutions
Build a custom app using TKinter
Just don't use a Jupyter Notebook
What you can do is to write a simple GUI program. use TKinter, to create a window and a text area inside it to show the json, a text box where you will input, how many objects you want to see, and a button named Next or something to see next and one more button to save.The following will be the functionalities for each of the items.
First you will be reading complete json in python and making it a dict.
Next Button - This will keep iterating based on the value in the TextBox. you could write a custom generator, where it will be yielding based on the number of values required.
Save Button-: This will keep saving the current json into a new json or if you could, you can try to write a function to update the current json directly.
Text Area - you should take the dictionary and convert to json and show the output from the Next Button's generator.
If you are using linux (or have an opportunity to transfer the file to *nix) you might wish to check out for number of lines within a file via
wc -l myfile.json
Let's say, for the purpose of simplicity, that your file has 2530000 lines and you wish to split it into 100k lines each, you can utilize any of the commands available at your distro to split the file further into desired chunks and then to edit them, one by one.
If you are comfortable with going the "linux way", check out some of the hints given on other topics, i.e.
edit multi-GB file when vi editor doesn't work
I hope it helps!
The only viewer I have used that works on large files (I had up to 250MB size files) is Dadroit. It is fast to view and comes with search.
Now, to edit, I use vi. I search for the location and make local edits. Vim or another simpler editor should work on Windows. Have you tried vscode? 100MB shouldn't be too large for it.
The other awesome terminal tool for viewing and editing data is Visidata. I have had mixed luck with it working on json files.
Not the best answer, but the problem with reading the JSON seems limited to Jupyter Notebooks (or even the limitations of my laptop).
Working in Spyder or running from the command line circumvents the Jupyter error mentioned in the original question.
It'd be great if someone knew how to tweak Jupyter to avoid this problem (sorry, I'm not sure how yet).
for editor,try notepad++
for language, try Python
since you haven't give your data structure, I can't give more answer.
I have a question that will help me understand how stuff works and investigate feasibility of a bigger plan I have in mind.
Simply given - lets assume that all things run locally - I am wondering if it is possible to:
create an html page with a form that will prompt the user to enter
the local path of an input file inputFile.dat
this input file will be fed to a c++ exe program that expects it for input
the c++ exe file will run (this exe file depends on libraries etc but lets assume all are local here)
... and will output the result on screen
It sounds simple, but is it?
Many thanks folks!
Yes, this is definitely possible, if you want to use PHP or some other scripting language.
Create a form on your web page
Add the directory of the program
Add the directory of the data
When submitted, use the PHP (or other language) exec function (docs) to execute the program, with the supplied data as argument
The exec function returns output from the program.
Display the output as wished on your page.
Good luck!
i am pretty new to html and as part of project we have to create a game where the player has to select different tags and then gets rewarded according to a value which is stored in a .csv file.
I created the layout so far with diffrent buttons, now i want to know how can i search inside the csv file and return the corresponding value? I am using html and javascript so far.
There are about 6000 entries in this file. Is it wise to load them all into an array?
And how can i share variables between functions without calling them one after the other?
For example to find out how often a button was pressed i obviously could not use a var in a script since it would be lost after the script was executed so i had to create a html input text field to store the variable constantly.
Your help is appreciated :)
Regards,
Marcurion
It would be wise to use a DBMS, however if the project requirement is to use CSV you could use a server side programming language like PHP to read and write the CSV file.
I wouldnt recommend to load all entries in the array, instead you could only load what you need but this can be achieved by using a DBMS.
To share variables between functions on javascript you will need to declare them as global variables, you could google: "javascript variables scope" or "javascript global variables" and read a little more of scopes.
If you want to know how often the button was pressed you can make AJAX requests to a server side script that can handle what you need if the script needs to be submited or executed.
You could check out JQuery framework to use events, ajax and infinite other things.
My advice is for you to read a little more about javascript, a framework for javascript like JQuery and a little of server side languages like PHP, with these tools you can easily develop what you need.
I'm trying to find a way to automatically download all links from a web page, but I also want to rename them. for example:
<a href = fileName.txt> Name I want to have </a>
I want to be able to get a file named 'Name I want to have' (I don't worry about the extension).
I am aware that I could get the page source, then parse all the links, and download them all manually, but I'm wondering if there are any built-in tools for that.
lynx --dump | grep http:// | cut -d ' ' -f 4
will print all the links that can be batch fetched with wget - but is there a way to rename the links on the fly?
I doubt anything does this out of the box. I suggest you write a script in Python or similar to download the page, and load the source (try the Beautiful Soup library for tolerant parsing). Then it's a simple matter of traversing the source to capture the links with their attributes and text, and download the files with the names you want. With the exception of Beautiful Soup (if you need to be able to parse sloppy HTML), all you need is built in with Python.
I solved the problem by converting the web page entirely to unicode on the first pass (using notepad++'s built-in conversion)
Then I wrote a small shell script that used cat, awk and wget to fetch all the data.
Unfortunately, I couldn't automate the process since I didn't find any tools for linux which would convert an entire page from KOI8-R to unicode.
I'm about to start writing a program which will attempt to extract data from a Google Code site so that it may be imported in to another project management site. Specifically, I need to extract the full issue detail from the site (description, comments, and so on).
Unfortunately Google don't provide an API for this, nor do they have an export feature, so to me the only option looks to be extracting the data from the actual HTML (yuck). Does any one have any suggestions on "best practice" from attempting to parse data out of HTML? I'm aware that this is less than ideal, but I don't think I have much choice. Can anyone else think of a better way, or maybe someone else has already done this?
Also, I'm aware of the CSV export feature on the issue page, however this does not give complete data about issues (but could be a useful starting point).
I just finished a program called google-code-export (hosted on Github). This allows you to export your Google Code project to an XML file, for example:
>main.py -p synergy-plus -s 1 -c 1
parse: http://code.google.com/p/synergy-plus/issues/detail?id=1
wrote: synergy-plus_google-code-export.xml
... will create a file named synergy-plus_google-code-export.xml.