wget not displaying all website data

wget not displaying all website data - html

Input: wget -qO- http://runescape.com/community | grep -i playerCount
Output: <li class="header-top__right-option"><strong id="playerCount">0</strong> Online</li>
In browser:
Using cygwin..I am trying to use wget to pull a number out of a webpage. As shown in the example above, the playerCount is 0. If you actually load the webpage up and look at the same code, it is a completely different number. How can I get the real number? I was told it may be something with cookies or a user agent. This just started not working a few weeks ago.

That value appears to be filled in via javascript (though I can't find the request at a quick glance). If that's the case then you cannot get it with something like wget or curl in this way. You would need to find the specific request and send that.
Given the URL indicated by aadarshs (which I saw but mistested when I looked at it the first time) something like this should work.
curl -s 'http://www.runescape.com/player_count.js?varname=iPlayerCount&callback=jQuery000000000000000000000_0000000000000' | awk -F '[()]' '{print $2}'

This worked for me
curl http://runescape.com/community | grep -i playercount
EDIT: Adding the player count link
curl http://www.runescape.com/player_count.js\?varname\=iPlayerCount\&callback\=jQuery111004241600367240608_1434074587842\&_\=1434074587843

Related

Ubuntu JSON API script to search IPs

I'm after firstly formatting a curl JSON API link in Ubuntu, as you can see in my picture, on the website JSON is formatted correctly, via Ubuntu Its just a bunch of word-wrapped code.
I tried using | jq at the end but that didn't work like so
curl https://www.abuseipdb.com/check/51.38.41.14/json?
key=my_key_here&days=7&verbose | jq
(not including my API key) (51.38.41.14) Is a spammer IP
Once this is figured out I would then want to script it so I run an alias called IPDB that asks me the IP and displays the curl address API request
Any guidance would be apprecaited

And again I figured it out for myself, no Idea why I joined today seen as I answered all my own questions :)
It might be helpful to someone in the future.
Make a text file and call it abuse.sh and copy the below text into it, then run the script and it will ask for a IP
echo Please enter IP to search
read -p 'IPaddress: ' IP
echo curl -s https://www.abuseipdb.com/check/$IP/json?key=your_key_here=7&verbose' | jq
I have taken my API key out, but you can get a free one from their website.
This script is for IP checking to see if the IP Is an abuser(spammer, hacker etc). I work for an ISP and wanted to automate how to check for IPs besides using good ole mxtoolbox.

How can I trigger a url from the terminal

Ok, I know this is one odd question, how can I run a html url from the terminal? Let me explain...
I have a shell script that is using the an api to update a record in a database, looks something like this:
http://{account}.cartodb.com/api/v2/sql?q=UPDATE test_table SET column_name = 'my new string value' WHERE cartodb_id = 1 &api_key={Your API key}
How can I run the above from my shell script so that it will have the same effect as when it is run in a browser?

Try this:
wget "http://{account}.cartodb.com/your/api?call= etc." -qO-
If your return page generate a lot of output, use less:
wget "http://{account}.cartodb.com/your/api?call= etc." -qO- | less
Or if you don't care about the output result:
wget "http://{account}.cartodb.com/your/api?call= etc." -q -O /dev/null

If you are asking about text based browsers there are quite a few.
However, running it from a script means you'll want it to be non-interactive and possibly to throw away the output.
e.g. lynx -dump {some_url} 2>/dev/null.
other command line browsers include w3c, links, elinks.
You might also want to use wget or curl for some operations.

Is there something like a "CSS selector" or XPath grep?

I need to find all places in a bunch of HTML files, that lie in following structure (CSS):
div.a ul.b
or XPath:
//div[#class="a"]//div[#class="b"]
grep doesn't help me here. Is there a command-line tool that returns all files (and optionally all places therein), that match this criterium? I.e., that returns file names, if the file matches a certain HTML or XML structure.

Try this:
Install http://www.w3.org/Tools/HTML-XML-utils/.
Ubuntu: aptitude install html-xml-utils
MacOS: brew install html-xml-utils
Save a web page (call it filename.html).
Run: hxnormalize -l 240 -x filename.html | hxselect -s '\n' -c "label.black"
Where "label.black" is the CSS selector that uniquely identifies the name of the HTML element. Write a helper script named cssgrep:
#!/bin/bash
# Ignore errors, write the results to standard output.
hxnormalize -l 240 -x $1 2>/dev/null | hxselect -s '\n' -c "$2"
You can then run:
cssgrep filename.html "label.black"
This will generate the content for all HTML label elements of the class black.
The -l 240 argument is important to avoid parsing line-breaks in the output. For example if <label class="black">Text to \nextract</label> is the input, then -l 240 will reformat the HTML to <label class="black">Text to extract</label>, inserting newlines at column 240, which simplifies parsing. Extending out to 1024 or beyond is also possible.
See also:
https://superuser.com/a/529024/9067 - similar question
https://gist.github.com/Boldewyn/4473790 - wrapper script

I have built a command line tool with Node JS which does just this. You enter a CSS selector and it will search through all of the HTML files in the directory and tell you which files have matches for that selector.
You will need to install Element Finder, cd into the directory you want to search, and then run:
elfinder -s "div.a ul.b"
For more info please see http://keegan.st/2012/06/03/find-in-files-with-css-selectors/

There are two tools:
pup - Inspired by jq, pup aims to be a fast and flexible way of exploring HTML from the terminal.
htmlq - Likes jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.
Examples:
$ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html
$ pup --color 'title' < robots.html
<title>
Robots exclusion standard - Wikipedia
</title>
$ htmlq --text 'title' < robots.html
Robots exclusion standard - Wikipedia

Per Nat's answer here:
How to parse XML in Bash?
Command-line tools that can be called from shell scripts include:
4xpath - command-line wrapper around Python's 4Suite package
XMLStarlet
xpath - command-line wrapper around Perl's XPath library

curl: downloading from dynamic url

I'm trying to download an html file with curl in bash. Like this site:
http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S&subareasel=PHYSICS&idxcrs=0001B+++
When I download it manually, it works fine. However, when i try and run my script through crontab, the output html file is very small and just says "Object moved to here." with a broken link. Does this have something to do with the sparse environment the crontab commands run it? I found this question:
php ssl curl : object moved error
but i'm using bash, not php. What are the equivalent command line options or variables to set to fix this problem in bash?
(I want to do this with curl, not wget)
Edit: well, sometimes downloading the file manually (via interactive shell) works, but sometimes it doesn't (I still get the "Object moved here" message). So it may not be a a specifically be a problem with cron's environment, but with curl itself.
the cron entry:
* * * * * ~/.class/test.sh >> ~/.class/test_out 2>&1
test.sh:
#! /bin/bash
PATH=/usr/local/bin:/usr/bin:/bin:/sbin
cd ~/.class
course="physics 1b"
url="http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=10S<URL>subareasel=PHYSICS<URL>idxcrs=0001B+++"
curl "$url" -sLo "$course".html --max-redirs 5
Edit: Problem solved. The issue was the stray tags in the url. It was because I was doing sed s,"<URL>",\""$url"\", template.txt > test.sh to generate the scripts and sed replaced all instances of & with the regular expression <URL>. After fixing the url, curl works fine.

You want the -L or --location option, which follows 300 series redirects. --maxredirs [n] will limit curl to n redirects.
Its curious that this works from an interactive shell. Are you fetching the same url? You could always try sourcing your environment scripts in your cron entry:
* * * * * . /home/you/.bashrc ; curl -L --maxredirs 5 ...
EDIT: the example url is somewhat different than the one in the script. $url in the script has an additional pair of <URL> tags. Replacing them with &, the conventional argument seperators for GET requests, works for me.

Without seeing your script it's hard to guess what exactly is going on, but it's likely that it's an environment problem as you surmise.
One thing that often helps is to specify the full path to executables and files in your script.
If you show your script and crontab entry, we can be of more help.

shell script output in html + email that html

Using Solaris
I have a monitoring script that uses other scripts as plugins.
Theses pugins are also scripts which work in difffernt ways like:
1. Sending an alert while high memory uilization
2. High Cpu usage
3. Full disk Space
4. chekcking the core file dump
Now all this is dispalyed on my terminal and I want to put them in a HTML file/format and send it as a body of the mail not as attachment.
Thanks .

You can use an ANSI to HTML convertor like so:
top -b -n 1 | /tmp/ansi2html.sh | mail -s "Server load" -a "Content-Type: text/html" myboss#example.com
Works even with colours. See Coloured Git diff to HTML.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

wget not displaying all website data - html

This worked for me curl http://runescape.com/community | grep -i playercount EDIT: Adding the player count link curl http://www.runescape.com/player_count.js\?varname\=iPlayerCount\&callback\=jQuery111004241600367240608_1434074587842\&_\=1434074587843

Related

Ubuntu JSON API script to search IPs

How can I trigger a url from the terminal

Is there something like a "CSS selector" or XPath grep?

curl: downloading from dynamic url

shell script output in html + email that html

Categories

Resources