How might I read (parse) html directly from a file using Watir?

How might I read (parse) html directly from a file using Watir? - html

I can already do this with Nokogiri of course
doc = Nokogiri::HTML(src)
where src is a text column in my database.
But I really like Watir's search interface for developers over Nokogiri.
There's not much evidence on how to do this so far in my searches on the internet, viz. for unhosted html.

You can access local html files by adding a "file://" to the start of the path to the file (see my blog post on the topic).
For example, lets say you have an html file on your computer at "C:\users\testuser\desktop\test_file.html".
If you want to open this file and interact with it using Watir, you can do:
browser = Watir::Browser.new
browser.goto('file://C:\users\testuser\desktop\test_file.html')
Then you can interact with the browser/page/html as you normally would with Watir.
Note: If you get a NoMethodError: unknown property or method: 'document' exception when trying to interact with the browser, make sure that your browser is being opened by a user with administrative privileges.

If the above does not work for you, you can try navigating with the driver directly like so:
browser = Watir::Browser.new
browser.driver.navigate.to('file://Users/path/to/file.html')
PS I am on a mac, but this should work irrespective of your OS

Related

download html attribute does not rename the file using external URL

I am trying to rename a file when downloading it from <a> tag.
Here a simple example:
Download Stackoverflow Logo
As you can see, it never downloads the file with stackoverflow.png name, it does with default name though.
Nevertheless, if I download the image and tried to do the same with a local route, it renames the file properly.
Another example:
Download Stackoverflow Logo
The example above works properly.
Why download html attribute only works using local routes?
Thanks in advance!

The attribute download works only for same origin URLs.
By the way, you really should learn to use proper terminology, or else people won't understand you:
<a href="https://i.stack.imgur.com/440u9.png" download="stackoverflow.png"> is a tag, specifically, an opening tag;
download is an attribute;
stackoverflow.png is the value of the attribute;
https://i.stack.imgur.com/440u9.png is a URL, sometimes called an URI or an address.
The entire construction Download Stackoverflow Logo is an element.
A "route" is something else entirely, and has no relationship with HTML.

I couldn't find any info of it, but seems like external resources aren't allowed renaming.
Have a look here, there's an example linking to google image and that doesn't work either - seems like the specs have changed along the way.

This is a security measure applied to cross-origin download requests where the server hosting the download does not use HTTP headers to explicitly mark the file as being for download.
From the HTML specification:
If the algorithm reaches this step, then a download was begun from a
different origin than the resource being downloaded, and the origin
did not mark the file as suitable for downloading, and the download
was not initiated by the user. This could be because a download
attribute was used to trigger the download, or because the resource in
question is not of a type that the user agent supports.
This could be dangerous, because, for instance, a hostile server could
be trying to get a user to unknowingly download private information
and then re-upload it to the hostile server, by tricking the user into
thinking the data is from the hostile server.
Thus, it is in the user's interests that the user be somehow notified
that the resource in question comes from quite a different source, and
to prevent confusion, any suggested file name from the potentially
hostile interface origin should be ignored.

Yii2 html link to open document on external filesystem

I want in Yii2 a simple
[a href="C:/Vo/AGO/2015.pdf">2015 [/a> ([ must be a <)
on one of my forms.
I don't want to upload the file, because the pdf (help) file is updated by an external organisation (instead of C: the pad is a server, but for test reasons I use C:), and I have to display a lot of files managed by that organisation.
So I use:
Html::a("2015", "C:/Vo/Ago/2015.pdf")
When I run the application and I inspect via show source I see
[a href="C:/Vo/Ago/2015.pdf">2015[/a>
But if I click the link on my form, nothing happens!
(When I do the same thing in a simple html document - not yii2 - the pdf opens)
If I copy right-click and copy the link I get:
file:///C:/Vo/Ago/2015.pdf
So, what am I missing?
Yes I'am new in Yii2 and I searched a lot on internet to find a solution.
If this is already asked, excuse me, a reference to the solution would then be welcome...
Thanks,
Chris G.M. Logghe

Because you are trying to link "local" file on browser.
Some browsers, like modern versions of Chrome, will even refuse to
cross from the http protocol to the file protocol, so you'd better
make sure you open this locally using the file protocol if you want to
do this stuff at all.
See here for more details.
The best option for you is to create action on controller and perform download file there.
In your view:
$data = 'C:/data/mydata.log';
echo Html::a('Download', ['sample-download', 'filename' => $data], ['target' => '_blank']);
In your controller:
public function actionSampleDownload($filename)
{
ob_clean();
\Yii::$app->response->sendFile($filename)->send();
}
Of course, you must limit to specific directory rather than user give full access to filename.

Get .html filename of a website with Firebug

How do I find the filename of an website I am inspecting with Firebug? As example when I look on http://example.org/ I can view inspect the Element, I see the whole html structure but I didn`t find the filename. I am searching for index.html or something in that way. Maybe this is an analog question, but I am not sure, because he/she is working with php. LINK
I know there are some solutions with Dreamweaver or other tools but I am searching for an easy way to figure that out with Firebug or an free Browser Add-On. I Hope you have a solution for that.

The URL you entered is the one that usually returns the main HTML contents. Though on most pages nowadays the HTML is altered using JavaScript. Also, pages are very often dynamically generated on the server.
So, in most cases there is no static .html file.
For what it's worth, you can see all network requests and their responses within Firebug's Net panel.
Note that the URL path doesn't necessarily reflect a file path on the server's file system. It is depending on the server configuration, where a specific URL maps to in the file system. The simplest example is the index file that is automatically called when a domain is accessed. In the case of http://example.org the server automatically loads a file index.html in the file system, for example.
So, in order to get the file name on the file system, you need to either check the server configuration or the related access logs.

Automatically copy text from a web page

There is a vpn that keeps changing their password. I have an autologin, but obviously the vpn connection drops every time that they change the password, and I have to manually copy and paste the new password into the credentials file.
http://www.vpnbook.com/freevpn
This is annoying. I realise that the vpn probably wants people not to be able to do this, but it's not against the ToS and not illegal, so work with me here!
I need a way to automatically generate a file which has nothing in it except
username
password
on separate lines, just like the one above. Downloading the entire page as a text file automatically (I can do that) will therefore not work. OpenVPN will not understand the credentials file unless it is purely and simply
username
password
and nothing more.
So, any ideas?

This kind of thing is done ideally via an API that vpnbook provides. Then a script can much more easily access the information and store it in a text file.
Barring that, and looks like vpnbook doesn't have an API, you'll have to use a technique called Web Scraping.
To automate this via "Web Scraping", you'll need to write a script that does the following:
First, login to vpnbook.com with your credentials
Then navigate to the page that has the credentials
Then traverse the structure of the page (called the DOM) to find the info you want
Finally, save out this info to a text file.
I typically do web scraping with Ruby and the mechanize library. The first example in the Mechanize examples page shows how to visit the google homepage, perform a search for "Hello World", and then grab the links in the results one at time printing it out. This is similar to what you are trying to do except instead of printing it out you would want to write it to a text file. (Google for writing a text file with Ruby)":
require 'rubygems'
require 'mechanize'
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
}
a.get('http://google.com/') do |page|
search_result = page.form_with(:id => 'gbqf') do |search|
search.q = 'Hello world'
end.submit
search_result.links.each do |link|
puts link.text
end
end
To run this on your computer you would need to:
a. Install ruby
b. Save this in a file called scrape.rb
c. call it by using the command line "ruby scrape.rb"
OSX comes with an older ruby that would work for this. Check out the ruby site for instructions on how to install it or get it working for your OS.
Before using a gem like mechanize you need to install it:
gem install mechanize
(this depends on Rubygems being installed, which I think typically comes with Ruby).
If you're new to programming this might sound like a big project, but you'll have an amazing tool in your toolbox for the future, where you'll feel like you can pretty much "do anything" you need to, and not rely on other developers to have happened to have built the software you need.
Note: for sites that rely on javascript, mechanize wont work - you can use Capybara+PhantomJS to run an actual browser that can run javascript from Ruby.
Note 2: Its possible that you don't actually have to go through the motions of (a) going to the login page (2) "filling in your info", (3) clicking on "Login", etc. Depending how their authentication works, you may be able to go directly to the page that displays info you need and just provide your credentials directly to that page using either basic auth or other means. You'll have to look at how their auth system works and do some trial and error for this. The most straightforward, most likely to work approach is to just to what a real user would do...login through the login page.
Update
After writing all this, I came across the vpnbook-utils library (during a search for "vpnbook api") which I think does what you need:
...With this little tool you can generate OpenVPN config files for the free VPN provider vpnbook.com...
...it also extracts the ever changing credentials from the vpnbook.com website...
looks like with one command line:
vpnbook config
you can automatically grab the credentials and write them into a config file.
Good luck! I still recommend you learn ruby :)

You don't even need to parse the content. Just string search for the second occurrence of Username:, cut everything before that, use sed to find the content between the next two occurrences of <strong> and </strong>. You can use curl or wget -qO- to get the website's content.

How to open html link to local file in its default program, NOT browser?

Basically, I'm creating a webpage filled with images of movie posters that link to video files, as a means of making a more visually-appealing form of my local video library.
I'm using
<a href="C:\blah\movie.mkv"><img src="poster.jpg">
It works exactly how I want, HOWEVER, it opens the file in the browser rather than opening it in its default program, as I would like. I would like each link to open the file in the program titled "VLC Media Player", as specified in Windows for each of their filetypes.
Let me know how I can do this (in the simplest form--I'm not too smart :P)
Thanks!

If you are creating web pages on your local system for you own use then you may want to consider looking in to a WAMP server setup. This uses php and should allow you to call VLC using the exec command. Would take some learning however.

There is very little you can do to control how a client will handle a resource.
You can use the Content-Disposition HTTP response header to state that the resource is an attachment (and thus recommend that it be downloaded instead of opened).
Content-Disposition: attachment;filename="movie.mkv"
You can't, however, stop browser native support or a plug-in from handling something instead of having it open in a separate application (let alone cause it to be opened in a specific application).
If the browser is configured to open video files internally, then nothing the author of a website can do will make it switch to using a application instead.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008