I am obviously new to HTML and Web Browsers and python too. I installed the Web Developer extension in Firefox and noticed that in addition to the "View Source" option there are two additional "View Generated Source" and "View Frame Source" options. What are these? Why should they be different?
I have no idea what a generated source is.
Aren't frames part of the page? If so why do I need a separate "View Frame Source" option? Does it mean that the regular "View Page Source" will not show source for all the elements in the page?
If I want to see the code that is executed/used to show me a page which option should I look at and why?
If I want to get this code in python using the requests module how do I get these various sources?
HTML code can be modified dynamically be javascript. "View Generated Source" will show you the HTML as in it is current state that might have been modified by javascript and differs from the html delivered by the server. So this is interesting for the debugging javascript applications.
"View Frame Source" is for websites that are using HTML framesets. Such such sites are a composite of multiple single html sites that are displayed together at one page. Is an older attempt of web design but still widely deployed. So such sites can look like a simple page with the menu on the left side and the content beside it. Using framesets there would be a menu.html and a content.html. Both html sites can be displayed separately in 'Web Developer Toolbar' while clicking with the right mouse button on it and select "Show frame source"
Question 1 and 2 should being answered. Question 3.
If I want to see the code that is executed/used to show me a page which option should I look at and why?
Answer use "View Generated Source..." as this will give you the html you are actually seeing diplayed in browser regardless if it is generated by javascript or not.
Unfortunately I'm not a python expert so question 4 keeps open
The generated source is the result of the frame source that is fetched by the browser then the execution of the javascript on the browser to modify this page.
To understand more how browsers get an html page compared to a program check my answer here:
https://stackoverflow.com/a/15775702/707949
Then to get the sourge html page check this answer:
https://stackoverflow.com/a/15799102/707949
And to get the generated html source, check the end of the first answer
Related
I have a problem with this page!
when entering it, you can right-click and view the source code via, say, chrome and see the articles with their links..etc. However, when pressing on "المزيد" and viewing the source code again, the source code of the new articles does not appear. Only the source code of the previous articles does.
What would you recommend to solve this problem?
I have pressed on view page source code on google chrome, but nothing appeared regarding the new articles.
The View source option only shows the source code of a page as it was delivered from the server. It does not take modifications performed using JavaScript into account.
The button mentioned in your question loads more content and inserts it into the page programmatically using JavaScript.
You need to use the Elements tab of Chrome Developer Tools to see programmatically inserted HTML code. Right-click anywhere on the page and choose "Inspect", or press Ctrl+Shift+I or F12 on Windows. (Shortcuts on other platforms may vary.)
What is the easiest way to get the generated web page of a website programatically in any programming language?
The generated web page that is required is the one you get if you go to a web page in firefox and press Ctrl-a and then right click and press "View Selection Source".
The one way that comes to mind is to understand the chromium open source web browser code and get the rendered page and use it in our service.
But I believe that there may be another solution out there that I am not aware of.
In javascript, you can get the full document content with
var html = document.documentElement.innerHTML;
If you want to do this server side you can use file_get_contents()
Ex:
file_get_contents(path_to_webpage);
For reference:
http://php.net/manual/en/function.file-get-contents.php
https://www.w3schools.com/php/func_filesystem_file_get_contents.asp
I want to get the HTML code of a particular site. It asks me to register myself first so that I can be redirected to their home page. Now, my question is: is it possible to retrieve the HTML code of the desired page just by choosing option ‘View Page Source’ which appears on right click? Is there any other way to fetch the HTML code?
There are multiple ways of getting the HTML source code of a page
One way, as you already know is by viewing the page's source code.
If you Right Click -> View Page Source or just press Ctrl + U you will view the source code in your browser
If you are using linux, you can use wget to get the source code.
Just open up a console and type wget www.somewebsite.com and you will get the HTML source code along with any CSS and JS links.
However, you cannot get the PHP code using any method unless you have FTP access to the server
Yes it is possible to view HTML via 'View page source' or you could use PHP as mentioned in the comments.
'usign php yes php.net/manual/en/function.file-get-contents.php –
Vitorino fernandes'
You could also let a website and or program do it for you but it's trustability depends on the site and or program,
Do note it is NOT possible to view the PHP source since that is server-side.
Using any browser, the "View Page Source" option will show you the source of the page, as received by the browser (which may be different then the source currently displayed). You also have the option of using the File > Save Page As (or similar) menu option to save a copy of the html code of the page from the browser.
It is also possible to use command line tools like curl and wget to download the page to your local machine. Those tools provide options to send data (such as cookies or headers to identify yourself) along with the request.
I've got a problem getting the "real" source code from a website:
http://sirius.searates.com/explorer
Trying it the normal way (view-source:) via Chrome I get a different result than trying it by using inspect elements function. And the code which I can see (using that function) is the one that I would like to have... How is that possible to get this code?
This usually happens because the UI is actually generated by a client-side Javascript utility.
In this case, most of the screen is generated by HighCharts, and a few elements are generated/modified by Bootstrap.
The DOM inspector will always give you the "current" view of the HTML, while the view source gives you the "initial" view. Since view source does not run the Javascript utilities, much of the UI is never generated.
To get the most up-to-date (HTML) source, you can use the DOM inspector to find the root html node, right-click and select "Edit as HTML". Then select-all and copy/paste into your favorite text editor.
Note, though, that this will only give you a snapshot of the page. Most modern web pages are really browser applications and the HTML is just one part of the whole. Copy/pasting the HTML will not give you a fully functional page.
You can get real-time html with this url,bookmark this url:
javascript:document.write('<textarea width="400">'+document.body.innerHTML+'</textarea>');
I know this question is weird, but anyway I want to know it,
In web browsers or generally we can know the page source of a url, but I had some page source (HTML Code) and now I don't know the url of that page source. Can we generate a url from that page source or is there a way or anything that we can do to get a url from the page source?
When I searched I am getting page source from a url, so I am asking here.
If #Wex is headed in the right direction with his answer and based on your comment then I'll answer with this.
You can get the "Web Developers Toolbar" add-on for FireFox which has an option to "View Generated Source"
this is the same a selecting the whole page and using "View Selected Source" in FF. This will give you the DOM of that page including javascript render code.
If you're asking if there is a way to get the original url from source code the answer is no. Why? Because Google doesn't search the source code, it searches content. There also could be a thousand different websites that use pieces of that code.
In Chrome, you can view the source of any webpage by preceding the url with view-source:. As far as I know, this is the only browser that allows you to do this; Safari and Firefox for instance, shows the source in a popup window which can't be accessed in a regular window.