Whether we can generate a url from a page source? - html

I know this question is weird, but anyway I want to know it,
In web browsers or generally we can know the page source of a url, but I had some page source (HTML Code) and now I don't know the url of that page source. Can we generate a url from that page source or is there a way or anything that we can do to get a url from the page source?
When I searched I am getting page source from a url, so I am asking here.

If #Wex is headed in the right direction with his answer and based on your comment then I'll answer with this.
You can get the "Web Developers Toolbar" add-on for FireFox which has an option to "View Generated Source"
this is the same a selecting the whole page and using "View Selected Source" in FF. This will give you the DOM of that page including javascript render code.
If you're asking if there is a way to get the original url from source code the answer is no. Why? Because Google doesn't search the source code, it searches content. There also could be a thousand different websites that use pieces of that code.

In Chrome, you can view the source of any webpage by preceding the url with view-source:. As far as I know, this is the only browser that allows you to do this; Safari and Firefox for instance, shows the source in a popup window which can't be accessed in a regular window.

Related

Get generated source of an HTML page programmatically

What is the easiest way to get the generated web page of a website programatically in any programming language?
The generated web page that is required is the one you get if you go to a web page in firefox and press Ctrl-a and then right click and press "View Selection Source".
The one way that comes to mind is to understand the chromium open source web browser code and get the rendered page and use it in our service.
But I believe that there may be another solution out there that I am not aware of.
In javascript, you can get the full document content with
var html = document.documentElement.innerHTML;
If you want to do this server side you can use file_get_contents()
Ex:
file_get_contents(path_to_webpage);
For reference:
http://php.net/manual/en/function.file-get-contents.php
https://www.w3schools.com/php/func_filesystem_file_get_contents.asp

Disable decoding of HTML entities in Chrome developer tools

I am working in Chrome developer tools and looking for a way to show HTML entities by default.
The view I see is this:
Whereas, the view I need is this:
Does anybody know how I can enable the view I need?
I know there is a theme engine for Chrome but is there an existing feature that fits my needs?
Thanks guys ;)
I had this same question and discovered that with chrome, right-click > View Source code, shows the pure un-decoded(un-evaluated) html entities while the right-click > inspect option seems to evaluate them... may help someone:)
In Google Chrome the correct way to see the actual source of the document that is currently loaded is to click the Sources tab and choose the file from the tree on the left. If you have a unique string on the line that you're searching for, you can press Ctrl-Shift-F to search all files for that string.
RightClick -> View Source is incorrect because it reloads the document, which may cause unexpected behaviour.
Inspect Element is incorrect because it displays the current DOM, not the HTML source.
You can't: The DOM Inspector shows a view of the DOM, not the source code.
The HTML has already been parsed and the entities converted to characters in text nodes.
The Inspector shows an HTML-like view because it is easy to understand. It doesn't reflect the original source code.
Browsers have an explicit "View Source" feature if you want to see the source code.

Google Chrome Extensions: Get Current Page HTML (incl. Ajax or updated HTML)

In my Google Chrome extension, I need to be able to get the current page HTML, including any updated Ajax HTML (unlike the browser's View Source command, which doesn't update it).
Is there a way to get it as a string in my Extension?
Suppose my extension is a right-click context menu called "View Actual HTML Source" which should print the current HTML to the console, or maybe count the number of certain tags there. I wasn't able to find an easy answer to this.
You can get the current state of the DOM as HTML programmatically using document.documentElement.innerHTML
Or just use Developer Tools
I followed the exact solution here, and this gave me the Page Source HTML:
Getting the source HTML of the current page from chrome extension
The solution is to inject the HTML into the Popup.

What is DOM generated code?

I am obviously new to HTML and Web Browsers and python too. I installed the Web Developer extension in Firefox and noticed that in addition to the "View Source" option there are two additional "View Generated Source" and "View Frame Source" options. What are these? Why should they be different?
I have no idea what a generated source is.
Aren't frames part of the page? If so why do I need a separate "View Frame Source" option? Does it mean that the regular "View Page Source" will not show source for all the elements in the page?
If I want to see the code that is executed/used to show me a page which option should I look at and why?
If I want to get this code in python using the requests module how do I get these various sources?
HTML code can be modified dynamically be javascript. "View Generated Source" will show you the HTML as in it is current state that might have been modified by javascript and differs from the html delivered by the server. So this is interesting for the debugging javascript applications.
"View Frame Source" is for websites that are using HTML framesets. Such such sites are a composite of multiple single html sites that are displayed together at one page. Is an older attempt of web design but still widely deployed. So such sites can look like a simple page with the menu on the left side and the content beside it. Using framesets there would be a menu.html and a content.html. Both html sites can be displayed separately in 'Web Developer Toolbar' while clicking with the right mouse button on it and select "Show frame source"
Question 1 and 2 should being answered. Question 3.
If I want to see the code that is executed/used to show me a page which option should I look at and why?
Answer use "View Generated Source..." as this will give you the html you are actually seeing diplayed in browser regardless if it is generated by javascript or not.
Unfortunately I'm not a python expert so question 4 keeps open
The generated source is the result of the frame source that is fetched by the browser then the execution of the javascript on the browser to modify this page.
To understand more how browsers get an html page compared to a program check my answer here:
https://stackoverflow.com/a/15775702/707949
Then to get the sourge html page check this answer:
https://stackoverflow.com/a/15799102/707949
And to get the generated html source, check the end of the first answer

Sourcecode in the url

normally you go on a website and by right click you can choose to see the source code. Or you just use firebug and select an element you want to analyse. Is it possible to write the source code in the URL so that it wouldn't be shown by right click + choosing or selecting an element?
I'm asking because I've already seen this phenomenon once by using an iphone simulator in safari.
Any ideas or hints what I'm exactly looking for? Your help would be great.
Edit: Based on wrong information. You can see the sourcecode by rightclicking. But the url still contains all information about the site. I'll get back to you as soon as I got more information to write them down clearly. Sorry for all the confusion.
Edit: This is the code in the url containing information about the site.
data:text/html;charset=utf-8;base64,PCFET0NUWVBFIGh0bWw%2BDQo8aHRtbCBtYW5pZmVzdD0naHR0cDovL25vdm93ZWIubWZ1c2UuY29tL3dlYmFwcC9TcG9ydGluZ2JldC9wb3J0YWwvc3BvcnRpbmdiZXRQb3J0YWwubWFuaWZlc3QnPg0KPGhlYWQ%2BPHRpdGxlPlNwb3J0aW5nYmV0PC90aXRsZT4NCiAgICA8bWV0YSBodHRwLWVxdWl2PSdjb250ZW50LXR5cGUnIGNvbnRlbnQ9J3RleHQvaHRtbDsgY2hhcnNldD11dGYtOCc%2BDQoJPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J21heGltdW0tc2NhbGU9MSwgd2lkdGg9ZGV2aWNlLXdpZHRoLCBoZWlnaHQ9ZGV2aWNlLWhlaWdodCwgdXNlci1zY2FsYWJsZT1ubywgbWluaW11bS1zY2FsZT0xLjAnPg0KICAgIDxtZXRhIG5hbWU9J2FwcGxlLW1vYmlsZS13ZWItYXBwLWNhcGFibGUnIGNvbnRlbnQ9J1lFUyc%2BDQogICAgPG1ldGEgbmFtZT0nYXBwbGUtbW9iaWxlLXdlYi1hcHAtc3RhdHVzLWJhci1zdHlsZScgY29udGVudD0nYmxhY2snPg0KICAgIDxzY3JpcHQgdHlwZT0ndGV4dC9qYXZhc2NyaXB0JyBsYW5ndWFnZT0namF2YXNjcmlwdCc%2BDQogIGlmIChkb2N1bWVudC5yZWZlcnJlciA9PSAnJykNCiAgew0KICAgd2luZG93LmxvY2F0aW9uPSdkYXRhOnRleHQvaHRtbDtjaGFyc2V0PXV0Zi04O2Jhc2U2NCxQR2gwYld3JTJCUEdobFlXUSUyQlBHMWxkR0VnYm1GdFpUMG5kbWxsZDNCdmNuUW5JR052Ym5SbGJuUTlKMjFoZUdsdGRXMHRjMk5oYkdVOU1Td2dkMmxrZEdnOVpHVjJhV05sTFhkcFpIUm9MQ0IxYzJWeUxYTmpZV3hoWW14bFBXNXZMQ0J0YVc1cGJYVnRMWE5qWVd4bFBURXVNQ2MlMkJQRzFsZEdFZ2JtRnRaVDBuWVhCd2JHVXRiVzlpYVd4bExYZGxZaTFoY0hBdFkyRndZV0pzWlNjZ1kyOXVkR1Z1ZEQwbldVVlRKejQ4YldWMFlTQnVZVzFsUFNkaGNIQnNaUzF0YjJKcGJHVXRkMlZpTFdGd2NDMXpkR0YwZFhNdFltRnlMWE4wZVd4bEp5QmpiMjUwWlc1MFBTZGliR0ZqYXljJTJCUEUxRlZFRWdhSFIwY0MxbGNYVnBkajBuY21WbWNtVnphQ2NnWTI5dWRHVnVkRDBuTVR0VlVrdzlhSFIwY0hNNkx5OTNaV0poY0hBdWJXWjFjMlV1WTI5dEwxTndiM0owYVc1blltVjBMMmx3YUc5dVpTOXBibVJsZUMxbGJsOUhRaTVvZEcxc1AybGtQVFUzTkRVME1qa3lNRUUwTURRMk1UVXdNVE01TUVaRFFUTTRNREJFTkRnNEpteHZZMkZzWlQxbGJsOUhRaVpoWm1acGJHbGhkR1ZKUkQwblBqd3ZhR1ZoWkQ0OGMzUjViR1UlMkJZbTlrZVh0aVlXTnJaM0p2ZFc1a0xXTnZiRzl5T2lNd01EQTdkR1Y0ZEMxaGJHbG5ianBqWlc1MFpYSTdZMjlzYjNJNkkwWkdSanRtYjI1MExXWmhiV2xzZVRwQmNtbGhiQ3dnU0dWc2RtVjBhV05oTENCellXNXpMWE5sY21sbU8yWnZiblF0YzJsNlpUb3lNSEI0TzMwOEwzTjBlV3hsUGp4aWIyUjVQanh3UG14dllXUnBibWN1TGk0OEwzQSUyQlBDOWliMlI1UGp3dmFIUnRiRDQ9Jw0KCSB9DQogICAgPC9zY3JpcHQ%2BDQogICAgPGxpbmsgcmVsPSdhcHBsZS10b3VjaC1pY29uLXByZWNvbXBvc2VkJyBocmVmPSdodHRwOi8vbm92b3dlYi5tZnVzZS5jb20vd2ViYXBwL1Nwb3J0aW5nYmV0L3BvcnRhbC9JbWFnZXMvV2ViQ2xpcEljb24tZW5fR0IucG5nJz4NCiAgICA8bGluayByZWw9J3N0eWxlc2hlZXQnIHR5cGU9J3RleHQvY3NzJyBocmVmPSdodHRwOi8vbm92b3dlYi5tZnVzZS5jb20vd2ViYXBwL1Nwb3J0aW5nYmV0L3BvcnRhbC9jc3MvbWFpbi1lbl9HQi5jc3MnPg0KICAgIDxsaW5rIHJlbD0nc3R5bGVzaGVldCcgdHlwZT0ndGV4dC9jc3MnIGhyZWY9J2h0dHA6Ly9ub3Zvd2ViLm1mdXNlLmNvbS93ZWJhcHAvU3BvcnRpbmdiZXQvcG9ydGFsL2Nzcy9UcmFuc2l0aW9ucy5jc3MnPg0KCTxzY3JpcHQgdHlwZT0ndGV4dC9qYXZhc2NyaXB0JyBzcmM9J2h0dHA6Ly9ub3Zvd2ViLm1mdXNlLmNvbS93ZWJhcHAvU3BvcnRpbmdiZXQvcG9ydGFsL1BhcnRzL3V0aWxpdGllcy5qcycgY2hhcnNldD0ndXRmLTgnPjwvc2NyaXB0Pg0KICAgIDxzY3JpcHQgdHlwZT0ndGV4dC9qYXZhc2NyaXB0JyBzcmM9J2h0dHA6Ly9ub3Zvd2ViLm1mdXNlLmNvbS93ZWJhcHAvU3BvcnRpbmdiZXQvcG9ydGFsL1BhcnRzL3NldHVwLWVuX0dCLmpzJyBjaGFyc2V0PSd1dGYtOCc%2BPC9zY3JpcHQ%2BDQogICAgPHNjcmlwdCB0eXBlPSd0ZXh0L2phdmFzY3JpcHQnIHNyYz0naHR0cDovL25vdm93ZWIubWZ1c2UuY29tL3dlYmFwcC9TcG9ydGluZ2JldC9wb3J0YWwvbWFpbi5qcycgY2hhcnNldD0ndXRmLTgnPjwvc2NyaXB0Pg0KICAgIDxzY3JpcHQgdHlwZT0ndGV4dC9qYXZhc2NyaXB0JyBzcmM9J2h0dHA6Ly9ub3Zvd2ViLm1mdXNlLmNvbS93ZWJhcHAvU3BvcnRpbmdiZXQvcG9ydGFsL1BhcnRzL1RleHQuanMnIGNoYXJzZXQ9J3V0Zi04Jz48L3NjcmlwdD4NCiAgICA8c2NyaXB0IHR5cGU9J3RleHQvamF2YXNjcmlwdCcgc3JjPSdodHRwOi8vbm92b3dlYi5tZnVzZS5jb20vd2ViYXBwL1Nwb3J0aW5nYmV0L3BvcnRhbC9QYXJ0cy9QdXNoQnV0dG9uLmpzJyBjaGFyc2V0PSd1dGYtOCc%2BPC9zY3JpcHQ%2BDQogICAgPHNjcmlwdCB0eXBlPSd0ZXh0L2phdmFzY3JpcHQnIHNyYz0naHR0cDovL25vdm93ZWIubWZ1c2UuY29tL3dlYmFwcC9TcG9ydGluZ2JldC9wb3J0YWwvUGFydHMvQnV0dG9uSGFuZGxlci5qcycgY2hhcnNldD0ndXRmLTgnPjwvc2NyaXB0Pg0KICAgIDxzY3JpcHQgdHlwZT0ndGV4dC9qYXZhc2NyaXB0JyBzcmM9J2h0dHA6Ly9ub3Zvd2ViLm1mdXNlLmNvbS93ZWJhcHAvU3BvcnRpbmdiZXQvcG9ydGFsL1BhcnRzL1RyYW5zaXRpb25zLmpzJyBjaGFyc2V0PSd1dGYtOCc%2BPC9zY3JpcHQ%2BDQogICAgPHNjcmlwdCB0eXBlPSd0ZXh0L2phdmFzY3JpcHQnIHNyYz0naHR0cDovL25vdm93ZWIubWZ1c2UuY29tL3dlYmFwcC9TcG9ydGluZ2JldC9wb3J0YWwvUGFydHMvU3RhY2tMYXlvdXQuanMnIGNoYXJzZXQ9J3V0Zi04Jz48L3NjcmlwdD4NCjwvaGVhZD4NCjxib2R5IG9uTG9hZD0nbG9hZCgpOyc%2BDQogICAgPGRpdiBpZD0nc3RhY2tMYXlvdXQnPjxkaXYgaWQ9J3NlbGVjdGlvbi1wYWdlJz4NCiAgICAgICAgICAgIDxkaXYgaWQ9J2xhbmRpbmdwYWdlJz4NCiAgICAgICAgICAgICAgICA8ZGl2IGlkPSdjZW50cmVUb3BCRyc%2BPC9kaXY%2BPGRpdiBpZD0nY2VudHJlQm90dG9tQkcnPjwvZGl2Pg0KICAgICAgICAgICAgICAgIDxkaXYgaWQ9J2xvZ28nPjwvZGl2Pg0KICAgICAgICAgICAgICAgIDxkaXYgaWQ9J2ljb24nPjwvZGl2Pg0KICAgICAgICAgICAgICAgIDxkaXYgaWQ9J2Rpc3BhbHlib3gnPg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDEnPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDInPjwvZGl2Pg0KICAgICAgICAgICAgICAgIDwvZGl2Pg0KICAgICAgICAgICAgICAgIDxkaXYgY2xhc3M9J3ZpZXcyJyBpZD0naXBob25lJz48L2Rpdj4NCiAgICAgICAgICAgICAgICA8ZGl2IGNsYXNzPSd2aWV3MicgaWQ9J2Nhc2lubyc%2BPC9kaXY%2BDQogICAgICAgICAgICAgICAgPGRpdiBpZD0nZGlzcGFseWJveDMnPg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDUnPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDYnPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDcnPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDgnPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDknPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDEwJz48L2Rpdj4NCiAgICAgICAgICAgICAgICAgICAgPGRpdiBpZD0ndGV4dHAxMSc%2BPC9kaXY%2BDQogICAgICAgICAgICAgICAgICAgIDxkaXYgaWQ9J3RleHRwMTInPjwvZGl2Pg0KICAgICAgICAgICAgICAgICAgICA8ZGl2IGlkPSd0ZXh0cDEzJz48L2Rpdj4NCiAgICAgICAgICAgICAgICA8L2Rpdj4NCiAgICAgICAgICAgIDwvZGl2Pg0KICAgICAgICA8L2Rpdj48ZGl2IGlkPSdpbnN0YWxsLWFwcC1wYWdlJz4NCiAgICAgICAgICAgIDxkaXYgaWQ9J2luc3RhbGwnPg0KICAgICAgICAgICAgICAgIDxkaXYgaWQ9J2NlbnRyZVRvcEJHMSc%2BPC9kaXY%2BPGRpdiBpZD0nY2VudHJlQm90dG9tQkcxJz48L2Rpdj4NCiAgICAgICAgICAgICAgICA8ZGl2IGlkPSdkaXNwYWx5Ym94MSc%2BDQogICAgICAgICAgICAgICAgICAgIDxkaXYgaWQ9J3RleHRwMyc%2BPC9kaXY%2BDQogICAgICAgICAgICAgICAgICAgIDxkaXYgaWQ9J3RleHRwNCc%2BPC9kaXY%2BDQogICAgICAgICAgICAgICAgPC9kaXY%2BDQogICAgICAgICAgICAgICAgPGRpdiBpZD0naWNvbjEnPjwvZGl2Pg0KICAgICAgICAgICAgICAgIDxkaXYgaWQ9J2xvZ28xJz48L2Rpdj4NCiAgICAgICAgICAgICAgICA8ZGl2IGlkPSdidXR0b24yJz48L2Rpdj4NCiAgICAgICAgICAgIDwvZGl2Pg0KICAgICAgICA8L2Rpdj48L2Rpdj4NCjwvYm9keT4NCjwvaHRtbD4=
No, it's not possible to hide a website's source code. The reason for that is simply that the browser needs that code to display the website, so whenever you see a website, you'll always be able to see as much code as is needed to make the website look like that.
You can mangle the code a bit, but as you have said yourself, things like Firebug are able to display the current state of a website, so you'll also be able to see the correct code.
edit
Just a note: Just because Safari with an iPhone user agent isn't able to display the source code, it doesn't mean that the code is not there or somehow encrypted into the URL. If you can see the website, the code is there.
I guess it's a bug (or a feature?) that Safari isn't able to display it in iPhone mode (maybe because the iPhone itself isn't able to display the code either).
edit 2
Okay, it indeed set the URL to the following for me:
data:text/html;charset=utf-8;base64,PGh0bWw%2BPGhlYWQ%2BPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J21heGltdW0tc2NhbGU9MSwgd2lkdGg9ZGV2aWNlLXdpZHRoLCB1c2VyLXNjYWxhYmxlPW5vLCBtaW5pbXVtLXNjYWxlPTEuMCc%2BPG1ldGEgbmFtZT0nYXBwbGUtbW9iaWxlLXdlYi1hcHAtY2FwYWJsZScgY29udGVudD0nWUVTJz48bWV0YSBuYW1lPSdhcHBsZS1tb2JpbGUtd2ViLWFwcC1zdGF0dXMtYmFyLXN0eWxlJyBjb250ZW50PSdibGFjayc%2BPE1FVEEgaHR0cC1lcXVpdj0ncmVmcmVzaCcgY29udGVudD0nMTtVUkw9aHR0cHM6Ly93ZWJhcHAubWZ1c2UuY29tL1Nwb3J0aW5nYmV0L2lwaG9uZS9pbmRleC1lbl9HQi5odG1sP2lkPTU4NjIwNEE2MEE0MDQ2MTUwMTM5MEZDQTFBQTdGNDFBJmxvY2FsZT1lbl9HQiZhZmZpbGlhdGVJRD0nPjwvaGVhZD48c3R5bGU%2BYm9keXtiYWNrZ3JvdW5kLWNvbG9yOiMwMDA7dGV4dC1hbGlnbjpjZW50ZXI7Y29sb3I6I0ZGRjtmb250LWZhbWlseTpBcmlhbCwgSGVsdmV0aWNhLCBzYW5zLXNlcmlmO2ZvbnQtc2l6ZToyMHB4O308L3N0eWxlPjxib2R5PjxwPmxvYWRpbmcuLi48L3A%2BPC9ib2R5PjwvaHRtbD4=
This however just encodes to a loading & redirect page that itself redirects to a different webpage with a special session-like parameter. I guess they didn't want to create real server side sessions for this and just put the parameter into the redirect page and encoded the whole junk using the data: URI to not create a custom page for it. This however does neither help the browser (in terms of speed or anything else) nor does it hide the source code, as you can just decode it again to see the original source code.
What you're referring to is the data URI scheme, which allows base64 encoded data to be included locally (within a request), where normally http/etc URLs are used to initiate new requests.
The data URI scheme is a URI scheme
that provides a way to include data
in-line in web pages as if they were
external resources. It tends to be
simpler than other inclusion methods,
such as MIME with cid or mid URIs.
Read the Wikipedia page for more details: http://en.wikipedia.org/wiki/Data_URI_scheme
i don't know what you're trying to achive, but if you want to hide the source code because of "anybody can steal my code": that isn't possible. the sourcecode has to get to the browser in any way, so the browser can display it - and if the code is on the client-machine (in the browser) there will always be a possibility to grab it.
Even if you restrict right clicking, or viewing the source, it is impossible to hide it from everybody. Also, placing it in the URL would be bad, very bad (I can't even imagine it).
the html is needed for the browser to render the UI. You can't hide it.
You could compress and obfuscate the javascript though, to make it difficult to read and understand. But that's evil :)
Internet Explorer has a character limit of 2048 characters, so you would have to compress the content and pray it will fit in the url after it's been base64 encoded. Then you can use javascript to decode it. It will also be extremely difficult to update your pages or allow for bookmarking. It could also result in users exploiting the system.
Chances are nobody will want your sauce code anyway, and if they did, it wouldn't affect you one little bit. Facebook shows it's sauce, I don't see it's popularity dropping. So just stick with serving your pages the normal way.
1. The length of an URL is limited, so that you couldn't write a whole page into it even if it were possible.
2. Once a thing has been displayed at a client machine the code cannot be protected.
(well, using javascript right-click disabling could repell a few noobs, but it is still fairly easy to grab the code)