Crawl a website with iframe - html

I have a test project using a library supporting crawl(openbuilding spiderling).
The problem is when i crawl on url "https://examlple.com". This page content a iframe from "https://iframe.com".
I want to get the element p(s) inside the iframe. But i now only can get those elements by visit iframe.com. I want to know that is there anyway to get element p even when i don't visit iframe.com, such as wait for ifame loaded.
Thank you!

No, you cannot spider an iframe's contents from the parent page. The closest you can do is note the URL of the iframe and then go off and independently spider it.
Think of an iframe as a sandboxed and protective container that only lets you visually view its contents and nothing more - no spidering or talking to it (unless you own the page and are working with JavaScript Window.postMessage() etc.

Related

How to set iframe correctly in web page

Here I give an code. I want to set the iframe but is not showing anything. I don't understand what is the problem.
<iframe src="https://twitter.com/money2020"></iframe>
"Twitter's site blocks direct iframe display through the secure hypertext transfer protocol, which keeps someone from making a direct iframe to any given page on Twitter. Instead, Twitter widgets need to be created through the corresponding Twitter account to be embedded on a site within an iframe."
http://www.wikihow.com/Create-a-Twitter-Hashtag-Widget
https://twitter.com/settings/widgets

Direct preloaded HTML content in iframe rather than src

I have HTML content (mostly e-mails) that I would like to display in an archive. Seeing as some of these records contain their own styles, images, and headers, they need to be displayed independently and confined to its container so as not to interfere with the page displaying it. I immediately thought of an iframe.
I have two ways I can do this, both are somewhat indirect. 1) I can draw an iframe that points to about:blank and use Javascript to draw the content into the iframe after the page loads. 2) I can create a secondary PHP page that returns only the content of the e-mail and point the iframe to it as the src attribute. These solutions are simple enough, but I was wondering if there is a more direct way.
I found solutions like these, but they suggest using options 1 or 2 above. The point of this question is: "Is there a more direct way to preload HTML content directly into an iframe than to rely on Javascript or a secondary page?"
Html code as IFRAME source rather than a URL
Specifying content of an iframe instead of the src to a page
I am not sure how much more "direct" you can get than to specify a page in the src attribute of the iframe.
You already link to the only answer that actually works in your question that does not include using a src page or using EMCAScript to draw the iframe content. Remember thought that data urls are still limited in the number of bytes of data they can display in most browsers because there are limits to the length of the data url itself.
I would really suggest that you use the src attribute with a seperate backend script as that will decouple and increase the maintainability of your code as you can develop the scripts responsible for the page itself seperatly from those that show the iframe content.

Page url links to pages internal frame

I have a personal website, which I have made (to the best of my ability) without a template. I am not very experience in HTML so am not entirely sure if this is bad practice or not, but here is my issue.
My website consists of a frameset, which has 3 frames. Two do not change (banner and nav panel), and the other is content. The way I display my content in the main frame is through an iframe. Here's where the trouble comes. I have suggested my website to the crawler, and it crawls all the pages for content, of course. When I click on one of my links suggested by google (say, a project), the browser loads that individual .html file, without any of the rest of my frames. In other words, it does not link to the page through my index.html which sets up the formatting and page frames, but simply loads the html as a stand-alone page.
Is there a way I can avoid this, so that if a link for my website is clicked from an external link (not from my domain), the page first loads my index.html, and then the page of interest, so that it appears as if it were accessed normally from my index? I am not sure whether I should find a new way of displaying my content in the main frame so that it avoids iframes, or just need a simple script to redirect the user.
Not sure if it's useful but I've attached a photo of my page just to better explain what the frame layout is that I am working with.
Many thanks!!!
iFrames are definitely not the route to take when you are displaying consistent content... Which from what appears to be the Navigation, Header, and of course, the Content. Of course there will be an issue when a "Search Engine Spider" crawls your page... From my understanding, seeing as you are calling "content" from another page, the spider will crawl that page but will not crawl the index.html page we are currently viewing. When a "Spider" crawls a page it looks for STATIC HTML Tags/Content/Keywords/etc, and seeing as you are calling all of your content from other pages the "Spider" will treat that content as being on another page as well.
You want me recommendation? Avoid using an iFrame at all times. The point of an iFrame is to display content from another location (external), and or display static content on a page without having to scroll the current page you are viewing the iFrame on.
It is bad practice to use an iFrame, I would suggest using DIVs. Within these DIVs you may place content, images, links... Virtually anything you want, with all of the benefits of having people view your website, along with Search Engine Spiders.
I hope this helps!
Thanks,
Aaron
iFrames are a bad choice. AJAX is VERY simple these days. Just replace the big iFrame with a Div, and AJAX a page, putting the contents into that Div.
Replace your anchors with tags, and replace href with name, like so:
<div name='main.html' class='link' />
You need a div with the id 'loadHere':
Then include jQuery (it's pretty easy, google it) and at the end of your HTML put this:
$('.link').click(function(){
$.post(this.name,function(dat){
$('#loadHere').html(dat); }); });

stoping iframe from reloading when I change pages

Is there any way for a iframe nested in a div on my page not to reload when I change pages in the nav? Because when I change pages it will load the code of the page and the iframe on the previous page will be reloaded. Is there any way that I can select it and make that it won't reload when I change pages?
If you reload the entire page, the IFRAME element is getting reloaded with it. Unless you used AJAX or a second IFRAME, there is no way to have the whole page except one element reload.
My initial reaction is: "Why the hell would you want to do that, it sounds awful?"
The only way for this to work is to change the page content dynamically, with the exception of the iframe, rather than loading a new page.
But to answer your question, yes you can do it.
If you have all the page content except the iframe inside a div, lets call it #page and the iframe is at the same level in the DOM, or higher, relative to #page, you could use something like jQuery's load() function to load new content for everything inside the #page div.
However, if SEO or Accessibility matter to you at all, you shouldn't do this.
A users browser will cache a lot fo the content in the iframe anyway, so it shouldn't be too demanding to reload it.
If the contents of the IFRAME are simple enough it might be a simple case of using some light query string parameters to indicate the state the IFRAME is in to persist it across pages.
Your options also depend on any development frameworks you might be using (.NET, Ruby, etc.).
Otherwise, additional IFRAMEs seem to be the only other solution.

Apply parent style to page in frame

Is there any way to apply the CSS of a parent page to a page within a frame without adding another http request in the page in the frame? Is this possible or would I have to add the CSS via http request in every page loaded in the frame? In the case that it wouldn't work, would it be more convenient to use style tags or link rel if each page were to have a unique CSS? I ask this because they're pages from my site which are only made to contribute to the parent page which has them in frames. The reason for frames being that there is more going on in other areas of the page and everything acts in unison; it'd be convenient not to reload everything for one section.
Set up your cache control headers right and using a <link> will fetch the CSS from the browser cache and not from the server.
No, you would have to put a link element in the iframe's source, which would
1) trigger a new http request
2) it wouldn't work on cross domain websites, because
a) XSS (en.wikipedia.org/wiki/Cross-site_scripting)
b) you would likely not have access to the source to edit it, because it's on a different server.
lists like these are fun. you should try making some of them :)