Getting html content from one page and adding it to my website

Getting html content from one page and adding it to my website - html

I have affiliated with expedia and I am using their API system. One of their requirements for launching the site is adding the terms and agreements to my page and they give us this page: http://travel.ian.com/index.jsp?pageName=userAgreement&locale=en_US&cid=xxx. I do not want to go to a different site, and I can not copy and paste the information because of updates. I also prefer not to use an iframe. Does anyone have any ideas on how to do this? Here is a webpage using this on their site with their domain: http://www.helloweekends.com/terms.htm. Does anyone know how they did this? Any help would be greatly appreciated!

Since it originates from another domain, it wouldn't be possible to use JavaScript, due to the same origin policy. Also, relying on JavaScript for the update would be trouble for users who has JavaScript disabled, as they wouldn't see the terms. Since you don't want to use an iframe, or copy the content, I guess your best shot would be to scrape their page with a server-side language of your choice, and then display it on your page.
Scraping can be a bit tricky though, if you rely on their markup. If they change their markup, there is a chance that your script will break, thus stop updating the terms.
There are various tutorials available on how to scrape sites. Here are a few PHP examples:
Web scrape with PHP
PHP Screen Scraping Tutorial
Note Make sure that they allow you to scrape the page prior to implementing it, so that you don't violate their rules.

Do you know if their API serves something with JSON? A JSONP call can get the values to you, but it will make your page rely on javascript for the users to see the updated page.
Another option is to use PHP of any other server side language to get the contents of the url, process it and return the block you require.

I would suggest the load() function offered by jQuery. It makes a simple AJAX call to retrieve a file, and you could even use a selector to only grab part of the page. For example, load the contents of a HTML page into a div:
$('#div_id').load('my_file.html');
Or just load a part of the page:
$('#div_id').load('my_file.html #main_text_id');

Related

How to embed another website into my own web to keep the same header

We have a web app that we want to integrate in the websites of several clients by a subdomain, since in most cases we cannot modify their webs. Besides, our web is build in a different language and we want to keep it in our servers.
At the moment, they are adding links on their site's menu to our subdomain, however, they want to keep the same header and the footer so that the user feels that they are on the same website.
For now, we are copying the html and inserting it in our template, but this is not a good solution for the future and we are having several problems due to javascript conflicts.
How can we solve this? An iframe does not allow us to modify its content, I think. Thanks in advance.

Don't know any good ways to do this client side.
First thought is to have all the pages link your Javascript to create the header/footer, but it's not good to require Javascript to display content.
HTML imports would really be perfect for this, but it not well supported. You can consider if you're willing to use a polyfill, like Google's webcomponents.
I feel like best approach here would be to do this somehow not on client side. Either use a server that lets you use a template engine, or some static site generator that supports templating.

In HTML, is there a way to find out if a particular string is variable?

Let's say I'm looking at a webpage that has a title including the year, such as "StackOverflow 2016". Is there a way, by inspecting the page source, to find out is this string is variable (function automatically updates it every year), or if it is a hardcoded string?

HTML is for navigator and is the result of a php (or python, etc.) script, so no you can't. (But you can if it is powered by javascript)

There is no way, unless the web site has been specifically coded to make that possible.
I know of one website that does enable marking the variables in its output, but even then, this functionality is turned off for most page requests – it doesn't work unless you explicitly turn it on for that request.
Certainly, there is no standard way in html to notice this.

If it is a string variable it is inserted when the page gets created, so to tell if it's hard coded or not you would have to have access to the file which constructs the page - usually a template or a PHP file, etc.. So no, you can't tell if it was a variable or a plain text just from inspecting the source on the client's side.

With one word: No, you can't...
Different scenarios:
By looking at the HTML alone... no, there is no way. Unless, in your specific case, you refresh the page at NYE or something like that... which is silly.
In the HTML is processed on the server.. there is still no way you can know if it is a hardcoded string or a variable.
There might be a chance to see that by looking at front-end source code if the HTML is processed in the client-side of the app...

You have to undestand that the web page you see is often generated by code that resides on a server, potentially miles away from you. When you ask for a web page you get simply an HTML page, no more.
So, generally, all the methods that generates the data you see on the page cannot be seen client side. Try to imagine what could happen if, let's say, StackOverflow would give you the "power" to see the logic the exists behind web pages of the entire app. You could use these information to do a lot of damage or to steal informations or complex algoritms.
I've said generally because data on web page could be generated by javascript, a client side language that can be used to modify the DOM.
In this case you could see if your string is update by a function.

As far as I'm aware there would be no way to know this as the source you can see will be what is rendered. So there will be no way to know if this was added with php or js etc.

Why AJAX over iFrames?

I am relatively new programmer, talking with a partner he told me, that before AJAX, he used a iframe to send data and change the content(obviously with help of JavaScript).
I understood that both are similar techniques, but i didn't find a article to describe their characteristic,
what are advantages of AJAX over Iframe ?
EDIT
i didnt find any explanation of the technique, but my partner told me he post the data trough a hidden iframe and submit the iframe, sound like just the iframe have to be refreshed, but i never did that

One advantage AJAX has is being able to read the state/status of the
request. You also have access to page headers, which you don't with
Iframes.
Ajax can handle multiple asynch requests. It's a little trickier
with Iframes as you need to create an Iframe per request (and keep
track of all of them to delete them later) instead of recycling the
same one.
Existing libraries are full of AJAX goodness and there is a larger community support base.

iframe
is a way show seperately two (or more) webpages in one
ajax
is a way to merge two (or more) webpages ( or new data ) into one
key advantages to Ajax I find are;
CSS will flow to the page called into it.
A way to retrieve data and update new information to the visitors without page refresh.
A fab mention to this site for it's clever use of Ajax.
A'Google instant' and suggestive searching is achieved via Ajax

Just my two cents:
I agree with Kris above that I wouldn't say they are comparable.
There's on use case that I find iFrames to be easier to work with over AJAX and that is if you need to submit a complicated form to another page but you don't need any response - the iframe route is by far the easiest to code.
Beyond that, AJAX, using a metaphor, acts a very knowledgeable go-between. It will handle multiple requests, the status of those requests, and hand back the data in the format you need.

I just wanted to add this because I didn't see in any of the answers.
The reasons to use Ajax are mostly about control, which you get a lot of. These reasons have been mentioned above.
One serious downside of Ajax, though, is that it is a JS fix. JavaScript is a great language, but people have been throwing it at every problem for a while now, and things which could be optimized if they were built in to the browsers, are now instead being done slowly (compared to compiled languages) with JS.
iFrames are a great example of this. They represent an incredibly common use case, wanting to include some html in some other html. Unfortunately, they aren't very amazing at it, often creating more headache than anything else.
If you want to include something and not have it mess with your site, nor your site to mess with it, iFrames are great. For the more common use case of including some random html in some other html, Ajax is better.
And here is the point I'm trying to make: this is dumb. There is no reason there shouldn't be something like an iFrame that acts more like Ajax. But, by jumping on board (as all of us did) with Ajax, we are now left with no choice.
The biggest reason this is a problem is that JS was never meant to be the absolute building blocks of the internet. Further, it's being used by pretty much every site around to violate user privacy. So, if you're looking for a good reason to use iFrames, this is mine:
It feels good to not need JS. If you can make your site improved by JS rather than dependent on it, that's a hard earned accomplishment, and the site will feel less "hacky" overall.
Anyways, that's just my input.

In my experience data loaded via AJAX is easier to manipulate versus data inside an iFrame. Also AJAX is really good for creating a better user experience. However I am not sure if I would necessarily put iFrames and AJAX in the same category because AJAX is asynchronous content and an iFrame is really just another page being loaded from outside of your site.
Also I could see iFraming creating SEO barriers and creating bad user experience. Honestly though if I had access to content I would prefer AJAX.

CSS Stylesheet-like adjustments

If I update the code on my CSS stylesheet, all pages that pull the code from that sheet will be updated with the adjustments made. Is there a way to do this with actual information that can be viewed a web page(s)? I want to make changes on one page and have all desired pages adjusted.
can anyone push me in the right direction or direct me to which tags I would use?
Thank you

There are several ways to do this, although none quite like CSS.
Server Side Code
This includes languages like ASP.NET, PHP, Ruby, and many others. Using server side code, you can create content areas that are usually controlled by a database (MySQL is a free database). When you store your content in a database, you can then pull that content out via server side code and place it on the page.
AJAX
AJAX is a relatively new method that also usually leverages the use of a database. Basically, when you need content, you send a call to your server (or database) via Javascript and it responds with the content you requested. You can then format the content how you wish. There are literally thousands of questions on StackOverflow about how to use AJAX. Most of them will reference jQuery.
Content Management Systems (CMS)
While this is similar to the first two methods I listed (in that they usually leverage one or both methods) CMSs are different because they abstract the need to actually do any of the work yourself. They are usually pre-built systems where you just plug in your content and make some tweaks and you're good to go. Some examples of CMSs are Wordpress, Joomla, and Drupal.
jQuery.load()
If you get into jQuery at all, there is an easy method you could use to kind of replicate what you're trying to accomplish (one file that controls all your content). While it is definitely not the most highly-recommended method, so long as your site is not too big, it could work nicely. Basically you would put all your content into an .html file and separate them into divs with ids. Then to pull content from that file, you would use jQuery.load() plus the page fragments option (scroll down a bit on the jQuery.load() page) to pull in the desired content. Again, this is not really how I would go about doing it, but it is an option for a small bit of content you want to quickly change on the fly without incurring the overhead of setting up and maintaining a database.

If I understand you correctly, you want to apply the ideas of CSS (provide some handy definitions, use them everywhere) to "the rest of the HTML code".
If you are on a web server, you can do that using one of these technologies:
Server Side Includes
PHP
JSP
and probably many more that allow external file inclusion.

Sounds like you need either server-side includes or JavaScript AJAX loading.

If it's just the tag that you only want to know, then there are only two available tags (or markup) to call JavaScript codes. It's either:
Inline: <script> ... code goes here ... </script>
External: <script src="filepath.js"></script>.
But if you are dealing with XHTML, then you have to include a CDATA between the <script> and </script>, e.g. <script><![CDATA[ ... inline code goes here ... ]]></script>.
However, if that doesn't answer your question, then a tag is not what you need, but JavaScript codes.

HTML Snapshot for crawler - Understanding how it works

i'm reading this article today. To be honest, im really interessed to "2. Much of your content is created by a server-side technology such as PHP or ASP.NET" point.
I want understand if i have understood :)
I create that php script (gethtmlsnapshot.php) where i include the server-side ajax page (getdata.php) and i escape (for security) the parameters. Then i add it at the end of the html static page (index-movies.html). Right? Now...
1 - Where i put that gethtmlsnapshot.php? In other words, i need to call (or better, the crawler need) that page. But if i don't have link on the main page, the crawler can't call it :O How can crawler call the page with _escaped_fragment_ parameters? It can't know them if i don't specific them somewhere :)
2 - How can crewler call that page with the parameters? As before, i need link to that script with the parameters, so crewler browse each page and save the content of the dinamic result.
Can you help me? And what do you think about this technique? Won't be better if the developers of crawler do their own bots in some others ways? :)
Let me know what do you think about. Cheers

I think you got something wrong so I'll try to explain what's going on here including the background and alternatives. as this indeed a very important topic that most of us stumbled upon (or at least something similar) from time to time.
Using AJAX or rather asynchronous incremental page updating (because most pages actually don't use XML but JSON), has enriched the web and provided great user experience.
It has however also come at a price.
The main problem were clients that didn't support the xmlhttpget object or JavaScript at all.
In the beginning you had to provide backwards compatibility.
This was usually done by providing links and capture the onclick event and fire an AJAX call instead of reloading the page (if the client supported it).
Today almost every client supports the necessary functions.
So the problem today are search engines. Because they don't. Well that's not entirely true because they partly do (especially Google), but for other purposes.
Google evaluates certain JavaScript code to prevent Blackhat SEO (for example a link pointing somewhere but with JavaScript opening some completely different webpage... Or html keyword codes that are invisible to the client because they are removed by JavaScript or the other way round).
But keeping it simple its best to think of a search engine crawler of a very basic browser with no CSS or JS support (it's the same with CSS, its party parsed for special reasons).
So if you have "AJAX links" on your website, and the Webcrawler doesn't support following them using JavaScript, they just don't get crawled. Or do they?
Well the answer is JavaScript links (like document.location whatever) get followed. Google is often intelligent enough to guess the target.
But ajax calls are not made. simple because they return partial content and no senseful whole page can be constructed from it as the context is unknown and the unique URI doesn't represent the location of the content.
So there are basically 3 strategies to work around that.
have an onclick event on the links with normal href attribute as fallback (imo the best option as it solves the problem for clients as well as search engines)
submitting the content websites via your sitemap so they get indexed, but completely apart from your site links (usually pages provide a permalink to this urls so that external pages link them for the pagerank)
ajax crawling scheme
the idea is to have your JavaScript xmlhttpget requests entangled with corresponding href attributes that look like so:
www.example.com/ajax.php#!key=value
so the link looks like:
go to my imprint
the function handleajax could evaluate the document.location variable to fire the incremental asynchronous page update. its also possible to pass an id or url or whatever.
the crawler however recognises the ajax crawling scheme format and automatically fetches http://www.example.com/ajax.php.php?%23!page=imprint instead of http://www.example.com/ajax.php#!page=imprint
so you the query string then contanis the html fragment from which you can tell which partial content has been updated.
so you have just have to make sure that http://www.example.com/ajax.php.php?%23!page=imprint returns a full website that just looks like the website should look to the user after the xmlhttpget update has been made.
a very elegant solution is also to pass the a object itself to the handler function which then fetches the same URL as the crawler would have fetched using ajax but with additional parameters. Your server side script then decides whether to deliver the whole page or just the partial content.
It's a very creative approach indeed and here comes my personal pr/ con analysis:
pro:
partial updated pages receive a unique identifier at which point they are fully qualified resources in the semantic web
partially updated websites receive a unique identifier that can be presented by search engines
con:
it's just a fallback solution for search engines, not for clients without JavaScript
it provides opportunities for black hat SEO. So Google for sure won't adopt it fully or rank pages with this technique high with out proper verification of the content.
conclusion:
just usual links with fallback legacy working href attributes, but an onclick handler are a better approach because they provide functionality for old browsers.
the main advantage of the ajax crawling scheme is that partially updated websites get a unique URI, and you don't have to do create duplicate content that somehow serves as the indexable and linkable counterpart.
you could argue that ajax crawling scheme implementation is more consistent and easier to implement. I think this is a question of your application design.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008