Integrate pages - html

I need to integrate a page from another site on to my own. What are the ways to do it ,,
apart from <object> and <iframe>? What if the site does not provide an RSS feed?

If the site doesn't provide some API to access its contents and you don't want to use iframes your only solution is to use site scraping: have a server side script that will send an HTTP request to the remote site, fetch the HTML, parse the HTML and pass it to your views. Obviously this could raise some copyright concerns so you should make sure that you have read the remote site policy about this. It is also extremely fragile and it might break at the moment the remote site changes its structure because you will be relying on HTML parsing.

Related

Looking for a script or program to convert all HTTP links to HTTPS

I am trying to find a script or program to convert my html website links from http to https.
I have looked all over hundreds of search results and web articles and I used the Word Press SSL plugin but it missed numerous pages with http links.
Below is one of thousands of my links I need to convert:
http://www.robert-b-ritter-jr.com/2015/11/30/blog-121-we-dont-need-the-required-minimum-distributions-rmds
I am looking for a way to do this quickly instead of one at a time.
The HTTPS Everywhere extension will automatically rewrite unsecure HTTP requests to HTTPS. Keep in mind not all websites offer a secure and encrypted connection.

Most Streamlined Way to use Basic Authentication with Web Application and CDN

I have a site whose pre-production environments use HTTP basic authentication to prevent unauthorized access. Recently, we've added a CDN (AWS Cloudfront) and we intend to use basic authentication (FWIW, using Lambda#Edge) for those pre-production CDN environments, as well.
While we've already implemented basic authentication on the web application (we're able to access the site after authentication), and have rudimentarily implemented basic authentication on the CDN (we're able to, say, access an image directly, after authentication), we're having trouble combining the two.
The web application includes images in the normal ways (e.g., via HTML and CSS includes). For instance, my site, https://www.example.com, has the following in its HTML:
<img src="https://cdn-files.example.com/foob.png" />
Using Chrome, when hitting the web application, I get a double-challenge (one for the app's domain and one for the CDN, each in turn), and the image loads.
Using Firefox, I get a single challenge, and the page loads, but the image fails to load (that request's response is 401).
Question 1: (Most streamlined option.) Is it possible, through the right configuration settings, to get the browser to pass through the credentials from the app's domain to the CDN domain? If so, what configurations are needed?
If not:
Question 2: (Less streamlined: Double-challenge.) What's the right combination of configurations (presumably, headers, etc.) to get the images, etc., to load on the web app?
I would prefer not to embed the credentials in the URLs, if at all possible.

REACT spa app - serving separate and different prerendered static html for SEO, benefits and drawbacks

Are there any benefits or drawbacks if you serving light version of page optimized for SEO if bots crawls and if people come from web then react SPA which completely javascript application.
Basically question is, is there practice to actually serve like short HTML version which contains only SEO important things and rips off everything else for bots and full page for users.
Is there any use case or example that somebody have used this technique?
This would be seen as Cloaking by the crawlers and could get your site penalized in the search results. If you are serving a prerendered page, you will want to make sure it is the exact page that your users will see after the javascript has been executed in order to prevent any cloaking issues.
You'll could mount prerender:
The Prerender.io middleware that you install on your server will check each request to see if it's a request from a crawler. If it is a request from a crawler, the middleware will send a request to Prerender.io for the static HTML of that page. If not, the request will continue on to your normal server routes. The crawler never knows that you are using Prerender.io since the response always goes through your server.
The Prerender.io middleware that you install on your server will check each request to see if it's a request from a crawler. If it is a request from a crawler, the middleware will send a request to Prerender.io for the static HTML of that page. If not, the request will continue on to your normal server routes. The crawler never knows that you are using Prerender.io since the response always goes through your server.
Quora SEO post
taekwondomonfils.com SEO

getting information from a website in processing?

I am currently making a processing program, where a part of it will be to acess some information from at website. The website will be an HTML file, where some information is stored, which i need to acess and parse. I know how to open a html file, but my problem is that it is supposed to acess a list, which is generated after a login on the website. How do i do that?
This is the website, right after loading the HTML file:
http://i.imgur.com/kGIkyle.png
After a login, the website will begin to spit out data every two seconds.
I wanna acess the data in the ordered list, and i wanna acess it every two seconds in my processing program. How do i do that?
This is the website, after a login, after a moment.
http://i.imgur.com/O743fNJ.png
When you use a web browser to submit a login, you're really interacting with the server. Usually the web browser submits a POST request containing the login information (like a username and password), and the server responds with the next webpage to load.
The details of this are going to depend on the website you're interacting with. Some websites might use AJAX to submit the data and then trigger some JavaScript to run.
The point is, you're going to have to understand exactly how the underlying web server and webpage works. Then you're going to have to use the rules of those interactions to issue the appropriate requests from your Processing code.
It might be as simple as submitting the login credentials in the url itself and then just scraping the information from the webpage.
More likely, you're going to have to interact with some kind of web API and do the requests yourself. Google "Java post request" for more info.
Of course, all of this assumes that the website is open to people using it. If this website isn't yours, it could also be locked down and unavailable to you.

Generating a web site from xsn files

As we all know, the infopath forms service residing on a sharepoint server generates a web site each time we publish an inforpath form template to the sharepoint server.
Here is the question: how does sharepoint do that. Is there any way for us to do that programmatically via some kind of api provided by MS?
In fact, what I need to do is getting all the html, js, css etc. files and applying some kind of operations like deleting some divs or insert some html code into the particular web page. I have come up with two ways to do this.
Generating the web page via sharepoint api and apply those operations at the same time
Extracting the web page files from the IIS server and apply those operations
I am totally new to this kind of work. All in my mind is that each time we right click on a web page in the browser and choose to save the web page, the browser gets some of the files we need to render the web page and makes it possible for us to browse the web page offline.
httrack
WinWSD
and tools like that seems to work fine with extracting html files from online web pages but not that well with js, css files.
Now I am trying to dig into the chromium project for some kind of inspiration, although whether it helps or not is unpredictable.
Any kind of advice will be appreciated.
Best Regards,
Jordan
Infopath xsn files are just zip files with a different extension. you can rename the extension to .zip and extract out the files. you will find a number of files that make up the form. the two main ones are the .xml and .xsl files. the .xsl will have the html to generate when applied to the xml.