What is a html snapshot ( for google crawler ) - html

I've been searching for an example of a "HTML Snapshot" , for google bot crawler, but still i don't have a clue what the "snap shot" is like ? From the way i understand, i figure it's my page's html put together into a large string ?
Thanks alot !

You are right, that's exactly what it is. A HTML snapshot is the static HTML code that you want Google to find when crawling your Website. 10 years ago, it was pretty much the same thing as the HTML source code. Today, especially in SPA (single page applications), the HTML changes without reloading the page. That means that there is not always a proper URL associated to every HTML possible. The snapshot is that kind of generated HTML that you want to present to Google.
That's why you can find products such as https://ajaxsnapshots.com/
It's a Javascript code that take "pictures" of your HTML pages as they are generated to make sure that the code fetched by Google bot is meaningful.

Related

Add html to a site in a site (proxy)

I imported a web proxy from github known as rhodium on to replit, and, after some editing was satisfied with the results, but i cant seem to add HTML to a site that is proxied. Example: You use rhodium to navigate your way to www.discord.com, but you want HTML added to the page, "yourdomain.example/service/https://discord.com/". I looked at the files and online, but I wasn't able to find a way to edit the index.html of that specific page, but frankly I am extremely new to html. (and to a lot of things web-development).
https://github.com/LudicrousDevelopment/Rhodium
Any help available?
Based on what i know, you can't. Because of the security parameters. You can't attach or redirect a website which isn't on the same directory/server.
You can, however redirect to that site, inside or outside, freely.

How do websites keep the header on every page if the header is in html?

So, I'm trying to make a website, but the problem is I can't find the most effective way to keep the header on every single page. My header is HTML code, and it is the most important source of navigation on the website. The tabs navigate using links to other HTML files (all located locally on my computer) and so every single new page is another separate HTML file. Here are the many different methods I used that all fell short in one way or another:
The most basic way: Copying the header code to EVERY HTML page on the website. I am currently using this method, and it is probably the most ineffective and stupid method ever. The downside (which is pretty obvious) is that not only is it tedious but every time I make a change to the header (like maybe add different menus, add another tab, change the image, etc.) I have to copy the new header code to everything else. That is ridiculous!!
I tried using the w3schools method of implementing a separate HTML file (with only the HTML code) onto the page HTML files. So, I have this 1 HTML file for the header that every page uses so I make a change in that one file and it automatically applies to everything else. However, it didn't let me organize the numerous HTML files effectively because unlike referencing a stylesheet like some file named 'style.css', it doesn't let me put the HTML sheet in a folder that doesn't share the same parent folder as the referencing HTML page files. Hopefully that made sense, but basically, I couldn't get a folder that separated the HTML menu tab files ("pages") and the HTML content files ("posts") without the w3school code failing. Here's the link: https://www.w3schools.com/howto/howto_html_include.asp
I've seen other options on Stackoverflow, like getting around the "can't implement HTML files" by using js files with html code in a document.write(), but this to me is very hard to use because of all my progress so far. Also, I am very uncomfortable with the idea of using document.write because it is probably still very different from a true html file. Seriously, why is there no HTML implementing system that stylesheets and scripts have??? (script src="b.js" script and link rel="stylesheet" href="css/style.css" type="text/css")
Using jQuery. I understand this the least (being an amateur programmer) but I've heard it isn't consistent either. It doesn't seem to work on a local file, and that sounds like a nightmare. Though, if there are good suggestions, having a jquery file tag along seems not the best solution but still a plausible solution.
So, I'm in great trouble. How do other websites do this? Do they use different files??? Do they use PHP files?? Am I going to have to scrap all my hard header HTML work and styling because PHP is another language?? Do I have to use Angular.js??? This is so complicated!
Hopefully, this question made some sense. Please ask if you have questions. Thanks in advance.
UPDATE
After checking numerous other posts on Stackoverflow suggesting PHP, I got my HTML files and then renamed it from "index.html" to "index.php", and holy macro it actually still behaved like an HTML file even if it wasn't!! Now I need to find a way to put:
include("header.php");
into my page PHP files that are actually in HTML code to reference a separate PHP file that has my header. How do I do that? Does it belong in like script tags or something? How do I add PHP code in a PHP file written in HTML code? Thanks for the answers to my previous question, I'm so sorry I should've read the answers on Stackoverflow more thoroughly first.
So I know it's been awhile since I asked the question and probably nobody cares anymore, but I just want to post an update after finding a solution to my question about using php code and how it all works.
First, I learned that in order for this to work, all my files had to be in php format. So I pulled up my folder of my local HTML files and literally just renamed it from something like "index.html" to "index.php". Then, without changing the HTML code, I opened it up in my browser and it was like nothing happened, except it was better! Now it can not only read HTML and style and script codes, but also php codes as well! I added:
<?php
include("header.php");
?>
to the top of my index.php file, for example, and then converted the rest of the files into php format like I did for this. I copied over my header html and css code and saved it in a separate php file in the same folder, and - there was no header. I was confused. What?? Why is it not doing anything? The header.php itself is working, why is the include function not??
Then, I learned that this php include code can't be executed on my local drive, so it doesn't work on my local drive but works when it is public and on a real website hosting service. I then installed XAMPP, which is a commonly used PHP development environment that is an Apache distribution and is totally free. It runs a sort of local hosting service that will support this php code and cause it to execute the way I intended it to. I'm sorry I'm not good at explaining how this works, as I just find it and use it. Anyways, XAMPP did make the php code included above actually do its job and I finally got the header-system I always wanted. Happily ever after, right?
Nope. Now that fundamental stuff is gone, I have to face other problems like formatting (a real pain in the a** considering how I have to find css problems in tons and tons of overlapping code), creating an entire personal search system (having to figure out how to make a php file actually use my brand new MySQL database, which is also run by XAMPP), and lots of other things. But, that sounds like a great adventure that I am willing and definitely eager to go through. Now, finally I am done blabbing for the day...I wonder how many hours of other people's time I just wasted.
Oh yeah..I forgot to mention, happy Fourth of July! (and happy birthday to the beloved Captain America)
Using JavaScript and jQuery is a very easy way to accomplish this. First, just build a sample JavaScript file. Inside, make functions that are run on page load. For example,
function buildPage() {
var html = ' ';
//Build the html through the function
//In the end...
$('html-id').empty().append(html);
}
This way each time the html is built you can just empty(clear whatever is in the id 'html-id') and then add your specific html. For example,
<html>
<head>Put header here!</head>
<body>
<div>Put tabs with onclick events here</div>
<div id="html-id"></div>
</body>
</html>
Each time a different tab is clicked, the buildPage() function should be called in order to build the page accordingly. No multiple html headers needed!
Write something like that
<html>
<head>
<title>First page</title>
</head>
<body>
<?php include ("header.php"); ?>
<!-- rest of your code -->
?php include ("footer.php"); ?>
</body>
</html>
It's recommended to do with that way. Wordpress is working like that too. Include files to main php file.
**Notice all your files have to be .php
Maybe this can help:
Include another HTML file in a HTML file
You can make one header.html and include it in all other html files of your website.

an html tag for displaying html received from a specific url

I created an API of sorts, that when you navigate to it, returns information in html.
On my website, I would like to have the web page reach out to the API and display the information as part of the web page (sort of like a webpage reaches out for an img). What HTML tag would be best suited to achieving this result? I came across the and tags but not really sure which would be best.
I am building this myself thus have full control over how the content is delivered back to the page. Is there specific pattern that is used for such "modular" sourcing of information? I could rewrite my website to - prior to serving the web page - reach out to the api and pull the info itself and then include the results in html but a) this would be more complex and require changes in several places b) will become really complex as the number of such api call results I would want to include increases.
You can use Iframe for this purpose and when you recieve html which you want to display , you can simply set html content in that iframe's ID :
document.getElementById('myIframe').contentWindow.document.write("<html><body>Here is your html</body></html>");
Hope this helps.
As far as i know, using iframes is rather depricated. I always use div-tags for such tasks.
document.getElementById("targetdiv").innerHTML = "New HTML-Content";
More info on divs: http://www.w3schools.com/tags/tag_div.asp

live content from html to html

I'm using UIWebView to display data from my organization data (publicize and legal), however, for instance, I would only want to pull specific data from the html file rather than pulling the whole URL. e.g. I want to pull the "News" section of the html and I want the user to only stay in that page, not enabling them to go into other parts of the website (e.g. home page, contact us) and allowing them to view the PDF article on the HTML file.
I've asked around and read up on DOM and screen scraping, but it seem that the data pulled are stored in a database instead.
Is there any way that I can pull just the HTML "News" section with the PDF URL into my customized HTML file and that it will be updated live (maybe every 30second it will refresh and pull information from the website so that the content and list of PDF are up to date)(e.g. added in 3new article into the main website, my customize HTML file will also refresh and pull information from website and update my article list)
If anyone can point to me a specific method that allow HTML to HTML data passing (live), that will be great and I can go do more research on it. Currently very lost and confuse as it is my first time doing this. Any help/feedback will be very much appreciated :)
EDIT: For example, google map or google search. I don't want to use the whole google webpage, just taking the important thing that i want like the search result or map display.
This will involve quite a lot of learning on your part - you'll have to learn HTML / the DOM / JavaScript and iOS/UIWebVIew.
Lets leave the live refresh part for now, I'll post another answer or edit to that later on.
That's not going to easy either (check out my earlier posting today on background execution issues that will affect you, unless the update is only to take place in the foreground
iOS Run Code Once a Day)
You will have to do something like this. And note that I've never tried this, nor seen posting of people who have on here, but in theory it should work, but there will be a lot of learning as I've said, and lots of trial and error. Its a big task when you're not familiar with these things.
1) Download the html page and load it in a UIWebView, but that UIWebView is hidden so the user's can't see it.
2) When the page has loaded its dom will be accessable.
3) You can use Javascript to access the DOM and look for the parts you want.
How you inject and run the Javascript in UIWebView can be answered in a separate question (this answer will get too long if all the exact details are included).
4) Remove the parts of the dom you are not interested in. Or use use events to make only those parts you are interested in appear, jQuery can probably help here.
5) Display the UIWebView
Alternatively the HTML could be saved to a file and string parsing could be used to search for the bits you are looking for and create a new text html file from it. I think this would get very messy, better to take advantage of the fact that UIWebView will parse the HTML page and create the dom for you.

How should I handle autolinking in wiki page content?

What I mean by autolinking is the process by which wiki links inlined in page content are generated into either a hyperlink to the page (if it does exist) or a create link (if the page doesn't exist).
With the parser I am using, this is a two step process - first, the page content is parsed and all of the links to wiki pages from the source markup are extracted. Then, I feed an array of the existing pages back to the parser, before the final HTML markup is generated.
What is the best way to handle this process? It seems as if I need to keep a cached list of every single page on the site, rather than having to extract the index of page titles each time. Or is it better to check each link separately to see if it exists? This might result in a lot of database lookups if the list wasn't cached. Would this still be viable for a larger wiki site with thousands of pages?
In my own wiki I check all the links (without caching), but my wiki is only used by a few people internally. You should benchmark stuff like this.
In my own wiki system my caching system is pretty simple - when the page is updated it checks links to make sure they are valid and applies the correct formatting/location for those that aren't. The cached page is saved as a HTML page in my cache root.
Pages that are marked as 'not created' during the page update are inserted into the a table of the database that holds the page and then a csv of pages that link to it.
When someone creates that page it initiates a scan to look through each linking page and re-caches the linking page with the correct link and formatting.
If you weren't interested in highlighting non-created pages however you could just have a checker to see if the page is created when you attempt to access it - and if not redirect to the creation page. Then just link to pages as normal in other articles.
I tried to do this once and it was a nightmare! My solution was a nasty loop in a SQL procedure, and I don't recommend it.
One thing that gave me trouble was deciding what link to use on a multi-word phrase. Say you had some text saying "I am using Stack Overflow" and your wiki had 3 pages called "stack", "overflow" and "stack overflow"....which part of your phrase gets linked to where? It will happen!
My idea would be to query the titles like SELECT title FROM articles and simply check if each wikilink is in that array of strings. If it is you link to the page, if not, you link to the create page.
In a personal project I made with Sinatra (link text) after I run the content through Markdown, I do a gsub to replace wiki words and other things (like [[Here is my link]] and whatnot) with proper links, on each checking if the page exists and linking to create or view depending.
It's not the best, but I didn't build this app with caching/speed in mind. It's a low resource simple wiki.
If speed was more important, you could wrap the app in something to cache it. For example, sinatra can be wrapped with the Rack caching.
Based on my experience developing Juli, which is an offline personal wiki with autolink, generating static HTML approach may fix your issue.
As you think, it takes long time to generate autolinked Wiki page. However, in generating static HTML situation, regenerating autolinked Wiki page happens only when a wikipage is newly added or deleted (in other words, it doesn't happen when updating wikipage) and the 'regenerating' can be done in background so that usually I don't matter how it take long time. User will see only the generated static HTML.