Sitemap for site with few dynamic HTML files and many possible URLs - html

I'm nearing the end of my first web development project and I'm looking to build a sitemap for our website as part of Search Engine Optimalisation. If I understand correctly a sitemap, when done correctly, is a file that shows a content tree (similar to paths in windows explorer) to all the public pages of my website.
For the purpose of my question you're going to need some background information on the site and how it works. The site is about bird migration, a user enters the site on a homepage that holds a searchbox, he or she is able to search for a species of birds and if we have data on it the user is able to go to a seperate page with information on this bird. From there the user can access statistical data about this species. The page will look something like below, filled with content that we get from a database.
The URL will look something like http://domain.com/searchbird.html?bird=Sedge%20Warbler?lang=1 for the informational page, and http://domain.com/statistics.html?bird=Sedge%20Warbler?lang=1 for the statistical page.
Every bird species uses the same base HTML file (searchbird.html) that is filled with data based on the ?bird="" parameter. I have about four HTML files in my webroot (lets call them: index.html, searchbird.html, statistics.html, about.html).
So when I go to create a sitemap using some sort of sitemap generation tool, I get a sitemap that contains those 4 .html files, which is great! Yet I'm missing the 500 bird species that users are going to be able to find.
Is there a way for me to include every possible URL in the sitemap automatically, and how would I go about doing such a thing? I've used HTML, CSS and Javascript in the past. but I'm only a beginner. If an executable tool exists for this that'd be great, but my Google searches haven't been successful yet.

You have to generate the list of URLs for your existing pages.
So dig into your data source (database or whatever you use), find all existing bird species, and generate the two URLs per species.
Directory for users/bots
It would probably be a good idea (for visitors as well as for bots) to output these links on your website, too. Visitors would have two ways to find a species (search for it or browse the directory), and as most bots don’t use search functions, they wouldn’t be able to find the links on your site otherwise (they would have to use your sitemap, which not all bots do, or they would have to hope to find the links from some other external website).
(If you do this, you could also use a sitemap generator service; but it’s usually better do generate it yourself.)
URL design
By the way, you might want to consider changing your URL design to a more human-friendly one. Instead of
http://example.com/searchbird.html?bird=Sedge%20Warbler?lang=1
http://example.com/statistics.html?bird=Sedge%20Warbler?lang=1
you could use something like
http://example.com/en/birds/sedge-warbler
http://example.com/en/birds/sedge-warbler/statistics
where en is the language code for "English" (these are standardized, and users have a chance to understand them, contrary to lang=1), and where http://example.com/en/birds could lead to the page listing all species. For other languages, you would of course ideally translate "birds" and "statistics".
Changing the URL design is possible with URL rewriting.

U can use sitemap generator. U can use https://www.xml-sitemaps.com/. U only need put url index. That website will search all link and generate sitemap automatically.
If u use wordpress u can use plugin wordpress like https://wordpress.org/plugins/google-sitemap-generator/.
Hope that help

Related

How to convert Javascript, CSS, and HTML content into a interactive-pdf or .h5p page

I have a webapp that let users place dots on sitemap and link them to images.
The web app uses Javascript, CSS, and HTML.
phase1
While the user is subscribed he uses a rich set of functionalities to:
add dots on the sitemap and link them to images
edit the dots: move, delete, link momultiple images etc ..
etc..
This is done via the website that hosts the webapp.
phase2
When the user ends the subscription, he gets a .zip file with the information that he created (sitemap, images, links between the sitemap and the images, etc..).
The user can then connect to the website that hosts the webapp, without signing in and get a subset of the functionalities (e.g. he can only click on the dots and see the linked images, but he can no longer edit the dots or add images).
I want to change phase2.
Instead of interacting with the webapp on the website, I want to "freeze" the webapp into a interactive-pdf, or h5p page that can be played independently without the webapp.
There are multiple reasons that motivate to do this:
the webapp is complex, so engaging with the webapp is prone to more errors.
If the small subset functionality of the final data, which boils down to showing the image when clicking on the hyperlink, can be done via h5p browsing, then the risks for runtime errors are greatly reduced.
the interactive-pdf or .h5p file can be browsed by variety of tools potentially even when being offline.
the end product can be re-designed to appear more simple.
My questions:
is it possible to programatically convert the Javascript, CSS, and HTML content into a interactive-pdf or .h5p page?
Every end-product will be different (e.g. by the number of dots, and their location in the sitemap) so having to manually create the .h5p page every time is not practical.
are there mobile apps (e.g. on Apple Store, or Google Play) that can read .h5p content locally, e.g. when the device is offline?
Thanks
EDIT:
Oliver Tacke, thank you for replying.
Up to few days ago, looking for a solution to my problem, I did not hear about h5p at all.
When looking into h5p, I see that
many comments rlated to h5p that is a bit old - from ~5/6 years ago.
h5p is frequently talked in context of education (e.g. Moodle)
when I filed the question I could not even find a tag for 'h5p'
I could not find forums for h5p in mainstream channels like Discourse or Slack
So I want to know if I'm in the right direction at all.
Is h5p a new thing that just takes time to pick up, or is it something that started a while ago and dwindlled down,
or maybe I'm wrong and it is currently more active than I think (I'm aware of h5p.org and I do see activity there).
Basically, I want to create interactive content that can work
ideally offline, or
online but with a mainstream browser/tool/website (i.e. without needing my special website)
In the design industry, I know there are interactive catalogues.
But I don't know if the user can download them and somehow (e.g. with an epub reader) read them.
Thanks
I don't know anything about creating PDFs programmatically, so I can only offer a partial answer for the H5P related part. Given the broad scope of your question, this may be acceptable as a comment.
H5P content follows a specification that is documented at https://h5p.org/documentation/developers/h5p-specification.
You would basically have to implement an H5P content type library (file) from the files that you are given by the service. I assume that the JavaScript and CSS files are always the same, then those could be reused directly (but potentially not legally). You would also have to add some more JavaScript that takes parameters and generates the HTML output that you get from the service. You would then have to model semantics.json to suit the parameters, and then you essentially have an H5P content type. You don't have to use the then available form based editor (which probably wouldn't make sense), but you could create the content.json file programmatically and put it into the H5P content file archive. To create that file programmatically, you'd have to create a converter that identities the parameters in the HTML file generated by that service and transform them into the H5P semantics/content format. Not sure if it made more sense to rather create an editor widget for H5P, so you wouldn't have to depend on the other service at all.
There are currently no known mobile apps that allow you to load and run H5P content. They are on the roadmap of the H5P core team, but I wouldn't expect them to work on those any time soon. There's the moodle app for the moodle LMS that allows to use H5P content offline, but it needs to be fetched from a moodle instance. There's Lumi that allows to run H5P content locally on Windows, MacOS and Linux, but not on Android or iOS. However, Lumi also allows to create single standalone HTML files from H5P content containing all the content and logic ready to play, so that would allow offline use on Android and iOS.

Sitemap generator finds only one page

I've tried to generate an XML sitemap from different sites (XML-sitemaps, Seoutility, MrWebMaster) but they all create a one-page sitemap.
In other words, they can't find subdirectories into my website.
I've tried all www.mydomain.online, mydomain.online, https://mydomain.online, and https://www.mydomain.online URLs, but nothing changed.
I didn't implement URL rewriting, so my website structure is:
folder
|-----index.php
|-----something.inc.php
|-----subfolder1
|-----index.php
|-----somethingelse.inc.hp
|subfolder2
|-----index.php
|-----foo.inc.php
|-----bar.inc.php
|-----subfolder4
|-----index.php
|-----baz.inc.php
|subfolder3
|-----index.php
and so on...
The problem is that even Gooogle Search Console indexing system does only find one page into my website (and that's why I decided to build a sitemap).
Do I have to change something in my configuration or tell it to the generators in some ways to make them work?
My guesses:
your site's rendering is javascript-based and you use no technic to emulate urls,
your pages aren't linked with each other, beside of the startpage.
Without to see your site in natura it isn't possible to say more.

Using an HTML template across multiple subdirectories

I run a website for my photography where I have a stories page (http://www.traumantic.com/stories.htm) that is a long list of choices that lead to a sub folder and a gallery of images for that session.
I have an index.htm file in each of those folders that displays the gallery chosen.
I am trying to develop a new format for my pages, and putting it in place means replacing dozens of index.htm files and editing each one for that new format. A boatload of work.
I have noted that a lot of news sites seems to have a method of using a single template for the main body of the page and the elements of the news story are pulled in from another source.
I figured I could do this with XML like I did with my galleries, but I am lost.
I tried creating an XML file in a couple of text folders and then reading that form an HTM file two levels up. Didn't work.
Currently when you click on a link on my stories page, it opens the index.htm file in a sub-folder.
What I want to happen is this.
Clicking on a choice on my stories page launches an html template that reads the details from the folder.
The one html template would be used for all of the different story folders below. Making it far easier to modify the look of my web site quickly.
I'd rather put a ton a of work into designing this system that doing a mass replace and edit project on hundreds of files.
I hope this makes sense to some of you and that you can guide me to some study topics that will help me learn how to do this.
I am seeking advice on places where I can see example of this process.
The simplest option is to use an iframe
https://www.w3schools.com/tags/tag_iframe.asp
<iframe src="/path/to/file.html"></iframe>
Searching "html include" will yield a few guides that have various JavaScript implementations. (e.g., https://www.w3schools.com/howto/howto_html_include.asp)
If you're able to run php, you could use include
https://www.w3schools.com/php/php_includes.asp
But at that point, you might want to consider installing some sort of template engine like twig https://twig.symfony.com/doc/2.x/intro.html

Mediawiki - Automatic two-way links between page sections

I want my MediaWiki install to have two classes of pages. (In the users' eyes - the wiki won't have to know the difference.)
I want some pages to be on topics, and others on sources (name of book, video, etc.)
I want to have a topic page "FAA Licenses" like:
==Medical Certificates==
===3rd Class===
Required for student license, and before student solo flights. {{{link/reference/whatever generally around here to Jeppesen Book#pg27-28}}}
And a source page "Jeppesen Book" like:
==pg27-28==
{{{link to FAA Licenses#3rd Class}}}
These source pages will track the source's (book or video) content. I imagine a source page for a book to have page numbers, and for a video to have start and stop times, or section numbers. (The book or video itself won't be on the source pages.)
So, the source pages will really serve two purposes. First, it will be fairly easy to see which parts of the sources have had notes taken and put into the topic pages. (So non-linear note-taking of sources will be easy -- skipping from source to source on topics, rather than digesting an entire source at once.) Second, it will be easy from a topic page to see where to go back to for a more in-depth review.
There's two issues I'm writing about.
(1) I want the workflow to be the user edits the topic page, putting in links to source pages and sections. I want this one user-addition to automatically make the source page link back to this spot. I want the system to handle the two-way-linking, assuming the user won't be perfect.
(2) I want the user to be able to put links in the topic page to source pages and sections that might not exist yet. I'd need those links to show up as red, to indicate they need to be created. But, still, once created, I want the system to handle the two-way-linking, even if there were multiple red links to the same area. (I could see building up quite a few red links, then having an unorganized "purge" of them by creating the missing pages and sections, and don't want to have to search for all the links to the new areas.) Ideally, I'd love for these source pages to be auto-generated -- so pages and sections were made as links were made to them, and automatically deleted (or at least the backlinks removed) as links were removed to them.
I don't think the MediaWiki what links here functionality does the job. I want this to work on a per-section rather than per-page basis. And, I don't want the user to have to add to each section a "what links here tag" -- I want it to be automatic.
The extension Semantic MediaWiki will allow you to get bidirectional linking in a semi-automatic fassion.
https://www.semantic-mediawiki.org/wiki/Help:Link_Template
shows a high level example.
If you dig deeper into SMW and SemanticForms you'll find how with e.g. SemanticForms you can get a user experience that is close to what you are asking for.
See e.g. http://smw.referata.com/wiki/Discourse_DB and http://www.discoursedb.org/wiki/Main_Page for an application of these principles.
I don't think there is an easy way to do that. You could write an extension that provides a parser function that your users can enter, save the source page + source section + target page + target section in a database at links update, then use the ParserSectionCreate hook to show links based on that. Or you can create two types of templates and write a bot that keeps them in sync.

How should I handle autolinking in wiki page content?

What I mean by autolinking is the process by which wiki links inlined in page content are generated into either a hyperlink to the page (if it does exist) or a create link (if the page doesn't exist).
With the parser I am using, this is a two step process - first, the page content is parsed and all of the links to wiki pages from the source markup are extracted. Then, I feed an array of the existing pages back to the parser, before the final HTML markup is generated.
What is the best way to handle this process? It seems as if I need to keep a cached list of every single page on the site, rather than having to extract the index of page titles each time. Or is it better to check each link separately to see if it exists? This might result in a lot of database lookups if the list wasn't cached. Would this still be viable for a larger wiki site with thousands of pages?
In my own wiki I check all the links (without caching), but my wiki is only used by a few people internally. You should benchmark stuff like this.
In my own wiki system my caching system is pretty simple - when the page is updated it checks links to make sure they are valid and applies the correct formatting/location for those that aren't. The cached page is saved as a HTML page in my cache root.
Pages that are marked as 'not created' during the page update are inserted into the a table of the database that holds the page and then a csv of pages that link to it.
When someone creates that page it initiates a scan to look through each linking page and re-caches the linking page with the correct link and formatting.
If you weren't interested in highlighting non-created pages however you could just have a checker to see if the page is created when you attempt to access it - and if not redirect to the creation page. Then just link to pages as normal in other articles.
I tried to do this once and it was a nightmare! My solution was a nasty loop in a SQL procedure, and I don't recommend it.
One thing that gave me trouble was deciding what link to use on a multi-word phrase. Say you had some text saying "I am using Stack Overflow" and your wiki had 3 pages called "stack", "overflow" and "stack overflow"....which part of your phrase gets linked to where? It will happen!
My idea would be to query the titles like SELECT title FROM articles and simply check if each wikilink is in that array of strings. If it is you link to the page, if not, you link to the create page.
In a personal project I made with Sinatra (link text) after I run the content through Markdown, I do a gsub to replace wiki words and other things (like [[Here is my link]] and whatnot) with proper links, on each checking if the page exists and linking to create or view depending.
It's not the best, but I didn't build this app with caching/speed in mind. It's a low resource simple wiki.
If speed was more important, you could wrap the app in something to cache it. For example, sinatra can be wrapped with the Rack caching.
Based on my experience developing Juli, which is an offline personal wiki with autolink, generating static HTML approach may fix your issue.
As you think, it takes long time to generate autolinked Wiki page. However, in generating static HTML situation, regenerating autolinked Wiki page happens only when a wikipage is newly added or deleted (in other words, it doesn't happen when updating wikipage) and the 'regenerating' can be done in background so that usually I don't matter how it take long time. User will see only the generated static HTML.