Website directory for multi lingual site - html

Hi I'm building a website in dutch, and on the index.html page there is an english flag for visitors to click on to get an english version on the site.
I'm wondering, specifically keeping SEO in mind, whats good practice considering web directories.
The dutch is www.website.com/index.html
would the english be something along the line of www.website.com/index-en.html?
any light shed would be appreciated.

In first place Google recommend countries specific domain (but is so expensive).
So I recommend subdomains (en.website.com) but if you can not create subdomains also you can use directories like: website.com/en/
Take a look to https://support.google.com/webmasters/answer/182192?hl=en

Basically, Google will just take the default language to crawl which in this case is Dutch. If your website have separate English pages, not those using translator widget those sort, Google will also index it as separate pages. But in search engine, it will look like Dutch and English version of index.html next of another.
Have a look at this >
http://www.w3.org/International/questions/qa-html-language-declarations
Hyphen would be good but remember not to overuse. Refrain from using other symbols. Google regards overusing of symbols as spammy.

Related

Is there a way of showing an specific html file for my webpage depending on the users location?

I´m developing a webpage and it has some entries in which it would be interesting to force my web server to show an html file not only autotranslated (mainly between english and spanish) but also to load the specific social media links which are created in those languages (as there is an English Instagram version and another Spanish one, an English Twitter account and another for Spanish, and so on), depending on the physical location of the webpage users.
Thanks in advance.
In principle you could detect the location. But I am guessing that what you really want is to detect the language of the user. For this there are multiple ways, see e.g.: JavaScript for detecting browser language preference

How to make a link url go through another page when clicked HTML

I'm sorry I do not know how to word that title better. I have tried searching google but my terminology isn't helping my results.
Let me explain the context. When you're on a news website or blog and you're on their homepage like: www.homepage.co.uk/ and then you click an article it will go somewhere like this: www.homepage.co.uk/2017/article/ how do they make the 2017 appear? because if you remove the /article/ from the url it takes you to an archive of all the links in that year? I don't understand, is there a process to this?
When I click a link in my website it goes to: www.website.co.uk/link
I want to be able to have that 2017/link/ in the url so they can find the archive of that year just like on their websites?
How do I do this?
I am sorry if I am not explaining this very well.
I understand changing my filenames to : "2017/article.html" might work but I do not believe that is the correct way of doing it?
Thanks a lot for your time and suggestions!
You're asking about a couple of things: one is the taxonomy of the site. Taxonomy, if you don't know, is the "shape" of or how your site is organized. News sites, for instance, are usually organized by date and perhaps topic (Health and Leisure, Politics, Entertainment, etc.). The other aspect of your question is regarding what you might call RESful "hacking" of URLs. One of the tenents of REST is that URLS (uri, to be accurate) are supposed to be hackable. A news site might have /2017/10/10 to display all articles for Oct 10. Maybe you remove the last "10", and get all the articles for October so far. If you are not using a site platform that does this for you, you will have to maintain that taxonomy yourself, and manually write all the links. Systems such as Drupal and Joomla, among others, will translate your taxonomy into automatically-maintained links. In editing a page on one of these platforms, you typically only refer to the system's internal name of the page (could be a shortened version of the article's title in the above example), and the underlying engine takes care of reconstructing the URL for you (in case the page moves, or its tags/taxonomy changes).
This is a big topic, and I encourage you to do some further reading:
http://searchcontentmanagement.techtarget.com/feature/Building-a-website-taxonomy-in-eight-steps
https://www.drupal.org/docs/7/organizing-content-with-taxonomies/organizing-content-with-taxonomies

"Mark as favourite" feature for a web page

(I'm quite new to programming so forgive me for any incorrect terms! HTML and CSS are my strenghts.)
I'm currently working on a Joomla website for a music festival. One of its pages contains a schedule with a list of performing acts.
My ambitious goal is to build a feature that makes my website's users able to mark certain acts as their favourites. In practice, clicking an icon would give it a visual highlight or something like that. The ideal situation would be that the user shouldn't have to sign in to save one's choises. I guess the solution would have something to do with the browser's local storage?
Here's one example for what I mean. (This is NOT my site, just an example of something I'm looking for).
Can anyone help me to get started? Thanks in advance!
This extension, http://extensions.joomla.org/extension/my-shortlist , should help with little or no modifications to the template.
If the above doesn't help, then you can the JED (the Joomla Extensions Directory) for an extension that is better suited to your needs.

Finding number of pages of a website

I want to find the number of pages of a website. Usually what I look for is a sitemap but I just encountered a site which does not have a sitemap so I am out of ideas of how to find its total pages. I tried to Google the URL but that did not helped much. Is there any other way we can find out the pages of a website?
Thanks in advance.
Ask Google "site:yourdomain.com"
This gives you all indexed pages.
Or use the free tool "Xenu". It crawls the whole site. But it won't find sites which have no internal links pointing to them. You can also export a sitemap with it.
I was about to suggest the same thing :) If this is a website you own, you can also add it to the Google Webmaster tools. It will show you lots of things about your site including number of links, pages, search terms, etc Its very useful and is free of charge.
I have found a better solution myself. You can go to Google Advanced Search and restrict the search results to your domain name. Leave everything else empty. It would give you the list of all pages cached by Google.
You could also try A1 Website Analyzer. But for all link checker software, you will have to make sure you configure them correctly to obey/not-obey (whatever your needs are) e.g robots.txt, noindex and nofollow instructions. (Common source for confusion in my experience.)

Crawling data or using API

How these sites gather all the data - questionhub, bigresource, thedevsea, developerbay?
Is this legal to show data in frame as bigresource do?
#amazed
EDITED : fixed some spelling issues 20110310
How these sites gather all data- questionhub, bigresource ...
Here's a very general sketch of what is probably happening in the background at website like questionhub.com
Spider program (google "spider program" to learn more)
a. configured to start reading web pages at stackoverflow.com (for example)
b. run program so it goes to home page of stackoverflow.com and starts visiting all links that it finds on those pages.
c. Returns HTML data from all of those pages
Search Index Program
Reads HTML data returned by spider and creates search index
Storing the words that it found AND what URL those words where found at
User Interface web-page
Provides feature rich user-interface so you can search the sites that have been spidered.
Is this legal to show data in frame as bigresource do?
To be technical, "it all depends" ;-)
Normally, websites want to be visible in google, so why not other search engines too.
Just as google displays part of the text that was found when a site was spidered,
questionhub.com (or others) has chosen to show more of the text found on the original page,
possibly keeping the formatting that was in the orginal HTML OR changing the formatting to
fit their standard visual styling.
A remote site can 'request' that spyders do NOT go thru some/all of their web pages
by adding a rule in a well-known file called robots.txt. Spiders do not
have to honor the robots.txt, but a vigilant website will track the IP addresses
of spyders that do not honor their robots.txt file and then block that IP address
from looking at anything on their website. You can find plenty of information about robots.txt here on stackoverflow OR by running a query on google.
There is a several industries (besides google) built about what you are asking. There are tags in stack-overflow for search-engine, search; read some of those question/answers. Lucene/Solr are open source search engine components. There is a companion open-source spider, but the name eludes me right now. Good luck.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer. This goes for your other posts here too ;-)