How can I get Facebook's open graph scraper to successfully parse my single-page website hosted via GitHub Pages? - html

One of my friend set up a simple one-page website and asked me to help to integrate open graph metadata so that sharing on Facebook provides a better user experience.
Unfortunately, Facebook doesn't recognize some values and Facebook's URL Debugger doesn't really help, cause it shows stuff from the registrar by default and fails with the error message Error parsing input URL, no data was cached, or no data was scraped. when I click on the Fetch new scrape information button. Also, when I click on See exactly what our scraper sees for your URL, I get the following error Document returned no data.
The URL is: http://know-your-limits.com/. The registrar is Gandi and the site is hosted on GitHub. The DNS configuration is as follows:
dig know-your-limits.com +nostats +nocomments +nocmd
; <<>> DiG 9.8.3-P1 <<>> know-your-limits.com +nostats +nocomments +nocmd
;; global options: +cmd
;know-your-limits.com. IN A
know-your-limits.com. 10771 IN A 192.30.252.153
Is there something I could do to fix this on something I have control on (ie registrat configuration, GitHub repository update, HTML update) as opposed to stuff I don't have control on (ie GitHub web server)?
Do you think it is a bug with GitHub hosting?

Related

How to create a W3C-validation-link for local files?

I have some local (self generated) HTML files. I view them in browser via file:// without a local webserver.
I would like to have a link in the footer of each of this files to "Validate" them on the W3C Validator. On some websites I see the links like this in source
<p class="validation">Validate</p>
But of course this doesn't work because there is no referer about local files.
I was asking this on ProWebmaster but the question was out of scope there.
EDIT: File Upload to the validator website via its web formular is not an option. I would like to send the whole HTML source to the validator without any external tools.
Validate
You can't.
The was the referer link works is to tell the validation service to look at the Referer header to get the URL to the previous page. Then to request that page and validate it.
Even if there was a Referer header for your local file, there is no way for the validation service to access it. It would be a serious security problem if every website you visited could read files from your hard disk freely!
Use the file upload feature to validate files without a public URL.
Alternatively, consider using a local validation tool and possibly looking for an extension to your IDE that makes it more convenient to access (such as this extension for VS Code).

get a json file that a webpage is made of

I'm not familiar with web development but I believe this web page text content
https://almath123.github.io/semstyle_examples/
is made of two JSON files mentioned in it (semstyle_results.json and semstyle_results.json) and the JSON files are completely present in ram (If this is the correct term for referring to it) because when I disconnect the internet I can still browse the page and see the text content.
I want to download semstyle_results.json file. Is that possible? how can I do that?
Technically if you visit a website you're "downloading" the content. Your browser sends a request for information and a server responds by sending you the information. You're viewing that information locally. Dynamic sites poll or make further requests as you browse to keep the data updated and relevant, but it's sent to you.
If you want to easily download any of the content from the website, a simple way is to open up the development tools (CTRL + SHFT + I on windows for Firefox and Chrome), go to a source file and click save as. The network tab shows you requests that were made which includes not just files such as json but also the details of the request.
Here is a screenshot locating one of the json files in a Chrome-based browser (Brave)
The webpages may not always show that they will support json or xml return of data. For example if you inspect this webpage SEC EDGAR database using the method described above, it shows no json link but if you append index.json at the end of the link it will return the same data in json format or xml format, if you so please.
i.e: same website but with json endpoint
So it is always a good idea to see if the website hosts developer information. For example SEC EDGAR provides developer tools that mentions that the directory structure can be accessed via HTML, XML or JSON.
SEC developer information

custom 404 not found page, how to make it?

Making updates on my website, there are a lot of pages that I don't use left. So I delete them.
Unfortunately some slight idexing has been made by search engine so when u type the name of website of mine it appears non more existent pages too in browser results.
I need to create a custum 404 page not found that appears everytime people go on pages that doesn't exist, respecting google SEO policy and w3c standards.
Unfortunately I can't.
Someone could teach me please?
Make a .html document in your webserver or website's directory in the htdocs then make a new folder that is called "err", then upload all your Error pages like 404, The page cannot be found or 403, Forbidden. Then place those files in the err folder and re-name them to the error codes. If you are using cPanel then search (Using the search bar) or find (By browsing the tools listed) Error Pages. Go from there with the instructions given on cPanel.
Hope this helps you,
Jay Salway (13 year old developer)
Create a static page containing your custom message and anything else you want (eg. site layout etc.) and save it somewhere appropriate within your site (eg. from the root: /errors/404.asp). Within that page make sure you write a 404 response header (eg. Response.Status = "404 Page Not found")
In IIS (the option will also be available somewhere similar under Apache if you are running that) open up the settings for your website and choose 'Error Pages' then look for the status code 404 (by default there should be one but you may need to create it) open that up and choose the option 'Execute a URL on this site' and enter the url chosen above /error/404.asp)

Executing Hudson jobs remotely

I am trying to automate Hudson by hitting the appropriate urls remotely. I am using python's urllib2 for doing the same.
First of all , I am trying to build an existing job and get the build status.
A sample url for the build would look like this:
http://tomcaturl:8080/hudson/job/.NET%20Build/build
However this returns to me html data.
Hudson docs say that I can get data in python/json/xml format, so I try to hit
http://tomcaturl:8080/hudson/job/.NET%20Build/build/api/json
But I get no data at all, although the build happens successfully.
Is there a way to find out which build was started by my remote build request, so that I can maintain a one-to-one mapping.
Please note that I am doing this through a remote python program and I DO NOT have access to hudson GUI.
First of all, if you have any security/login enabled you have to be logged in to the remote hudson server for the /job/JobName/build. If you allow starting the build without being logged in, this is not a problem.
The /job/JobName/build request will return html data. If you are not logged in you will get a repsonse redirecting to the login page and the build will not be started. If the request is successful you will not get a redirect to the login, and you can assume the build was queued. You can also check the build queue using the api url of the project (see below). Note that there may be a delay before the build is started, which you can control by calling /job/JobName/build?delay=0sec
The API is not available under the job/JobName/build url, but you can see api information here:
http://tomcaturl:8080/hudson/job/.NET%20Build/api
Most pages in hudson that shows information (about a project, a specific build and so on) has an api page if you append /api/xml or /api/json to the end of the url.
The reason /job/JobName/build doesn't have an api page is simply because it isn't an url to an information page.
Example api requests:
xml call for information about the project:
http://tomcaturl:8080/hudson/job/.NET%20Build/api/xml
json call for information about the last successful build of the project:
http://tomcaturl:8080/hudson/job/.NET%20Build/lastSucessfulBuild/api/json

Why doesn't Wikipedia have extensions?

Look at a random wikipedia article like http://en.wikipedia.org/wiki/Impostor_syndrome, I see that there's no .html attached to the end of the address. In fact, if I do try to put a .html after it, Wikipedia tells me "Wikipedia does not have an article with this exact name." How come it doesn't need any file extensions?
More a superuser question?
There is no law saying that an html file has to end in .html or .htm and since wiki generates pages from a database there is really no file page there anyway (except in a cache).
Not having .htm or .php is moresensible - why do you care what technology they use when you ask for a url? It would be like having to put the operating system of the recipient at the end of their email address.
if you make a call to a website it probably looks like
www.example.com/siteA/index.html
this request just tells the webserver you want to see a resource that is called index.html in siteA.
the website that runs on this server has to determine what you want to see and how the data is loaded.
index.html could be a file in the siteA directory
or
it can be row with the key "index.html" in the siteA-table in your database.
so the part siteA/index.html is just a resource identifier. the grammar of this resource identifier is completely free and is determined per website.
url rewriting is also common to make url easier to read and remember.
for example there could be a rewrite rule to accomplish the following:
if the user enters something like
www.example.com/download/demo.zip
rewrite it so your website sees it like:
www.example.com/download.php?file=demo.zip
Wikipedia's servers map the url to the page you want. .html is just a naming convention that, today is mostly historical from the period of static pages when urls actually were names of files on the server. In fact, there may be no file at all, where the server queries the database and a web framework sends out the html on the fly.
Wikipedia is most likely using the Apache module mod_rewrite in order to not have to link paths directly to a file system path.
See: http://en.wikipedia.org/wiki/Rewrite_engine#Web_frameworks
However programming languages can also take control of the incoming URLs and return data depending on the structure of the link according to some set of rules, for example the Django web framework employees a URL dispatcher.
That's because Wikipedia uses MediaWiki's feature of URL shortening.
Actually when you search for a file it really loads a php file. Try searching for a word that doesn't exist, for example "Pazaz". The URL is http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=pazaz . Notice index.php in the URL.
To tell the truth it's not a MediaWiki feature, it's Apache. For further info http://www.mediawiki.org/wiki/Manual:Short_URL .
URL routing is your answer for example in ASP read below source from
The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes:
Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them
Construct outgoing URLs that can be used to call back to Controllers/Actions (for example: form posts, links, and AJAX calls)
I would suggest that sites like this use some sort of Model View Controller framework similar to Ruby on Rails where the url 'directories' form a part of a request/url route...
In frameworks that are MVC based, the url 'directories' can dictate what View/Controller to utilise as well as what action should be taken with the data.
eg: shop.com/product/carrots
Where product is a view/controller and carrots is the data. The framework then analyses which action/route to take. Default could be viewing the product information and price of the carrot.