I have a list of links for our site that point to my local dev environment. I need to make a valid sitemap according to the protocol here. http://www.sitemaps.org/protocol.php
I have created an initial version by hand that validates as XML, however, when I feed it into the Dustme Selectors Firefox extension, I am told that it is invalid.
I can't seem to find any tool online that will validate my sitemap via direct entry. They all ask for a URL to be pointed to. Plus they would all fail anyway because the URLs are not publicly accessible.
Does anyone know of a tool (online or not) that will accept direct input and either generate a sitemap from a list of URLs or at least validate the XML document that I have created?
You can just validate it against the XML schemas, as described here
http://www.sitemaps.org/protocol.php#validating
You can use http://www.timestampgenerator.com/tools/xml-sitemap-from-list/
And after that make some changes with the help of text editor
Related
I am trying to rename a file when downloading it from <a> tag.
Here a simple example:
Download Stackoverflow Logo
As you can see, it never downloads the file with stackoverflow.png name, it does with default name though.
Nevertheless, if I download the image and tried to do the same with a local route, it renames the file properly.
Another example:
Download Stackoverflow Logo
The example above works properly.
Why download html attribute only works using local routes?
Thanks in advance!
The attribute download works only for same origin URLs.
By the way, you really should learn to use proper terminology, or else people won't understand you:
<a href="https://i.stack.imgur.com/440u9.png" download="stackoverflow.png"> is a tag, specifically, an opening tag;
download is an attribute;
stackoverflow.png is the value of the attribute;
https://i.stack.imgur.com/440u9.png is a URL, sometimes called an URI or an address.
The entire construction Download Stackoverflow Logo is an element.
A "route" is something else entirely, and has no relationship with HTML.
I couldn't find any info of it, but seems like external resources aren't allowed renaming.
Have a look here, there's an example linking to google image and that doesn't work either - seems like the specs have changed along the way.
This is a security measure applied to cross-origin download requests where the server hosting the download does not use HTTP headers to explicitly mark the file as being for download.
From the HTML specification:
If the algorithm reaches this step, then a download was begun from a
different origin than the resource being downloaded, and the origin
did not mark the file as suitable for downloading, and the download
was not initiated by the user. This could be because a download
attribute was used to trigger the download, or because the resource in
question is not of a type that the user agent supports.
This could be dangerous, because, for instance, a hostile server could
be trying to get a user to unknowingly download private information
and then re-upload it to the hostile server, by tricking the user into
thinking the data is from the hostile server.
Thus, it is in the user's interests that the user be somehow notified
that the resource in question comes from quite a different source, and
to prevent confusion, any suggested file name from the potentially
hostile interface origin should be ignored.
New to the JSON world and I'm trying to find out how to view a JSON object of a webpage. Will every webpage have a JSON object and if so how do I find it in order to get the data and display it on my site? I vaguely remember something about using Firebug?
Thanks,
B
Will every webpage have a JSON object
No.
Many web sites will not use any JSON; many will be completely static (HTML and CSS only).
It may only apply if there is a "Web API" (for programmatic access to content), but there are non-JSON ways to do APIs (the X in AJAX is for XML).
To determine how to access a site programmatically look at the site's developer documentation. If there isn't any documentation then any AJAX web debuggers (like FireBug) show may well be internal only and intended only for the site's own implementation; other uses could well be not welcome (you could be up for violating IP).
This might become a vulnerability to add sensitive JSON to your final HTML page.. JSON should be loaded like an ingredient to the soup, via Ajax for example on authenticated page. If it's not sensitive JSON then you should load it for performance reasons once it is required... it really depends on your choice. I have built a library to handle these kind of requests for web, check it out: https://github.com/alexmano/jsMan
I am at this website -
http://www.zoominfo.com/s/#!search/company/1.64.eyJjb21wYW55TmFtZSI6xIB2YWx1xIw6ImEiLCJpc1VzZWTEjXRyxJN9fQ%3D%3D
If you see the company name - Agilent Technologies Inc.
Its neither there in page source, nor in any json format.
But it does show in the Dom of Chrome Developer tool.
I have looked and analysed almost every requests that it sent, but still couldn't find where this data is saved.
By where the data is saved - I am looking to find where I can scrape that data from?
If by using python-requests and BeautifulSoup
I do see an XMLHTTPREQUEST made, not sure what that means, or if that is the clue to my answer.
I am still learning python, and it would be a very useful information if someone helps me with this.
Thanks in advance.
After the HTML is loaded, js requests for the data through an XMLHTTPREQUEST which is loaded right after the request is received on your client. That's why you see the DOM element right there using element inspector.
You didn't mention what goal you want to achieve or what tool you are using. Please be specific on your question. If you do not have any idea about this kind of pattern, google out angularjs, see some example.
do see an XMLHTTPREQUEST made, not sure what that means, or if that is the clue to my answer.
It means that javascript embedded in the page is sending an extra HHTP request to the web server. It is likely that the "Agilent Technologies Inc." text is being returned in the server's response to that request, and the javascript in the page is then injecting the text into the DOM in the appropriate place.
Where is the Data stored on Website
That is a completely different question ...
(You have already noted that the data (e.g. the company name) gets injected into the page displayed by your browser.)
On the server side, the data could be stored in the web server (or its back-end systems) in a variety of ways. Or it might not be stored at all. There is no way of knowing ... without looking at the server-side code and configurations.
So, I've been trying to get a web page to display links to videos (over a symbolic link) dynamically (i.e., without hardcoding an <a></a> tag for each one) I have, and I think I may have found a solution, albeit a hacky one:
Video
Ignoring that this is a horrible way to do this, does anyone know how to format the following?:
I'm guessing there is an apache config file somewhere, but it is extremely hard to search for it as I do not know what it is called when files are just listed in this manner.
i'm basically looking to resize the widths of columns, and maybe even do some pretty-fication.
this is all running on my web/file server and is being accessed form my local machine.
This is what you're looking for:
http://perishablepress.com/better-default-directory-views-with-htaccess/
This tutorial details how directory listing by Apache can be modified to suit your taste using HTAccess file.
Using Apache HeaderName and ReadmeName directives and the module "mod_autoindex.c" you can add custom markup to your directory listing pages.
For displaying links to A/V and other files, look at my website: https://wrcraig.com/ApacheDirectoryDescriptions.
It goes beyond the default directory description, providing a spreadsheet to assist in creating detailed descriptions and exporting them in FancyIndex/AddDescription format for inclusion in .htaccess.
It also provides a menu driven BASH scripted alternative, using the FancyIndex descriptive data above (automatically adding A/V durations) to recursively populate a custom index.html while retaining the security features of .htaccess.
The site has examples of the input spreadsheet and both the FancyIndex output and the optional BASH scripted output.
Look at a random wikipedia article like http://en.wikipedia.org/wiki/Impostor_syndrome, I see that there's no .html attached to the end of the address. In fact, if I do try to put a .html after it, Wikipedia tells me "Wikipedia does not have an article with this exact name." How come it doesn't need any file extensions?
More a superuser question?
There is no law saying that an html file has to end in .html or .htm and since wiki generates pages from a database there is really no file page there anyway (except in a cache).
Not having .htm or .php is moresensible - why do you care what technology they use when you ask for a url? It would be like having to put the operating system of the recipient at the end of their email address.
if you make a call to a website it probably looks like
www.example.com/siteA/index.html
this request just tells the webserver you want to see a resource that is called index.html in siteA.
the website that runs on this server has to determine what you want to see and how the data is loaded.
index.html could be a file in the siteA directory
or
it can be row with the key "index.html" in the siteA-table in your database.
so the part siteA/index.html is just a resource identifier. the grammar of this resource identifier is completely free and is determined per website.
url rewriting is also common to make url easier to read and remember.
for example there could be a rewrite rule to accomplish the following:
if the user enters something like
www.example.com/download/demo.zip
rewrite it so your website sees it like:
www.example.com/download.php?file=demo.zip
Wikipedia's servers map the url to the page you want. .html is just a naming convention that, today is mostly historical from the period of static pages when urls actually were names of files on the server. In fact, there may be no file at all, where the server queries the database and a web framework sends out the html on the fly.
Wikipedia is most likely using the Apache module mod_rewrite in order to not have to link paths directly to a file system path.
See: http://en.wikipedia.org/wiki/Rewrite_engine#Web_frameworks
However programming languages can also take control of the incoming URLs and return data depending on the structure of the link according to some set of rules, for example the Django web framework employees a URL dispatcher.
That's because Wikipedia uses MediaWiki's feature of URL shortening.
Actually when you search for a file it really loads a php file. Try searching for a word that doesn't exist, for example "Pazaz". The URL is http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=pazaz . Notice index.php in the URL.
To tell the truth it's not a MediaWiki feature, it's Apache. For further info http://www.mediawiki.org/wiki/Manual:Short_URL .
URL routing is your answer for example in ASP read below source from
The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes:
Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them
Construct outgoing URLs that can be used to call back to Controllers/Actions (for example: form posts, links, and AJAX calls)
I would suggest that sites like this use some sort of Model View Controller framework similar to Ruby on Rails where the url 'directories' form a part of a request/url route...
In frameworks that are MVC based, the url 'directories' can dictate what View/Controller to utilise as well as what action should be taken with the data.
eg: shop.com/product/carrots
Where product is a view/controller and carrots is the data. The framework then analyses which action/route to take. Default could be viewing the product information and price of the carrot.