How to cryptographically verify web page requisites? - html

How to cryptographically verify web page requisites in HTML?
For example, if I have some external resource like an image, a style sheet or (most importantly) a script on a (potentially untrusted) content delivery network, is it possible to force the client browser to cryptographically verify the hash of the downloaded resource before usage? Is there some HTML attribute or URL scheme for this or does one manually have to write some JavaScript to do it?
The rationale is that providing the hashes in HTML served over HTTPS provides an extra defence against compromised (or faulty) CDN-s.
Related questions on SO:
How secure are CDNs for delivering jQuery?

As of 23 June 2016 Subresource Integrity is a W3C Recommendation which allows you to do just that (draft version here). According to the Implementation Report it is already implemented in Firefox 43 and Chrome 45.
A simple example using subresource integrity would be something like:
<script src="https://example.com/example.js"
integrity="sha256-8OTC92xYkW7CWPJGhRvqCR0U1CR6L8PhhpRGGxgW4Ts="
crossorigin="anonymous"></script>
It is also possible to specify multiple algorithm-hash pairs (called metadata) in integrity field, separated by whitespace and ignoring invalid data (§3.3.3). The client is expected to filter out the strongest metadata values (§3.3.4), and compare the hash of the actual data to the hash values in set of the strongest metadata values (§3.3.5) to determine whether the resource is valid. For example:
<script src="https://example.com/example.js"
integrity="
md5-kS7IA7LOSeSlQQaNSVq1cA==
md5-pfZdWPRbfElkn7w8rizxpw==
sha256-8OTC92xYkW7CWPJGhRvqCR0U1CR6L8PhhpRGGxgW4Ts=
sha256-gx3NQgFlBqcbJoC6a/OLM/CHTcqDC7zTuJx3lGLzc38=
sha384-pp598wskwELsVAzLvb+xViyFeHA4yIV0nB5Aji1i+jZkLNAHX6NR6CLiuKWROc2d
sha384-BnYJFwkG74mEUWH4elpCm8d+RFIMDgjWWbAyaXAb8Oo//cHPOeYturyDHF/UcnUB"
crossorigin="anonymous"></script>
If the client understands SHA256 and SHA384, but not MD5, then it tokenizes the value of the integrity attribute by whitespace and throws away the md5- metadata tokens as garbage. The client then determines that the strongest hashes in the metadata are SHA384 and compares their values to the SHA384 hash of the actual data received.

Related

Subresource Integrity and Nonce Values

Currently, for our web application, we are generating nonce values to attach to script tags. I have recently found out about Subresource Integrity and considering that we're using a CDN (as most examples reference), I was curious if this was something my web app should use.
Is there ever a case for both nonce and integrity attributes to be used? Is one better than the other? Or, do they support multiple use cases entirely?
Thanks
They support different use cases and you can use both.
Nonce instructs the browser to execute only <script> elements which have the same nonce value set in the CSP header.
nonce-*
A cryptographic nonce (only used once) to whitelist scripts. The server must generate a unique nonce value each time it transmits a
policy. It is critical to provide a nonce that cannot be guessed as
bypassing a resource's policy is otherwise trivial. This is used in
conjunction with the script tag nonce attribute. e.g.
nonce-DhcnhD3khTMePgXwdayK9BsMqXjhguVV
So let's say your application set a Content-Security-Policy header like script-src'nonce-r4nd0m'; then the script at good.com/good.js will be executed because the nonce value is the same.
<script nonce="r4nd0m" src="//good.com/good.js">
What happens if an attacker compromises good.com and add a malicious script to good.js? That your web application still allows the execution of that script because the check is made on the nonce value not on the script content. So you need also to be sure that the content of good.js remains the same.
Here integrity attribute is involved. It implements Subresource Integrity and tells to the browser to run resources only if the computed hash matches with the one stored in the integrity attribute.
Subresource Integrity (SRI) is a security feature that enables
browsers to verify that resources they fetch (for example, from a CDN)
are delivered without unexpected manipulation. It works by allowing
you to provide a cryptographic hash that a fetched resource must
match.
So let's suppose the first time you included the script in the web app, the content of the script was safe and the integrity value was X. Then you added integrity="sha384-X" to the script element as follows.
<script src="//good.com/good.js"
integrity="sha384-X">
The attacker modifies good.js so the resulting hash of the modified script becomes Y. The browser doesn't run the script because the computed hash (Y) and the required hash (X) don't match.
I think you can combine both like this.
<script nonce="r4nd0m" integrity="sha384-X" src="//good.com/good.js">

ETags for server-side rendered pages that contain CSP nonce

I have a server-side-rendered React app and Node/Express so far were able to generate the correct, stable ETags, allowing for taking advantage of client-side caching.
Additionally, generated HTML contains fragments of render-blocking (above-the-fold) CSS and JS inlined as <script> and <style> tags for faster client-side first renders (as promoted by Google and its PageSpeed and Lighthouse tools).
Now I want to enable Content Security Policy (CSP) and I provide a nonce as an attribute to those <script> and <style> tags on every page request, to avoid unsafe-inline violations. However, ever-changing nonce makes ETags to change on every request as well. HTML is never cached and every request hits my Express server.
Is there a way to combine simultaneously:
inlined CSS and JS
CSP features (that is nonce, or similar)
ETags or alternatives
?
So far I see a contradiction between current performance vs security guidelines.
Are there equivalents to CSP nonce or can CSP nonce be provided while keeping HTML intact? Is there a way to otherwise cache pages that contain CSP nonce?
Ideally, I would like a solution to be contained within the Express server, without resorting to tinkering with my reverse proxy config, but any options are welcome.
One solution is to leave the whole content generation and caching to web application (Node in your case) and CSP nonce generation to front-end webserver (e.g. Nginx). I have implemented it with Django which does page caching with ETag, does all the Vary header logic etc and the HTML it produces contains such a static CSP nonce placeholder:
< script nonce="+++CSP_NONDE+++"> ... </script>
This placeholder is then filled in by Nginx using ngx_http_subs_filter_module:
sub_filter_once off;
sub_filter +++CSP_NONCE+++ $ssl_session_id;
add_header Content-Security-Policy "script-src 'nonce-$ssl_session_id'";
I have seen solutions using an additional Nginx module to generate a truly unique random nonce for each request but I believe it's an overkill and I'm just using TLS session identifier, which is unique per each connecting client and may be cached for some time (e.g. 10 minutes) depending on your Nginx configuration.
Just make sure the web application returns uncompressed HTML as Nginx won't be able to do string substitution.

What are the integrity and crossorigin attributes?

Bootstrapcdn recently changed their links. It now looks like this:
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"
rel="stylesheet"
integrity="sha256-MfvZlkHCEqatNoGiOXveE8FIwMzZg4W85qfrfIFBfYc= sha512-dTfge/zgoMYpP7QbHy4gWMEGsbsdZeCXz7irItjcC3sPUFtf0kuFbDz/ixG7ArTxmDjLXDmezHubeNikyKGVyQ=="
crossorigin="anonymous">
What do the integrity and crossorigin attributes mean? How do they affect the loading of the stylesheet?
Both attributes have been added to Bootstrap CDN to implement Subresource Integrity.
Subresource Integrity defines a mechanism by which user agents may verify that a fetched resource has been delivered without unexpected manipulation Reference
Integrity attribute is to allow the browser to check the file source to ensure that the code is never loaded if the source has been manipulated.
Crossorigin attribute is present when a request is loaded using 'CORS' which is now a requirement of SRI checking when not loaded from the 'same-origin'.
More info on crossorigin
More detail on Bootstrap CDNs implementation
integrity - defines the hash value of a resource (like a checksum) that has to be matched to make the browser execute it. The hash ensures that the file was unmodified and contains expected data. This way browser will not load different (e.g. malicious) resources. Imagine a situation in which your JavaScript files were hacked on the CDN, and there was no way of knowing it. The integrity attribute prevents loading content that does not match.
Invalid SRI will be blocked (Chrome developer-tools), regardless of cross-origin. Below NON-CORS case when integrity attribute does not match:
Integrity can be calculated using: https://www.srihash.org/
Or typing into console (link):
openssl dgst -sha384 -binary FILENAME.js | openssl base64 -A
crossorigin - defines options used when the resource is loaded from a server on a different origin. (See CORS (Cross-Origin Resource Sharing) here: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS). It effectively changes HTTP requests sent by the browser. If the “crossorigin” attribute is added - it will result in adding origin: <ORIGIN> key-value pair into HTTP request as shown below.
crossorigin can be set to either “anonymous” or “use-credentials”. Both will result in adding origin: into the request. The latter however will ensure that credentials are checked. No crossorigin attribute in the tag will result in sending a request without origin: key-value pair.
Here is a case when requesting “use-credentials” from CDN:
<script
src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/js/bootstrap.min.js"
integrity="sha384-vBWWzlZJ8ea9aCX4pEW3rVHjgjt7zpkNpZk+02D9phzyeVkE+jo0ieGizqPLForn"
crossorigin="use-credentials"></script>
A browser can cancel the request if crossorigin incorrectly set.
Links
https://www.w3.org/TR/cors/
https://www.rfc-editor.org/rfc/rfc6454
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link
Blogs
https://frederik-braun.com/using-subresource-integrity.html
https://web-security.guru/en/web-security/subresource-integrity
Technically, the Integrity attribute helps with just that - it enables the proper verification of the data source. That is, it merely allows the browser to verify the numbers in the right source file with the amounts requested by the source file located on the CDN server.
Going a bit deeper, in case of the established encrypted hash value of this source and its checked compliance with a predefined value in the browser - the code executes, and the user request is successfully processed.
Crossorigin attribute helps developers optimize the rates of CDN performance, at the same time, protecting the website code from malicious scripts.
In particular, Crossorigin downloads the program code of the site in anonymous mode, without downloading cookies or performing the authentication procedure. This way, it prevents the leak of user data when you first load the site on a specific CDN server, which network fraudsters can easily replace addresses.
Source: https://yon.fun/what-is-link-integrity-and-crossorigin/

HTML Hyperlinks using url query strings

Some sites use hyper links like:
www.example.com\index.php?x=test.php
www.example.com\index.php?test.php
www.example.com\index.php?\test.php
instead of simply:
www.example.com\test.php
Are there any advantages of linking to other pages using query strings instead of simple hyperlinks.
When to use them
If the order is irrelevant or if they can be combined in different ways. In many cases like pagination not all query strings are needed to get the desired result. So the are usually always optional.
Advantages
They do not have the encoding issues of named params and are pretty much the way all other web apps also use such volatile params. So it is the de facto standard for web apps.
With query strings the short-routed action names (usually all index actions) don’t pop up anymore as long as you don’t use passed params:
URLs are passed in Referrer headers – if a secure page uses resources, such as javascript, images or analytics services, the URL is passed in the Referrer request header of each embedded request. Sometimes the query string parameters may be delivered to and stored by third party sites.
It can be useful wile posting a data on a webservice, or u can even pass a session ID along with the URL in query parameters provided they are encoded.
The type of notation that you mentioned earlier is aided by the concept called url rewriting.
Many php frameworks these days use MVC architecture for code organisation into three tiers. This enhance code scalability and web application's security.
All the requests to the server are directed to index.php where they are resolved to load a particular action, thus hiding the background layout of the code.
Here test.php can be present either at the root or inside some other folder depending upon the location specified to the routing algorithms that are called in index.php

What are REST resources?

What are REST resources and how do they relate to resource names and resource representations?
I read a few articles on the subject, but they were too abstract and they left me more confused than I was before.
Is the following URL a resource? If it is, what is the name of that resource and what is its representation?
http://api.example.com/users.json?length=2&offset=5
The GET response of the URL should look something like:
[
{
id: 6,
name: "John"
},
{
id: 7,
name: "Jane"
}
]
The reason why articles on REST resources are abstract is because the concept of a REST resource is abstract. It's basically "whatever thing is accessed by the URL you supply". So, in your example, the resource would be the list of two users starting at offset 5 in some bigger list. Note that, how the resource is implemented is a detail you don't care about unless you are the one writing the implementation.
Is the following URL a resource?
The URL is not a resource, it is a label that identifies the resource, it is, if you like, the name of the resource.
The JSON is a representation of the resource.
What’s a Resource?
A resource is anything that’s important enough to be referenced as a
thing in itself. If your users might “want to create a hypertext link
to it, make or refute assertions about it, retrieve or cache a
representation of it, include all or part of it by reference into
another representation, annotate it, or perform other operations on
it”, then you should make it a resource.
Usually, a resource is something that can be stored on a computer and
represented as a stream of bits: a document, a row in a database, or
the result of running an algorithm. A resource may be a physical
object like an apple, or an abstract concept like courage, but (as
we’ll see later) the representations of such resources are bound to be
disappointing. Here are some possible resources:
Version 1.0.3 of the software release
The latest version of the software release
The first weblog entry for October 24, 2006
A road map of Little Rock, Arkansas
Some information about jellyfish
A directory of resources pertaining to jellyfish
The next prime number after 1024
The next five prime numbers after 1024
The sales numbers for Q42004
The relationship between two acquaintances, Alice and Bob
A list of the open bugs in the bug database
The text is from the O'Reilly book "RESTful Web Services".
The URL is never a resource or its name or its representation.
URL just tells where the resource is located and You can invoke GET,POST,PUT,DELETE etc on this URL to invoke the resource.
Data responded back are the resources while the form of the data is its representation.
Let's say Your URL with given GET parameters can output a JSON resource - this is the JSON representation of this resource. While with other flag in the GET it could respond with the same data in XML - that will be another representation of the very same resource.
EDIT: Due to the comments to the OP and to my answer I'm adding another explanations.
Also the resource name is considered to be the 'script name', e.g. in this case it is users.json while this resource name is self describing the resource representation itself - when calling this resource we expect the resource is in JSON, while when calling e.g. users.xml we would expect the data in XML.
When I change the offset parameter in GET the response contains
different data set - is it a new resource or its representation?
When I define which columns are returned in response in GET, is it a different resource or different representation, or?
Well, here the problem and answer are clear - we still call the same URL, the server responses with the data in the same form (still it is JSON), data still contains information about users - just the information itself has changed due to the new offset parameter. So it is obvious that it is still the same resource with the same representation and the same resource name as before.
Second problem could be a little confusing. Though we are calling the same resource, though the resource contains the same data (just with only predefined column set) and though the data is in the same representation it could seem to us as a different resource. But due to the points in the paragraph above it is nor the different resource or different representation. Though the data set contains less information the requesting side (filtering this data set) should be considering this and behave accordingly. So again: it is the same resource with the same resource name and the same resource representation.
REST
This architectural style was defined in the chapter 5 of Roy T. Fielding's dissertation.
REST is about resource state manipulation through their representations on the top of stateless communication between client and server. It's a protocol independent architectural style but, in practice, it's commonly implemented on the top of the HTTP protocol.
Resources
The resource itself is an abstraction and, in the words of the author, a resource can be any information that can be named. The domain entities of an application (e.g. a person, a user, a invoice, a collection of invoices, etc) can be resources. See the following quote from Fielding's dissertation:
5.2.1.1 Resources and Resource Identifiers
The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. [...]
Resource representations
A JSON document is resource representation that allows you to represent the state of a resource. A server can provide different representations for the same resource. For example, using XML and JSON documents. A client can use content negotiation to request different representations of the same resource.
Quoting Fielding's dissertation:
5.2.1.2 Representations
REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.
A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation. [...]
Over HTTP, request and response headers can be used to exchange metadata about the representation.
Resource identifiers
A URL a resource identifier that identifies/locates a resource in the server.
This answer may also be insightful.
What are REST resources and how do they relate to resource names and resource representations?
REST doesn't mean a great deal more then you use HTTP verbs (GET, POST, PUT, DELETE, etc) properly.
Is the following URL a resource?
All URLs are strings that tell computers where a resource can be located. (Hence the name: Uniform Resource Locator).
A resource is:
a noun
that is unique
and can be represented as data
and has at least one URI
I go into more detail on my blog post, What, Exactly, Is a RESTful Resource?
Representational State Transfer (REST) is a style of software architecture for distributed systems such as the World Wide Web. REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. Resources are a set of addressable objects, basically files and documents, linked using URLs. As correctly pointed out above by Quentin, REST archiecture simply implies that you'd use the HTTP verbs GET/POST/PUT/DELETE...
Conceptually you can think about a resource as everything which is accessible on the web using an URL.
If you stick to this rule http://api.example.com/users.json?length=2&offset=5 can be considered a resource
You've only provided what appear to be relative parameters rather than "ID" which is (or should be) concrete. Remember, get operations should be idempotent (i.e. repeatable with the same outcome).
What is REST?
REST is an architecture Style which stands for Representational(RE) State(S) transfer(T).
What is REST Resource ?
Rest Resource is data on which we want to perform operation(s).So this data can be present in database as record(s) of table(s) or in any other form.This record has unique identifier with which it can be identified like id for Employee.
Now when this data is requested by unique url like http://www.example.com/employees/123,so ultimately data or record which is present in database will be converted to JSON/XML/Plain text format by Rest Service and will be sent to Consumer.
So basically,what is happening here is REPRESENTATIONAL STATE TRANSFER, in a way that state of the data present in database is transferred to another format which can be JSON/XML or plain text.
So in this case 1 employee represents 1 resource which can be accessed by unique url like http://www.example.com/employees/123
In case we want to get list of all resources(employees),we will do:
http://www.example.com/employees
Hope this will help.
REST stands for REpresentational State Transfer. It's a method of transferring variable information from one location to another. A common method of doing this is using JSON - a way of formatting your variables so that they can be transferred without loss of information.
PHP, for example, has built in JSON support. Passing a PHP array to json_encode($array) will output a string in the format you posted (which by the way, is indeed a REST resource, because it gives you variables and information).
In PHP, the array you posted would turn out as:
Array (
[0]=>Array (
['id']=>6;
['name']=>'John';
)
[1]=>Array (
['id']=>7;
['name']=>'Jane';
)
)