I have a newbie question about IPFS content. I can request a content with hash address. Is that is the content returned by the hash address always encrypted? Or the content returned can be either encrypted or plain without encryption and if encrypted, then private key is required to decode and view the content.
Content by default is never encrypted on IPFS unless you explicitly encrypt it before adding it to an IPFS node. So whether or not the content returned by the "hash address", also known as a CID (Content Identifier) will be encrypted entirely depends on whether or not it was encrypted before adding it to an IPFS node. There is also no way to know whether or not the content is encrypted just by looking at the CID.
In order to properly examine the content referenced by a CID, you will need to know the format of the content and how it was generated. Currently most stuff that exists on IPFS is likely to be of type UnixFS. Although it is entirely possible that content is stored as arbitrary types using IPLD. If you attempt to examine content referenced by a CID without knowing the format of the content, you will most certainly receive errors.
EDIT:
One thing I should clarify, you may very well be using an encrypted transport to receive the data, but the data itself wont be encrypted. IPFS uses a few different transports like websockets, tls, secio, plaintext, etc.. So if your talking to a node that hosts QmA which supports the tls transport, the node could send you the data over tls.
Related
How to cryptographically verify web page requisites in HTML?
For example, if I have some external resource like an image, a style sheet or (most importantly) a script on a (potentially untrusted) content delivery network, is it possible to force the client browser to cryptographically verify the hash of the downloaded resource before usage? Is there some HTML attribute or URL scheme for this or does one manually have to write some JavaScript to do it?
The rationale is that providing the hashes in HTML served over HTTPS provides an extra defence against compromised (or faulty) CDN-s.
Related questions on SO:
How secure are CDNs for delivering jQuery?
As of 23 June 2016 Subresource Integrity is a W3C Recommendation which allows you to do just that (draft version here). According to the Implementation Report it is already implemented in Firefox 43 and Chrome 45.
A simple example using subresource integrity would be something like:
<script src="https://example.com/example.js"
integrity="sha256-8OTC92xYkW7CWPJGhRvqCR0U1CR6L8PhhpRGGxgW4Ts="
crossorigin="anonymous"></script>
It is also possible to specify multiple algorithm-hash pairs (called metadata) in integrity field, separated by whitespace and ignoring invalid data (§3.3.3). The client is expected to filter out the strongest metadata values (§3.3.4), and compare the hash of the actual data to the hash values in set of the strongest metadata values (§3.3.5) to determine whether the resource is valid. For example:
<script src="https://example.com/example.js"
integrity="
md5-kS7IA7LOSeSlQQaNSVq1cA==
md5-pfZdWPRbfElkn7w8rizxpw==
sha256-8OTC92xYkW7CWPJGhRvqCR0U1CR6L8PhhpRGGxgW4Ts=
sha256-gx3NQgFlBqcbJoC6a/OLM/CHTcqDC7zTuJx3lGLzc38=
sha384-pp598wskwELsVAzLvb+xViyFeHA4yIV0nB5Aji1i+jZkLNAHX6NR6CLiuKWROc2d
sha384-BnYJFwkG74mEUWH4elpCm8d+RFIMDgjWWbAyaXAb8Oo//cHPOeYturyDHF/UcnUB"
crossorigin="anonymous"></script>
If the client understands SHA256 and SHA384, but not MD5, then it tokenizes the value of the integrity attribute by whitespace and throws away the md5- metadata tokens as garbage. The client then determines that the strongest hashes in the metadata are SHA384 and compares their values to the SHA384 hash of the actual data received.
When an image is uploaded from the client's machine to the client (browser), it requires FileReader() API in html, thereafter a base64 encoded url (say) of the image is sent in chunks to the server, where it needs to be re-assembled. All of this is taken care by the developer.
However, when an image is sent from the server to the client, only the directory path of the image inside the server machine suffices, no chunking and encoding is required.
My questions are:
1. Does the server send the image in chunks to the html file. If it does not, how does sending full images not bottle server's network? What would happen in case of large video files?
2. In what form of binary data is the image sent to the client - base64url / ArrayBuffer / binarystring / text / etc.
3. If the server does send the image in chunks, who is doing the chunking and the re-assembly on the client thereafter?
Thanks.
HTML isn't really important here. What you care about are the transport protocols used - HTTP and TCP, most likely.
HTTP isn't chunked by default, though there are advanced headers that do allow that - those are mostly used to support seeking in large files (e.g. PDF, video). Technically, this isn't really chunking - it's just the infrastructure for allowing for partial downloads (i.e. "Give me data from byte 1024 to byte 2048.").
TCP is a stream-based protocol. From programmer point of view, that's all there is to it - no chunking. Technically, though, it will process your input data and send it as distinct packets that are reassembled in-order on the other side. This is a matter of practicality - it allows you to manage data latency, streaming, packet retransmission etc. TCP handles the specifics during connection negotiation - flow control, window sizing, congestion control...
That's not the end of it, though. All the other layers add their own bits - their own ways to package the payload and split it as necessary, their own ways to handle routing and filtering, their own ways to handle congestion...
Finally, just like HTTP natively supports downloads, it supports uploading data as well. Just send an HTTP request (usually POST or PUT) and write data in a format the server understands - anything from text through base-64 to raw binary data. The limiting factor in your case isn't the server, the browser or HTTP - it's JavaScript. The basic mechanism is still the same - a request followed by a response.
Now, to address your questions:
Server doesn't send images to the HTML file. HTML only contains an URL of the image[1], and when the browser sees an URL in the img tag, it will initiate a new, separate request just for the image data. It isn't fundamentally different from downloading a file from a link. As for the transfer itself, it follows pretty much exactly the same way as the original HTML document - HTTP headers, then the payload.
Usually, raw binary data. HTTP is a text-based protocol, but it's payload can be arbitrary. There's little reason to use Base-64 to transfer image data (though in the past, there have been HTTP and FTP servers that didn't support binary at all, so you had to use something like Base-64).
The HTTP server doesn't care (with the exception of "partial downloads" mentioned above). The underlying network protocols handle this.
[1] Nowadays, there's methods to embed images directly in the HTML text, but it's of varying practicality depending on the image size, caching requirements etc.
For our last week in school (finals next week) our teacher decided to give us a crash course in Perl. We talked about all the differences we would encounter if we used Perl and then we started talking about "spoofing".
We were given an HTML example where a user could input their first and last names. Of course our example already had Mickey as the first name and Mouse as the last name.
<form action="action_page.php">
First name:<br>
<input type="text" value="Mickey">
<br>
Last name:<br>
<input type="text" name="lastname" value="Mouse">
<br><br>
<input type="submit" value="Submit">
</form>
At the end when you hit submit you were redirected to a new screen that said your first name is Mickey and your last name is Mouse.
Our teacher said "spoofing" is when you change the method = get in the URL so instead of having
firstname=Mickey&lastname=Mouse
you would enter something like
firstname=baseball&lastname=bat
That would instantly alter the intended command and you would end up getting first name as baseball and lastname as bat.
This all sounds pretty straight forward, until he said he wanted us to write a program to prevent spoofing without using a post method.
Instead when a user attempts to spoof the system we're supposed to print out some anti-spoofing comment.
Unfortunately, we never really talked about spoofing aside from the examples. I've attempted to Google spoofing to see some example code, or at least understand this concept, but I haven't had much luck, or I haven't looked in the right places.
So I thought I would ask here. Can someone who is decent at Perl direct me towards basic anti-spoofing programs and content, or at least explain and show how spoofing is supposed to work.
What you need to do is to authenticate the data in the query string, and validate it when you receive it. There is a standard tool(set) for this: a cryptographic Message Authentication Code (MAC).
Basically, a MAC is a function that takes in a message (any arbitrary string) and a secret key, and outputs a random-looking token that depends, in a complicated way, on both the message and the key. Importantly, it is effectively impossible to compute a valid MAC token for a modified message without knowing the key.
To validate a query string (or some other data) with a MAC, you'd basically follow these steps:
Encode the data into a "canonical" form as a string. For an HTTP URL, you could just use the query string (and/or the entire URL) as it is, although you may wish to normalize it e.g. by %-decoding any characters that don't have to be encoded, and normalizing the case of any %-encoded values (e.g. %3f → %3F).
Alternatively, you could decode the query string into, say, an associative array, and serialize this array in a format of your choice. This can make it easier to combine parameters from multiple sources (e.g. hidden form fields), to add extra data fields (see below) and to choose which fields you want to validate.
Optionally, combine the data with any additional information you wish to associate it with, such as a user ID and/or a timestamp. (You can either transmit the timestamp explicitly, or just round it down to, say, the last hour, and check both the current and the previous timestamp when validating it.) Changing any of these values will change the MAC output, thus preventing attackers from e.g. trying to submit one user's data under another user's account.
Store a secret key (preferably, a securely generated random value of, say, 128 bits) on the server. Obviously, this secret key must be stored so that users cannot access it (e.g. by guessing the path to the config file).
Feed the canonically encoded data and the secret key into the MAC algorithm. Take the result and (if your MAC library doesn't do this for you) encode it in some convenient matter (e.g. using the URL-safe Base64 variant).
Append the encoded MAC token as an extra parameter in the URL.
When you receive the data back, remove the MAC token, feed the rest of the data back into the MAC generation code as described above, and check that the resulting MAC matches the one you received.
MAC algorithms can be constructed from cryptographic hash functions like MD5 or SHA-1/2/3. In fact, a basic MAC can be obtained simply by concatenating the secret and the message, hashing them, and using the result as the token.
For some hash functions, like SHA-3, the simple MAC construction described above is actually believed to be secure; for older hash functions, which were not explicitly designed with this use in mind, however, it's safer to use the (slightly) more complicated HMAC construction, which hashes the input twice.
Alternatively, there are also MAC algorithms, such as CMAC, which are based on block ciphers (like AES) instead of hash functions. In some cases (e.g. on embedded platforms, where a fast hash function may not be available) these may be more efficient than HMAC; for a web application, however, the choice is essentially a matter of taste.
One difference between GET and POST is that the information for the former is passed in the URL itself. That means you can type what you like in the browser's address bar -- it doesn't have to have come from an HTML form. I think that's what is meant by spoofing here.
The most obvious protection is to calculate a CRC of all the protected fields -- in this case MickeyMouse -- and put that value in a hidden field of the HTML form sent out by the server. Then, when the request comes back, calculate the CRC of the same fields and check that it matches the value of the returned hidden field.
Of course that can be circumvented if the user works out how the protection functions and adds his own calculation of the CRC of his spoofed data as well. But this should be sufficient for a proof of concept.
If you want to detect if a user has changed a parameter in the querystring of a url after a form has performed a GET action, then generate a client side hash before the form is submitted. The hash would be based on the values of the form fields, and then compared to a recalculated hash based on the current parameter values on the response page. If the hashes don't match the querystring has been tampered with.
Here's a client side Crypto library to calculate the hashes https://code.google.com/p/crypto-js/
Note this is only for educational use, and wouldn't provide enough security in the real world, as a person could also discover the hashing key by inspecting the page source and use that to generate their own hashes.
A POST method wouldn't prevent spoofing anyway. POST and GET do almost exactly the same thing - they send plain text encoded variables to a web server.
They're insanely easy to "spoof" - the point isn't the spoofing, it's that you shouldn't trust "user input" like that, ever.
I would suggest in the case of the names, it doesn't matter. So what if I fudge your web page to "pretend" I am called "baseball bat" instead?
If it's important, like for example, ensuring I can only see my test results - then you need to handle the data processing server side. One method of doing this is via session tracking - so rather than including field in a web form, I instead use a "session token".
You would 'send' me a username and password - ideally using a hash to make it impossible to 'see' as you're sending it, or in your browser history. And then I would check it against my server, to check if that hash is 'valid' by performing the same operation on the server, and comparing the two.
So perlishly:
#!/usr/bin/perl
use strict;
use warnings;
use Digest::SHA qw ( sha1_base64 );
my ( $firstname, $lastname ) = qw ( Mickey Mouse );
my $timewindow = int ( time / 300 );
my $token = sha1_base64 ( $timewindow.$firstname.$lastname );
print $token;
This produces a token that doesn't last long - it changes every 5 minutes - but it's extremely difficult to tamper with.
The reason for including the time, is to avoid replay attacks, whereby if look in your browser history, I can find "your" token and reuse it. (That's probably the next question after the "spoofing" one though :))
If you sent the parameters with the token, bear in mind that it's actually quite easy for a malicious actor to perform the same calculation themselves, and send some completely faked credentials and tokens.
This is something of a simplistic example though - because really, faked parameters shouldn't matter, because you shouldn't trust them in the first place. If 'Mickey Mouse' is valid, and 'baseball bat' isn't, then your server should detect that when processing the form, and discard the latter, which makes the whole 'form spoofing' thing irrelevant.
The question is rather narrowly phrased, so this answer might not quite address what you're asking. But as a matter of policy, if you don't want your users to tamper with your data you should not give them custody of it. Why are you relying on the query string for the user name if the server already knows it? Rely on the client for authentication and for new information, and rely on your records for any information that should stay beyond the user's control.
POST requests can be crafted almost as easily as GET requests, and cryptographic protection, even when it is secure, is only useful to the extent that the client cannot access
the encrypted data; so why transmit it back and forth?
I grabbed the basic idea about DHT from wiki:
Store Data:
In a DHT-network, every node is responsible for a specific range of key-space. To store a file in the DHT, first, hash the file's name to get the file's key; second, send a message put(key, file-content) to any node of the DHT, the message will be relayed to the node which is responsible for key and that node will store the pair (key, file-content).
Get Data:
When getting a file from DHT, first, hash the file's name to get the key; second send a message get(key) to any node, relay the message until...
Questions:
To store a file, we can hash the file's name to get its key, but wiki says:
In the real world the key k could be a hash of a file's content rather
than a hash of a file's name to provide content-addressable storage,
so that renaming of the file does not prevent users from finding it.
Hash file's content? How am I supposed to know the file's content? If I've already know the file's content, then WHY would I search it in the DHT?
According to the wiki, every participating node will spare some space to store files. So does it mean that, if I participate in a DHT, I have to spare 10G disk space to store those files whose key falls into the specific key-space I'm responsible for?
If indeed I should spare some disk space to store those files, then how should I store those (key, file-content) on the disk? I mean, should the file be arranged into a B-tree or something on my disk?
When a query happens, how does my computer respond? I assume, first, check the queried key, if in my key-space, then find the corresponding file on my disk. right?
A DHT is just an algorithm. At its base it provides distributed key-value PUT and GET operations. Similar to a normal Map or associative array found in many programming languages.
Due to the real-world limitations such as untrustworthy nodes, failure rates and so on actual DHT implementations don't provide an arbitrary-length PUT(<uint8[]>, <uint8[]>) operation.
Example:
The kademlia implementation for bittorrent for example provides the following interfaces:
PUT(uint8[20], uint16)
GET(uint8[20]) -> List<Pair<IP, uint16>> where the list only represents a randomly sampled subset of the actual data
As you can see it actually is a specialized asymmetric interface when compared to more generic associative arrays.
The IP address is always derived from the PUT sender's source address, i.e. cannot be explicitly set.
And the GET returns a list instead of a single value, so it implements a MultiMap or Map<List>, if you want to see it like that.
In bittorrent's case a hash is used as content descriptor, where peers which have the content announce themselves on the DHT. If someone wants the file(s) they look up IP/Port pairs on the DHT, then contact the peers via a separate protocol and then download the data.
But other uses for a DHT are also possible, i.e. they could store signed, structured data, tweet-like text snippets or whatever. It always depends on your applications' needs.
It's just a basic building block.
On a form of my web app, I've got a hidden field that I need to protect from tampering for security reasons. I'm trying to come up with a solution whereby I can detect if the value of the hidden field has been changed, and react appropriately (i.e. with a generic "Something went wrong, please try again" error message). The solution should be secure enough that brute force attacks are infeasible. I've got a basic solution that I think will work, but I'm not security expert and I may be totally missing something here.
My idea is to render two hidden inputs: one named "important_value", containing the value I need to protect, and one named "important_value_hash" containing the SHA hash of the important value concatenated with a constant long random string (i.e. the same string will be used every time). When the form is submitted, the server will re-compute the SHA hash, and compare against the submitted value of important_value_hash. If they are not the same, the important_value has been tampered with.
I could also concatenate additional values with the SHA's input string (maybe the user's IP address?), but I don't know if that really gains me anything.
Will this be secure? Anyone have any insight into how it might be broken, and what could/should be done to improve it?
Thanks!
It would be better to store the hash on the server-side. It is conceivable that the attacker can change the value and generate his/her own SHA-1 hash and add the random string (they can easily figure this out from accessing the page repeatedly). If the hash is on the server-side (maybe in some sort of cache), you can recalculate the hash and check it to make sure that the value wasn't tampered with in any way.
EDIT
I read the question wrong regarding the random string (constant salt). But I guess the original point still stands. The attacker can build up a list of hash values that correspond to the hidden value.
Digital Signature
Its probably overkill, but this sounds no different than when you digitally sign an outgoing email so the recipient can verify its origin and contents are authentic. The tamper-sensitive field's signature can be released into the wild with your tamper-sensitive field with little fear of undetectable tampering, as long as you protect the private key and verify the data and the signature with the public key on return.
This scheme even has the nifty property that you can limit "signing" to very protected set of servers/processes with access to the private key, but use a larger set of servers/processes provided with the public key to process form submissions.
If you have a really sensitive "do-not-tamper" field and can't maintain the hash signature of it on the server, then this is the method I would consider.
Although I suspect most are familiar with digital signing, here's some Wikipedia for any of the uninitiated:
Public Key Cryptography - Security
... Another type of application in
public-key cryptography is that of
digital signature schemes. Digital
signature schemes can be used for
sender authentication and
non-repudiation. In such a scheme a
user who wants to send a message
computes a digital signature of this
message and then sends this digital
signature together with the message to
the intended receiver. Digital
signature schemes have the property
that signatures can only be computed
with the knowledge of a private key.
To verify that a message has been
signed by a user and has not been
modified the receiver only needs to
know the corresponding public key. In
some cases (e.g. RSA) there exist
digital signature schemes with many
similarities to encryption schemes. In
other cases (e.g. DSA) the algorithm
does not resemble any encryption
scheme. ...
If you can't handle the session on the server, consider encrypting the data with your private key and generating an HMAC for it, send the results as the hidden field(s). You can then verify that what is returned matches what was sent because, since no-one else knows your private key, no-one else can generate the valid information. But it would be much better to handle the 'must not be changed' data on the server side.
You have to recognize that anyone sufficiently determined can send an HTTP request to you (your form) that contains the information they want, which may or may not bear any relation to what you last sent them.
If you can't/won't store the hash server side, you need to be able to re-generate it server-side in order to verify it.
For what it's worth, you should also salt your hashes. This might be what you meant when you said:
concatenated with a constant long
random string (i.e. the same string
will be used every time)
Know that if that value is not different per user/login/sesison, it's not actually a salt value.
As long as you guard the "constant long random string" with your life then your method is relatively strong.
To go further, you can generate one time / unique "constant long random string"'s.
What you are describing is similar to part of the implementation required for what are termed canaries and are used to mitigate Cross Site Request forgery attacks.
Generally speaking, a hidden input inside a HTML form contains an encrypted value that is posted back with a HTTP request. A browser cookie or a string held in session contains the same encrypted value such that when the hidden input value is decrypted and the cookie/session value is decrypted, the unencrypted values are compared to each other - if they are not identical, the HTTP request cannot be trusted.
The encrypted value might be an object containing properties. For example, in ASP.NET MVC, the canary implementation uses a class that contains properties for the authenticated username, a cryptographically pseudo random value generated using the RNGCryptoServiceProvider class, the DateTime (UTC format) at which the object was created and an optional salt string. The object is then encrypted using the AES Encryption algorithm with a 256 bit key and decrypted with the same key when the request comes in.