How does DHT work? - language-agnostic

How does DHT work? - language-agnostic

I grabbed the basic idea about DHT from wiki:
Store Data:
In a DHT-network, every node is responsible for a specific range of key-space. To store a file in the DHT, first, hash the file's name to get the file's key; second, send a message put(key, file-content) to any node of the DHT, the message will be relayed to the node which is responsible for key and that node will store the pair (key, file-content).
Get Data:
When getting a file from DHT, first, hash the file's name to get the key; second send a message get(key) to any node, relay the message until...
Questions:
To store a file, we can hash the file's name to get its key, but wiki says:
In the real world the key k could be a hash of a file's content rather
than a hash of a file's name to provide content-addressable storage,
so that renaming of the file does not prevent users from finding it.
Hash file's content? How am I supposed to know the file's content? If I've already know the file's content, then WHY would I search it in the DHT?
According to the wiki, every participating node will spare some space to store files. So does it mean that, if I participate in a DHT, I have to spare 10G disk space to store those files whose key falls into the specific key-space I'm responsible for?
If indeed I should spare some disk space to store those files, then how should I store those (key, file-content) on the disk? I mean, should the file be arranged into a B-tree or something on my disk?
When a query happens, how does my computer respond? I assume, first, check the queried key, if in my key-space, then find the corresponding file on my disk. right?

A DHT is just an algorithm. At its base it provides distributed key-value PUT and GET operations. Similar to a normal Map or associative array found in many programming languages.
Due to the real-world limitations such as untrustworthy nodes, failure rates and so on actual DHT implementations don't provide an arbitrary-length PUT(<uint8[]>, <uint8[]>) operation.
Example:
The kademlia implementation for bittorrent for example provides the following interfaces:
PUT(uint8[20], uint16)
GET(uint8[20]) -> List<Pair<IP, uint16>> where the list only represents a randomly sampled subset of the actual data
As you can see it actually is a specialized asymmetric interface when compared to more generic associative arrays.
The IP address is always derived from the PUT sender's source address, i.e. cannot be explicitly set.
And the GET returns a list instead of a single value, so it implements a MultiMap or Map<List>, if you want to see it like that.
In bittorrent's case a hash is used as content descriptor, where peers which have the content announce themselves on the DHT. If someone wants the file(s) they look up IP/Port pairs on the DHT, then contact the peers via a separate protocol and then download the data.
But other uses for a DHT are also possible, i.e. they could store signed, structured data, tweet-like text snippets or whatever. It always depends on your applications' needs.
It's just a basic building block.

Related

Storing NFT metadata on IPFS with BaseURI, can I update the JSON dynamically for tokens when they are minted?

I'm trying to create a collection of 10,000 (ERC-721) tokens, whose metadata are stored on IPFS.
Each image associated to a token would be uploaded on IPFS beforehand with its unique CDI.
Since tokens will not be minted all at once, at first I want each metadata json to be empty and link to a placeholder image.
My question is: without setting specific TokenURI in my contract (which I want to avoid), how can I change the json file associated with a token when it's minted, without changing the BaseURI which must be common for all tokens?
This is how it should work:
ipfs://Qx000000000000000000/1.json // json file points to nothing
// token 1 is minted
ipfs://Qx000000000000000000/1.json // json file is updated but keeps the same ipfs base URI
I guess it should involve IPNS, but I can't find a specific guide on the best practice for this. Although I see this method is used all the time, for example even by the Bored Ape Yacht Club collection.

What is "representation", "state" and "transfer" in Representational State Transfer (REST)?

I came across a few resources regarding REST but I'm not able to understand things clearly. It would help me if someone could explain things with respect to my example below.
I have a table named User
User table Content
id name
1 xxx
The URL I will be calling is /test/1
The result will be in JSON format, eg: { 1: "xxx" }
My understanding so far regarding REST:
Resource - User table content
Representation - table/JSON
State transfer - data in the form of table to JSON.
Please do let me know if my understanding is correct.
Else, please answer the below questions:
What is a Resource in my example?
What is a Representation in my example?
What is a state transfer or when does this happen in my example?

REST is about resource state manipulation through their representations on the top of stateless communication between client and server. It's a protocol independent architectural style but, in practice, it's commonly implemented on the top of the HTTP protocol.
When designing REST over HTTP, URLs are used to locate the resources, HTTP methods are used to express the operations over the resources and representations such as JSON and/or XML documents are used to represent the state of the resource. HTTP headers can be used to exchange some metadata about the request and response while HTTP status code are used to inform the client regarding the status of the operation.
What is a resource in my example?
Understand resource as the concept of a user. Don't think about the table in your database, think about an abstraction of a user with their set of attributes.
What is a representation in my example?
A JSON document can be used to represent the state of a particular resource. A resource can have many representations, such as JSON and/or XML documents, and the client can use content negotiation to request different representations of the same resource.
What is a state transfer or when does this happens in my example?
The state of a given resource can be retrieved and manipulated using representations.
A GET request, for example, allows you to retrieve a representation of the state of a resource, sent in the response payload. A PUT request, for example, allows you to replace the state of a resource with the state defined by the representation enclosed in the request payload.
Example
Consider a user resource with attributes such as id and name stored somehow in your server:
ID: 1
Name: John Doe
These details make the state of the resource.
A URL such as /users/1 can be used to locate the resource in your server.
Requests such as GET, PUT and DELETE can be performed against this URL to retrieve/manipulate the state of the resource using representations, such as JSON and/or XML documents (other representations can be supported according to your needs):
{
"id": 1,
"name": "John Doe"
}
<user>
<id>1</id>
<name>John Doe</name>
</user>
The above shown documents are not the resource itself. They are just a way to represent the resource. which is stored somehow in your server.

If you want to understand REST, you should really start from the source: Fielding's thesis.
What is a Resource in my example?
OK, review of the term:
The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
In other words, the "resource" is the concept you are talking about. In this case, the user with name xxx. But it could be anything - the table that holds the data about the user with name xxx is also a "resource".
What is a Representation in my example?
Representations are fundamentally byte arrays
A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.
So your json document -- more precisely, the utf-8 encoded byte array, is a representation. A given resource might have many representations at any given time.
What is a state transfer or when does this happens in my example?
When the client and server exchange messages; the client server architectural style is the first of the architectural constraints in the REST architectural style.

TimeStampToken Storage in MySQL or Oracle?

I'm getting a TimeStampToken (RFC3161) by using a java based client.
I need to store all the information included in TSTInfo in a database, MySql or Oracle.Is there any specific format to store it?

There is no specified format1 for this kind of thing.
But some obvious alternatives spring to mind:
Store the DER-encoded form as a BLOB.
Take the DER-encoded form, base-64 encoded it and store it in a CHAR(n) column.
Create a table with columns to represent each of the fields of the TSSInfo structure ... assuming that you are already decoding it.
Serialize the Java object representation using the Java serialization protocol, XML, JSON, etcetera.
and so on.
1 - Actually, according to Wikipedia, there is an encoding for ASN.1 called XER that is represented using XML.

Note that if you only store the TSTInfo, you lose the signature, which is the whole point of having an RFC3161 token. The TSTInfo without the signature proves nothing!
To preserve its evidentiary property, you really should store the entire timestamp token (which is defined as the signed CMS ContentInfo that wraps the TSTInfo).
In terms of what format to use, probably chapter 3.2 of the RFC3161 specification (https://www.rfc-editor.org/rfc/rfc3161) can be helpful (which is only a suggestion though):
3. Transports
There is no mandatory transport mechanism for TSA messages in this
document. The mechanisms described below are optional; additional
optional mechanisms may be defined in the future.
[...]
3.2. File Based Protocol
A file containing a time-stamp message MUST contain only the DER
encoding of one TSA message, i.e., there MUST be no extraneous header
or trailer information in the file. Such files can be used to
transport time stamp messages using for example, FTP.
So, I would store the DER encoded CMS ContentInfo (not of the TSTInfo) as a BLOB

What are REST resources?

What are REST resources and how do they relate to resource names and resource representations?
I read a few articles on the subject, but they were too abstract and they left me more confused than I was before.
Is the following URL a resource? If it is, what is the name of that resource and what is its representation?
http://api.example.com/users.json?length=2&offset=5
The GET response of the URL should look something like:
[
{
id: 6,
name: "John"
},
{
id: 7,
name: "Jane"
}
]

The reason why articles on REST resources are abstract is because the concept of a REST resource is abstract. It's basically "whatever thing is accessed by the URL you supply". So, in your example, the resource would be the list of two users starting at offset 5 in some bigger list. Note that, how the resource is implemented is a detail you don't care about unless you are the one writing the implementation.
Is the following URL a resource?
The URL is not a resource, it is a label that identifies the resource, it is, if you like, the name of the resource.
The JSON is a representation of the resource.

What’s a Resource?
A resource is anything that’s important enough to be referenced as a
thing in itself. If your users might “want to create a hypertext link
to it, make or refute assertions about it, retrieve or cache a
representation of it, include all or part of it by reference into
another representation, annotate it, or perform other operations on
it”, then you should make it a resource.
Usually, a resource is something that can be stored on a computer and
represented as a stream of bits: a document, a row in a database, or
the result of running an algorithm. A resource may be a physical
object like an apple, or an abstract concept like courage, but (as
we’ll see later) the representations of such resources are bound to be
disappointing. Here are some possible resources:
Version 1.0.3 of the software release
The latest version of the software release
The first weblog entry for October 24, 2006
A road map of Little Rock, Arkansas
Some information about jellyfish
A directory of resources pertaining to jellyfish
The next prime number after 1024
The next five prime numbers after 1024
The sales numbers for Q42004
The relationship between two acquaintances, Alice and Bob
A list of the open bugs in the bug database
The text is from the O'Reilly book "RESTful Web Services".

The URL is never a resource or its name or its representation.
URL just tells where the resource is located and You can invoke GET,POST,PUT,DELETE etc on this URL to invoke the resource.
Data responded back are the resources while the form of the data is its representation.
Let's say Your URL with given GET parameters can output a JSON resource - this is the JSON representation of this resource. While with other flag in the GET it could respond with the same data in XML - that will be another representation of the very same resource.
EDIT: Due to the comments to the OP and to my answer I'm adding another explanations.
Also the resource name is considered to be the 'script name', e.g. in this case it is users.json while this resource name is self describing the resource representation itself - when calling this resource we expect the resource is in JSON, while when calling e.g. users.xml we would expect the data in XML.
When I change the offset parameter in GET the response contains
different data set - is it a new resource or its representation?
When I define which columns are returned in response in GET, is it a different resource or different representation, or?
Well, here the problem and answer are clear - we still call the same URL, the server responses with the data in the same form (still it is JSON), data still contains information about users - just the information itself has changed due to the new offset parameter. So it is obvious that it is still the same resource with the same representation and the same resource name as before.
Second problem could be a little confusing. Though we are calling the same resource, though the resource contains the same data (just with only predefined column set) and though the data is in the same representation it could seem to us as a different resource. But due to the points in the paragraph above it is nor the different resource or different representation. Though the data set contains less information the requesting side (filtering this data set) should be considering this and behave accordingly. So again: it is the same resource with the same resource name and the same resource representation.

REST
This architectural style was defined in the chapter 5 of Roy T. Fielding's dissertation.
REST is about resource state manipulation through their representations on the top of stateless communication between client and server. It's a protocol independent architectural style but, in practice, it's commonly implemented on the top of the HTTP protocol.
Resources
The resource itself is an abstraction and, in the words of the author, a resource can be any information that can be named. The domain entities of an application (e.g. a person, a user, a invoice, a collection of invoices, etc) can be resources. See the following quote from Fielding's dissertation:
5.2.1.1 Resources and Resource Identifiers
The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. [...]
Resource representations
A JSON document is resource representation that allows you to represent the state of a resource. A server can provide different representations for the same resource. For example, using XML and JSON documents. A client can use content negotiation to request different representations of the same resource.
Quoting Fielding's dissertation:
5.2.1.2 Representations
REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.
A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation. [...]
Over HTTP, request and response headers can be used to exchange metadata about the representation.
Resource identifiers
A URL a resource identifier that identifies/locates a resource in the server.
This answer may also be insightful.

What are REST resources and how do they relate to resource names and resource representations?
REST doesn't mean a great deal more then you use HTTP verbs (GET, POST, PUT, DELETE, etc) properly.
Is the following URL a resource?
All URLs are strings that tell computers where a resource can be located. (Hence the name: Uniform Resource Locator).

A resource is:
a noun
that is unique
and can be represented as data
and has at least one URI
I go into more detail on my blog post, What, Exactly, Is a RESTful Resource?

Representational State Transfer (REST) is a style of software architecture for distributed systems such as the World Wide Web. REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. Resources are a set of addressable objects, basically files and documents, linked using URLs. As correctly pointed out above by Quentin, REST archiecture simply implies that you'd use the HTTP verbs GET/POST/PUT/DELETE...

Conceptually you can think about a resource as everything which is accessible on the web using an URL.
If you stick to this rule http://api.example.com/users.json?length=2&offset=5 can be considered a resource

You've only provided what appear to be relative parameters rather than "ID" which is (or should be) concrete. Remember, get operations should be idempotent (i.e. repeatable with the same outcome).

What is REST?
REST is an architecture Style which stands for Representational(RE) State(S) transfer(T).
What is REST Resource ?
Rest Resource is data on which we want to perform operation(s).So this data can be present in database as record(s) of table(s) or in any other form.This record has unique identifier with which it can be identified like id for Employee.
Now when this data is requested by unique url like http://www.example.com/employees/123,so ultimately data or record which is present in database will be converted to JSON/XML/Plain text format by Rest Service and will be sent to Consumer.
So basically,what is happening here is REPRESENTATIONAL STATE TRANSFER, in a way that state of the data present in database is transferred to another format which can be JSON/XML or plain text.
So in this case 1 employee represents 1 resource which can be accessed by unique url like http://www.example.com/employees/123
In case we want to get list of all resources(employees),we will do:
http://www.example.com/employees
Hope this will help.

REST stands for REpresentational State Transfer. It's a method of transferring variable information from one location to another. A common method of doing this is using JSON - a way of formatting your variables so that they can be transferred without loss of information.
PHP, for example, has built in JSON support. Passing a PHP array to json_encode($array) will output a string in the format you posted (which by the way, is indeed a REST resource, because it gives you variables and information).
In PHP, the array you posted would turn out as:
Array (
[0]=>Array (
['id']=>6;
['name']=>'John';
)
[1]=>Array (
['id']=>7;
['name']=>'Jane';
)
)

How to detect hidden field tampering?

On a form of my web app, I've got a hidden field that I need to protect from tampering for security reasons. I'm trying to come up with a solution whereby I can detect if the value of the hidden field has been changed, and react appropriately (i.e. with a generic "Something went wrong, please try again" error message). The solution should be secure enough that brute force attacks are infeasible. I've got a basic solution that I think will work, but I'm not security expert and I may be totally missing something here.
My idea is to render two hidden inputs: one named "important_value", containing the value I need to protect, and one named "important_value_hash" containing the SHA hash of the important value concatenated with a constant long random string (i.e. the same string will be used every time). When the form is submitted, the server will re-compute the SHA hash, and compare against the submitted value of important_value_hash. If they are not the same, the important_value has been tampered with.
I could also concatenate additional values with the SHA's input string (maybe the user's IP address?), but I don't know if that really gains me anything.
Will this be secure? Anyone have any insight into how it might be broken, and what could/should be done to improve it?
Thanks!

It would be better to store the hash on the server-side. It is conceivable that the attacker can change the value and generate his/her own SHA-1 hash and add the random string (they can easily figure this out from accessing the page repeatedly). If the hash is on the server-side (maybe in some sort of cache), you can recalculate the hash and check it to make sure that the value wasn't tampered with in any way.
EDIT
I read the question wrong regarding the random string (constant salt). But I guess the original point still stands. The attacker can build up a list of hash values that correspond to the hidden value.

Digital Signature
Its probably overkill, but this sounds no different than when you digitally sign an outgoing email so the recipient can verify its origin and contents are authentic. The tamper-sensitive field's signature can be released into the wild with your tamper-sensitive field with little fear of undetectable tampering, as long as you protect the private key and verify the data and the signature with the public key on return.
This scheme even has the nifty property that you can limit "signing" to very protected set of servers/processes with access to the private key, but use a larger set of servers/processes provided with the public key to process form submissions.
If you have a really sensitive "do-not-tamper" field and can't maintain the hash signature of it on the server, then this is the method I would consider.
Although I suspect most are familiar with digital signing, here's some Wikipedia for any of the uninitiated:
Public Key Cryptography - Security
... Another type of application in
public-key cryptography is that of
digital signature schemes. Digital
signature schemes can be used for
sender authentication and
non-repudiation. In such a scheme a
user who wants to send a message
computes a digital signature of this
message and then sends this digital
signature together with the message to
the intended receiver. Digital
signature schemes have the property
that signatures can only be computed
with the knowledge of a private key.
To verify that a message has been
signed by a user and has not been
modified the receiver only needs to
know the corresponding public key. In
some cases (e.g. RSA) there exist
digital signature schemes with many
similarities to encryption schemes. In
other cases (e.g. DSA) the algorithm
does not resemble any encryption
scheme. ...

If you can't handle the session on the server, consider encrypting the data with your private key and generating an HMAC for it, send the results as the hidden field(s). You can then verify that what is returned matches what was sent because, since no-one else knows your private key, no-one else can generate the valid information. But it would be much better to handle the 'must not be changed' data on the server side.
You have to recognize that anyone sufficiently determined can send an HTTP request to you (your form) that contains the information they want, which may or may not bear any relation to what you last sent them.

If you can't/won't store the hash server side, you need to be able to re-generate it server-side in order to verify it.
For what it's worth, you should also salt your hashes. This might be what you meant when you said:
concatenated with a constant long
random string (i.e. the same string
will be used every time)
Know that if that value is not different per user/login/sesison, it's not actually a salt value.

As long as you guard the "constant long random string" with your life then your method is relatively strong.
To go further, you can generate one time / unique "constant long random string"'s.

What you are describing is similar to part of the implementation required for what are termed canaries and are used to mitigate Cross Site Request forgery attacks.
Generally speaking, a hidden input inside a HTML form contains an encrypted value that is posted back with a HTTP request. A browser cookie or a string held in session contains the same encrypted value such that when the hidden input value is decrypted and the cookie/session value is decrypted, the unencrypted values are compared to each other - if they are not identical, the HTTP request cannot be trusted.
The encrypted value might be an object containing properties. For example, in ASP.NET MVC, the canary implementation uses a class that contains properties for the authenticated username, a cryptographically pseudo random value generated using the RNGCryptoServiceProvider class, the DateTime (UTC format) at which the object was created and an optional salt string. The object is then encrypted using the AES Encryption algorithm with a 256 bit key and decrypted with the same key when the request comes in.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008