Ethereum receipts blockid and hashes - ethereum

Preface: The question relates to the relationship between the content of the Ethereum's receipts and the hash of the block header.
Problem description: I wonder, in Ethereum, blockid is based on the block's hash. The header contains the hash of the root of the receipts Trie. The receipt contains the block's ID.
But before putting the block's id into the receipt one needs to know the hash of the block itself (which contains a hash of the Receipts Trie) - in other words we've got a circular dependency.
Now, I see 4 possibilities:
1) The block contains receipts of transactions which are contained within the block; if not then this would be easy; receipts would be stored in consecutive blocks; but this would complicate things; as there would need to be an incentive for other nodes to include external receipts, to distribute them etc.
2) The hash of the Receipt's Trie does not take into account the blockID field within receipts. This would result in some of the data being malleable (not protected by the PoW).
3) The blockID is not based on the hash value of the entire header. Thus not taking into account the hash of the Receipts Trie. (again allowing for malleability for some of the data)
4) Theres' no blockID inside of a receipt; but then I've seen these are included on some of the JSON print-outs available here. are these values appended by the command'processing interface implicitly?
Which one, or maybe another possibility, is it?

From the yellow paper, section 4.3.1:
The transaction receipt, R, is a tuple of four items comprising: the cumulative gas used in the block containing the transaction receipt as of immediately after the transaction has happened, Ru, the set of logs created through execution of the transaction, Rl and the Bloom filter composed from information in those logs, Rb and the status code of the transaction, Rz:
(20) R ≡ (Ru, Rb, Rl, Rz)
So the transaction receipt itself does not specify any details of the block whose receipt trie it's a part of.

Related

How to count number of blocks needed to be read

I have an examination paper of Operating System Concepts which contains a rather hard problem and I have to review it before taking exam. I don't quite understand about the problem and don't know how to count the number of blocks of a file referred on the problem. Please help me.
An operating system uses 32-bit pointer to allocate files with linked list method. Assume a block of data has the size of 4KB and the first 32 bits of the block contains the pointer, the rest contains the data. We also assume that the following function has been successfully called in an application program:
fd = open('myfile.bin',O_RDONLY);
and myfile.bin is a file having the size of 20480 bytes.
a) Count the number of blocks that needs to be read (including the block that contains the pointer to the first block of the list) when we perform the following system function
lseek(fd,16385,SEEK_SET); read(fd,&c,1);
b) Count the number of blocks that needs to be read when we have to continue performing the following system function right after the system function performance in part a.
lseek(fd,2048,SEEK_CUR); read(fd,&c,1);

Hadoop - Splitting of files

I have just started learning hadoop and i have a question on how the splitting works.
For example, i have a file like below with key value-
2 1121291290r5405454
1 2192949495959454454
2 121334883484585
So my question is , when the splitting will be done will that based on the block size or the record type. if its based on block size it might be possible that while splitting the key-value data might be separated and put in different blocks which will give the incorrect data.
Taking my file as an example - if the file is splitting into 2 blocks -
block 1 -------
2 1121291290r5405454
1 21929494959594
block 2--------
54454
2 121334883484585
So, here the key value relationship is gone and resulting in incorrect data. As far as i know the split happens when the input file size exceeds the block size. So how do we handle this situation?
By default ,number of input spilit depends upon the number of blocks. In your case , if there are two block for a single file , then two mappers will run . while writing the data into blocks , hadoop uses a kind of pointer which indicates the location of next block, so by using this pointer, first mapper identifies the exact key value pair and process accordingly and 2nd mapper will start computation by leaving that data which will be processed by 1st mapper. Simply a map task is not limited to that block only, it can also process the data from other blocks.

CUDA: Scatter communication pattern

I am learning CUDA from the Udacity's course on parallel programming. In a quiz, they have a given a problem of sorting a pre-ranked variable(player's height). Since, it is a one-one correspondence between input and output array, should it not be a Map communication pattern instead of a Scatter?
CUDA makes no canonical definition of these terms, that I know of. Therefore my answer is merely a suggestion of how it might be or have been interpreted.
"Since, it is a one-one correspondence between input and output array"
This statement doesn't appear to be supported by the diagram, which shows gaps in the output array, which have no corresponding input point associated with them.
If a smaller set of values are distributed into a larger array (with resultant gaps in the output array, therefore, in which no input value corresponds to the gap location(s)), then a scatter might be used to describe that operation. Both scatters and maps have maps which describe where the input values go, but it might be that the instructor has defined scatter and map in such a way as to differentiate between these two cases, such as the following plausible definitions:
Scatter: one-to-one relationship from input to output (ie. unidirectional relationship). Every input location has a corresponding output location, but not every output location has a corresponding input location.
Map: one-to-one relationship between input and output (ie. bidirectional relationship). Every input location has a corresponding output location, and every output location has a corresponding input location.
Gather: one-to-one relationship from output to input (ie. unidirection relationship). Every output location has a corresponding input location, but not every input location has a corresponding output location.
The definition of each communication pattern (map, scatter, gather, etc.) varies slightly from one language/environment/context to another, but since I have followed that same Udacity course I'll try to explain that term as I understand it in the context of the course:
The Map operation calculates each output element as a function of its corresponding input element, i.e.:
output[tid] = foo(input[tid]);
The Gather pattern calculates each output element as a function of one or more (usually more) input elements, not necessarily the corresponding one (typically these are elements from a neighborhood). For example:
output[tid] = (input[tid-1] + input[tid+1]) / 2;
Lastly, the Scatter operation has each input element contribute to one or more (again, usually more) output elements. For instance,
atomicAdd( &(output[tid-1]), input[tid]);
atomicAdd( &(output[tid]), input[tid]);
atomicAdd( &(output[tid+1]), input[tid]);
The example given in the question is clearly not a Map, because each output is calculated from an input at a different location.
Also, it is hard to see how the same example can be a scatter, because each input element only causes one write to the output, but it is indeed a scatter because each input causes a write to an output whose location is determined by the input.
In other words, each CUDA thread processes an input element at the location associated with its tid(thread ID number), and calculates where to write the result. More usually a scatter would write on several places instead of only one, so this is a particular case that might as well be named differently.
Each player has 3 properties (name, height, rank).
So I think scatter is correct, because we should consider these three things to make output.
If player has only one property like rank,
then Map is correct I think.
reference: Parallel Communication Patterns Recap in this lecture
reference: map/reduce/gather/scatter with image

Will an MD5 hash keep changing as its input grows?

Does the value returned by MySQL's MD5 hash function continue to change indefinitely as the string given to it grows indefinitely?
E.g., will these continue to return different values:
MD5("A"+"B"+"C")
MD5("A"+"B"+"C"+"D")
MD5("A"+"B"+"C"+"D"+"E")
MD5("A"+"B"+"C"+"D"+"E"+"D")
... and so on until a very long list of values ....
At some point, when we are giving the function very long input strings, will the results stop changing, as if the input were being truncated?
I'm asking because I want to use the MD5 function to compare two records with a large set of fields by storing the MD5 hash of these fields.
======== MADE-UP EXAMPLE (YOU DON'T NEED THIS TO ANSWER THE QUESTION BUT IT MIGHT INTEREST YOU: ========
I have a database application that periodically grabs data from an external source and uses it to update a MySQL table.
Let's imagine that in month #1, I do my first download:
downloaded data, where the first field is an ID, a key:
1,"A","B","C"
2,"A","D","E"
3,"B","D","E"
I store this
1,"A","B","C"
2,"A","D","E"
3,"B","D","E"
Month #2, I get
1,"A","B","C"
2,"A","D","X"
3,"B","D","E"
4,"B","F","E"
Notice that the record with ID 2 has changed. Record with ID 4 is new. So I store two new records:
1,"A","B","C"
2,"A","D","E"
3,"B","D","E"
2,"A","D","X"
4,"B","F","E"
This way I have a history of *changes* to the data.
I don't want have to compare each field of the incoming data with each field of each of the stored records.
E.g., if I'm comparing incoming record x with exiting record a, I don't want to have to say:
Add record x to the stored data if there is no record a such that x.ID == a.ID AND x.F1 == a.F1 AND x.F2 == a.F2 AND x.F3 == a.F3 [4 comparisons]
What I want to do is to compute an MD5 hash and store it:
1,"A","B","C",MD5("A"+"B"+"C")
Let's suppose that it is month #3, and I get a record:
1,"A","G","C"
What I want to do is compute the MD5 hash of the new fields: MD5("A"+"G"+"C") and compare the resulting hash with the hashes in the stored data.
If it doesn't match, then I add it as a new record.
I.e., Add record x to the stored data if there is no record a such that x.ID == a.ID AND MD5(x.F1 + x.F2 + x.F3) == a.stored_MD5_value [2 comparisons]
My question is "Can I compare the MD5 hash of, say, 50 fields without increasing the likelihood of clashes?"
Yes, practically, it should keep changing. Due to the pigeonhole principle, if you continue doing that enough, you should eventually get a collision, but it's impractical that you'll reach that point.
The security of the MD5 hash function is severely compromised. A collision attack exists that can find collisions within seconds on a computer with a 2.6Ghz Pentium4 processor (complexity of 224).
Further, there is also a chosen-prefix collision attack that can produce a collision for two chosen arbitrarily different inputs within hours, using off-the-shelf computing hardware (complexity 239).
The ability to find collisions has been greatly aided by the use of off-the-shelf GPUs. On an NVIDIA GeForce 8400GS graphics processor, 16-18 million hashes per second can be computed. An NVIDIA GeForce 8800 Ultra can calculate more than 200 million hashes per second.
These hash and collision attacks have been demonstrated in the public in various situations, including colliding document files and digital certificates.
See http://www.win.tue.nl/hashclash/On%20Collisions%20for%20MD5%20-%20M.M.J.%20Stevens.pdf
A number of projects have published MD5 rainbow tables online, that can be used to reverse many MD5 hashes into strings that collide with the original input, usually for the purposes of password cracking.

Stream filter in cuda

I have an array of values and a linked list of indexes. Now, i only want to keep those values from the array that correspond to the indexes in the LL. is there a standard algorithm to do this. Please give example if possible
So, suppose i have an array 1,2,5,6,7,9
and i have a linked list 2->3
So, i want to keep the values at the index 2 and 3. That is keep 5 and 6.
Thus my function should return 5 and 6
In general, linked list is inherently serial. Having a parallel machine will not speed up the traversal of your list, hence the number of steps of your problem cannot go below O(n), where n is the size of the list.
However, if you have some additional way to access the list you can do something with it.
For example, all elements of the list could be stored in a fixed-size array (although, not necesairly in a consecutive way). List member could be represented in an array using the following struct.
struct ListNode {
bool isValid;
T data;
int next;
}
The value isValid sets if given cell in an array is occupied by a valid list member, or it is just an empty cell.
Now, a parallel algorithm would read all cells at once, check if it represents a valid data, and if so, do something with it.
Second part: Each thread, having a valid index idx of your input array A would have to mark A[idx] not to be deleted. Once we know which elements of A should be removed and which not - a parallel compaction algorithm can be applied.