Let's say I have a smart contract with branch, where each branch has a different number of operations.
if (someCondition) {
// do operations costing 10 gas
} else {
//do operations costing 100 gas
}
When a user goes to call this function from their client, say metamask, how do they know how much gas their transaction will cost? Do they just have to guess and include enough gas for the most expensive path?
The client app is almost always able to calculate the gas usage. Either by running their own EVM emulator, or by querying an external API that emulates the transaction and returns the result.
All blockchain data is publicly available for reading (even values of private properties - just not with Solidity, but with using more low-level approach and querying the storage slots), and the gas cost of each operation is predetermined.
So the client knows, that the transaction is going to
read one slot from memory
write into storage slot
declare another slot in memory and return it
And it also knows that one MLOAD costs 3 gas, one SSTORE costs 5,000 gas, etc.
It can use all this data to calculate the final cost.
The exception is when the decision tree is based on block data such as block.timestamp, that is unknown beforehand. Then it depends on the client, but my guess is that most suggest the most expensive combination, so that they lower the risk of having the transaction reverted due to insufficient gas.
Example:
if (block.timestamp % 2 == 0) {
// even second, do operations costing 10 gas
} else {
// odd second, do operations costing 100 gas
}
You can find all the values corresponding to the relative costs, in gas, of a number of abstract operations that a transaction may affect in the Ethereum yellow paper (page 27).
The "if" statment in a low level languaje, is consider a "JUMP" operation (alters de program counter). So in the gas cost table (page 27) says that a JUMPDEST operation cost 1 gas value.
Related
It is said that zero copy should be used in situations where “read and/or write exactly once” constraint is met. That's fine.
I have understood this, but my question is why is zero copy fast in first place ? After all whether we use explicit transfer via cudamemcpy or zero copy , in both case data has to travel through pci express bus. Or there exist any other path ( i.e copy happen's directly in GPU register by passing device RAM ?
Considered purely from a data-transfer-rate perspective, I know of no reason why the data transfer rate for moving data between host and device via PCIE should be any different when comparing moving that data using a zero-copy method vs. moving it using cudaMemcpy.
However, both operations have overheads associated with them. The primary overhead I can think of for zero-copy comes with pinning of the host memory. This has a noticeable time overhead (e.g. when compared to allocating the same amount of data using e.g. malloc or new). The primary overhead that comes to mind with cudaMemcpy is a per-transfer overhead of at least a few microseconds that is associated with the setup costs of using the underlying DMA engine that does the transfer.
Another difference is in accessibility to the data. pinned/zero-copy data is simultaneously accessible between host and device, and this can be useful for some kinds of communication patterns that would otherwise be more complicated with cudaMemcpyAsync for example.
Here are two fairly simple design patterns where it may make sense to use zero-copy rather than cudaMemcpy.
When you have a large amount of data and you're not sure what will be needed. Suppose we have a large table of data, say 1GB, and the GPU kernel will need access to it. Suppose, also that the kernel design is such that only one or a few locations in the table are needed for each kernel call, and we don't know a-priori which locations those will be. We could use cudaMemcpy to transfer the entire 1GB to the GPU. This would certainly work, but it would take a possibly non-trivial amount of time (e.g. ~0.1s). Suppose also that we don't know what location was updated, and after the kernel call we need access to the modified data on the host. Another transfer would be needed. Using pinned/zero-copy methods here would mostly eliminate the costs associated with moving the data, and since our kernel is only accessing a few locations, the cost for the kernel to do so using zero-copy is far less than 0.1s.
When you need to check status of a search or convergence algorithm. Suppose that we have an algorithm that consists of a loop that is calling a kernel in each loop iteration. The kernel is doing some kind of search or convergence type algorithm, and so we need a "stopping condition" test. This might be as simple as a boolean value, that we communicate back to the host from the kernel activity, to indicate whether we have reached the stopping point or not. If the stopping point is reached, the loop terminates. Otherwise the loop continues with the next kernel launch. There may even be "two-way" communication here. For example, the host code might be setting the boolean value to false. The kernel might set it to true if iteration needs to continue, but the kernel does not ever set the flag to false. Therefore if continuation is needed, the host code sets the flag to false and calls the kernel again. We could realize this with cudaMemcpy:
bool *d_continue;
cudaMalloc(&d_continue, sizeof(bool));
bool h_continue = true;
while (h_continue){
h_continue = false;
cudaMemcpy(d_continue, &h_continue, sizeof(bool), cudaMemcpyHostToDevice);
my_search_kernel<<<...>>>(..., d_continue);
cudaMemcpy(&h_continue, d_continue, sizeof(bool), cudaMemcpyDeviceToHost);
}
The above pattern should be workable, but even though we are only transferring a small amount of data (1 byte), the cudaMemcpy operations will each take ~5 microseconds. If this were a performance concern, we could almost certainly reduce the time cost with:
bool *z_continue;
cudaHostAlloc(&z_continue, sizeof(bool), ...);
*z_continue = true;
while (*z_continue){
*z_continue = false;
my_search_kernel<<<...>>>(..., z_continue);
cudaDeviceSynchronize();
}
For example, assume that you wrote a cuda-accelerated editor algorithm to fix spelling errors for books. If a 2MB text data has only 5 bytes of error, it would need to edit only 5 bytes of it. So it doesn't need to copy whole array from GPU VRAM to system RAM. Here, zero-copy version would access only the page that owns the 5 byte word. Without zero-copy, it would need to copy whole 2MB text. Copying 2MB would take more time than copying 5 bytes (or just the page that owns those bytes) so it would reduce books/second throughput.
Another example, there could be a sparse path-tracing algorithm to add shiny surfaces for few small objects of a game scene. Result may need just to update 10-100 pixels instead of 1920x1080 pixels. Zero copy would work better again.
Maybe sparse-matrix-multiplication would work better with zero-copy. If 8192x8192 matrices are multiplied but only 3-5 elements are non-zero, then zero-copy could still make difference when writing results.
When I look over the tutorial of Robot Operating system (ROS), I found most example codes set the publisher's queue size to a larger value such as 1000. I think this leads to losing real-time response of the node.
For what purpose, do people set it to that large value?
From ROS docs (http://wiki.ros.org/ROS/Tutorials/WritingPublisherSubscriber):
Message publisher (producer):
"The second parameter to advertise() is the size of the message queue
used for publishing messages. If messages are published more quickly
than we can send them, the number here specifies how many messages to
buffer up before throwing some away."
Message subscriber:
"The second parameter to the subscribe() function is the size of the message queue. If messages are arriving faster than they are being processed, this is the number of messages that will be buffered up before beginning to throw away the oldest ones."
Possible explanation:
Think in the consumer-producer problem.
You can't guarantee that you will consume messages in the rate they arrive. So you create a queue that is filled as messages comes by sender (some sensor for instance).
Bad case: If your program delays in some other part and you can't read the messages in the rate they arrived the queue increases.
Good case: As soon as your other processing load diminishes you can read the queue faster and start to reduce it. If you have available time you will end up reducing queue size to zero.
So as for your question, if you send queue size to large value you may guarantee that will not lose messages. In a simple example you have no memory constraints so you can do anything you want, like use many GBytes of RAM to create a large queue and assures will always work. Or if you create a toy example to explain a concept you don't want your program to crash for other reasons.
A real life example can be a scenario of a waiter and a kitchen to wash dishes.
Suppose the costumers ends its meals and the waiter takes their dirty dishes to wash in the kitchen. He puts in a table. Whenever the dishwasher can, he goes to table and gets dishes and take to wash. In normal operation the table is never filled. But if someone else give another task to the dishwasher guy, the table will start to get full. Until some time the waiter can't place dishes anymore and leave tables dirty (problem in the system). But if table is artificially large there (let's say 1000 square units) the waiter will likely fulfill its job even if dishwasher is busy, considering that after some time he will be able to return to clean dishes.
Ok, long answer, but it may be of help to understand queues.
I am analyzing an ICO, successful transactions start here: https://etherscan.io/txs?a=0x6267b5376c809445c9432bd9f14a3808b00eae2c&p=134
If you see the last column - most successful transactions paid a very high price (>0.1 ETH) but there are some in between which paid lil (https://etherscan.io/tx/0x5b9145d94449fe01b7bcecee162e3adffd389997ba27a5c8724b632ca455b61c)
Question is -
How are these transactions able to get in between high price transactions? Is it just chance?
Is there some kind of strategy possible to make sure your transaction gets picked up - like if you are running a node?
Comparing the transactions for this contract to each other, yes, there is a big difference in tx costs. But, if you look specifically at the gas price, the buyers are paying a very big gas price across the spectrum. The high end transactions are paying 1k+ Gwei (some are even higher than 3k Gwei), but even the "cheapest" transactions you're looking at are still paying ~100 Gwei. Compared to other transactions on the blockchain, that's a high cost. The cost to have a transaction mined as fast as possible varies depending on congestion, but whenever I check ethgasstation.info, the high end gas prices are usually around 20-40 Gwei. As you can imagine, anything higher than that, miners are going to be eager to consume ASAP.
For your 2nd question, this is exactly the best strategy to have your transaction picked up the fastest. Pay a higher gas price.
How would I get the transaction cost inside of my contract? Would it just be: tx.gasprice ? And will this always be a value in gwei or will it be wei ?
The cost of a transaction isn't really known until execution completes. In an extreme example, perhaps your function which is computing this is being called by another function, and after you return, that function throws, consuming all remaining gas. There's no way for you to know in advance that this will happen.
To calculate the cost of the transaction, you'd need two pieces of information:
The gas price.
How much gas will be consumed.
If you knew both, you could multiple them together and get the total cost. tx.gasprice tells you (1), but as explained above, you can't really know (2). The best you can do is probably to use msg.gas at the top and bottom of a function to tell you roughly how much gas that function consumes.
i know what is a gas, gaslimit and gasprice, but still have confusion even after searching and reading through the Internet.
There is a gaslimit per block, but why many blocks did not reach it? in other words, can a miner send a block to the network without reaching the gaslimit for the block?
Assume the block gaslimit is 4 million and i sent a transaction with 4 million gaslimit. But when the miner executed it (used gas was 1 million). Can the miner add extra transactions to the block to fill the remaining 3 million or not. In another way, does a transaction with a big gaslimit (but uses a fraction of that gas) affects the miner of adding more transactions to the block?
Each Opcode coast some value of gas. How Ethereum measure the cost of each EVM opcode? (any reference for explanation?).
Thanks
Q1 The block gas limit is an upper bound on the total cost of transactions that can be included in a block. Yes, the miner can and should send a solved block to the network, even if the gas cost is 0. Blocks are meant to arrive at a steady pace in any case. So "nothing happened during this period" is a valid solution.
Q2a The gas cost of a transaction is the total cost of executing the transaction. Not subject to guesswork. If the actual cost exceeds the supplied gas then the transaction fails with an out-of-gas exception. If there is surplus gas, it's returned to the sender.
Q2b Yes, a miner can and should include multiple transactions in a block. A block is a well-ordered set of transactions that were accepted by the network. It's a unit of disambiguation that clearly defines the accepted order of events. Have a look here for exact meaning of this: https://ethereum.stackexchange.com/questions/13887/is-consensus-necessary-for-ethereum
Q3 I can't say for sure (possibly someone can confirm) that this is an up-to-date list: https://docs.google.com/spreadsheets/d/1m89CVujrQe5LAFJ8-YAUCcNK950dUzMQPMJBxRtGCqs/edit#gid=0