Is this a correct way to calculate price for tokens with different decimals? - ethereum

I've tested this function on a few different examples and it seems to return proper numbers but I'm not sure if this is correct and will work for all cases (different decimals of A and B, different values of amountOfB and priceForA etc).
Is this the right way to compute price of token A that have fixed price in amount of B tokens?
function calculatePrice(uint8 decimalsA, uint256 priceForA, uint256 amountOfB) public {
return amountOfB * 10 ** decimalsA / priceForA;
}
Decimals for both tokens A and B can be equal or one can be higher or lower than the other.
For example
decimalsA = 6;
decimalsB = 18;
// priceForA is price for 1 (human readable) A token which is 0.01 (human readable) B tokens
// so in other words for 1000000 A you need to pay 10000000000000000 B
priceForA = 10000000000000000;
// this is the amount of B tokens that will be used to buy A tokens
amountOfB = 50000000000000000;
In this example we should get 5000000 as return value so 5A (human readable) tokens.

Related

Need help understanding solidity contract function swapback()

I have recently been delving into the code of a few contracts and have been commenting them myself to try and understand how they work as the entire field seems ironically black-box from a development perspective with most code copy and pasted from other contracts with 0 comments explaining what functions do.
As such I have seen a recurring function across multiple projects which I am struggling to get my head around. Namely shouldswapback() and swapback().
shouldSwapBack():
function shouldSwapBack() internal view returns (bool) {
return msg.sender != pair
&& !inSwap
&& swapEnabled
&& _balances[address(this)] >= swapThreshold;
}
My understanding of above is to return true only when the caller of the function is not the LP address, swapping is allowed and we are not in a swap currently and the balance of the contract is >= the predefined swap threshold.
2 questions regarding this:
1. In what context/circumstance is the msg.sender the LP contract?
2. Why is it important to only swap if the contract balance is greater than a certain percentage? What would be the effect if this was 0?
The actual swap function is longer:
function swapBack() internal swapping {
uint256 contractTokenBalance = balanceOf(address(this));
/* (contract balance * 3)/15/2 */
uint256 amountToLiquify = contractTokenBalance.mul(liquidityFee).div(totalFee).div(2);
uint256 amountToSwap = contractTokenBalance.sub(amountToLiquify);
/* set the address path for the BNB and Token addresses */
address[] memory path = new address[](2);
path[0] = address(this);
path[1] = WBNB;
/* note before balance of contract */
uint256 balanceBefore = address(this).balance;
/*swap tokens using PCS */
router.swapExactTokensForETHSupportingFeeOnTransferTokens(
amountToSwap, /*The amount of input tokens to send. */
0, /*The minimum amount of output tokens that must be received for the transaction not to revert. */
path, /*An array of token addresses. path.length must be >= 2. Pools for each consecutive pair of addresses must exist and have liquidity. */
address(this), /*Recipient of the BNB. */
block.timestamp /*Unix timestamp after which the transaction will revert. (Think set to one block?)*/
);
uint256 amountBNB = address(this).balance.sub(balanceBefore); /* get amount swapped by checking new contract balance vs beforeswap */
uint256 totalBNBFee = totalFee.sub(liquidityFee.div(2));
uint256 amountBNBLiquidity = amountBNB.mul(liquidityFee).div(totalBNBFee).div(2);
uint256 amountBNBMarketing = amountBNB.mul(marketingFee).div(totalBNBFee);
/* send marketing fee to marketing wallet */
(bool MarketingSuccess, /* bytes memory data */) = payable(marketingFeeReceiver).call{value: amountBNBMarketing, gas: 30000}("");
require(MarketingSuccess, "receiver rejected ETH transfer");
/* if we meet the liquidity threshold add to marketwallet */
if(amountToLiquify > 0){
router.addLiquidityETH{value: amountBNBLiquidity}(
address(this),
amountToLiquify,
0,
0,
marketingFeeReceiver,
block.timestamp
);
emit AutoLiquify(amountBNBLiquidity, amountToLiquify);
}
}
My understanding for this is that the contract tokens are swapped to BNB and then the fees are calculated and sent to the appropriate wallet. However with this I don't understand:
1.Why are the tokens going the contract and not the LP?
2.If all the tokens are swapped to ETH do none get swapped back?
3.What is the significance of adding to the LP, why would this be necessary? (if this wasn't done what would the affect be on the token? I have seen projects fail in the past because they didn't buyback liquidity but I'm struggling to understand what it would do apart from ease volatility?)
I understand that this a lengthy question and my issue seems to be less from a code perspective and more from a key concept one but if anyone could help me it would be great. I have looked online for courses but none seem to really delve into dex interactions, a lot just seem very basic token/NFT creation I haven't been able to find one tutorial on making a contract that has a working tax implementation so instead I've just been cross referencing a bunch of different contracts.
If anyone can point me in the right direction for a course or even better a tutor that would be great.
Thanks.

How to random mint a static number of NFT?

I'd like to mint these amount of tokens:
200 super
300 rare
500 common
But the mint process needs to be random, you can get a (super, rare, or common) but at the end of the process, it should be minted the same amount of 200 super, 300 rare, and 500 common.
The following code does the random but the final amount of tokens will be different from the beginning:
function safeMint(address to) public onlyOwner {
require(_tokenIdCounter.current() < totalSupply(), "There's no token to mint.");
require(mintCnt[msg.sender] < maxMintCntPerAddress, "One address can mint 1 tickets.");
if(mintPrice > 0) {
require(mintPrice == msg.value, "Mint price is not correct.");
address payable _to = payable(serviceAddress);
_to.transfer(mintPrice);
}
uint randomNumber = random(expectedTokenSupply - _tokenIdCounter.current());
for (uint256 i = 0; i < _tokenMetadata.length; i++) {
if(_tokenMetadata[i].amount <= randomNumber) {
_safeMint(to, _tokenIdCounter.current());
_setTokenURI(_tokenIdCounter.current(), _tokenMetadata[i].uri);
_tokenIdCounter.increment();
break;
}
}
}
function random(uint maxValue) internal returns (uint) {
return uint(keccak256(abi.encodePacked(block.timestamp, msg.sender, _tokenIdCounter.current()))) % maxValue;
}
First don't use block.timestamp or any block or blockchain data as a source of randomness, because it will cause the "randomness" be predictable or possible to be manipulated by minners, try with chainlink as a source of randomness, they have a good examples in their docs, if you want to have a fixed supply of each type of tokens you can have 3 variables to know how much of each one have been minted, and when you got the random number and all that you need you just need to apply some math, in this case you want the tokens to be 20% of super, 30% of rare and 50% of common, you only have to do the math you need to decide wich one will be minted, and in case of that type has already reach is max supply what will happend?

ERC20 Token: What is address(0)? And best practices for initial token distribution?

I have a pretty boilerplate test token that I'm going to use to support a DApp project. Key functions I have questions regarding are as follows:
constructor() {
name = "Test Token";
symbol = "TTKN";
decimals = 18;
_totalSupply = 1000000000000000000000000000000;
//WITHOUT DECIMALS = 1,000,000,000,000; should be 1 trillion
balances[msg.sender] = _totalSupply;
emit Transfer(address(0), msg.sender, _totalSupply);
}
function totalSupply() public override view returns (uint256) {
return _totalSupply - balances[address(0)];
}
First, a quick question about decimals and supply: did I set this up correctly to create 1 trillion of the TTKN token? And do I really need so many decimal places?
Second, what exactly is address(0)? My understanding of the constructor is that address(0) first transfers all the tokens to msg.sender, which is me, the person who deploys this contract.
And finally, what are the best practices for initially distributing the tokens? What I want is basically as follows:
a) Myself and a few other devs each get 1% of the initial supply
b) Our DApp, a separate smart contract, will get 50% of the initial supply, and will use this to reward users for interacting with our website/project
c) To accomplish a) and b), me, the contract deployer, should manually transfer these tokens as planned?
d) The rest of the coins... available to go on an exchange somehow (maybe out of scope of question)
So now that I've deployed this test token on remix and am getting a feel for how to transfer around the tokens, I want to understand the above points in relation to our project. Is my plan generally acceptable and feasible, and is it the case that as the initial owner I'm just making a bunch of transfer calls on the ETH mainnet eventually when I deploy?
did I set this up correctly to create 1 trillion of the TTKN token?
This is one of the correct ways. More readable would be also:
_totalSupply = 1000000000000 * 1e18;
or
// 10 to the power of
_totalSupply = 1000000000000 * (10 ** decimals);
^^ mind that this snippet performs a storage read (of the decimals variable) so it's more expensive gas-wise
a well as
_totalSupply = 1000000000000 ether;
^^ using the ether unit, an alias for * 1e18
what exactly is address(0)
If it's in the first param of the Transfer event, it means the tokens are minted. If it's in the second param, it means a burn of the tokens.
A token contract which creates new tokens SHOULD trigger a Transfer event with the _from address set to 0x0 when tokens are created.
Source: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-20.md#transfer-1
initially distributing the tokens
You can perform the distribution in the constructor. For the sake of simplicity, my example shows the "exchange" as a regular address managed by your team that will send the tokens to the exchange manually. But it's possible to list a token on a DEX automatically as well.
_totalSupply = 1000000000000 * 1e18;
address[3] memory devs = [address(0x123), address(0x456), address(0x789)];
address dapp = address(0xabc);
address exchange = address(0xdef);
// helper variable to calculate the remaining balance for the exchange
uint256 totalSupplyRemaining = _totalSupply;
// 1% for each of the devs
uint256 devBalance = _totalSupply / 100;
for (uint i = 0; i < 3; i++) {
balances[devs[i]] = devBalance;
emit Transfer(address(0x0), devs[i], devBalance);
totalSupplyRemaining -= devBalance;
}
// 50% for the DApp
uint256 dappBalance = _totalSupply / 2;
balances[dapp] = dappBalance;
emit Transfer(address(0x0), dapp, dappBalance);
totalSupplyRemaining -= dappBalance;
// the rest for the exchange
balances[exchange] = totalSupplyRemaining;
emit Transfer(address(0x0), exchange, totalSupplyRemaining);

Implement exponential decay function in Solidity

I am working on an ERC-721 token in which there is a built-in royalty feature, i.e., when a user issues a token, he chooses the royalty fee in ETH and he is entitled to that fee on every secondary market trade. Also, the more tokens a user holds belonging to a specific issuer, the lower the royalty fees and I need to implement an exponential decay function to calculate the discount.
M = e ^ (-x / 50)
where, e is Euler's number
so the discountedFee = M * standardFee
For instance, the royalty fee of a token is 0.01 ETH and in the contract, it will be represented as 10000000000000000 WEI. Now the buyer holds 10 tokens of that issuer and now he is about to buy his 11th token, so the royalty fee will be calculated as
M = e ^ (-10 / 50) = 0.818
discountedFee = M * standardFee = 0.0081 ETH
Can this function in Solidity? Thanks in advance.
Implementing a true exponential decay in solidity is pretty difficult, let alone expensive since the language doesn't support floating point arithmetic. This link might be helpful.
Personally, I approximate an exponential decay with a series of linear functions targeting a half-life in a certain time-period.
function getPrice(uint256 value, uint256 t, uint256 halfLife) public pure returns (uint256 price) {
value >>= (t / halfLife);
t %= halfLife;
price = value - value * t / halfLife / 2;
}
Half-life in my code was the number of seconds in two days.

How to reduce calculation of average to sub-sets in a general way?

Edit: Since it appears nobody is reading the original question this links to, let me bring in a synopsis of it here.
The original problem, as asked by someone else, was that, given a large number of values, where the sum would exceed what a data type of Double would hold, how can one calculate the average of those values.
There was several answers that said to calculate in sets, like taking 50 and 50 numbers, and calculating the average inside those sets, and then finally take the average of all those sets and combine those to get the final average value.
My position was that unless you can guarantee that all those values can be split into a number of equally sized sets, you cannot use this approach. Someone dared me to ask the question here, in order to provide the answer, so here it is.
Basically, given an arbitrary number of values, where:
I know the number of values beforehand (but again, how would your answer change if you didn't?`)
I cannot gather up all the numbers, nor can I sum them (the sum will be too big for a normal data type in your programming language)
how can I calculate the average?
The rest of the question here outlines how, and the problems with, the approach to split into equally sized sets, but I'd really just like to know how you can do it.
Note that I know perfectly well enough math to know that in math theory terms, calculating the sum of A[1..N]/N will give me the average, let's assume that there are reasons that it isn't just as simple, and I need to split up the workload, and that the number of values isn't necessarily going to be divisable by 3, 7, 50, 1000 or whatever.
In other words, the solution I'm after will have to be general.
From this question:
What is a good solution for calculating an average where the sum of all values exceeds a double’s limits?
my position was that splitting the workload up into sets is no good, unless you can ensure that the size of those sets are equal.
Edit: The original question was about the upper limit that a particular data type could hold, and since he was summing up a lot of numbers (count that was given as example was 10^9), the data type could not hold the sum. Since this was a problem in the original solution, I'm assuming (and this is a prerequisite for my question, sorry for missing that) that the numbers are too big to give any meaningful answers.
So, dividing by the total number of values directly is out. The original reason for why a normal SUM/COUNT solution was out was that SUM would overflow, but let's assume, for this question that SET-SET/SET-SIZE will underflow, or whatever.
The important part is that I cannot simply sum, I cannot simply divide by the number of total values. If I cannot do that, will my approach work, or not, and what can I do to fix it?
Let me outline the problem.
Let's assume you're going to calculate the average of the numbers 1 through 6, but you cannot (for whatever reason) do so by summing the numbers, counting the numbers, and then dividing the sum by the count. In other words, you cannot simply do (1+2+3+4+5+6)/6.
In other words, SUM(1..6)/COUNT(1..6) is out. We're not considering NULL's (as in database NULL's) here.
Several of the answers to that question alluded to being able to split the numbers being averaged into sets, say 3 or 50 or 1000 numbers, then calculating some number for that, and then finally combining those values to get the final average.
My position is that this is not possible in the general case, since this will make some numbers, the ones appearing in the final set, more or less valuable than all the ones in the previous sets, unless you can split all the numbers into equally sized sets.
For instance, to calculate the average of 1-6, you can split it up into sets of 3 numbers like this:
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 3 3 3 / \ 3 3 3 / <-- 3 because 3 numbers in the set
---------- -----------
2 2 <-- 2 because 2 equally sized groups
Which gives you this:
2 5
- + - = 3.5
2 2
(note: (1+2+3+4+5+6)/6 = 3.5, so this is correct here)
However, my point is that once the number of values cannot be split into a number of equally sized sets, this method falls apart. For instance, what about the sequence 1-7, which contains a prime number of values.
Can a similar approach, that won't sum all the values, and count all the values, in one go, work?
So, is there such an approach? How do I calculate the average of an arbitrary number of values in which the following holds true:
I cannot do a normal sum/count approach, for whatever reason
I know the number of values beforehand (what if I don't, will that change the answer?)
Well, suppose you added three numbers and divided by three, and then added two numbers and divided by two. Can you get the average from these?
x = (a + b + c) / 3
y = (d + e) / 2
z = (f + g) / 2
And you want
r = (a + b + c + d + e + f + g) / 7
That is equal to
r = (3 * (a + b + c) / 3 + 2 * (d + e) / 2 + 2 * (f + g) / 2) / 7
r = (3 * x + 2 * y + 2 * z) / 7
Both lines above overflow, of course, but since division is distributive, we do
r = (3.0 / 7.0) * x + (2.0 / 7.0) * y + (2.0 / 7.0) * z
Which guarantees that you won't overflow, as I'm multiplying x, y and z by fractions less than one.
This is the fundamental point here. Neither I'm dividing all numbers beforehand by the total count, nor am I ever exceeding the overflow.
So... if you you keep adding to an accumulator, keep track of how many numbers you have added, and always test if the next number will cause an overflow, you can then get partial averages, and compute the final average.
And no, if you don't know the values beforehand, it doesn't change anything (provided that you can count them as you sum them).
Here is a Scala function that does it. It's not idiomatic Scala, so that it can be more easily understood:
def avg(input: List[Double]): Double = {
var partialAverages: List[(Double, Int)] = Nil
var inputLength = 0
var currentSum = 0.0
var currentCount = 0
var numbers = input
while (numbers.nonEmpty) {
val number = numbers.head
val rest = numbers.tail
if (number > 0 && currentSum > 0 && Double.MaxValue - currentSum < number) {
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
currentSum = 0
currentCount = 0
} else if (number < 0 && currentSum < 0 && Double.MinValue - currentSum > number) {
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
currentSum = 0
currentCount = 0
}
currentSum += number
currentCount += 1
inputLength += 1
numbers = rest
}
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
var result = 0.0
while (partialAverages.nonEmpty) {
val ((partialSum, partialCount) :: rest) = partialAverages
result += partialSum * (partialCount.toDouble / inputLength)
partialAverages = rest
}
result
}
EDIT:
Won't multiplying with 2, and 3, get me back into the range of "not supporter by the data type?"
No. If you were diving by 7 at the end, absolutely. But here you are dividing at each step of the sum. Even in your real case the weights (2/7 and 3/7) would be in the range of manageble numbers (e.g. 1/10 ~ 1/10000) which wouldn't make a big difference compared to your weight (i.e. 1).
PS: I wonder why I'm working on this answer instead of writing mine where I can earn my rep :-)
If you know the number of values beforehand (say it's N), you just add 1/N + 2/N + 3/N etc, supposing that you had values 1, 2, 3. You can split this into as many calculations as you like, and just add up your results. It may lead to a slight loss of precision, but this shouldn't be an issue unless you also need a super-accurate result.
If you don't know the number of items ahead of time, you might have to be more creative. But you can, again, do it progressively. Say the list is 1, 2, 3, 4. Start with mean = 1. Then mean = mean*(1/2) + 2*(1/2). Then mean = mean*(2/3) + 3*(1/3). Then mean = mean*(3/4) + 4*(1/4) etc. It's easy to generalize, and you just have to make sure the bracketed quantities are calculated in advance, to prevent overflow.
Of course, if you want extreme accuracy (say, more than 0.001% accuracy), you may need to be a bit more careful than this, but otherwise you should be fine.
Let X be your sample set. Partition it into two sets A and B in any way that you like. Define delta = m_B - m_A where m_S denotes the mean of a set S. Then
m_X = m_A + delta * |B| / |X|
where |S| denotes the cardinality of a set S. Now you can repeatedly apply this to partition and calculate the mean.
Why is this true? Let s = 1 / |A| and t = 1 / |B| and u = 1 / |X| (for convenience of notation) and let aSigma and bSigma denote the sum of the elements in A and B respectively so that:
m_A + delta * |B| / |X|
= s * aSigma + u * |B| * (t * bSigma - s * aSigma)
= s * aSigma + u * (bSigma - |B| * s * aSigma)
= s * aSigma + u * bSigma - u * |B| * s * aSigma
= s * aSigma * (1 - u * |B|) + u * bSigma
= s * aSigma * (u * |X| - u * |B|) + u * bSigma
= s * u * aSigma * (|X| - |B|) + u * bSigma
= s * u * aSigma * |A| + u * bSigma
= u * aSigma + u * bSigma
= u * (aSigma + bSigma)
= u * (xSigma)
= xSigma / |X|
= m_X
The proof is complete.
From here it is obvious how to use this to either recursively compute a mean (say by repeatedly splitting a set in half) or how to use this to parallelize the computation of the mean of a set.
The well-known on-line algorithm for calculating the mean is just a special case of this. This is the algorithm that if m is the mean of {x_1, x_2, ... , x_n} then the mean of {x_1, x_2, ..., x_n, x_(n+1)} is m + ((x_(n+1) - m)) / (n + 1). So with X = {x_1, x_2, ..., x_(n+1)}, A = {x_(n+1)}, and B = {x_1, x_2, ..., x_n} we recover the on-line algorithm.
Thinking outside the box: Use the median instead. It's much easier to calculate - there are tons of algorithms out there (e.g. using queues), you can often construct good arguments as to why it's more meaningful for data sets (less swayed by extreme values; etc) and you will have zero problems with numerical accuracy. It will be fast and efficient. Plus, for large data sets (which it sounds like you have), unless the distributions are truly weird, the values for the mean and median will be similar.
When you split the numbers into sets you're just dividing by the total number or am I missing something?
You have written it as
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 3 3 3 / \ 3 3 3 /
---------- -----------
2 2
but that's just
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 6 6 6 / \ 6 6 6 /
so for the numbers from 1 to 7 one possible grouping is just
/ 1 2 3 \ / 4 5 6 \ / 7 \
| - + - + - | + | - + - + - | + | - |
\ 7 7 7 / \ 7 7 7 / \ 7 /
Average of x_1 .. x_N
= (Sum(i=1,N,x_i)) / N
= (Sum(i=1,M,x_i) + Sum(i=M+1,N,x_i)) / N
= (Sum(i=1,M,x_i)) / N + (Sum(i=M+1,N,x_i)) / N
This can be repeatedly applied, and is true regardless of whether the summations are of equal size. So:
Keep adding terms until both:
adding another one will overflow (or otherwise lose precision)
dividing by N will not underflow
Divide the sum by N
Add the result to the average-so-far
There's one obvious awkward case, which is that there are some very small terms at the end of the sequence, such that you run out of values before you satisfy the condition "dividing by N will not underflow". In which case just discard those values - if their contribution to the average cannot be represented in your floating type, then it is in particular smaller than the precision of your average. So it doesn't make any difference to the result whether you include those terms or not.
There are also some less obvious awkward cases to do with loss of precision on individual summations. For example, what's the average of the values:
10^100, 1, -10^100
Mathematics says it's 1, but floating-point arithmetic says it depends what order you add up the terms, and in 4 of the 6 possibilities it's 0, because (10^100) + 1 = 10^100. But I think that the non-commutativity of floating-point arithmetic is a different and more general problem than this question. If sorting the input is out of the question, I think there are things you can do where you maintain lots of accumulators of different magnitudes, and add each new value to whichever one of them will give best precision. But I don't really know.
Here's another approach. You're 'receiving' numbers one-by-one from some source, but you can keep track of the mean at each step.
First, I will write out the formula for mean at step n+1:
mean[n+1] = mean[n] - (mean[n] - x[n+1]) / (n+1)
with the initial condition:
mean[0] = x[0]
(the index starts at zero).
The first equation can be simplified to:
mean[n+1] = n * mean[n] / (n+1) + x[n+1]/(n+1)
The idea is that you keep track of the mean, and when you 'receive' the next value in your sequence, you figure out its offset from the current mean, and divide it equally between the n+1 samples seen so far, and adjust your mean accordingly. If your numbers don't have a lot of variance, your running mean will need to be adjusted very slightly with the new numbers as n becomes large.
Obviously, this method works even if you don't know the total number of values when you start. It has an additional advantage that you know the value of the current mean at all times. One disadvantage that I can think of is the it probably gives more 'weight' to the numbers seen in the beginning (not in a strict mathematical sense, but because of floating point representations).
Finally, all such calculations are bound to run into floating-point 'errors' if one is not careful enough. See my answer to another question for some of the problems with floating point calculations and how to test for potential problems.
As a test, I generated N=100000 normally distributed random numbers with mean zero and variance 1. Then I calculated their mean by three methods.
sum(numbers) / N, call it m1,
my method above, call it m2,
sort the numbers, and then use my method above, call it m3.
Here's what I found: m1 − m2 ∼ −4.6×10−17, m1 − m3 ∼ −3×10−15, m2 − m3 ∼ −3×10−15. So, if your numbers are sorted, the error might not be small enough for you. (Note however that even the worst error is 10−15 parts in 1 for 100000 numbers, so it might be good enough anyway.)
Some of the mathematical solutions here are very good. Here's a simple technical solution.
Use a larger data type. This breaks down into two possibilities:
Use a high-precision floating point library. One who encounters a need to average a billion numbers probably has the resources to purchase, or the brain power to write, a 128-bit (or longer) floating point library.
I understand the drawbacks here. It would certainly be slower than using intrinsic types. You still might over/underflow if the number of values grows too high. Yada yada.
If your values are integers or can be easily scaled to integers, keep your sum in a list of integers. When you overflow, simply add another integer. This is essentially a simplified implementation of the first option. A simple (untested) example in C# follows
class BigMeanSet{
List<uint> list = new List<uint>();
public double GetAverage(IEnumerable<uint> values){
list.Clear();
list.Add(0);
uint count = 0;
foreach(uint value in values){
Add(0, value);
count++;
}
return DivideBy(count);
}
void Add(int listIndex, uint value){
if((list[listIndex] += value) < value){ // then overflow has ocurred
if(list.Count == listIndex + 1)
list.Add(0);
Add(listIndex + 1, 1);
}
}
double DivideBy(uint count){
const double shift = 4.0 * 1024 * 1024 * 1024;
double rtn = 0;
long remainder = 0;
for(int i = list.Count - 1; i >= 0; i--){
rtn *= shift;
remainder <<= 32;
rtn += Math.DivRem(remainder + list[i], count, out remainder);
}
rtn += remainder / (double)count;
return rtn;
}
}
Like I said, this is untested—I don't have a billion values I really want to average—so I've probably made a mistake or two, especially in the DivideBy function, but it should demonstrate the general idea.
This should provide as much accuracy as a double can represent and should work for any number of 32-bit elements, up to 232 - 1. If more elements are needed, then the count variable will need be expanded and the DivideBy function will increase in complexity, but I'll leave that as an exercise for the reader.
In terms of efficiency, it should be as fast or faster than any other technique here, as it only requires iterating through the list once, only performs one division operation (well, one set of them), and does most of its work with integers. I didn't optimize it, though, and I'm pretty certain it could be made slightly faster still if necessary. Ditching the recursive function call and list indexing would be a good start. Again, an exercise for the reader. The code is intended to be easy to understand.
If anybody more motivated than I am at the moment feels like verifying the correctness of the code, and fixing whatever problems there might be, please be my guest.
I've now tested this code, and made a couple of small corrections (a missing pair of parentheses in the List<uint> constructor call, and an incorrect divisor in the final division of the DivideBy function).
I tested it by first running it through 1000 sets of random length (ranging between 1 and 1000) filled with random integers (ranging between 0 and 232 - 1). These were sets for which I could easily and quickly verify accuracy by also running a canonical mean on them.
I then tested with 100* large series, with random length between 105 and 109. The lower and upper bounds of these series were also chosen at random, constrained so that the series would fit within the range of a 32-bit integer. For any series, the results are easily verifiable as (lowerbound + upperbound) / 2.
*Okay, that's a little white lie. I aborted the large-series test after about 20 or 30 successful runs. A series of length 109 takes just under a minute and a half to run on my machine, so half an hour or so of testing this routine was enough for my tastes.
For those interested, my test code is below:
static IEnumerable<uint> GetSeries(uint lowerbound, uint upperbound){
for(uint i = lowerbound; i <= upperbound; i++)
yield return i;
}
static void Test(){
Console.BufferHeight = 1200;
Random rnd = new Random();
for(int i = 0; i < 1000; i++){
uint[] numbers = new uint[rnd.Next(1, 1000)];
for(int j = 0; j < numbers.Length; j++)
numbers[j] = (uint)rnd.Next();
double sum = 0;
foreach(uint n in numbers)
sum += n;
double avg = sum / numbers.Length;
double ans = new BigMeanSet().GetAverage(numbers);
Console.WriteLine("{0}: {1} - {2} = {3}", numbers.Length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
for(int i = 0; i < 100; i++){
uint length = (uint)rnd.Next(100000, 1000000001);
uint lowerbound = (uint)rnd.Next(int.MaxValue - (int)length);
uint upperbound = lowerbound + length;
double avg = ((double)lowerbound + upperbound) / 2;
double ans = new BigMeanSet().GetAverage(GetSeries(lowerbound, upperbound));
Console.WriteLine("{0}: {1} - {2} = {3}", length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
}