I'd like to ask about my program bcs it doesn't work correctly. I want to recall a set of variable in two different Sequence Array. Here is my code.
// Array of Arrays
var SequenceGo:Array =
\[
{dt:dt1, P:P1, s0:s01, s:s1},
{dt:dt2, P:P2, s0:s02, s:s2},
{dt:dt3, P:P3, s0:s03, s:s3},
{dt:dt4, P:P4, s0:s04, s:s4},
{dt:dt5, P:P5, s0:s05, s:s5},
{dt:dt6, P:P6, s0:s06, s:s6},
{dt:dt7, P:P7, s0:s07, s:s7},
{dt:dt8, P:P8, s0:s08, s:s8},
{dt:dt9, P:P9, s0:s09, s:s9},
{dt:dt10, P:P10, s0:s010, s:s10},
\];
var SequenceBack:Array =
\[
{dtback:dt10back, P:P10, s0:s010, sback:s10back},
{dtback:dt9back, P:P9, s0:s09, sback:s9back},
{dtback:dt8back, P:P8, s0:s08, sback:s8back},
{dtback:dt7back, P:P7, s0:s07, sback:s7back},
{dtback:dt6back, P:P6, s0:s06, sback:s6back},
{dtback:dt5back, P:P5, s0:s05, sback:s5back},
{dtback:dt4back, P:P4, s0:s04, sback:s4back},
{dtback:dt3back, P:P3, s0:s03, sback:s3back},
{dtback:dt2back, P:P2, s0:s02, sback:s2back},
{dtback:dt1back, P:P1, s0:s01, sback:s1back}
\];
function onNext(index:int = 0):void
{
if (index >= SequenceGo.length)
{
return;
}
var aDataGo:Object = SequenceGo[index];
var aDataBack:Object = SequenceBack[index];
//variables
F = s_teganganst.value;
m = s_masjenst.value/10000;
v = Math.sqrt(F/m);
tp = 5000/v;
f = s_frekuensist.value;
w = 2*Math.PI*f;
aDataGo.dt += t;
aDataGo.s = aDataGo.s0 - A * Math.sin(w * aDataGo.dt);
aDataGo.P.y = aDataGo.s;
if(P10.y < 607){
aDataBack.dtback += t;
aDataBack.sback = - A * Math.sin(w * aDataBack.dtBack);
aDataBack.P.y = aDataGo.s + aDataBack.sback;
}
setTimeout(onNext, tp, index + 1);
}
Actually, code
aDataBack.P.y = aDataGo.s + aDataBack.sback;
is not a fit code for the animation because aDataBack is ordered inversely from aDataGo (we have to stay this inverse order for the proper animation in my program). I want to recall the variables based on its number, so each variable will match with another variable. For example,
P1.y = s1 + s1back;
P2.y = s2 + s2back;
P3.y = s3 + s3back;
P4.y = s4 + s4back;
//and so on
I've tried the code above, but it also doesn't work. Any other expression for calling some couples of variables just like my code above? Thanks!
I want to recall the variables based on its number, so each variable will match with another variable
Ok, there are two options.
Option one, simple and straightforward: compose a method to find the correspondent back object on spot:
function findBack(P:Object):Object
{
for each (var aDataBack:Object in SequenceBack)
{
if (aDataBack.P == P)
{
return aDataBack;
}
}
}
So, that piece of code would be
var aDataGo:Object = SequenceGo[index];
var aDataBack:Object = findBack(aDataGo.P);
The possible problem here is the performance. It is fine on the scale of 10 or 100 objects, but as (I suppose) you devise a particle system, the object count easily scales to thousands, and the amount of loop-searching might become cumbersome.
So I advise to prepare a pre-indexed hash so that you won't need to search each single time.
var SequenceBack:Array =
[
// ...
];
// Dictionary is a storage of key:value data, just like Object,
// but Dictionary allows Object keys.
var HashBack:Dictionary = new Dictionary;
for each (var aDataBack:Object in SequenceBack)
{
HashBack[aDataBack.P] = aDataBack;
}
I encourage you to read more about Dictionary class.
And so that piece of code would be
var aDataGo:Object = SequenceGo[index];
var aDataBack:Object = HashBack[aDataGo.P];
Introduction
I am trying to obtain the share difficulty sent from my mining rig to the pool; I have captured the following stratum data:
Work from pool sent to miner:
{"id":0,"jsonrpc":"2.0","result":["0xbafb9b219bb51e0b47abf28e6044322a0f181926496528bbf137eef718a6623b","0x78255e703ddc0c08a70aba14dafca75ff40401240c1622d2f80b398919451e14","0x7e00000007e00000007e00000007e00000007e00000007e00000007e","0xe65d91"]}
Miner with share solution, sent to pool:
{"id":24140800,"method":"eth_submitWork","params":["0x605f42c3367bd64a","0xbafb9b219bb51e0b47abf28e6044322a0f181926496528bbf137eef718a6623b","0x6d8320a2471edc0ef2c8a41fa1d6d9e5e50fbeff8771ed1e58960765b0e7131f"],"worker":"rig"}
My mining software [T-Rex miner] has reported that this share difficulty was 18.33 G, not enough for an actual block solution, but valid as a share for the pool.
My approach
After many hours of reading and investigating, I know that to calculate the share diff (18.33 G) what I need is the blockhash and nonce and then Ethash(nonce + blockhash) it, but I am failing to obtain the correct share value.
This is what I have come up to using Node.js and ethereumjs
const Ethash = require('#ethereumjs/ethash').default;
const { MemoryLevel } = require('memory-level');
const cacheDB = new MemoryLevel();
const ethash = new Ethash(cacheDB);
var seed = '0x78255e703ddc0c08a70aba14dafca75ff40401240c1622d2f80b398919451e14';
var blockheader = '0xbafb9b219bb51e0b47abf28e6044322a0f181926496528bbf137eef718a6623b';
var nonce = '0x605f42c3367bd64a';
var seedBuffer = Buffer.from(seed, 'hex');
var nonceBuffer = Buffer.from(nonce, 'hex');
var blockheaderBuffer = Buffer.from(blockheader, 'hex');
ethash.mkcache(1024, seedBuffer);
var result = ethash.run(blockheaderBuffer, nonceBuffer, 1024 * 32);
var dHexHash = '0x' + result.hash.toString('hex');
console.log(dHexHash / 1000000 + ' M');
var dHexMix = '0x' + result.mix.toString('hex');
console.log(dHexMix);
The problem(s)
• Firstly, ethash.run(val, nonce, fullSize?) as stated here returns a hash and a mix, I do not know which one I should use to compute the difficulty, so I used the two - neverless none of the two gave me a correct value.
The third argument - fullSize - I also don't know what corresponds to, I used 503 since that is the epoch when the share was computed, but I might be wrong - also changing this to another random number, the hashes change completely so it has to be a very correct value.
• Secondly and last, mkcache(cacheSize, seed) needs a - cacheSize - that I also don't know to what corresponds to, asumed to be epoch when share was computed, it may be wrong too.
Thanks for your help!
After 2 days of reading and investigating, I made the code work.
The main issue is that there is no clear documentation on how to use ethereumjs
ethash.run(val, nonce, fullSize?)
mkcache(cacheSize, seed)
fullSize corresponds to get_cachesize and cacheSize corresponds to get_datasize, references for both are here.
This is my final working test code:
import { get_datasize, get_cachesize } from './dag.js';
import Ethash from '#ethereumjs/ethash';
import { MemoryLevel } from 'memory-level';
const ethash = new Ethash.default(new MemoryLevel());
const block_number = 0xe68326
const nonce = '922349db156ce3e0'
const header = 'dc4db678c8a6bf1b521320ed0b03c77d5d460b3eca5730b6d8f058647dccb158'
const seed = '78255e703ddc0c08a70aba14dafca75ff40401240c1622d2f80b398919451e14'
const mix_digest = '0x045f0cb0561619fecfe21893a17ebe409fee608211630a09ce085ff899910570'
var seedBuffer = Buffer.from(seed, 'hex');
var nonceBuffer = Buffer.from(nonce, 'hex');
var headerBuffer = Buffer.from(header, 'hex');
var cache = ethash.mkcache(get_cachesize(block_number), seedBuffer);
var result = ethash.run(headerBuffer, nonceBuffer, get_datasize(block_number));
var hash = '0x' + result.hash.toString('hex');
var mix = '0x' + result.mix.toString('hex');
console.log(diff2(hash));
console.log(mix);
if(mix == mix_digest)
console.log('Valid PoW');
else
console.log('Invalid PoW, try again');
function diff2(d) {
return Math.ceil((Math.pow(2, 256) / parseInt(d, 16)));
}
// prints 'Valid PoW'
Am I running an unnecessary risk of creating an id that is not unique? I'm trying to generate a unique, random id of alphanumeric characters. This ID will be used in the primary key for the database record.
const idSeed: string =
crypto.randomBytes(16).toString('base64') +
'' +
Date.now();
const orderId: string = Buffer.from(idSeed)
.toString('base64')
.replace(/[\/\+\=]/g, '');
First off, I recommend that you get rid of the .replace(/[\/\+\=]/g, '') as that is losing randomness and, in fact, mapping some unique orderIds that differ only in those characters to be the same.
My recommendation would be to use a base58 encoder base-x that will directly encode to what you want. This encoder library lets you pass in the exact character set you want to use for encoding and it just uses that.
Here's my suggested code you can insert:
const base58Encode = require('base-x')('123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz').encode;
And, then where you create the orderID, change to this:
const idSeed = crypto.randomBytes(16)
const orderId = base58Encode(idSeed);
I don't know about the probability of a dup (you'd need a crypto/statistician for that), but I ran 10,000,000 orderId values without a dup and I repeated that 10 times and still didn't get a dup. Obviously, that doesn't mean it can't happen, but I'm doing this rapid fire too where Date.now() might not even be much different. I couldn't run it more than 10,000,000 times because I run out of memory trying to store all the prior orderId values in a Set object to check for dups. You could increase memory for nodejs and run it with even higher values or put it in a shell script and run it over and over and over again.
Here's my dup checker program if you want to run it yourself over and over:
const crypto = require('crypto');
function addCommas(str) {
var parts = (str + "").split("."),
main = parts[0],
len = main.length,
output = "",
i = len - 1;
while(i >= 0) {
output = main.charAt(i) + output;
if ((len - i) % 3 === 0 && i > 0) {
output = "," + output;
}
--i;
}
// put decimal part back
if (parts.length > 1) {
output += "." + parts[1];
}
return output;
}
let set = new Set();
const numToTry = 10_000_000;
const debugMultiple = 100_000;
for (let i = 0; i < numToTry; i++) {
if (i !== 0 && i % debugMultiple === 0) {
console.log(`Attempt #${addCommas(i)}`);
}
const idSeed = crypto.randomBytes(16).toString('base64') + '' + Date.now();
const orderId = Buffer.from(idSeed).toString('base64').replace(/[\/\+\=]/g, '');
//console.log(orderId);
if (set.has(orderId)) {
console.log(`Found conflict after ${addCommas(i)} attempts`);
console.log(`Conflicting orderId = ${orderId}`);
process.exit(1);
}
set.add(orderId);
}
console.log(`No dups found after ${addCommas(numToTry)} attempts`);
Before spending a lot of time on this, I would investigate your database to see if it will generate a unique key for you that could work as the orderId. This is a common database problem.
Here's a newer version that I was able to run up to 1,000,000,000 ids through. Still no conflicts. Because there's no way I could have a giant Set object of 1,000,000,000 ids in memory, I brainstormed about a number of ways to do it. I thought about using a redis server and storing the ids in there since it can use a lot more memory. But, then I came up with a disk-based solution that can scale as high as you want. Here's the basic idea:
One of your orderId values looks like this:
zz6h6q6oRELJXmh4By4NUw1587006335064`
When I generate a new orderId, if I can separate it out into a disk-based "bucket" that contains only ids with the same beginning characters, then I can split all the ids among many different files.
The idea is that if each id that starts with the same two characters is stored in the same file, then no other id in any other file could possibly match the ids in that file.
You can then do your work in two passes. The first pass generates 1,000,000,000 ids and as they are generated, they are written out to an appropriate bucket file based on the characters the id starts with.
After all the ids are generated and written to their appropriate bucket files, the second pass is to iterate through each of the bucket files one at a time, load all the ids into a Set object and see if any conflict. If none match, clear that Set and go onto the next file. This lets you do the memory constrained part (dealing with a Set object) in pieces to use less memory for big numbers of ids.
So, then the question is how to divide the ids up into bucket files? Since each byte in the base64 id value represents up to 64 possible values, if you use just the first two characters of the id to determine the bucket, you will get up to 64*64=4096 buckets. For some reason (which must have to do with how crypto.randomBytes(16) works), I only found ~3800 buckets actually occurred in the actual orderId values.
But, if you split 1,000,000,000 values into 3800 buckets, you get about 263,000 ids per bucket. We already showed that we could easily process 15,000,000 ids in memory before, so this should be more than enough buckets to be able to process each bucket in memory one at a time. In fact, if I were patient enough, we could probably go to 10,000,000,000 with buckets based on just the first two characters.
If you wanted more buckets, they could be based on the first three characters, though then you start getting too many files for a single directory and you have to start splitting files among directories which can be done, but complicates things.
So, I need to create a bucket filename that's based on the first two characters of the id. The ids are case sensitive (base64 uses upper and lower case to represent different values). My Windows file system is case insensitive so I can't just directly use the first two letters as the filename. So, I created a simple algorithm, that takes a two character mixed case prefix and makes it into a four character lowercase name. It maps a lowercase "a" to "a_" and a non-lowercase character like "B" to "bb". So, a lowercase value is followed by a _ and an uppercase value is follows by a second copy of itself. So, you'd have id mappings like this:
"ab" => "a_b_"
"AB" => "aabb"
"aB" => "a_BB"
"Ab" => "aab_"
Non-alpha characters (like numbers) just map to a doubling of themselves just like any non-lowercase characters. So, with this, I can get an id value, grab the first two characters, see what filename it belongs to and append it to that file.
For performance reasons, I created a Bucket class which maintains a cache of ids waiting to be written in memory. When the cache inside a particular bucket gets to a certain length (which I now have set to 3000), I append them all out to the file at once and clear the bucket cache. When I'm done generating all the ids, I iterate through all the buckets and flush out any remaining ids. With this kind of write caching, the generation of ids is mostly CPU bound, not disk bound. Disk utilization runs around 30%. One core of the CPU is pegged during id generation. This could probably be sped up with some WorkerThreads.
So, once all the ids are written to bucket files and nothing is in memory at all, it's time to read through each of the bucket files one at a time, load all their ids into a Set and see if there are any conflicts. Each bucket file is a line separated list of ids that all start with same prefix like this:
zzoexm2FE8DIrHnXpp8qw1587003338798
zzuP6LpusKIMeYrfl0WJnQ1587003338885
zz1itmTqA3yaFNo1KFUhg1587003338897
zz3TEFeqH965OTFCrFTjJQ1587003338904
zz8XQKvq11fCqn9kB4O2A1587003338904
zzaKMTFPct5ls7WW3YmcQ1587003338927
zzyX3htzIqi4zOq4Cxdg1587003338928
zzoHu6vIHMEgNMVY46Qw1587003338962
So, I just read a given bucket file, line by line, check each id against a Set for that bucket file. If it's already in the set, there's a conflict. Output that conflict and abort. If it's not the Set, add it to the Set and continue with the rest of the ids in that bucket file. Since this bucket file contains all the ids that start with the same two characters, no other id in any other bucket file can conflict with these so you can just compare all these ids vs each other.
The reading of the bucket files is heavily disk bound. When running 1,000,000,000 ids into the 3844 bucket files, each bucket file is about 5MB which is 22GB of data. Each file has to be read and parsed into lines and then each id added to the Set.
I tried a couple different mechanisms for reading the files line by line and found them quite slow. I started with the readLine interface which lets you iterate through line by line via a readStream. It was sloooow. Then, I just read the whole file into memory with fs.readFile() into a giant string and then called .split("\n") on it to break it into lines. This was actually better than readLine, but still slow. I theorized that there were just too many copies of the data which meant the garbage collector was having to work at lot.
So, finally I wrote my own version of readFile that reads the entire file into a reusable Buffer and splits it into lines by parsing the binary buffer directly. This saved at least a couple copies of the data along the way and saved a lot of GC work. It wasn't fast, but it was faster. Reusing the buffer also saved me a lot of separate 5MB allocations.
The first pass (generating the ids) is CPU bound. I've theorized I could speed that up quite a bit by starting up a number of Worker Threads (probably like 6 since I have an 8-core CPU) and letting them crunch on generating the ids. I would dole out 1/6 of the quantity to each Worker Thread and when they accumulated 1000 or so, they'd message those 1000 back to the main thread which would insert them in the right buckets. But, before I adventure into using WorkerThreads, I need to do some benchmarking to see how much of the total time of the first pass is in the crypto.randomBytes() function vs. elsewhere to make sure it would be worth it.
The second pass it totally disk bound, but the actual disk throughput is horrible (like 60MB/s). Either my disk really sucks, nodejs isn't very good at this type of file I/O or there's just a lot of overhead in handling 3800 large files (read directory entry, seek to disk for first sector, read as many sequential sectors as you can, seek again, etc...). I could try it on my fastest SSD, but I don't really want to go writing 20GB to my SSD everytime I play with this.
I played with increasing the UV_THREADPOOL_SIZE thinking that maybe nodejs was queuing too many reads/writes. But, performance actually got worse when I increased the thread pool size. I guess it's default of 4 is more than enough to keep one disk controller plenty busy. Anything more than that and you're just asking the disk head to jump around between different files when it would be more efficient to read all of one file, then go to the next file and so on.
While the second pass is mostly disk bound, there's still about 30% of the time spent in non-disk related stuff (based on some high-res timers I inserted). So, if it didn't cause too much harm with disk contention, it's possible you could spread the processing of the different bucket files out among a group of WorkerThreads. You would at least get parallelism on the CPU part of that process. You would likely get more disk contention though so I'm not sure if it would help.
Lastly, bucket files could be split among drives and, even ideally among separate SATA controllers. I have plenty of drives and a couple SATA controllers to try that, but then it gets pretty specific to my system.
Here's the code for the bucket system.
// unique-test.js
const crypto = require('crypto');
const readline = require('readline');
const fs = require('fs');
const fsp = fs.promises;
const path = require('path');
const {fastReadFileLines} = require('./fast-read-file.js');
function delay(t, v) {
return new Promise(resolve => {
setTimeout(resolve, t, v);
})
}
function addCommas(str) {
var parts = (str + "").split("."),
main = parts[0],
len = main.length,
output = "",
i = len - 1;
while(i >= 0) {
output = main.charAt(i) + output;
if ((len - i) % 3 === 0 && i > 0) {
output = "," + output;
}
--i;
}
// put decimal part back
if (parts.length > 1) {
output += "." + parts[1];
}
return output;
}
// make a unique filename using first several letters of
// the string. Strings are case sensitive, bucket filenames
// cannot be so it has to be case neutralized while retaining
// uniqueness
function makeBucketKey(str) {
let piece = str.substr(0,2);
let filename = [];
// double up each character, but
for (let ch of piece) {
filename.push(ch);
if (ch >= 'a' && ch <= 'z') {
filename.push("_")
} else {
filename.push(ch);
}
}
return filename.join("").toLowerCase();
}
// this value times the number of total buckets has to fit in memory
const bucketCacheMax = 3000;
class Bucket {
constructor(filename, writeToDisk = true) {
this.items = [];
this.filename = filename;
this.cnt = 0;
this.writeToDisk = writeToDisk;
// We dither the bucketCacheMax so that buckets aren't all trying to write at the same time
// After they write once (and are thus spread out in time), then they will reset to full cache size
let dither = Math.floor(Math.random() * bucketCacheMax) + 10;
if (Math.random() > 0.5) {
dither = -dither;
}
this.bucketCacheMax = bucketCacheMax + dither;
}
// add an item to cache, flush to disk if necessary
async add(item) {
++this.cnt;
this.items.push(item);
if (this.items.length > this.bucketCacheMax) {
// the dithered cache size is only used on the first write
// to spread out the writes. After that, we want a full cache size
let priorBucketCacheMax = this.bucketCacheMax;
this.bucketCacheMax = bucketCacheMax;
await this.flush();
}
}
// write any cached items to disk
async flush() {
if (this.writeToDisk && this.items.length) {
let data = this.items.join("\n") + "\n";
this.items.length = 0;
if (this.flushPending) {
throw new Error("Can't call flush() when flush is already in progress");
}
function flushNow() {
this.flushPending = true;
return fsp.appendFile(this.filename, data).finally(() => {
this.flushPending = false;
});
}
// we write to disk with retry because we once go EBUSY (perhaps from a backup program)
let retryCntr = 0;
const retryMax = 10;
const retryDelay = 200;
const retryBackoff = 200;
let lastErr;
function flushRetry() {
if (retryCntr > retryMax) {
throw lastErr;
}
return flushNow.call(this).catch(err => {
lastErr = err;
console.log("flushNow error, retrying...", err);
return delay(retryDelay + (retryCntr++ * retryBackoff)).then(() => {
return flushRetry.call(this);
});
});
}
return flushRetry.call(this);
}
this.items.length = 0;
}
delete() {
return fsp.unlink(this.filename);
}
get size() {
return this.cnt;
}
}
class BucketCollection {
constructor(dir, writeToDisk = true) {
// map key is bucketID, value is bucket object for that key
this.buckets = new Map();
this.dir = dir;
}
add(key, data) {
let bucket = this.buckets.get(key);
if (!bucket) {
let filename = path.join(this.dir, key);
bucket = new Bucket(filename, writeToDisk);
this.buckets.set(key, bucket);
}
return bucket.add(data);
}
async flush() {
// this could perhaps be sped up by doing 4 at a time instead of serially
for (let bucket of this.buckets.values()) {
await bucket.flush();
}
}
async delete() {
// delete all the files associated with the buckets
for (let bucket of this.buckets.values()) {
await bucket.delete();
}
}
get size() {
return this.buckets.size;
}
getMaxBucketSize() {
let max = 0;
for (let bucket of this.buckets.values()) {
max = Math.max(max, bucket.size);
}
return max;
}
}
// program options
let numToTry = 100_000;
let writeToDisk = true;
let cleanupBucketFiles = true;
let skipAnalyze = false;
let analyzeOnly = false;
// -nodisk don't write to disk
// -nocleanup erase bucket files when done
// -analyzeonly analyze files in bucket directory only
if (process.argv.length > 2) {
let args = process.argv.slice(2);
for (let arg of args) {
arg = arg.toLowerCase();
switch(arg) {
case "-nodisk":
writeToDisk = false;
break;
case "-nocleanup":
cleanupBucketFiles = false;
break;
case "-skipanalyze":
skipAnalyze = true;
break;
case "-analyzeonly":
analyzeOnly = true;
break;
default:
if (/[^\d,]/.test(arg)) {
console.log(`Unknown argument ${arg}`);
process.exit(1);
} else {
numToTry = parseInt(arg.replace(/,/g, ""), 10);
}
}
}
}
let bucketDir = path.join(__dirname, "buckets");
let collection = new BucketCollection(bucketDir, writeToDisk);
console.log(`Running ${addCommas(numToTry)} random ids`);
const debugMultiple = 100_000;
async function analyze() {
let cntr = 0;
const cntrProgress = 10;
const cntrProgressN = 10n;
let buffer = null;
let times = [];
async function processFile(file) {
if (cntr !== 0 && cntr % cntrProgress === 0) {
let sum = 0n;
for (let i = 0; i < cntrProgress; i++) {
sum += times[i];
}
console.log(`Checking bucket #${cntr}, Average readFileTime = ${sum / cntrProgressN}`);
times.length = 0;
}
++cntr;
let set = new Set();
let startT = process.hrtime.bigint();
let buffer = null;
let result = await fastReadFileLines(file, buffer);
let data = result.lines;
// keep reusing buffer which may have been made larger since last time
buffer = result.buffer;
//let data = (await fsp.readFile(file, "utf8")).split("\n");
let afterReadFileT = process.hrtime.bigint();
for (const lineData of data) {
let line = lineData.trim();
if (line) {
if (set.has(line)) {
console.log(`Found conflict on ${data}`);
} else {
set.add(line);
}
}
}
let loopT = process.hrtime.bigint();
let divisor = 1000n;
let readFileTime = (afterReadFileT - startT) / divisor;
times.push(readFileTime);
// console.log(`readFileTime = ${readFileTime}, loopTime = ${(loopT - afterReadFileT) / divisor}`);
/*
let rl = readline.createInterface({input:fs.createReadStream(file), crlfDelay: Infinity});
for await (const line of rl) {
let data = line.trim();
if (data) {
if (set.has(data)) {
console.log(`Found conflict on ${data}`);
} else {
set.add(data);
}
}
}
*/
}
if (analyzeOnly) {
let files = await fsp.readdir(bucketDir);
for (let file of files) {
let fullPath = path.join(bucketDir, file)
await processFile(fullPath);
}
} else {
for (let bucket of collection.buckets.values()) {
await processFile(bucket.filename);
}
}
}
async function makeRandoms() {
let start = Date.now();
if (analyzeOnly) {
return analyze();
}
for (let i = 0; i < numToTry; i++) {
if (i !== 0 && i % debugMultiple === 0) {
console.log(`Attempt #${addCommas(i)}`);
}
const idSeed = crypto.randomBytes(16).toString('base64') + '' + Date.now();
const orderId = idSeed.toString('base64').replace(/[\/\+\=]/g, '');
//console.log(orderId);
let bucketKey = makeBucketKey(orderId);
await collection.add(bucketKey, orderId);
}
console.log(`Total buckets: ${collection.size}, Max bucket size: ${collection.getMaxBucketSize()}`);
//console.log(`No dups found after ${addCommas(numToTry)} attempts`);
await collection.flush();
let delta = Date.now() - start;
console.log(`Run time for creating buckets: ${addCommas(delta)}ms, ${addCommas((delta / numToTry) * 1000)}ms per thousand`);
if (!skipAnalyze) {
console.log("Analyzing buckets...")
await analyze();
}
if (cleanupBucketFiles) {
console.log("Cleaning up buckets...")
await collection.delete();
}
}
makeRandoms();
And, here's a dependent file (goes in the same directory) for my faster readfile function:
// fast-read-file.js
const fsp = require('fs').promises;
async function fastReadFile(filename, buffer = null) {
let handle = await fsp.open(filename, "r");
let bytesRead;
try {
let stats = await handle.stat();
if (!buffer || buffer.length < stats.size) {
buffer = Buffer.allocUnsafe(stats.size);
}
// clear any extra part of the buffer so there's no data leakage
// from a previous file via the shared buffer
if (buffer.length > stats.size) {
buffer.fill(0, stats.size);
}
let ret = await handle.read(buffer, 0, stats.size, 0);
bytesRead = ret.bytesRead;
if (bytesRead !== stats.size) {
// no data leaking out
buffer.fill(0);
throw new Error("bytesRead not full file size")
}
} finally {
handle.close().catch(err => {
console.log(err);
});
}
return {buffer, bytesRead};
}
async function fastReadFileLines(filename, buf = null) {
const {bytesRead, buffer} = await fastReadFile(filename, buf);
let index = 0, targetIndex;
let lines = [];
while (index < bytesRead && (targetIndex = buffer.indexOf(10, index)) !== -1) {
// the buffer may be larger than the actual file data
// so we have to limit our extraction of data to only what was in the actual file
let nextIndex = targetIndex + 1;
// look for CR before LF
if (buffer[targetIndex - 1] === 13) {
--targetIndex;
}
lines.push(buffer.toString('utf8', index, targetIndex));
index = nextIndex;
}
// check for data at end of file that doesn't end in LF
if (index < bytesRead) {
lines.push(buffer.toString('utf8', index, bytesRead));
}
return {buffer, lines};
}
module.exports = {fastReadFile, fastReadFileLines};
// if called directly from command line, run this test function
// A file of ids named "zzzz" must exist in this directory
if (require.main === module) {
let buffer = Buffer.alloc(1024 * 1024 * 10, "abc\n", "utf8");
fastReadFileLines("zzzz", buffer).then(result => {
let lines = result.lines;
console.log(lines[0]);
console.log(lines[1]);
console.log(lines[2]);
console.log("...");
console.log(lines[lines.length - 3]);
console.log(lines[lines.length - 2]);
console.log(lines[lines.length - 1]);
}).catch(err => {
console.log(err);
});
}
You first create a sub-directory named "buckets" under where you are running this. Then, you run this from the command line:
node unique-test.js 1,000,000,000
There are some supported command lines options (mostly used during debugging):
-nodisk Don't write to disk
-nocleanup Don't cleanup generated disk files when done
-skipAnalyze Just generate bucket files, don't analyze them
-analyzeOnly Use previously generated bucket files and analyze them
The number you pass on the command line is how many ids to generate. If you pass nothing, it defaults to 100,000. For readability, it handles commas.
That's a really superb answer by #jfriend, I'd just like to add that you can calculate the result analytically, or rather an approximation. I believe using both approaches can be the best route to go.
This is an example of the Birthday Problem.
The TLDR on this is that the approximate probability of collision can be determined using the formula:
1 − exp(−n²/(2x))
Where x is the number of possible values and n is the number of generated values, as long as n is small compared to x (It will be!)
Now, you have approximately 16 bytes of entropy in the generated ids this gives 2^128 or 3.4 x 10^38 possible ids. Since two characters are being dropped (+/), the number of possible values is more like (62^21) = 4.37 x 10^37.
As #jfriend00 has pointed out, the addition of the date means you'd have to generate the number of ids in the table below every millisecond to have the corresponding probability of collision.
This table should give an approximation of the collision probabilities.
|----------------------------|----------------------------|
| Number of Ids | Collision Probability |
|----------------------------|----------------------------|
| 10^6 (1 million) | 2.29 × 10^-26 |
|----------------------------|----------------------------|
| 10^9 (1 billion) | 2.29 × 10^-20 |
|----------------------------|----------------------------|
| 10^12 (1 trillion) | 2.29 × 10^-14 |
|----------------------------|----------------------------|
| 10^15 (1 quadrillion) | 2.29 × 10^-8 |
|----------------------------|----------------------------|
I've used the very handy Wolfram Alpha to calculate these results.
I am getting error "Padding is invalid and cannot be removed."
when trying to decrypt file contents in chunks(using buffer). I am able to decrypt whole file at once but not in blocks. I found many links regarding this problem and most of them suggested to set Padding of AesManaged object
like aesManaged.Padding = PaddingMode.None
But this property is not available in window phone application.
Below is method:
internal static byte[] DecryptBytes(byte[] cipherText, string password)
{
// Check arguments.
if (cipherText == null || cipherText.Length <= 0)
throw new ArgumentNullException("cipherText");
byte[] decryptedBytes= new byte[cipherText.Length];
using (var rijAlg = new AesManaged { KeySize = 256, BlockSize = 128 })
{
var key = new Rfc2898DeriveBytes(password, Encoding.UTF8.GetBytes(Salt));
rijAlg.Key = key.GetBytes(rijAlg.KeySize / 8);
rijAlg.IV = key.GetBytes(rijAlg.BlockSize / 8);
// Create a decrytor to perform the stream transform.
ICryptoTransform decryptor = rijAlg.CreateDecryptor(rijAlg.Key, rijAlg.IV);
// Create the streams used for decryption.
using (var msDecrypt = new MemoryStream())
{
using (var csDecrypt = new CryptoStream(msDecrypt, decryptor, CryptoStreamMode.Write))
{
csDecrypt.Write(cipherText, 0, cipherText.Length);
csDecrypt.FlushFinalBlock();
}
decryptedBytes = msDecrypt.ToArray();
}
}
return decryptedBytes;
}
Please suggest issue in above code or any other workaround
Don't use aesManaged.Padding = PaddingMode.None it will only hide the error, not solve it. You would get the error for incorrectly derived keys, incorrect ciphertext or - for smaller ciphertext - incorrect IV.
Print out the values of all the inputs in hexadecimals right before you perform the decrypt, then compare them with the ones obtained for encryption.