EtherumJ EthereumFactory.createEthereum() taking long hours - ethereum

We are very new to Ethereum .We are trying to create a sample application as POC using ethereumJ .
Ethereum ethereum = EthereumFactory.createEthereum();
The above line is getting executed and its running for long time . Its almost 6 hours now .
Is this normal ? Are we missing something ? How we can minimise it ? ( We are ok to use a small test network . )
Any help will be greatly appreciated

Any blockchain-related processing is inherently compute & resource hungry.
To start with, may it would help if we understand your operating environment & specifications.
E.g. hardware (cpu, ram, etc); software (os, runtime, etc); versions (os, runtime, ethereumj, etc);

Model Name: MacBook Pro
Model Identifier: MacBookPro10,2
Processor Name: Intel Core i5
Processor Speed: 2.6 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 8 GB

Related

Too much time to train one epoch

I use an RTX 3060 12GB GPU enabled workstation with RAM of 16GB DDR4 and CPU Intel Core i5 10400F. Also mounted an external storage HDD drive and ran the script p2ch11.prepcache from the bellow referred repository in order to cache… Used from zero to 8 workers and various batch size selections ranging from 32 to 1024!! Still it takes approximately 13,5 hours to train for one epoch (with batch size=1024 and 4 workers!!)… I still haven’t figured what’s wrong… Looks like I cannot utilize the GPU for some reason …
Code pulled from the repository: https://github.com/deep-learning-with-pytorch/dlwpt-code
-> p2ch11.training.py (https://github.com/deep-learning-with-pytorch/dlwpt-code/blob/master/p2ch11/training.py)
The size of an image is large, you need to do some preprocessing first. I think this will help.

how many uvicorn workers do I have to have in production?

My Environment
FastAPI
Gunicorn & Uvicorn Worker
AWS EC2 c5.2xlarge (8 vCPU)
Document
https://fastapi.tiangolo.com/deployment/server-workers/
Question
Currently I'm using 24 Uvicorn workers in production server. (c5.2xlarge)
gunicorn main:app --workers 24 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:80
I've learn that one process runs on one core.
Therefore If i have 8 processes, I can make use of whole cores (c5.2xlarge's vCpu == 8)
I'm curious that in this situation, Is there any performance benefit if I got more processes than 8?
Number of recommended workers is 2 x number_of_cores +1
You can read more about it at
https://docs.gunicorn.org/en/stable/design.html#:~:text=Gunicorn%20should%20only%20need%204,workers%20to%20start%20off%20with.
In your case with 8 CPU cores, you should be using 17 worker threads.
Additional thoughts on async systems:
The two times core is not a scientific figure as says in the article. But the idea is that one thread can do I/O and another CPU processing at the same time. This makes maximum use of simultaneous threads. Even with async systems, conceptually this works and should give you maximum efficiency.
In general the best practice is:
number_of_workers = number_of_cores x 2 + 1
or more precisely:
number_of_workers = number_of_cores x num_of_threads_per_core + 1
The reason for it is CPU hyperthreading, which allows each core to run multiple concurrent threads. The number of concurrent threads is decided by the chip designers.
Two concurrent threads per CPU core are common, but some processors can support more than two.
vCPU that is mentioned for AWS ec2 resource is already the hyperthreaded amount of processing units you have on the machine (num_of_cores x num_of_threads_per_core). Not to be confused with number_of_cores available on that machine.
So, in your case, c5.2xlarge has 8 vCPUs, meaning you have 8 available concurrent workers.

Why is learning at CPU slower than at GPU

I have:
GPU : GeForce RTX 2070 8GB.
CPU : AMD Ryzen 7 1700 Eight-Core Processor.
RAM : 32GB.
Driver Version: 418.43.
CUDA Version: 10.1.
On my project, gpu is also slower than cpu. But now I will use the documentation example.
from catboost import CatBoostClassifier
import time
start_time = time.time()
train_data = [[0,3],
[4,1],
[8,1],
[9,1]]
train_labels = [0,0,1,1]
model = CatBoostClassifier(iterations=1000, task_type = "CPU/GPU")
model.fit(train_data, train_labels, verbose = False)
print(time.time()-start_time)
Training time on gpu: 4.838074445724487
Training time on cpu: 0.43390488624572754
Why is the training time on gpu more than on cpu?
Be careful, no experience with catboost, so the following is from CUDA point of view
data transfer The launch of a kernel (function called by Host, e.g. CPU, executed by device, e.g. GPU) requires data to be transferred from host to device. See image below to get an idea of the transfer time on data size. By default, the memory are non-pinned (using cudaMalloc()). See https://www.cs.virginia.edu/~mwb7w/cuda_support/pinned_tradeoff.html to find out more.
kernel launch overhead Each time when the host calls a kernel, it enqueues the kernel to the working queue of the device. i.e. for each iteration, the host instantiates a kernel, and adds to the queue. Before the introduction of CUDA graph (which also points out that the kernel launch overhead can be significant when the kernel has short execution time), the overhead of each kernel launch cannot be avoided. I don't know how catboost handles iterations, but given that difference between execution times, it seems not have resolved the launch overheads (IMHO)
Catboost uses some different techniques with small datasets (rows<50k or columns<10) that reduces overfitting, but takes more time. Try training with a gigantic dataset, for instance the Epsilon dataset. see github https://github.com/catboost/catboost/issues/505

How to make sense of OpenShift pods CPU usage metrics

Can someone please help explain how to understand the CPU usage metrics reported on the OpenShift web console.
Below is an example for my application. The cursor on the graph points to 0.008 cores and is different at different times. What does 0.008 cores mean? How should this value be understood if my project on OpenShift doesn't resource limit and quote set? Thanks!
Compute Resources Learn More
Each container running on a node uses compute resources like CPU and memory. You can specify how much CPU and memory a container needs to improve scheduling and performance.
CPU
CPU is often measured in units called millicores. Each millicore is equivalent to 1⁄1000 of a CPU core.
1000 millcores = 1 core

Running text classification - CNN on GPU

Based on this github link https://github.com/dennybritz/cnn-text-classification-tf , I want to classified my datasets on Ubuntu-16.04 on GPU.
For running on GPU, I've been changed line 23 on text_cnn.py to this : with tf.device('/gpu:0'), tf.name_scope("embedding"):
my first dataset for train phase has 9000 documents and it's size is about 120M and
second one for train has 1300 documents and it's size is about 1M.
After running on my Titan X server with GPU, I have got errors.
Please guide me, How can I solve this issue?
Thanks.
You are getting Out of Memory error, so first thing to try is smaller batch size
(default is 64). I would start with:
./train.py --batch_size 32
Most of the memory is used to hold the embedding parameters and convolution parameters. I would suggest reduce:
EMBEDDING_DIM
NUM_FILTERS
BATCH_SIZE
try embedding_dim=16, batch_size=16 and num_filters=32, if that works, increase them 2x at a time.
Also if you are using docker virtual machine to run tensorflow, you might be limited to use only 1G of memory by default though you have 16G memory in your machine. see here for more details.