CUDA cuDNN: cudnnGetConvolutionForwardWorkspaceSize fails with bad parameter - deep-learning

I am currently trying to implement a very basic 2D convolution using CUDA cuDNN between an "image" of size 3x3 and a kernel of size 2x2, resulting in a 2x2 output.
This is my code:
// Create a cuDNN handle:
cudnnHandle_t handle;
cudnnCreate(&handle);
// Create your tensor descriptors:
cudnnTensorDescriptor_t cudnnIdesc;
cudnnFilterDescriptor_t cudnnFdesc;
cudnnTensorDescriptor_t cudnnOdesc;
cudnnConvolutionDescriptor_t cudnnConvDesc;
cudnnCreateTensorDescriptor( &cudnnIdesc );
cudnnCreateFilterDescriptor( &cudnnFdesc );
cudnnCreateTensorDescriptor( &cudnnOdesc );
cudnnCreateConvolutionDescriptor( &cudnnConvDesc );
// Set tensor dimensions as multiples of eight (only the input tensor is shown here):
// W, H, D, C, N
const int dimI[] = { I_M, I_N, 1, 1 };
// Wstride, Hstride, Dstride, Cstride, Nstride
const int strideI[] = { 1, 1, 1, 1 };
checkCUDAError( "SetImgDescriptor failed", cudnnSetTensorNdDescriptor(cudnnIdesc, CUDNN_DATA_HALF, 4, dimI, strideI) );
const int dimF[] = { K_M, K_N, 1, 1 };
checkCUDAError( "SetFilterDescriptor failed", cudnnSetFilterNdDescriptor(cudnnFdesc, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4, dimF) );
const int dimO[] = { I_M - K_M + 1, I_N - K_N + 1, 1, 1 };
const int strideO[] = { 1, 1, 1, 1 };
checkCUDAError( "SetOutDescriptor failed", cudnnSetTensorNdDescriptor(cudnnOdesc, CUDNN_DATA_HALF, 4, dimO, strideO) );
checkCUDAError( "SetConvDescriptor failed", cudnnSetConvolution2dDescriptor(cudnnConvDesc, 0, 0, 1, 1, 1, 1, CUDNN_CONVOLUTION, CUDNN_DATA_HALF) );
// Set the math type to allow cuDNN to use Tensor Cores:
checkCUDAError( "SetConvMathType failed", cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
// Choose a supported algorithm:
int algoCount = 0;
cudnnConvolutionFwdAlgoPerf_t algoPerf;
checkCUDAError( "GetConvForwardAlgo failed", cudnnFindConvolutionForwardAlgorithm(handle, cudnnIdesc, cudnnFdesc, cudnnConvDesc, cudnnOdesc, 1, &algoCount, &algoPerf) );
// Allocate your workspace:
void *workSpace;
size_t workSpaceSize = 0;
checkCUDAError( "WorkspaceSize failed", cudnnGetConvolutionForwardWorkspaceSize(handle, cudnnIdesc, cudnnFdesc, cudnnConvDesc, cudnnOdesc, algoPerf.algo, &workSpaceSize) );
if (workSpaceSize > 0) {
cudaMalloc(&workSpace, workSpaceSize);
}
However, cudnnGetConvolutionForwardWorkspaceSize fails with CUDNN_STATUS_BAD_PARAM.
According to https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetConvolutionForwardWorkspaceSize
this can only be because of one of the reasons:
CUDNN_STATUS_BAD_PARAM:
At least one of the following conditions are met:
(1) One of the parameters handle, xDesc, wDesc, convDesc, yDesc is NULL.
(2) The tensor yDesc or wDesc are not of the same dimension as xDesc.
(3) The tensor xDesc, yDesc or wDesc are not of the same data type.
(4) The numbers of feature maps of the tensor xDesc and wDesc differ.
(5) The tensor xDesc has a dimension smaller than 3.
I don't see how any of them are true.
(1) is obviously not the case. Because yDesc, wDesc and xDesc all have 4 dimensions, (2) is also not the case.
Every tensor has the data type CUDNN_DATA_HALF, which is why (3) is also not true.
I don't know exactly what (4) refers to but I think the number of feature maps for image and kernel is 1 in my case.
And (5) is also not true.
Any idea why the function fails nevertheless?

I solved the error by doing this:
checkCUDAError( "SetImgDescriptor failed", cudnnSetTensor4dDescriptor(cudnnIdesc, CUDNN_TENSOR_NCHW, CUDNN_DATA_HALF, 1, 1, I_M, I_N) );
checkCUDAError( "SetFilterDescriptor failed", cudnnSetFilter4dDescriptor(cudnnFdesc, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 1, 1, K_M, K_N) );
checkCUDAError( "SetOutDescriptor failed", cudnnSetTensor4dDescriptor(cudnnOdesc, CUDNN_TENSOR_NCHW, CUDNN_DATA_HALF, 1, 1, I_M - K_M + 1, I_N - K_N + 1) );

Related

Thrust: selectively copy based on another vector

I'd like to use a set of thrust operations to selectively copy the elements of one vector A into a new vector B based on a predicate on elements in a third vector C.
Here's an example case: I want to copy elements (in order) from A when the corresponding element in B is 1 to C and don't if it is 0. I want |C| < |A| if there are 0s in B. We can pre-determine the size of C by a reduction on B. e.g:
A = [2, 3, 6, 0, 11]
B = [1, 0, 1, 1, 0]
C = [2, 6, 0]
Any help is greatly appreciated
This algorithm is known as stream compaction. It is implemented in thrust::copy_if.
The following example is taken from the Thrust documentation.
#include <thrust/copy.h>
...
struct is_even
{
__host__ __device__
bool operator()(const int x)
{
return (x % 2) == 0;
}
};
...
int N = 6;
int data[N] = { 0, 1, 2, 3, 4, 5};
int stencil[N] = {-2, 0, -1, 0, 1, 2};
int result[4];
thrust::copy_if(data, data + N, stencil, result, is_even());
// data remains = { 0, 1, 2, 3, 4, 5};
// stencil remains = {-2, 0, -1, 0, 1, 2};
// result is now { 0, 1, 3, 5}
Although Abator had given the right function to use. Let me try a complete example.
//~~~START:Wed, 06-Oct-2021, 21:41:22 IST
//~~~Author:Rajesh Pandian M | mrprajesh.co.in
//~~CompiledWith: nvcc a.cu -std=c++14 --expt-extended-lambda
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <stdio.h>
int main(void) {
const int N=5;
int A[]={2, 3, 6, 0, 11}; //Data
int B[]={1, 0, 1, 1, 0 }; //Stencil
thrust::device_vector<int> dA(A, A + N);
thrust::device_vector<int> dB(B, B + N);
// Allocates memory based on #number of 1's
thrust::device_vector<int> dC(thrust::count(B, B+N,1));
//Condition on the stencil elements. If 1 is seen copy, else do not!
thrust::copy_if( dA.begin()
,dA.end()
,dB.begin()
,dC.begin()
,[] __host__ __device__ (const int& x){
return 1 == x;
});
//Prints
thrust::for_each(dC.begin(), dC.end(),
[] __host__ __device__(const int& x){ printf("%d ",x);});
return 0;
}

function IBuilder::buildEngineWithConfig() returns null

I am using tensorRT to build a small model as below:
#include "NvInfer.h"
#include "cuda_runtime_api.h"
#include <fstream>
#include <map>
#include <chrono>
#include <iostream>
#include "include/Utils.h"
#include <memory>
#include <vector>
#include <cassert>
#include "src/InferDeleter.cpp"
using namespace std;
using namespace nvinfer1;
class MyLogger : public ILogger {
void log(Severity severity, const char *msg) override {
if (severity != Severity::kINFO) {
cout << msg << endl;
}
}
} gLogger;
int main() {
//load weights
map<string, Weights> mWeightMap = Utils::getInstance().loadWeights("Weights/mnistapi.wts");
//a few configuration parameters
const char *INPUT_BLOB_NAME = "input";
const char *OUTPUT_BLOB_NAME = "output";
DataType dataType = nvinfer1::DataType::kFLOAT;
int INPUT_H = 28, INPUT_W = 28;
int batchSize = 1;
//define the network
IBuilder *builder = createInferBuilder(gLogger);
INetworkDefinition *network = builder->createNetworkV2(0U);
// Create input tensor of shape { 1, 1, 28, 28 }
ITensor *data = network->addInput(
INPUT_BLOB_NAME, DataType::kFLOAT, Dims3{1, INPUT_H, INPUT_W});
// Create scale layer with default power/shift and specified scale parameter.
const float scaleParam = 0.0125f;
const Weights power{DataType::kFLOAT, nullptr, 0};
const Weights shift{DataType::kFLOAT, nullptr, 0};
const Weights scale{DataType::kFLOAT, &scaleParam, 1};
IScaleLayer *scale_1 = network->addScale(*data, ScaleMode::kUNIFORM, shift, scale, power);
// Add convolution layer with 20 outputs and a 5x5 filter.
IConvolutionLayer *conv1 = network->addConvolutionNd(
*scale_1->getOutput(0), 20, Dims{2, {5, 5}, {}}, mWeightMap["conv1filter"], mWeightMap["conv1bias"]);
conv1->setStride(DimsHW{1, 1});
// Add max pooling layer with stride of 2x2 and kernel size of 2x2.
IPoolingLayer *pool1 = network->addPoolingNd(*conv1->getOutput(0), PoolingType::kMAX, Dims{2, {2, 2}, {}});
pool1->setStride(DimsHW{2, 2});
// Add second convolution layer with 50 outputs and a 5x5 filter.
IConvolutionLayer *conv2 = network->addConvolutionNd(
*pool1->getOutput(0), 50, Dims{2, {5, 5}, {}}, mWeightMap["conv2filter"], mWeightMap["conv2bias"]);
conv2->setStride(DimsHW{1, 1});
// Add second max pooling layer with stride of 2x2 and kernel size of 2x3>
IPoolingLayer *pool2 = network->addPoolingNd(*conv2->getOutput(0), PoolingType::kMAX, Dims{2, {2, 2}, {}});
pool2->setStride(DimsHW{2, 2});
// Add fully connected layer with 500 outputs.
IFullyConnectedLayer *ip1
= network->addFullyConnected(*pool2->getOutput(0), 500, mWeightMap["ip1filter"], mWeightMap["ip1bias"]);
// Add activation layer using the ReLU algorithm.
IActivationLayer *relu1 = network->addActivation(*ip1->getOutput(0), ActivationType::kRELU);
// Add second fully connected layer with 20 outputs.
IFullyConnectedLayer *ip2 = network->addFullyConnected(
*relu1->getOutput(0), 10, mWeightMap["ip2filter"], mWeightMap["ip2bias"]);
// Add softmax layer to determine the probability.
ISoftMaxLayer *prob = network->addSoftMax(*ip2->getOutput(0));
prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*prob->getOutput(0));
//build engine
IBuilderConfig *builderConfig = builder->createBuilderConfig();
builder->setMaxBatchSize(batchSize);
builderConfig->setMaxWorkspaceSize(1<<24);
//engine null
ICudaEngine *engine = builder->buildEngineWithConfig(*network, *builderConfig);
//later uses of engine.
return 0;
}
However, the function builder->buildEngineWithConfig(*network, *builderConfig) returns nullptr. I tried to change maxWorkSpace to other values but it still does not work. I also visited this post but nothing help. Anyone points out the causes of the problem. Thank you!
After a few days of rolling over this problem. I have found that if layers in the model does not match the weight passed in, there is no error will appear but you can not create an TensorRT engine to do later tasks. Therefore, the best way to do in this situation is carefully checking layer by layer and the .wts file.

get a key-value pair from an object to a new object jq

I get this object
{
"138.68.226.120:26969": 1,
"178.128.50.37:26969": 1,
"207.180.218.133:26969": 1,
"66.42.67.157:26969": 1,
"140.82.14.193:26969": 1,
"51.15.39.62:26969": 1,
"144.217.91.232:26969": 1,
"144.217.81.95:26969": 1,
"68.183.105.143:26969": 1,
"192.99.246.177:26969": 1,
"167.99.98.151:26969": 1,
"59.79.71.205:26969": 1
}
When I use jq '."59.79.71.205:26969"' it give me the value only, is there a way to get the key-value from the object into an object like the example
{
"59.79.71.205:26969": 1
}
The answer is in the Object Construction section of the manual.
jq '{"59.79.71.205:26969"}'
const splittedObject = Object.keys( // get all the keys
yourObject
).map((key) => { // then for each key, turn the key into an object with the key-value pair
return {
[key]: yourObject[key] // assign the value to the key and voila
}
});
Now splittedObject is an array of those objects with one key, it's better to demonstrate with this snippet:
const yourObject = { "138.68.226.120:26969": 1, "178.128.50.37:26969": 1, "207.180.218.133:26969": 1, "66.42.67.157:26969": 1, "140.82.14.193:26969": 1, "51.15.39.62:26969": 1, "144.217.91.232:26969": 1, "144.217.81.95:26969": 1, "68.183.105.143:26969": 1, "192.99.246.177:26969": 1, "167.99.98.151:26969": 1, "59.79.71.205:26969": 1 };
const splittedObject = Object.keys( // get all the keys
yourObject
).map((key) => { // then for each key, turn the key into an object with the key-value pair
return {
[key]: yourObject[key] // assign the value to the key and voila
}
});
console.log(splittedObject);
By the way, can I ask why you need to do this?

Signing and Verifying Messages in Ethereum

I am trying to follow this tutorial here:
But in the tutorial he doesnt specify how to implement the contract. So I tried to do it using truffle and ganache-cli. In a truffle test I have tried using the following code:
const amount = web3.toWei(5, 'ether');
const Contract = await GmsPay.new({from : Sender, value : web3.toWei(10, 'ether')});
const hash = Web3Beta.utils.soliditySha3(
{t : 'address', v : Recipient},
{t : 'uint256', v : amount},
{t : 'uint256', v : 1},
{t : 'address', v : Contract.address}
);
const sig = await Web3Beta.eth.sign(hash, Sender);
const res = await Contract.claimPayment(amount, 1, sig, {from : Recipient});
But I just keep getting, "Error: VM Exception while processing transaction: revert". Using the debuger I see that my code executes down to:
require(recoverSigner(message, sig) == owner);
Even if I take that line out the last line still doesnt work. What am I doing wrong? Any help would be greatly appreciated.
Ran into similar challenges in my truffle tests with the 'recoverSigner(message, sig) == owner' check. After comparing R,S,V, values produced in the solidity recoverSigner() function vs. the same values generated on the test side using the ethereumjs-util's fromRpcSig() function, I realized that recoverSigner is returning V as a 0, whereas fromRpcSig has this value at 27. This line provided the working answer.
Final splitSignature() function included below if you run into a similar issue.
function splitSignature(bytes memory sig)
internal
pure
returns (uint8, bytes32, bytes32)
{
require(sig.length == 65);
bytes32 r;
bytes32 s;
uint8 v;
assembly {
// first 32 bytes, after the length prefix
r := mload(add(sig, 32))
// second 32 bytes
s := mload(add(sig, 64))
// final byte (first byte of the next 32 bytes)
v := byte(0, mload(add(sig, 96)))
}
// support both versions of `eth_sign` responses
if (v < 27)
v += 27;
return (v, r, s);
}

Json object with random key value

I am developing an application, where I drag and drop images to grid, and after dropping I am creating a json object where I am adding that drop images to json object, there is a 144 square grid.
My json object gets created with entries:
27: Object
51: Object
54: Object
75: Object
99: Object
123: Object
125: Object
How can I loop through my exact key numbers in jquery for match with other json object?
Sample Example:
var arr = [ "one", "two", "three", "four", "five" ];
var obj = { one: 1, two: 2, three: 3, four: 4, five: 5 };
jQuery.each( arr, function( i, val ) {
$( "#" + val ).text( "Mine is " + val + "." );
// Will stop running after "three"
return ( val !== "three" );
});
jQuery.each( obj, function( i, val ) {
$( "#" + i ).append( document.createTextNode( " - " + val ) );
});
Output:
Mine is one. - 1
Mine is two. - 2
Mine is three. - 3
- 4
- 5
You can loop over object's properties like this:
var obj = {1: 'val1', 2:'val2', 3:'val3' }
for (var param in obj) {
if (obj.hasOwnProperty(param))
console.log(obj[param])
}
And this will print:
val1
val2
val3