Assume the following df:
ib c d1 d2
0 1.14 1 1 0
1 1.0 1 1 0
2 0.71 1 1 0
3 0.6 1 1 0
4 0.66 1 1 0
5 1.0 1 1 0
6 1.26 1 1 0
7 1.29 1 1 0
8 1.52 1 1 0
9 1.31 1 1 0
10 0.89 1 0 1
d1 and d2 are perfectly colinear. Now I estimate the following regression model:
import statsmodels.api as sm
reg = sm.OLS(df['ib'], df[['c', 'd1', 'd2']]).fit().summary()
reg
This gives me the following output:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: ib R-squared: 0.087
Model: OLS Adj. R-squared: -0.028
Method: Least Squares F-statistic: 0.7590
Date: Thu, 17 Nov 2022 Prob (F-statistic): 0.409
Time: 12:19:34 Log-Likelihood: -1.5470
No. Observations: 10 AIC: 7.094
Df Residuals: 8 BIC: 7.699
Df Model: 1
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
c 0.7767 0.111 7.000 0.000 0.521 1.033
d1 0.2433 0.127 1.923 0.091 -0.048 0.535
d2 0.5333 0.213 2.499 0.037 0.041 1.026
==============================================================================
Omnibus: 0.257 Durbin-Watson: 0.760
Prob(Omnibus): 0.879 Jarque-Bera (JB): 0.404
Skew: 0.043 Prob(JB): 0.817
Kurtosis: 2.019 Cond. No. 8.91e+15
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 2.34e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
"""
However, including c, d1 and d2 represents the well known dummy variable trap which, from my understanding, should make it impossible to estimate the model. Why is this not the case here?
Out of bound error occured.
This is Octave language.
for ii=1:1:10
m(ii)=ii*8
q=m(ii)
if (ii>=2)
q(ii).xdot=(q(ii).x-q(ii-1).x)/Ts;
end
end
But error says
q(2): out of bound 1
How can I fixed it?
For this type of assignment you do not need a loop and anyway you need to define Ts.
To calculate differential increase you can use diff
x=(1:1:10)*8
x =
8 16 24 32 40 48 56 64 72 80
octave:5> Ts=2
Ts = 2
octave:6> xdot=diff(x)/Ts
xdot =
4 4 4 4 4 4 4 4 4
octave:7> size(x)
ans =
1 10
octave:8> size(xdot)
ans =
1 9
Hello I am running Caffe RC-4 and keep getting this output from create_imagenet.sh:
Creating train lmdb...
I0221 14:48:57.925828 8185 convert_imageset.cpp:86] Shuffling data
I0221 14:48:58.085487 8185 convert_imageset.cpp:89] A total of 28 images.
I0221 14:48:58.085688 8185 db_lmdb.cpp:35] Opened lmdb /home/caffe/images/ilsvrc12_train_lmdb
E0221 14:48:58.102860 8185 io.cpp:80] Could not open or find file /home/caffe/data/motd/train/
I0221 14:48:58.308913 8185 convert_imageset.cpp:153] Processed 27 files.
Creating val lmdb...
I0221 14:48:58.352946 8189 convert_imageset.cpp:86] Shuffling data
I0221 14:48:58.481848 8189 convert_imageset.cpp:89] A total of 0 images.
I0221 14:48:58.482054 8189 db_lmdb.cpp:35] Opened lmdb /home/caffe/images/ilsvrc12_val_lmdb
Done.
I have only 27 images in train and 8 in val
Why does it say could not open or find file and then says it has processed 27 files.
and for the Val it says there is 0 images when there are 8.
train.txt
n10000001_2955.JPEG 0
n10000001_7050.JPEG 0
n10000002_2179.JPEG 1
n10000002_2313.JPEG 1
n10000002_5631.JPEG 1
n10000002_6673.JPEG 1
n10000002_7309.JPEG 1
n10000002_8927.JPEG 1
n10000003_2926.JPEG 2
n10000003_3895.JPEG 2
n10000003_4498.JPEG 2
n10000003_6069.JPEG 2
n10000003_6442.JPEG 2
n10000004_1557.JPEG 3
n10000004_1759.JPEG 3
n10000004_2455.JPEG 3
n10000004_3370.JPEG 3
n10000004_3529.JPEG 3
n10000004_6396.JPEG 3
n10000005_1465.JPEG 4
n10000005_4328.JPEG 4
n10000005_4561.JPEG 4
n10000005_4640.JPEG 4
n10000005_5958.JPEG 4
n10000005_7610.JPEG 4
n10000005_7650.JPEG 4
n10000005_9843.JPEG 4
test.txt
n10000002_5602.JPEG 1
n10000002_7860.JPEG 1
n10000003_3676.JPEG 2
n10000003_4381.JPEG 2
n10000003_4578.JPEG 2
n10000003_7249.JPEG 2
n10000004_6958.JPEG 3
n10000005_7996.JPEG 4
My Dir settings in create_imagenet.sh:
EXAMPLE=$ROOTDIR/images
DATA=$ROOTDIR/data/motd
TOOLS=/usr/local/Caffe/caffe-rc4/build/tools
TRAIN_DATA_ROOT=$ROOTDIR/data/motd/train/
VAL_DATA_ROOT=$ROOTDIR/data/motd/val/
Thanks.
I discovered that it needs val.txt instead of test.txt! Oops.
What is the best method of detecting and dropping duplicate rows from an array in Julia?
x = Integer.(round.(10 .* rand(1000,4)))
# In R I would apply the duplicated function.
x = x[duplicated(x),:]
unique is what you are looking for: (this does not answer the question for the detection part.)
julia> x = Integer.(round.(10 .* rand(1000,4)))
1000×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
⋮
julia> unique(x, 1)
973×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
⋮
As for the detection part, a dirty fix would be editing this line:
#nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))
to:
(#nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))), uniquerows
Alternatively, you could define your own unique2 with abovementioned changes:
using Base.Cartesian
import Base.Prehashed
#generated function unique2(A::AbstractArray{T,N}, dim::Int) where {T,N}
......
end
julia> y, idx = unique2(x, 1)
julia> y
960×4 Array{Int64,2}:
8 3 1 5
8 3 1 6
1 1 0 1
8 10 1 10
9 1 8 7
⋮
julia> setdiff(1:1000, idx)
40-element Array{Int64,1}:
99
120
132
140
216
227
⋮
The benchmark on my machine is:
x = rand(1:10,1000,4) # 48 dups
#btime unique2($x, 1);
124.342 μs (31 allocations: 145.97 KiB)
#btime duplicated($x);
407.809 μs (9325 allocations: 394.78 KiB)
x = rand(1:4,1000,4) # 751 dups
#btime unique2($x, 1);
66.062 μs (25 allocations: 50.30 KiB)
#btime duplicated($x);
222.337 μs (4851 allocations: 237.88 KiB)
The result shows that the convoluted-metaprogramming-hashtable way in Base benefits a lot from lower memory allocation.
Julia v1.4 and above you would need to type unique(a, dims=1) where a is your N by 2 Array
julia> a=[2 2 ; 2 2; 1 2; 3 1]
4×2 Array{Int64,2}:
2 2
2 2
1 2
3 1
julia> unique(a,dims=1)
3×2 Array{Int64,2}:
2 2
1 2
3 1
You can also go with:
duplicated(x) = foldl(
(d,y)->(x[y,:] in d[1] ? (d[1],push!(d[2],y)) : (push!(d[1],x[y,:]),d[2])),
(Set(), Vector{Int}()),
1:size(x,1))[2]
This collects a set of seen rows, and outputs the indices of those already seen. This is essentially the minimal effort needed to get the result, so it should be fast.
julia> x = rand(1:2,5,2)
5×2 Array{Int64,2}:
2 1
1 2
1 2
1 1
1 1
julia> duplicated(x)
2-element Array{Int64,1}:
3
5
julia> x[duplicated(x),:]
2×2 Array{Int64,2}:
1 2
1 1
I have the following table called DETAILS.
if Flag is 1 -->00000001 //(1st LSB is set here)
if flag is 2 -->00000010 //(2nd LSB is set here)
if flag is 3 -->00000011 //(1st,2nd LSB's are set here)
if flag is 5 -->00000101 //(1st,3rd LSB's are set here)
Sample data:
ID NAME FLAG(int) IS_LAST
--------------------------------------
1 sports 5 (0000 0101) 0 //1st,3rd LSB's are set
2 News 11 (0000 1011) 0 //1,2,4 MSB's are set
3 Weather 24 (0001 1000) 1 //4,5 MSB's are set
4 IPL 32 (0010 0000) 0 //6th MSB is set
If 2nd LSB or 6th LSB of FLAG column or IS_LAST=1, then I want to OR the FLAG with 64 (0100 0000) and store the result back in to same FLAG column using UPDATE query.
I want the output like this:
ID NAME FLAG(int) IS_LAST
-------------------------------------------------
1 sports 5 (0000 0101)(Not updated) 0
2 News 75 (0100 1011)(updated) 0
3 Weather 88 (0101 1000)(updated) 1
4 IPL 96 (0110 0000)(updated) 0
My SQL has a bitwise or operator - |, so this is just a straightforward update statement:
UPDATE details
SET flag = flag | 64
WHERE (flag | 2 > 0) OR
(flag | 32 > 0) OR
(is_last = 1)