Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...
Related
I am trying to work on a matlab script that calculates a 1x1854 matrix called N2. This routine has to be performed 1000 times because each iteration the input data files are different. I am trying to store the matrix N2 in progressive order for each iteration, like N2_1, N2_2 ecc. How should implement that?
for ii=1:1000
file1 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35X35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
file2 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35K35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
%%calculations...
[N,bind] = elecdensity(omega_new,closestapproach);
%
N2_num2str(ii) = N./1e6;
end
To generate those variables, change the code line
N2_num2str(ii) = N./1e6;
to
eval(['N2_' num2str(ii) '= ' 'N./1e6']);
This might be computational too expensive. Another approach I will use to avoid the usage of the "eval" command is to save the tables in a structure and each field of it will be the matrix (named N_NUMBER). So, the code will be
% Generate the struct object
myValues = struct;
% Start the for loops
for ii=1:1000
file1 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35X35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
file2 = load(['/Users/gianmarcobroilo/Desktop/1000shifts/delays/GRV_JUGR_2021158_1648X35K35001KV03.NEWFES_delay_' num2str(ii) '.TXT']);
%%calculations...
[N,bind] = elecdensity(omega_new,closestapproach);
%
fieldName = ['N2_' num2str(ii)];
myValues.(fieldName) = N./1e6;
end
% Print the table 54
myValues.N2_54
I've got a little MatLab script, which I try to understand. It doesn't do very much. It only reads a text from a file and encode and decode it with the Huffman-functions.
But it throws an error while decoding:
"error: out of memory or dimension too large for Octave's index type
error: called from huffmandeco>dict2tree at line 95 column 19"
I don't know why, because I debugged it and don't see a large index type.
I added the part which calculates p from the input text.
%text is a random input text file in ASCII
%calculate the relative frequency of every Symbol
for i=0:127
nlet=length(find(text==i));
p(i+1)=nlet/length(text);
end
symb = 0:127;
dict = huffmandict(symb,p); % Create dictionary
compdata = huffmanenco(fdata,dict); % Encode the data
dsig = huffmandeco(compdata,dict); % Decode the Huffman code
I can oly use octave instead of MatLab. I don't know, if there is an unexpected error. I use the Octave Version 6.2.0 on Win10. I tried the version for large data, it didn't change anything.
Maybe anyone knows the error in this context?
EDIT:
I debugged the code again. In the function huffmandeco I found the following function:
function tree = dict2tree (dict)
L = length (dict);
lengths = zeros (1, L);
## the depth of the tree is limited by the maximum word length.
for i = 1:L
lengths(i) = length (dict{i});
endfor
m = max (lengths);
tree = zeros (1, 2^(m+1)-1)-1;
for i = 1:L
pointer = 1;
word = dict{i};
for bit = word
pointer = 2 * pointer + bit;
endfor
tree(pointer) = i;
endfor
endfunction
The maximum length m in this case is 82. So the function calculates:
tree = zeros (1, 2^(82+1)-1)-1.
So it's obvious why the error called a too large index type.
But there must be a solution or another error, because the code is tested before.
I haven't weeded through the code enough to know why yet, but huffmandict is not ignoring zero-probability symbols the way it claims to. Nor have I been able to find a bug report on Savannah, but again I haven't searched thoroughly.
A workaround is to limit the symbol list and their probabilities to only the symbols that actually occur. Using containers.Map would be ideal, but in Octave you can do that with a couple of the outputs from unique:
% Create a symbol table of the unique characters in the input string
% and the indices into the table for each character in the string.
[symbols, ~, inds] = unique(textstr);
inds = inds.'; % just make it easier to read
For the string
textstr = 'Random String Input.';
the result is:
>> symbols
symbols = .IRSadgimnoprtu
>> inds
inds =
Columns 1 through 19:
4 6 11 7 12 10 1 5 15 14 9 11 8 1 3 11 13 16 15
Column 20:
2
So the first symbol in the input string is symbols(4), the second is symbols(6), and so on.
From there, you just use symbols and inds to create the dictionary and encode/decode the signal. Here's a quick demo script:
textstr = 'Random String Input.';
fprintf("Starting string: %s\n", textstr);
% Create a symbol table of the unique characters in the input string
% and the indices into the table for each character in the string.
[symbols, ~, inds] = unique(textstr);
inds = inds.'; % just make it easier to read
% Calculate the frequency of each symbol in table
% max(inds) == numel(symbols)
p = histc(inds, 1:max(inds))/numel(inds);
dict = huffmandict(symbols, p);
compdata = huffmanenco(inds, dict);
dsig = huffmandeco(compdata, dict);
fprintf("Decoded string: %s\n", symbols(dsig));
And the output:
Starting string: Random String Input.
Decoded string: Random String Input.
To encode strings other than the original input string, you would have to map the characters to symbol indices (ensuring that all symbols in the string are actually present in the symbol table, obviously):
>> [~, s_idx] = ismember('trogdor', symbols)
s_idx =
15 14 12 8 7 12 14
>> compdata = huffmanenco(s_idx, dict);
>> dsig = huffmandeco(compdata, dict);
>> fprintf("Decoded string: %s\n", symbols(dsig));
Decoded string: trogdor
The following is string of arithmetic operations in lua.
local str ='x+abc*def+y^z+10'
Can this string be splitted so that individual variables or numbers will appear? For example, say string str is splitted into table s. Then the output will be
s[1] = x
s[2] = abc
s[3] = def
s[4] = y
s[5] = z
s[6] = 10
The splitting is to be done with operators +,-,*,\,^,%
Try also this simpler pattern:
local str ='x+(abc*def)+y^z+10'
for w in str:gmatch("%w+") do
print(w)
end
You can use string.gmatch to iterate over your string.
Feel free to add other operators to the pattern.
Refer to https://www.lua.org/manual/5.3/manual.html#6.4.1
local str ='x+abc*def+y^z+10'
local s = {}
for operand in str:gmatch('[^%+%*%^]+') do
table.insert(s, operand)
end
You can use string.gmatch to do what your looking for. you would use the pattern %+%-%*%^/
local str ='x+abc*def+y^z+10'
local s = {}
for value in str:gmatch("[%+%-%*%^/]*(%w*)[%+%-%*%^/]*") do
s[#s + 1] = value
end
print(unpack(s))
Also not that if you need \ as operator as shown in your question it would need to be escaped using an additional \.
Resourse for leaning more about lua patterns: understanding_lua_patterns
I encountered this bug in cjson lua when I was using a script in redis 3.2 to set a particular value in a json object.
Currently, the lua in redis does not differentiate between an empty json array or an empty json object. Which causes serious problems when serialising json objects that have arrays within them.
eval "local json_str = '{\"items\":[],\"properties\":{}}' return cjson.encode(cjson.decode(json_str))" 0
Result:
"{\"items\":{},\"properties\":{}}"
I found this solution https://github.com/mpx/lua-cjson/issues/11 but I wasn't able to implement in a redis script.
This is an unsuccessful attempt :
eval
"function cjson.mark_as_array(t)
local mt = getmetatable(t) or {}
mt.__is_cjson_array = true
return setmetatable(t, mt)
end
function cjson.is_marked_as_array(t)
local mt = getmetatable(t)
return mt and mt.__is_cjson_array end
local json_str = '{\"items\":[],\"properties\":{}}'
return cjson.encode(cjson.decode(json_str))"
0
Any help or pointer appreciated.
There are two plans.
Modify the lua-cjson source code and compile redis, click here for details.
Fix by code:
local now = redis.call("time")
-- local timestamp = tonumber(now[1]) * 1000 + math.floor(now[2]/1000)
math.randomseed(now[2])
local emptyFlag = "empty_" .. now[1] .. "_" .. now[2] .. "_" .. math.random(10000)
local emptyArrays = {}
local function emptyArray()
if cjson.as_array then
-- cjson fixed: https://github.com/xiyuan-fengyu/redis-lua-cjson-empty-table-fix
local arr = {}
setmetatable(arr, cjson.as_array)
return arr
else
-- plan 2
local arr = {}
table.insert(emptyArrays, arr)
return arr
end
end
local function toJsonStr(obj)
if #emptyArrays > 0 then
-- plan 2
for i, item in ipairs(emptyArrays) do
if #item == 0 then
-- empty array, insert a special mark
table.insert(item, 1, emptyFlag)
end
end
local jsonStr = cjson.encode(obj)
-- replace empty array
jsonStr = (string.gsub(jsonStr, '%["' .. emptyFlag .. '"]', "[]"))
for i, item in ipairs(emptyArrays) do
if item[1] == emptyFlag then
table.remove(item, 1)
end
end
return jsonStr
else
return cjson.encode(obj)
end
end
-- example
local arr = emptyArray()
local str = toJsonStr(arr)
print(str) -- "[]"
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
LZW Compression In Lua
Here is my code for compressing data in Lua using the LZW compression method. My problem is that the function is returning the character 'T', instead of returning the full compressed string 'TOBEORNOTTOBEORNOT'. Thanks!
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
local wc = w .. c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
if w then
dictionary[w] = ""
end
return w
end
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print(compressed)
Just a hint: you return w in the for loop
EDIT some explanation
If you return your result in the loop, then the loop will do only one iteration. At the end of the first iteration, your function will finish. That makes no sense. So your return statement should come after the for loop.
Also, it is suspicious that you declare a variable result = {} and then you never use it.
So I suggest you put your return statement after the loop and you print the value of your variables at the end of in each iteration (you'd put the print statements where you have the return now), so you can see what is really happening.