is there a way to use fixed-length index in logback's rolling filename pattern? - logback

is there a way to use fixed-length index in logback's rolling filename pattern?
I mean rolled log-files will be indexed like:
application-2021-07-14_01.log.gz
application-2021-07-14_02.log.gz
application-2021-07-14_03.log.gz
...
application-2021-07-14_10.log.gz
application-2021-07-14_11.log.gz
...
instead of:
application-2021-07-14_1.log.gz
application-2021-07-14_2.log.gz
application-2021-07-14_3.log.gz
...
application-2021-07-14_10.log.gz
application-2021-07-14_11.log.gz
...

Try the fileNamePattern like this:
// left justified, missing zeroes
<fileNamePattern>logFile.%d{yyyy-MM-dd}_%-2i.log</fileNamePattern>
// zero left-pad with 2 places
<fileNamePattern>logFile.%d{yyyy-MM-dd}_%02i.log</fileNamePattern>
or
<fileNamePattern>logFile.%d{yyyy-MM-dd}_%i{2}.log</fileNamePattern>
As mentinoned in the docs but spreading it with the percent/formatter syntax.

Related

MySQL : Import Number Enclosed by Bracket from CSV

I am trying to import reports in csv format to MySQL for further analysis process. But, I have find several negative numbers enclosed by bracket e.g ($184,919.02),
($182,246.50). If I use double format, it will become 0, but using varchar or text it appears.
I need it to be recorded in double format to automate some calculations in further analysis process. Is there any way to solve this problem? And also how to remove the $ (dollar) sign as well?
Thanks in advance.
Load into a VARCHAR column. Then update the column with REPLACE(col, '$', '') to get rid of the $.
Repeat to get rid of ,, -, (, ')` and any other garbage that is in the way.
Better yet, us a real programming language (not SQL) to cleanse the data. Many languages let you remove -$,() in a single pass.

MySQL RegEx to match two consecutive digits that are the same

I am using the following RegEx in MySQL to match two consecutive digits that are the same anywhere in a string:
^.*([[:digit:]])\1+.*$
It matches correctly the following strings:
8831
5011
9931
but it also matches
9318
and it doesn't match
3449
Is the problem around .* or is it something else?
There's no way to check to the same thing twice directly, instead you would need to check for all possibilities. Luckily since you are only looking at 10 digits, it's relatively easy:
(11|22|33|44|55|66|77|88|99|00)
I don't think MySQL regular expressions have back references. You can do the more verbose:
where col regexp '00|11|22|33|44|55|66|77|88|99'

Regex for start with three alpha and four digits

I have writen an sql statement to retrieve data from Mysql db and I wanted to select data where myId start with three alpha and 4 digits example : ABC1234K1D2
myId REGEXP '^[A-Z]{3}/d{4}'
but it gives me empty result(data is available in DB). Could someone point me to correct way.
In most regex variants the answer would be: /d matches a / followed by a d; I think you want \d which matches a digit.
However MySQL has a somewhat limited regex implementation (see documentation).
There is no shortcut to character sets like \d for any digit.
You need to either use a named character set ([[:digit:]]), or just use [0-9].
Try this out :
[A-Z]{3}[0-9]{4}
If you want characters to be case insensitive. Try this :
[a-zA-Z]{3}[0-9]{4}
First, in regular regular expressions, to match a digit, you have to use \d instead of /d (which makes you match / followed by d).
Then, I had never noticed, but I think \d (and the others like \w, etc.) don't seem to be available in MySQL. The doc lists the accepted spacial chars, and those generic classes don't appear. You could use [:digit:] instead, even if [0-9] is quite shorter ;)
You are doing fine, just replace /d with \d.Final regex: ^[A-Z]{3}\d{4}
You could use the following pattern :
^[a-zA-Z]{3}\d{4}

liblinear's train.exe: "Wrong input format at line 1"

I'm trying to run liblinear's train.exe on Windows:
>train ex1_train.txt
Wrong input format at line 1
Here's the beginning of the file. What's wrong?
17.592 1:6.1101
9.1302 1:5.5277
13.662 1:8.5186
11.854 1:7.0032
6.8233 1:5.8598
11.886 1:8.3829
4.3483 1:7.4764
12 1:8.5781
6.5987 1:6.4862
3.8166 1:5.0546
3.2522 1:5.7107
15.505 1:14.164
3.1551 1:5.734
7.2258 1:8.4084
0.71618 1:5.6407
3.5129 1:5.3794
5.3048 1:6.3654
0.56077 1:5.1301
3.6518 1:6.4296
5.3893 1:7.0708
Liblinear requires the same input format as LibSVM. And, from their README file,
The format of training and testing data file is:
<label> <index1>:<value1> <index2>:<value2> ...
Each line contains an instance and is ended by a '\n' character. For
classification, <label> is an integer indicating the class label
(multi-class is supported). For regression, <label> is the target
value which can be any real number. For one-class SVM, it's not used
so can be any number. The pair <index>:<value> gives a feature
(attribute) value: <index> is an integer starting from 1 and <value>
is a real number. The only exception is the precomputed kernel, where
<index> starts from 0; see the section of precomputed kernels. Indices
must be in ASCENDING order.
Since we don't have the entire file, the best answer we can provide is that make sure all these instructions are followed. E.g., there is no TAB instead of space, there is no '\r\n' instead of '\n', etc. A good way to debug would be to take a few lines and keep adding until you get the error.
head -10 <yourfile> > tmp10
head -20 <yourfile> > tmp20
etc. And see where the error pops up.
My problems were that: you can't use zero as a feature id, and your features need to be sorted.

Implementing run-length encoding

I've written a program to perform run length encoding.
In typical scenario if the text is
AAAAAABBCDEEEEGGHJ
run length encoding will make it
A6B2C1D1E4G2H1J1
but it was adding extra 1 for each non repeating character. Since i'm compressing BMP files with it, i went with an idea of placing a marker "$" to signify the occurance of a repeating character, (assuming that image files have huge amount of repeating text).
So it'd look like
$A6$B2CD$E4$G2HJ
For the current example it's length is the same, but there's a noticable difference for BMP files. Now my problem is in decoding. It so happens some BMP Files have the pattern $<char><num> i.e. $I9 in the original file, so in the compressed file also i'd contain the same text. $I9, however upon decoding it'd treat it as a repeating I which repeats 9 times! So it produces wrong output. What i want to know is which symbol can i use to mark the start of a repeating character (run) so that it doesn't conflict with the original source.
Why don't you encode each $ in the original file as $$ in the compressed file?
And/or use some other character instead of $ - one that is not used much in bmp files.
Also note that the BMP format has RLE compression 'built-in' - look here, near the bottom of the page - under "Image Data and Compression".
I don't know what you're using your program for, or if it's just for learning, but if you used the "official" bmp method, your compressed images wouldn't need decompression before viewing.
AAAAAABBCDEEEEGGHJ$IIIIIIIII ==> $A6$B2CD$E4$G2HJ$$I9
If the repeat character occurs in the data, try inserting an extra repeat character in the encoded data. Then if the decoder sees a double repeat character it can insert the actual repeat character
$A6$B2CD$E4$G2HJ$$I9 ==> AAAAAABBCDEEEEGGHJ$IIIIIIIII
What most programs do to signify that some character needs to be treated literally is that they have a defined escape sequence.
For example, in regular expressions, the following are specially defined characters that usually have a meaning:
^[].*+{}()$
Yes, your fun dollar sign character is in there, and it usually means end of line.
So what a programmer using regular expressions has to do to have these characters interpreted literally is that they need to express those characters as an escape sequence. For example, to interpret $ as $, and not end of line, the programmer uses \$, which is the escape sequence.(1)
In your case, you can store literal dollar signs into your compressed file as \$.(2)
NB: grep inverts this logic.
The above solutions to store $ as $$ becomes confusing when you have runs of $ in the BMP file.
If you have the luxury of being able to scan the entire input before starting to compress it, you could choose the least frequent value in the input as your escape value.
For example, given this input:
AAAABBCCCCDDEEEEEEEFFG
You could choose "G" as your escape value (or even "H" if it's part of your symbol set) and adopt a convention whereby the first character of the encoded stream is the escape value. So the string above might encode to:
GGA4BBGC4DDGE7FFGG
or even better:
HHA4BBHC4DDHE7FFG
Please note that there's no point in encoding a "run" of two identical values because the "compressed" version (e.g. HD2) is longer than the uncompressed version (DD).
Hope that helps!
If I understand correctly, the problem is that $ is both a symbol for marking a repeat, and also can be a 'BMP' value as well?
If so, what you could do is to mark a double $ ('$$') character to denote that the '$' character should be treated not as a repeat, but as a single '$'. This would of course mean that the '$' is expensive to encode (takes two symbols instead of 1), but would solve your problem.
If you wanted to have a run of the '$' character, you would need to encode it as:
$$$5 - meaning '$' run of '$$'=$, '5' - 5 times.
I'm honestly not sure what would possessed someone to use a text-based RLE if they want to compress binary data with it. A BMP is not text.
Right now, since only a single byte is read after the $, and it is interpreted as ascii number from 0 to 9, this process has a run length range of 0 to 9, meaning you can only compress values up to 9 repetitions before a new run-length flag needs to be written. After all, you can't make the difference between $I34 for a run-length of 34, and $I3 + 4 for a literal 4 behind the repeat of 3.
If this same byte is instead interpreted as binary value, it can contain values from 0 to 255, giving a massive difference in efficiency.
As for the escaping of $ signs themselves, I'd advice either always treating it as repeat of at least 1 ($$1), or, better yet, encoding the entire thing differently, with the order of the run length values and the data swapped, so a code becomes $<length><data>; then you can use $0 as special symbol to mean 'just $'. When decompressing and encountering the 0 after a $, simply don't read on for a third byte. A run length of 0 should never appear in the compressed data anyway, so it can be given a special meaning, but this is useless if the data byte is put first, since then it'd still be the same length as a normal repeat.