Subsetting from file names - csv

I have some files named:
Mycase_xxx_x_xxx.csv
Mycase_xxx_x_xxx_xx_x.csv
Myanalysis_x_xx_xx_xxx_x_x.csv
Myattempt_xx_x_xxxx.csv
I would like the files named in the following way:
xxx_x_xxx.csv
xxx_x_xxx_xx_x.csv
x_xx_xx_xxx_x_x.csv
xx_x_xxxx.csv
In other words I would like (by Unix) to preserve all characters that are present in filenames after the first word.
Can anyone help me please?
Thank you in advance

This can be achieved using Perl's rename utility, which depending on your system, may be called rename, prename or perl-rename. On Debian and Ubuntu, it can be installed as follows:
sudo apt install rename
Note that on some systems, the command rename may be Perl's rename, while on others it may be util-linux's rename and on others it may be GNU rename. These tools are not compatible with each other.
Perl's rename tool on Debian and Ubuntu can be used as follows:
prename 's/expression/substitution/' filenames...
Or to apply to all files in the current directory:
prename 's/expression/substitution/' *
A useful feature of Perl's rename is the ability to use regular expressions (note how the supported syntax is similar to sed's syntax).
In your case, a regular expression to match the prefix of your filenames up to and including the first underscore is as follows:
^[^_]*_
This can then be substituted with an empty string to remove this part of the filename, resulting in the following command:
prename 's/^[^_]*_//' *
Before running this command, if you are unsure and would like to test that the files will be renamed how you want them, you can add the flags -vn as follows:
prename -vn 's/^[^_]*_//' *
This will not rename any files and will instead print out a list of the files which will be renamed, like so:
$ prename -vn 's/^[^_]*_//' *
Myanalysis_x_xx_xx_xxx_x_x.csv -> x_xx_xx_xxx_x_x.csv
Myattempt_xx_x_xxxx.csv -> xx_x_xxxx.csv
Mycase_xxx_x_xxx.csv -> xxx_x_xxx.csv
Mycase_xxx_x_xxx_xx_x.csv -> xxx_x_xxx_xx_x.csv

Related

Recursively Replace One Windows Path w/ Another in Text Files

I have a large amount of text files stored on a Red Hat server that contain explicit Windows paths. Today, that path has changed and I would like to change the text files to reflect the new path. As they are Windows paths, they all contain single backslashes. I would like to maintain the single backslashes if possible.
I wanted to ask what the best method to perform this string replacement would be. I have made backups of folders so that I may test on a smaller scale before applying to the larger scale that will affect my group members.
Example:
Change $oldPath to $newPath in all *.py files recursively contained in current directory.
i.e. $oldPath\common\file_referenced should become $newPath\common\file_referenced
Robustly using any awk in any shell on every Unix box and regardless of which characters your old or new directory paths contain and whether or not the final directory in either old or new could be a substring of another existing directory name:
$ cat file
\old\fashioned\common\file_referenced
$ oldPath='\old\fashioned'
$ newPath='\new\fangled\etc'
$ awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]="" }
index($0"\\",old"\\")==1 { $0=new substr($0,length(old)+1) }
1' "$oldPath" "$newPath" file
\new\fangled\etc\common\file_referenced
To update all .py files in a directory you could use GNU awk for -i inplace, or you could do for i in *.py; do awk '...' old new "$i" > tmp && mv tmp "$i"; done, or you could use find and/or xargs, etc. - any of the common Unix ways to process multiple files with any command.

Cannot connect to MySQL with Weka

I am trying to connect a database to Weka 3.6.13 in Linux Elementary OS.
First, I had a problem with JDBC connection, solved by this answer changing the /usr/bin/weka file.
Now, when I load the database, this error comes:
Unknown data type: INT. Add entry in weka/experiment/DatabaseUtils.props.
However, I am trying to use explorer only, this file doesn't even exists in my installation.
I installed via sudo apt install weka.
What should I do?
Look inside the directory where your weka.jar file resides, and check if there exists a file called DatabaseUtils.props.
The Weka wiki says:
Weka only looks for the DatabaseUtils.props file. If you take one of
the example files listed above, you need to rename it first.
My file is different I think the actual name does not really matter, it's the filename extension that matters.
In my version of this file there is a section that looks like this:
... (snip...
# mysql-conversion / type-mappings
CHAR=0
TEXT=0
VARCHAR=0
STRING=0
LONGVARCHAR=9
BINARY=0
VARBINARY=0
LONGVARBINARY=9
BIT=1
BOOL=1
NUMERIC=2
DECIMAL=2
FLOAT=2
DOUBLE=2
TINYINT=3
SMALLINT=4
#SHORT=4
SHORT=5
INTEGER=5
INT=5
BIGINT=6
LONG=6
REAL=7
DATE=8
TIME=10
TIMESTAMP=11
#mappings for table creation
CREATE_STRING=TEXT
CREATE_INT=INT
CREATE_DOUBLE=DOUBLE
CREATE_DATE=DATETIME
DateFormat=yyyy-MM-dd HH:mm:ss
#database flags
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true
setAutoCommit=true
createIndex=false
# All the reserved keywords for this database
Keywords=\
AND,\
ASC,\
BY,\
DESC,\
FROM,\
GROUP,\
INSERT,\
ORDER,\
SELECT,\
UPDATE,\
WHERE
# The character to append to attribute names to avoid exceptions due to
# clashes between keywords and attribute names
KeywordsMaskChar=_
#flags for loading and saving instances using DatabaseLoader/Saver
nominalToStringLimit=50
idColumn=auto_generated_id
If you do a google search for this file, another guy has posted his on github. The weka Wiki or SVN/Git-Repo might also list an offfical version somewhere (cannot find it right now), or you can open your weka.jar file as a zip file and extract the .props file (/src/main/java/weka/experiment/DatabaseUtils.props.mysql).
In any case, Mysql exists in many different versions, and I think you can even switch the query engine inside mysql. So I cannot express any guarantees that any of these 2 .props files shown here really work for you. You should experiment a bit.

Fastest way to search a string in a mysql database

When searching a string in an mysql database without knowing in what table or column I might find it, I usually resort to
mysqldump --extended-insert=FALSE --complete-insert=TRUE dbname|grep SomeString
That dumps a lot of unneeded data, ignores indeces invented for that exact purpose, and makes it rather hard to localize the results. Is there a more convenient and more performant way? (Except installing 3rd party software.)
About duplicate questions: As a comment pointed out, I am not restricted to SQL queries, but will accept any solution - may it be a bash script, some CLI function I might have not seen, or grepping the DB file as suggested.
I found a find command that lists the file which contains a specific string:
find [path of data folder of database]/*.ibd -type f -exec grep -i '[search string]' {} +
Outputs when found:
Binary file [path]/[table].ibd matches
Doesn't outputs anything when not found.
Means:
find : linux find command
path : the path pointing to the database folder under mysql/data
directory which contains the table's ibd files
-type f : only search for files
-exec : execute this while listing
grep -i : case insensitive text search
string : string to search for
For others, read this:
https://serverfault.com/a/343177
Same usage here:
http://www.valleyprogramming.com/blog/linux-find-grep-commands-exec-combine-search

.hgignore regex syntax to ignore a specific file (e.g. "core") anywhere

Suppose I have a working directory like this:
t.c
core
multicore
test1/core
I want to ignore all "core" files.
If I use "/core$" (4) will get ignored but not (2).
If I use "^core$" (2) will get ignored but not (4)
If I use "core$" (2) and (4) will get ignored but so will (3) which is not what I want.
How do you do this?
planetmaker's answer, "use glob syntax", is simpler and is what I would usually recommend. There is, however, a regexp answer, and a minor flaw in the glob syntax version.
Mercurial uses Python regular expressions, so we have the (alt1|alt2|...) syntax available. Note that these are grouped.1 We can and should use (?:...) to avoid grouping when required, but for .hgignore, the grouping is irrelevant, so it is simpler (and much more readable) to just use the parentheses, and I do so where possible below.
We could just write:
^core$
/core$
to ignore the file core with nothing coming before it (first pattern) and to ignore a file with a name like test1/core (second pattern). This is a fine, but we can compress it a bit more using the alternation syntax. The leading ^ works even in an alternate within a group, as long as it is still, in effect, leading, so:
(^|/)core$
means the same thing and accomplish the job using regexp syntax.
Annoyingly, all of these patterns ignore all files in any directory named core (whether or not we use regexp vs glob syntax):
$ rm core
$ mkdir core
$ touch core/keepme
$ cat .hgignore
syntax: glob
core
$ hg status -A
? .hgignore
? multicore
? t.c
I core/keepme
I test1/core
The problem is that as soon as we say ignore (some pattern that matches a directory named core), if there are files in that directory that are currently untracked, Mercurial ignores them too. You can forcibly add the file—as with Git, once a file is tracked, any ignore-file pattern that matches it becomes irrelevant—but this does not help with additional files we stick into the directory:
$ hg add core/keepme
$ touch core/keep-me-too
$ hg status -A
A core/keepme
? .hgignore
? multicore
? t.c
I core/keep-me-too
I test1/core
Here, regular expressions can prove to be the answer. Python (and Perl) regexps allow "negative lookbehind", i.e., you can say "as long as some pattern does not appear". Hence we can replace the existing .hgignore contents with:
$ cat .hgignore
(?<!^core/).*/core$
and now we have this status:
$ hg status -A
A core/keepme
? .hgignore
? core/keep-me-too
? multicore
? t.c
I test1/core
This particular regular expression depends on the wanted core directory being named core at the top level (^core). If we wanted to keep core directories named core (top level) and a/subsys/core, we would write:
(?<!(^core|^a/subsys/core)/).*/core$
as our regular expression.
Constructing these regexps is something of an art form, and rarely worth a lot of effort. Glob syntax is almost always simpler, and as long as it suffices, I prefer it. It was once significantly slower than regexp syntax but this was fixed back around Mercurial 3.1.
1Grouped, here, means that in Python code, we may use the .groups() method to obtain the parts of the string matched by these parts of the regular expressions. Non-grouped (?:...) expressions do not affect the way .groups() gathers the parts of the strings. As in the paragraph to which this is a footnote, this is more a concern when writing Python (or Perl, or whatever) code, not when using these patterns in .hgignore or other parts of Mercurial.
Try to give the filename using glob syntax:
syntax: glob
core
It gives:
~/hg-test$ hg st -A
M .hgignore
? multicore
I core
I dir1/core

CVS -- Need command line to change status of file file from Binary to allow keyword substitution

I am coming into an existing project after several years of use. I have been attempting to add the nice keywords $Header$ and $Id$ so that I can identify the file versions in use.
I have come across several text files where these keywords did not expand at all. Investigation has determined that CVS thinks these files are BINARY and will not expand the keywords.
Is there anyway from a Linux Command Line invocation to permanently change the status of these files in the repository to cause keyword expansion? I'd be appreciative if you could tell me. Several attempts that I have tried have not succeeded.
cvs admin -kkv filename
will restore the file to the default text mode so keywords are expanded.
If you type
cvs log -h filename
(to show just the header and not the entire history), a binary file will show
keyword substitution: b
which indicates that keyword substitution is never done, while a text file will show
keyword substitution: kv
The CVSROOT/cvswrappers file can be used to specify the default new files you add, based on their names.