Cannot connect to MySQL with Weka - mysql

I am trying to connect a database to Weka 3.6.13 in Linux Elementary OS.
First, I had a problem with JDBC connection, solved by this answer changing the /usr/bin/weka file.
Now, when I load the database, this error comes:
Unknown data type: INT. Add entry in weka/experiment/DatabaseUtils.props.
However, I am trying to use explorer only, this file doesn't even exists in my installation.
I installed via sudo apt install weka.
What should I do?

Look inside the directory where your weka.jar file resides, and check if there exists a file called DatabaseUtils.props.
The Weka wiki says:
Weka only looks for the DatabaseUtils.props file. If you take one of
the example files listed above, you need to rename it first.
My file is different I think the actual name does not really matter, it's the filename extension that matters.
In my version of this file there is a section that looks like this:
... (snip...
# mysql-conversion / type-mappings
CHAR=0
TEXT=0
VARCHAR=0
STRING=0
LONGVARCHAR=9
BINARY=0
VARBINARY=0
LONGVARBINARY=9
BIT=1
BOOL=1
NUMERIC=2
DECIMAL=2
FLOAT=2
DOUBLE=2
TINYINT=3
SMALLINT=4
#SHORT=4
SHORT=5
INTEGER=5
INT=5
BIGINT=6
LONG=6
REAL=7
DATE=8
TIME=10
TIMESTAMP=11
#mappings for table creation
CREATE_STRING=TEXT
CREATE_INT=INT
CREATE_DOUBLE=DOUBLE
CREATE_DATE=DATETIME
DateFormat=yyyy-MM-dd HH:mm:ss
#database flags
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true
setAutoCommit=true
createIndex=false
# All the reserved keywords for this database
Keywords=\
AND,\
ASC,\
BY,\
DESC,\
FROM,\
GROUP,\
INSERT,\
ORDER,\
SELECT,\
UPDATE,\
WHERE
# The character to append to attribute names to avoid exceptions due to
# clashes between keywords and attribute names
KeywordsMaskChar=_
#flags for loading and saving instances using DatabaseLoader/Saver
nominalToStringLimit=50
idColumn=auto_generated_id
If you do a google search for this file, another guy has posted his on github. The weka Wiki or SVN/Git-Repo might also list an offfical version somewhere (cannot find it right now), or you can open your weka.jar file as a zip file and extract the .props file (/src/main/java/weka/experiment/DatabaseUtils.props.mysql).
In any case, Mysql exists in many different versions, and I think you can even switch the query engine inside mysql. So I cannot express any guarantees that any of these 2 .props files shown here really work for you. You should experiment a bit.

Related

Admin import - Group not found

I am trying to load multiple csv files into a new db using the neo4j-admin import tool on a machine running Debian 11. To try to ensure there's no collisions in the ID fields, I've given every one of my node and relationship files.
However, I'm getting this error:
org.neo4j.internal.batchimport.input.HeaderException: Group 'INVS' not found. Available groups are: [CUST]
This is super frustrating, as I know that the INV group definitely exists. I've checked every file that uses that ID Space and they all include it.Another strange thing is that there are more ID spaces than just the CUST and INV ones. It feels like it's trying to load in relationships before it finishes loading in all of the nodes for some reason.
Here is what I'm seeing when I search through my input files
$ grep -r -h "(INV" ./import | sort | uniq
:ID(INVS),total,:LABEL
:START_ID(INVS),:END_ID(CUST),:TYPE
:START_ID(INVS),:END_ID(ITEM),:TYPE
The top one is from my $NEO4J_HOME/import/nodes folder, the other two are in my $NEO4J_HOME/import/relationships folder.
Is there a nice solution to this? Or have I just stumbled upon a bug here?
Edit: here's the command I've been using from within my $NEO4J_HOME directory:
neo4j-admin import --force=true --high-io=true --skip-duplicate-nodes --nodes=import/nodes/\.* --relationships=import/relationships/\.*
Indeed, such a thing would be great, but i don't think it's possible at the moment.
Anyway it doesn't seems a bug.
I suppose it may be a wanted behavior and / or a feature not yet foreseen.
In fact, on the documentation regarding the regular expression it says:
Assume that you want to include a header and then multiple files that matches a pattern, e.g. containing numbers.
In this case a regular expression can be used
while on the description of --nodes command:
Node CSV header and data. Multiple files will be
logically seen as one big file from the
perspective of the importer. The first line must
contain the header. Multiple data sources like
these can be specified in one import, where each
data source has its own header.
So, it appears that the neo4j-admin import considers the --nodes=import/nodes/\.* as a single .csv with the first header found, hence the error.
Contrariwise with more --nodes there are no problems.

file "(...).csv" not Stata file error in using merge command

I use Stata 12.
I want to add some country code identifiers from file df_all_cities.csv onto my working data.
However, this line of code:
merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3)
Gives me the error:
. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);
This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?
Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)
You need to read the .csv file with the command insheet. You can use it like this.
* Preserve saves a snapshot of your data which is brought back at "restore"
preserve
* Read the csv file. clear can safely be used as data is preserved
insheet using "df_all_cities.csv", clear
* Create a tempfile where the data can be saved in .dta format
tempfile country_codes
save `country_codes'
* Bring back into working memory the snapshot saved at "preserve"
restore
* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3)
See how insheet is also using using and this command accepts .csv files.

CVS -- Need command line to change status of file file from Binary to allow keyword substitution

I am coming into an existing project after several years of use. I have been attempting to add the nice keywords $Header$ and $Id$ so that I can identify the file versions in use.
I have come across several text files where these keywords did not expand at all. Investigation has determined that CVS thinks these files are BINARY and will not expand the keywords.
Is there anyway from a Linux Command Line invocation to permanently change the status of these files in the repository to cause keyword expansion? I'd be appreciative if you could tell me. Several attempts that I have tried have not succeeded.
cvs admin -kkv filename
will restore the file to the default text mode so keywords are expanded.
If you type
cvs log -h filename
(to show just the header and not the entire history), a binary file will show
keyword substitution: b
which indicates that keyword substitution is never done, while a text file will show
keyword substitution: kv
The CVSROOT/cvswrappers file can be used to specify the default new files you add, based on their names.

mysql - set ~/.my.cnf location?

Is it possible to specify which .my.cnf file mysql client should use? I have 2 mysql instances running on different ports and want to only need to specify a filename with credentials.
As documented under Command-Line Options that Affect Option-File Handling:
When specifying file names, you should avoid the use of the “~” shell metacharacter because it might not be interpreted as you expect.
--defaults-extra-file=file_name
Read this option file after the global option file but (on Unix) before the user option file. If the file does not exist or is otherwise inaccessible, the program exits with an error. file_name is interpreted relative to the current directory if given as a relative path name rather than a full path name.
--defaults-file=file_name
Use only the given option file. If the file does not exist or is otherwise inaccessible, the program exits with an error. file_name is interpreted relative to the current directory if given as a relative path name rather than a full path name.
--defaults-group-suffix=str
If this option is given, the program reads not only its usual option groups, but also groups with the usual names and a suffix of str. For example, the mysql client normally reads the [client] and [mysql] groups. If the --defaults-group-suffix=_other option is given, mysql also reads the [client_other] and [mysql_other] groups.
Note that "to work properly, each of these options must be given before other options".

Natural ordering files in directory into a cell array using Octave

I have files being generated by another program/user that have names such as "jh-1.txt, jh-2.txt, ..., jh-100.txt, ..., jh-1024.txt". I'm extracting a column from these files, manipulating the data, and outputting to a new matrix. The only problem is that Octave is using ASCII ordering and not natural ordering when reading in the files. Thus, the output matrix is not ordered in a natural way. My question is, can Octave sort file names in a natural order? I'm getting file names in the standard method:
fileDirectory = '/path/to/directory';
filePattern = fullfile(fileDirectory, '*.txt'); % Selects only the txt files.
dataFiles = dir(filePattern); % Gets the info from the txt files in the directory.
baseFileName = {dataFiles.name}'; % Gets all the txt file names.
I can't rename the files because this is a script for another user. They are on a Windows machine and already have Octave installed with Cygwin and I don't want to make them use the command line more than they have to because they are unfamiliar with it. Alternatively, it would be nice to have the output with the file names in a column but, I haven't figured that one out either (bit of a noob with Octave myself). That way the user could use Excel (which they are familiar with) to sort the columns.
I don't think there's a built in natural sort in Octave. However, there is a natural sort submission on Mathwork's File Exchange. I've not used it, but the comments imply it works in Octave too.