I am trying to load multiple csv files into a new db using the neo4j-admin import tool on a machine running Debian 11. To try to ensure there's no collisions in the ID fields, I've given every one of my node and relationship files.
However, I'm getting this error:
org.neo4j.internal.batchimport.input.HeaderException: Group 'INVS' not found. Available groups are: [CUST]
This is super frustrating, as I know that the INV group definitely exists. I've checked every file that uses that ID Space and they all include it.Another strange thing is that there are more ID spaces than just the CUST and INV ones. It feels like it's trying to load in relationships before it finishes loading in all of the nodes for some reason.
Here is what I'm seeing when I search through my input files
$ grep -r -h "(INV" ./import | sort | uniq
:ID(INVS),total,:LABEL
:START_ID(INVS),:END_ID(CUST),:TYPE
:START_ID(INVS),:END_ID(ITEM),:TYPE
The top one is from my $NEO4J_HOME/import/nodes folder, the other two are in my $NEO4J_HOME/import/relationships folder.
Is there a nice solution to this? Or have I just stumbled upon a bug here?
Edit: here's the command I've been using from within my $NEO4J_HOME directory:
neo4j-admin import --force=true --high-io=true --skip-duplicate-nodes --nodes=import/nodes/\.* --relationships=import/relationships/\.*
Indeed, such a thing would be great, but i don't think it's possible at the moment.
Anyway it doesn't seems a bug.
I suppose it may be a wanted behavior and / or a feature not yet foreseen.
In fact, on the documentation regarding the regular expression it says:
Assume that you want to include a header and then multiple files that matches a pattern, e.g. containing numbers.
In this case a regular expression can be used
while on the description of --nodes command:
Node CSV header and data. Multiple files will be
logically seen as one big file from the
perspective of the importer. The first line must
contain the header. Multiple data sources like
these can be specified in one import, where each
data source has its own header.
So, it appears that the neo4j-admin import considers the --nodes=import/nodes/\.* as a single .csv with the first header found, hence the error.
Contrariwise with more --nodes there are no problems.
Related
I am running some simulations using R, which at the very end produce a matrix with the outcome. Since I run it under different conditions, I use append in the write.table command which writes it into a CSV file.
For a while, everything seemed to work fine. But then yesterday, something in the output CSV file seemed wrong: the order of the simulations was upside down, and one column looked unrealistic. Additionally, the column title written underneath the troublesome column got a "3" instead of the column name.
Since this was the result of the last simulation, and hence was still saved in the last matrix created, I could check it. The first columns of the output were fine, but then there was lack of correspondence between the last two columns in the file and the real output.
Writing the matrix into a file is the very last command on my program. Here is the command I use:
write.table(gen.dist, "FST2.csv", col.names=TRUE, row.names=F,sep=",", append=T)
gen.dist is a numeric matrix.
Has anyone encountered a similar problem?
Thank you
I have a functional LMDB that, for test purposes, currently contains only 21 key / value records. I've successfully tested inserting and reading records, and I'm comfortable with the database working as intended.
However, when I use the mdb_stat and mdb_dump utilities, I see the following output, respectively:
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 1
VERSION=3
format=bytevalue
type=btree
mapsize=1073741824
maxreaders=126
db_pagesize=4096
HEADER=END
4d65737361676573
000000000000010000000000000000000100000000000000d81e0000000000001500000000000000ba1d000000000000
DATA=END
In particular, why would mdb_stat indicate only one entry when I have 21? Moreover, each entry comprises 1024 x 300 values of five bytes per value. mdb_dump obviously doesn't show anywhere near the 1,536,000 bytes I'd expect to see, yet the values I mdb_put() and mdb_get() on the fly are correct. Anyone know what's going on?
The relationship between an operating system's directory and an LMDB environment's data.mdb and lock.mdb files is one-to-one.
If the LMDB environment (in the OS directory) has more than one database, then the environment also contains a separate LMDB database containing all of its named databases.
The mdb_stat and mdb_dump utilities appear to contain minimal logic, so when they are fed a given directory via the command line, they appear to produce results only for the database storing database names and not the database(s) storing the actual data of interest.
4d65737361676573 is the Ascii for "Messages", which is the name of table ("sub-db" in lmdb terminology) storing the actual data in your case.
The mdb_dump command only dumps the main db by default. You can use the -s option to dump that sub-db, i.e.
mdb_dump -s Messages
or you can use the -a option to dump all the sub-dbs.
Since you are using a sub-database, the number of entries in the main database corresponds to the number of sub-databases you've created (ie just 1).
Try using mdb_stat -a. This will show you a break-down of all the sub-databases (as well as the main DB). In this breakdown it will list the number of entries for each sub-database. Here you should see your 21 entries.
I am trying to connect a database to Weka 3.6.13 in Linux Elementary OS.
First, I had a problem with JDBC connection, solved by this answer changing the /usr/bin/weka file.
Now, when I load the database, this error comes:
Unknown data type: INT. Add entry in weka/experiment/DatabaseUtils.props.
However, I am trying to use explorer only, this file doesn't even exists in my installation.
I installed via sudo apt install weka.
What should I do?
Look inside the directory where your weka.jar file resides, and check if there exists a file called DatabaseUtils.props.
The Weka wiki says:
Weka only looks for the DatabaseUtils.props file. If you take one of
the example files listed above, you need to rename it first.
My file is different I think the actual name does not really matter, it's the filename extension that matters.
In my version of this file there is a section that looks like this:
... (snip...
# mysql-conversion / type-mappings
CHAR=0
TEXT=0
VARCHAR=0
STRING=0
LONGVARCHAR=9
BINARY=0
VARBINARY=0
LONGVARBINARY=9
BIT=1
BOOL=1
NUMERIC=2
DECIMAL=2
FLOAT=2
DOUBLE=2
TINYINT=3
SMALLINT=4
#SHORT=4
SHORT=5
INTEGER=5
INT=5
BIGINT=6
LONG=6
REAL=7
DATE=8
TIME=10
TIMESTAMP=11
#mappings for table creation
CREATE_STRING=TEXT
CREATE_INT=INT
CREATE_DOUBLE=DOUBLE
CREATE_DATE=DATETIME
DateFormat=yyyy-MM-dd HH:mm:ss
#database flags
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true
setAutoCommit=true
createIndex=false
# All the reserved keywords for this database
Keywords=\
AND,\
ASC,\
BY,\
DESC,\
FROM,\
GROUP,\
INSERT,\
ORDER,\
SELECT,\
UPDATE,\
WHERE
# The character to append to attribute names to avoid exceptions due to
# clashes between keywords and attribute names
KeywordsMaskChar=_
#flags for loading and saving instances using DatabaseLoader/Saver
nominalToStringLimit=50
idColumn=auto_generated_id
If you do a google search for this file, another guy has posted his on github. The weka Wiki or SVN/Git-Repo might also list an offfical version somewhere (cannot find it right now), or you can open your weka.jar file as a zip file and extract the .props file (/src/main/java/weka/experiment/DatabaseUtils.props.mysql).
In any case, Mysql exists in many different versions, and I think you can even switch the query engine inside mysql. So I cannot express any guarantees that any of these 2 .props files shown here really work for you. You should experiment a bit.
I'm new in neo4j.
I'm trying to load csv files using the import.bat,
with shell.
(in windows)
I have 500,000 nodes
and 37 million relationships.
The import.bat is not working.
The code in shell cmd:
../neo4j-community-3.0.4/bin/neo4j-import \
--into ../neo4j-community-3.0.4/data/databases/graph.db \
--nodes:Chain import\entity.csv
--relationships import\roles.csv
but I did not know where to keep the csv files
and how to use the import.bat with shell.
I'm not sure I'm in the right place:
neo4j-sh(?)$
(I looked at a lot of examples, for me it just does not work)
I try to start the server with the cmd line and it's not working. That's what I did:
neo4j-community-3.0.4/bin/neo4j.bat start
I want to work with indexes I set the index, but when I try to use it,
it's not working:
start n= node:Chain(entity_id='1') return n;
I set the properties:
node_keys_indexable=entity_id
and also:
node_auto_indexing=true
Without indexes this query:
match p = (a:Chain)-[:tsuma*1..3]->(b:Chain)
where a.entity_id= 1
return p;
try to get one node with 3 levels
it's returned 49 relationships in 5 minutes.
It's a lot of time!!!!!
Your import command looks correct. You point to the csv files where they are, just like with how you point to --into directory. If you're unsure then use fully qualified names like /home/me/some-directory/entities.csv. What does it say (really hard to help you without knowing the error).
What's the error?
Legacy indexes doesn't go well with the importer and so enabling the legacy indexes afterwards doesn't index your data, could you instead use a index (CREATE INDEX ...)?
I have files being generated by another program/user that have names such as "jh-1.txt, jh-2.txt, ..., jh-100.txt, ..., jh-1024.txt". I'm extracting a column from these files, manipulating the data, and outputting to a new matrix. The only problem is that Octave is using ASCII ordering and not natural ordering when reading in the files. Thus, the output matrix is not ordered in a natural way. My question is, can Octave sort file names in a natural order? I'm getting file names in the standard method:
fileDirectory = '/path/to/directory';
filePattern = fullfile(fileDirectory, '*.txt'); % Selects only the txt files.
dataFiles = dir(filePattern); % Gets the info from the txt files in the directory.
baseFileName = {dataFiles.name}'; % Gets all the txt file names.
I can't rename the files because this is a script for another user. They are on a Windows machine and already have Octave installed with Cygwin and I don't want to make them use the command line more than they have to because they are unfamiliar with it. Alternatively, it would be nice to have the output with the file names in a column but, I haven't figured that one out either (bit of a noob with Octave myself). That way the user could use Excel (which they are familiar with) to sort the columns.
I don't think there's a built in natural sort in Octave. However, there is a natural sort submission on Mathwork's File Exchange. I've not used it, but the comments imply it works in Octave too.