Importing MYSQL database to NeO4j - mysql

I have a mysql database on a remote server which I am trying to migrate into Neo4j database. For this I dumped the individual tables into csv files and am now planning to use the LOAD CSV functionality to create graphs from the tables.
How does loading each table preserve the relationship between tables?
In other words, how can I generate a graph for the entire database and not just a single table?

Load each table as a CSV
Create indexes on your relationship field (Neo4j only does single property indexes)
Use MATCH() to locate related records between the tables
Use MERGE(a)-[:RELATIONSHIP]->(b) to create the relationship between the tables.
Run "all at once", this'll create a large transaction, won't go to completion, and most likely will crash with a heap error. Getting around that issue will require loading the CSV first, then creating the relationships in batches of 10K-100K transaction blocks.
One way to accomplish that goal is:
MATCH (a:LabelA)
MATCH (b:LabelB {id: a.id}) WHERE NOT (a)-[:RELATIONSHIP]->(b)
WITH a, b LIMIT 50000
MERGE (a)-[:RELATIONSHIP]->(b)
What this does is find :LabelB records that don't have a relationship with the :LabelA records and then creates that relationship for the first 50,000 records it finds. Running this repeatedly will eventually create all the relationships you want.

Related

SSIS Script component - Reference data validation

I am in the process of extending an SSIS package, which takes in data from a text file, 600,000 lines of data or so, modifies some of the values in each line based on a set of business rules and persists the data to a database, database B. I am adding in some reference data validation, which needs to be performed on each row before writing the data to database B. The reference data is stored in another database, database A.
The reference data in database A is stored in seven different tables; each tables only has 4 or 5 columns of type varchar. Six of the tables contain < 1 million records and the seventh has 10+ million rows. I don't want to keep hammering the database for each line in the file and I just want to get some feedback on my proposed approach and ideas on how best to manage the largest table.
The reference data checks will need to be performed in the script component, which acts as a source in the data flow. It has an ado.net connection. On pre-execute, I am going to retrieve the reference data from database 'A', the tables which have < 1 million rows, using the ado.net connection, loop through them all using a sqldatareader, convert them to .Net objects; one for each table and add them to a dictionary.
As I process each line in the file, I can use the dictionaries to perform the reference data validation. Is this a good approach? Anybody got any ideas on how best to manage the largest table?

Way to create and load multiple tables of same schema

I have 200 tab delimited files which I want to load up in MySQL database.Is there any way to automate create table command for creating the schema for 200 tables ,and loading up those 200 tables automatically?
The thing is I would have to run the create table query and loading tables 200 times each.so any way to automate it.
The create table command creates one table. You can run 200 create tables in one sql script, but the create table schema would have to be there for each table.
The only way you could do multiple create tables is if all your tables were exactly the same. You could use a FOR LOOP to run the create table sql as many times as you want. The only thing is, if you have more than one table that is exactly the same, you have other problems. So the answer is no.
There are various software that can import your tab delimited files and create the tables for you, but you will still have import 200 times.
On the plus side, you only have to import them once. At that point you can easily export all the tables to a single sql file. You will now be at a single import of your tables for the future.

Performing a Bulk insert into two tables

So I am have some statistical studies that I would like to import into a MySQL databases. The studies have numerous variables, each is used to create a column in my database. I have a CSV file of all the data in my studies that I would like to import into my database as well.
Some of the studies have greater than 1000 variables in them. This means there will be more than 1000 columns in my table, which I know is the limit in MySql. Because of this I have to create multiple tables for my study and combine them using a view to see all the variables at once.
Does this mean that I will have to have multiple CSV files as well (one for each 1000 column table) or is there some way to perform a bulk insert from a CSV file into two tables?
You will certainly need multiple tables. I would write a script to read the csv files.and them write the data to the database.
However, before just dumping the data, I would look for opportunities to normalize/rationalize the dataset. You may discover that normalisation may actually help in your analysis.

java and mysql load data infile misunderstanding

Thanks for viewing this. I need a little bit of help for this project that I am working on with MySql.
For part of the project I need to load a few things into a MySql database which I have up and running.
The info that I need, for each column in the table Documentation, is stored into text files on my hard drive.
For example, one column in the documentation table is "ports" so I have a ports.txt file on my computer with a bunch of port numbers and so on.
I tried to run this mysql script through phpMyAdmin which was
LOAD DATA INFILE 'C:\\ports.txt" INTO TABLE `Documentation`(`ports`).
It ran successfully so I went to do the other load data i needed which was
LOAD DATA INFILE 'C:\\vlan.txt' INTO TABLE `Documentation` (`vlans`)
This also completed successfully, but it added all the rows to the vlan column AFTER the last entry to the port column.
Why did this happen? Is there anything I can do to fix this? Thanks
Why did this happen?
LOAD DATA inserts new rows into the specified table; it doesn't update existing rows.
Is there anything I can do to fix this?
It's important to understand that MySQL doesn't guarantee that tables will be kept in any particular order. So, after your first LOAD, the order in which the data were inserted may be lost & forgotten - therefore, one would typically relate such data prior to importing it (e.g. as columns of the same record within a single CSV file).
You could LOAD your data into temporary tables that each have an AUTO_INCREMENT column and hope that such auto-incremented identifiers remain aligned between the two tables (MySQL makes absolutely no guarantee of this, but in your case you should find that each record is numbered sequentially from 1); once there, you could perform a query along the following lines:
INSERT INTO Documentation SELECT port, vlan FROM t_Ports JOIN t_Vlan USING (id);

How to use load data Infile to insert into multiple tables?

I use aa python program which inserts many new entries to database,
this new entries are spread across multiple tables.
I'm using load data infile to load the file, but this solution is only for one table, and I don't feel like to do this multiple times.
I found http://forge.mysql.com/worklog/task.php?id=875 this but I'm not quite
sure if its already implemented or not.
I am doing exactly what you are trying to do as follows:
Step 1: Create a temp table (holding all the fields of the import file)
Step 2: LOAD DATA LOCAL INFILE -> into the temp table
Step 3: INSERT INTO Table1 ( fieldlist ) SELECT FROM TempTable ( matching fieldlist ) ... include JOINS, WHERE, and ON PRIMARY KEY UPDATE as necessary
Step 4: Repeat step 3 with the second table insert query and so on.
Using this method I am currently importing each of my 22MB data files, and parsing them out to multiple tables (6 tables, including 2 audit/changes tables)
Without knowing your table structure and data file structure it is difficult to give you a more detailed explanation, but I hope this helps get you started
load data from local file to insert new data accross multiple tables isnt yet supported (v 5.1)
I don't think LOAD DATA can do that, but why not duplicate the table after importing?
See
Duplicating table in MYSQL without copying one row at a time
Or, if you can go outside mySQL, Easiest way to copy a MySQL database?