I have tried the Cypher export csv, but the query itself takes too long, and it didn't seem practical since i needed to export a whole database which ranges upwards to a few gigabytes in size.
Is there a tool that lets me export a huge database in fairly good time?
You should take a look at the APOC procedures. This is straightforward to install within your Neo4j environment and provide many additional functionalities, e.g. to export your results or data to .csv file.
Related
We are trying to import CSV file into Access Database via Powershell. My input file size is 1GB and it is getting difficult to iterate through each row and use insert command. Any quick suggestions here are highly appreciated
Thanks!!
as expresssed by #AlbertD.Kallal - what is the reason to use powershell at all? ... I simply made an assumption that you sought something that would run automatically, daily, unattended - - - as that is a typical reason.
if that is the case then it really breaks down to 2 parts:
make the import work manually in Access - - and then set up that import to fire automatically upon start/open of the Access file (auto exec).
just use powershell to start/open the Access file daily (or whenever...).
Access is not designed to be open full time and run unattended. So this is the typical approach to use it in that mode.
Ok, now having stated no need for power-shell, there are cases in which the IT folks and people are using power-shell to automate processes. So it not "bad" to consider power-shell - especially if it is being used.
I only wanted to point out that PowerShell will not help performance wise - and probably will be slower.
If you have (had) to say schedule a import to occur every 15 min or whatever?
Then I suggest setting up a VBA routine in a standard code module in Access to do the import. You then in the power shell, or windows script launch access, and then call that import routine. So, first step is to setup that routine in Access - even if using some kind of batch system for scheduling that import routine to run.
So, you use the windows scheduler.
It would: launch access, run the VBA sub, shutdown Access.
And using the windows scheduler is quite robust. So, we don't need (or want) to keep access running, but only launch it, run the import, and then shutdown access.
Next up, if the import process is "huge" or rather large, then on startup, a temp accDB file can be created, and then we import into that. We then can take the import table and send it into the production data table. (often column names are different etc. It also of course much safer to import into that temp table, and better yet, we can delete that temp file after - and thus we never suffer bloating or file size problems (no need to compact + repair).
So, the first thing to do is manually import the csv file using the Access UI. This ALSO allows you to create + setup a import spec. That import spec can thus remember the data types (currency, or often date time columns).
Once we have the import working and the import spec created?
Then we now can write code to do the above same steps, and THEN take the imported table and put that data into the production data table.
It it not clear if you "stage" the imported csv into that temp table, and then process that table into the real production data table, but I do suggest doing this.
(too dangerous to try and import directly into the production data).
you also don't share right now what kind of pre-processing, or what additonal code is required after you do the import of that csv (but, still, we assume now that such imports will be into a new temp table).
So, I would assume the steps are:
we import the csv file using built in import abiity of Acces
we then send this data table to the production table, perhaps with some code processing of each row before we send that temp table to production table.
Once we done the import, then we dump + delete the temp accDB file we used for the import, and thus we eliminate the huge data bloat issue.
Thus for the next time, then we create that temp file for a fresh import, and thus each time we start out with a nice empty database file.
So the first question, and you can create a blank new database for this test. Do you or can you import the csv file using Access. You want to do this, since such imports are VERY fast and VERY high speed. Even if the imported format is not 100% as you want, you do need to confirm and try if using the access UI you can import the csv file. if you can, then we can adopt VBA commands to do the same thing, but no use writing code if a simple csv import via Access UI can't be used.
I have millions of documents in different collections in my database. I need to export them to a csv onto my local storage when I specify the collection name.
I tried mlcp export but didn't work. We cannot use corb for this because of some issues.
I want the csv to be in such a format that if I try a mlcp import then I should be able to restore all docs just the way they were.
My first thought would be to use MLCP archive feature, and to not export to a CSV at all.
If you really want CSV, Corb2 would be my first thought. It provides CSV export functionality out of the box. It might be worth digging into why that didn't work for you.
DMSDK might work too, but involves writing code that handles the writing of CSV, which sounds cumbersome to me.
Last option that comes to mind would be Apache NiFi for which there are various MarkLogic Processors. It allows orchestration of data flow very generically. It could be rather overkill for your purpose though.
HTH!
ml-gradle has support for exporting documents and referencing a transform, which can convert each document to CSV - https://github.com/marklogic-community/ml-gradle/wiki/Exporting-data#exporting-data-to-csv .
Unless all of your documents are flat, you likely need some custom code to determine how to map a hierarchical document into a flat row. So a REST transform is a reasonable solution there.
You can also use a TDE template to project your documents into rows, and the /v1/rows endpoint can return results as CSV. That of course requires creating and loading a TDE template, and then waiting for the matching documents to be re-indexed.
We are finally moving from Excel and .csv files to databases. Currently, most of my Tableau files are connected to large .csv files (.twbx).
Is there any performance differences between PostgreSQL and MySQL in Tableau? Which would you choose if you were starting from scratch?
Right now, I am using pandas to join files together and creating a new .csv file based on the join.(Example, I take a 10mil row file and drop duplicates and create a primary key, then I join it with the same key on a 5mil row file, then I export the new 'Consolidated' file to .csv and connect Tableau to it. Sometimes the joins are complicated involving dates or times and several columns).
I assume I can create a view in a database and then connect to that view rather than creating a separate file, correct? Each of my files could instead be a separate table which should save space and allow me to query dates rather than reading the whole file into memory with pandas.
Some of the people using the RDMS would be completely new to databases in general (dashboards here are just Excel files, no normalization, formulas in the raw data sheet, etc.. it's a mess) so hopefully either choice has some good documentation to lesson the learning curve (inserting new data and selecting data mainly, not the actual database design).
Both will work fine with Tableau. In fact, Tableau's internal data engine is based on Postgres.
Between the two, I think Postgres is more suitable for a central data warehouse. MySQL doesn’t allow certain SQL methods such as Common Table Expressions and Window Functions.
Also, if you’re already using Pandas, Postgres has a built-in Python extension called PL/Python.
However, if you’re looking to store a small amount of data and get to it really fast without using advanced SQL, MySQL would be a fine choice but Postgres will give you a few more options moving forward.
As stated, either database will work and Tableau is basically agnostic to the type of database that you use. Check out https://www.tableau.com/products/techspecs for a full list of all native (inbuilt & optimized) connections that Tableau Server and Desktop offer. But, if your database isn't on that list you can always connect over ODBC.
Personally, I prefer postgres over mysql (I find it really easy to use psycopg2 to write to postgres from python), but mileage will vary.
I have a large CSV file (5.4GB) of data. It's a table with 6 columns a lot of rows. I want to import it into MySQL across several tables. Additionally I have to do some transformations to the data before import (e.g. parse a cell, and input the parts into several table values etc.). Now I can either do a script does a transformation and inserts a row at a time but it will take weeks to import the data. I know there is the LOAD DATA INFILE for MySQL but I am not sure how or if I can do the needed transformations in SQL.
Any advice how to proceed?
In my limited experience you won't want to use the Django ORM for something like this. It will be far too slow. I would write a Python script to operate on the CSV file, using Python's csv library. And then use the native MySQL facility LOAD DATA INFILE to load the data.
If the Python script to massage the CSV file is too slow you may consider writing that part in C or C++, assuming you can find a decent CSV library for those languages.
I am building my first database driven website with Drupal and I have a few questions.
I am currently populating a google docs excel spreadsheet with all of the data I want to eventually be able to query from the website (after it's imported). Is this the best way to start?
If this is not the best way to start what would you recommend?
My plan is to populate the spreadsheet then import it as a csv into the mysql db via the CCK Node.
I've seen two ways to do this.
http://drupal.org/node/133705 (importing data into CCK nodes)
http://drupal.org/node/237574 (Inserting data using spreadsheet/csv instead of SQL insert statements)
Basically my question(s) is what is the best way to gather, then import data into drupal?
Thanks in advance for any help, suggestions.
There's a comparison of the available modules at http://groups.drupal.org/node/21338
In the past when I've done this I simply write code to do it on cron runs (see http://drupal.org/project/phorum for an example framework that you could strip down and build back up to do what you need).
If I were to do this now I would probably use the http://drupal.org/project/migrate module where the philosophy is "get it into MySQL, View the data, Import via GUI."
There is a very good module for this, node import. It allows you to take your GoogleDocs spreadsheet and import it as a .csv file.
It's really easy to use, the module allows you to map your .csv columns to the node fields you want them to go to, so you don't have to worry about setting your columns in a particular order. Also, if there is an error on some records, it will spit out a .csv with the error files and what caused the error, but will import all good records.
I have imported up to 3000 nodes with this method.