Large csv impossible to handle - csv

I am trying to access a 2.2 GB csv file. Excel and R are useless for this. SAS could have worked, but it seems that the file is corrupted and SAS cannot handle that. I am trying to do something with Python, but no luck so far. Any advice would be welcome, thanks.

Just for accessing the file: vim and gvim have large file plugins, depending on your OS.

Related

import very large XML file (60GB) into MySQL

I have a XML file with a size of almost 60 GB that I would like to import into a MySQL database. I have root access to my server, but I don't know how to handle files of this size.
You guys have any idea?
Normally I use Navicat, but it gave up...
Thanks
This is a little out of my area of knowledge but would this work ?
LOAD XML LOCAL INFILE '/pathtofile/file.xml'
INTO TABLE my_tablename(name, date, etc);
I know this sort of thing work with <1GB files, but I've yet to work with large files.
Hope this helps !
EDIT
If that doesn't work for you go take a look at the LOAD DATA documentation http://dev.mysql.com/doc/refman/5.1/en/load-data.html
You could use a command line xml splitter to split it into manageable size files first. Google to find one.

Access data created using Btrieve, stored in .DBK files

I asked a question here a while back and using the answers, made some head way in figuring out how my DOS based legacy software works.
My problem: The software uses Btrieve to read/store data in .dbk files. I know this because the DDF files reference these dbk files. I found a number of ways to open btrieve data but only if they are stored in .btr files.
Anyone has any hints? I've spent considerable amount of time digging through resources but to no avail. All I need right now is to see the data stored in the dbk files in a readable format.
If your DDFs reference .DBK files, you should, using a version of Btrieve / Pervasive that supports it, be able to use ODBC to read the data.
Create the ODBC DSN pointing to your DDFs and Data Files.
Once created, use your favorite export tool to export the data to your favorite format.

Conversion of GRIB and NetCDF to my database

I have downloaded "High Resolution Initial Conditions" climate forecast data for one day, it was in extension .tar.gz so I extracted it in my local directory and I get the files like in the attached image. I think, that the files without extension are GRIB data (because first word in them is "GRIB"). So I want to get data from the big files (GRIB and NetCDF formats containing climate data like temerature & pressure in grid) to my database, but they are binary. Can you recommend me some easy way for getting data from these files? I can't get any information about handling their datasets on their website.
Converting these files to .csv would be nice, but I can't find a program to convert the GRIB files.
Using python and some available modules it is simple...
The Enthought Python Distribution includes several packages, including netCDF4, to deal with NetCDF files!
I've never worked with GRIB files, but google tells that another python package exists, pygrib2.
Or you can use PyNio, a Python package that allows to read and write netCDF3 and netCDF4 classic format, and to read GRIB1 and GRIB2 files.
I don't know the ammount of data you have, but usually it is crazy to convert it to *.csv! Python is easy to learn, and suitable to work with this kind of data (with matplotlib package you can even plot it). Or, if you really need it in a *.csv, you can select with python a smaller domain, for example, or the needed variables...
For conversion into text, look into http://www.cpc.ncep.noaa.gov/products/wesley/wgrib.html or http://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/
Both are C programs from one of the big names in GRIB.
I'm currently dealing with a similar issue.
In my case I'm trying to rely on the GrADS software, which can "easily" transform GRIB data into other formats.
If your dataset is not huge, then you can export it to csv using this tutorial.
My dataset is 80gb in GRIB binary files, so I'm very restricted in what software I can use to handle it (no R unless I find a computer with more than 80gb of RAM).

Is there any free tool to convert a file with more than 65000 registers from DBF format to CSV?

I need to convert a very large file from DBF format to CSV format. I have tried Microsoft Excel to do the job, but the problem is that I cannot see more than 65500 registers when I open and export the file.
Microsoft Access couldn't open the file, too.
I have found on google some shareware tools, searching for "DBF to CSV". Have you tried any of these with very large files?
Also, any solution that could export to mysql or postgresql database formats will be welcome.
Thanks in advance for your responses, best regards,
https://github.com/SocialExplorer/FastDBF
"Also included here is a small utility that reads DBF files and outputs CSV files! "
go to http://www.the-oasis.net/ftpmaster.php3?content=ftputils.htm
look for this file dbx130.zip
Bytes: 125,478 Date: 1993-03-22
dbMAX is an xBASE utility that will allow complete multi-user access
to any xBASE databases and indexes. The program uses a CUA-type menu
system with Brief(R)-style hot keys and can browse databases in up to
250 moveable, sizable windows. Almost every Clipper(R)/dBASE(R)
command is available, allowing dbMAX to replace the dBASE
Assist/Control Center or Computer Associates' DBU utility. dbMAX also
has a partially open architecture, allowing programmers to create
their own menus and operate on dbMAX internal data structures.
this utility has a dos ui but it allows you via the Copy function on the menu to export entire dbf tables in SDF or CSV format. I personally know that it can handle a file with 3.8 million rows so it should be able to handle your table.
Use OpenOffice - Its free and can handle a lot of rows. With that many rows, you might need to split the file and then convert the pieces and then reassemble.
OpenOffice 3.0 Calc maxes out at 65K rows. I tried importing a large DBF into OpenOffice 3.0 Base but it handed the job off to Calc :-(
Alternative: if you have Python 2.4 to 2.6, I can send you a copy of my soon-to-go-public DBF-reading module plus a DBF-to-CSV script. To get my e-mail address, search for "John Machin xlrd" [xlrd is my Excel XLS-reading package].

PST to CSV File Conversion

Does anyone know of a good tool that converts .pst to .csv files through command line?
Can you assume Outlook is installed on the computer? If so, I believe it can be background scripted using OLE or something similar. I've done file conversions through Excel using Ruby that way.
And here's a Perl example
A solution I just stumbled across is:
libpst
It obviously doesn't convert straight to CSV, but it converts into a more manageable format.
Importing into Outlook and then exporting as CSV is still probably the quickest solution, but libpst would certainly be useful if all you have is the PST file and no Outlook.
one time only? or programmatically?
if one time only, import into a mail program that handles mbox (e.g. Thunderbird), at which point you just have text files, manipulate as desired.
otherwise, no idea, best of luck.
You can always write a .Net application using CDO, MAPI, OOM or Redemption, that does what you need.
I've written a complete Outlook exporter tool for my company, which you can view at http://www.tzunami.com