I have a vey large (4gb) csv file. Cannot open in excel or in other editors. The number of lines (rows) is nearly 3,000 and the number of columns is nearly 320,000.
One solution is to split the original file into smaller ones and be able to open these small ones in Excel or other editors.
The second solution is to take the transpose of the original data then open it in the Excel.
I could not find a tool or script for transpose. I've found some scripts and free software for splitting but each of them splits the csv by row size.
Is there a way to split the original file into smaller ones that consist of max 15000 rows.
I tried to use:
import panda as pd
pd.read_csv(%file Path%).T.to_csv('%new File Path%,headre=false)
But it take ages to complete
In meanwhile I tired to use some python coding, but all of them failed because of the memory issues.
Trial version of the Delimit (http://www.delimitware.com/) handled the data perfectly.
Related
I have what I am hoping is an easy thing ;)
I load a large dataset into a pandas dataframe from an excel file.
I then load that excel file into a mySQL table which has basically the same structure.
I have found that many of the cells in excel, have lots of trailing whitespace, which translates to whitespace in the columns in the table.
I am hoping there is a simple way to remove this whitespace before/while it is written to the database without massive loops. (i know there is the 'strip' method, but i think I'd have to loop through all the fields / rows.
Is there some simple switch that can be called during the read_excel or the to_sql, which would remove whitespace automatically? (likely daydreaming..)
I can also clean it up after it is loaded into the db - but i'm looking for any options before that.
Thanks!
I have 50 gb sample-gene expression data .I want to store this data in mysql files.Data is divided into three txt files .one is sample ,2nd is gene ,and third is a sample-gene matrix which store their expression values.
I tried with three tables,one is for sample,2nd for gene and third with two foreign keys sample id,geneid and a field exp_value .But problem is how i can store that matrix in this table.
Please read
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
You have data in text files, hoping that it's already formatted with separators. If it's formatted then it's easy to import.
If you are using linux use Konsole. If you are using Windows use CMD. It will take while to import it for that large file size. You just have to wait. It's gonna be a lot of trials and errors at first.
I have a particular dataset in tsv format seperated by tabs that is one big txt file of around 100gb (Somewhere around 255 million rows). I have to filter and extract relevant rows so I can easily work on them. So far, I know that Excel can't handle that many rows, and familliar text editors can't open or very painful to work with tables. I've tried LogParser, a 36 mins query gave me a csv output but unfortunately exported number of rows are way below what I guess is present in the data. Also I get some parsing errors and some columns in the exported sets are shifted. Do you have any other alternatives? Maybe can I somehow turn the data into an SQL database? Is it possible?
I have 50 CSV files, up to 2 millions records in each.
I daily need to get 10000 random records from each of the 50 files and make a new CSV files with all the info (10000*50)
I can not do it manually, because will take me a lot of time, also I've tried to use Access, but, because database is larger then 2G, I cannot use it.
Also, I've tried to use CSVed - a good soft, but still did not help me.
Could someone please give an idea/soft in order to get random records from files and make a new CSV file?
There are many languages you could use, I would use C# and do this.
1) Get the number of lines in a file.
Lines in text file
2) Generate the 10,000 random numbers (unique if you need that) based on the maximum being the count from step 1.
Random without duplicates
3) Pull the records from step 2 from the file and write to new file.
4) Repeat for each file.
Other options if you want to consider a database other than Access are MySQL or SQL Server Express to name a couple.
I need to regularly merge data from multiple CSV files into a single spreadsheet by appending the rows from each source file. Only OpenOffice/LibreOffice is able to read the UTF-8 CSV file, which has quote-delimited fields containing newline characters.
Now, each CSV file column headings, but the order of the columns vary from file to file. Some files also have missing columns, and some have extra columns.
I have my master list of column names, and the order in which I would like them all to go. What is the best way to tackle this? LibreOffice gets the CSV parsing right (Excel certainly does not). Ultimately the files will all go into a single merged spreadsheet. Every row from each source file must be kept intact, apart from the column ordering.
The steps also need to be handed over to a non-technical third party eventually, so I am looking for an approach that will not offer too many non-expert technical hurdles.
Okay, I'm approaching this problem a different way. I have instead gone back to the source application (WooCommerce) to fix the export, so the spreadsheets list all the same columns and all in the same order, on every export. This does have other consequences that I need to follow up, such as managing patches and trying to get the changes accepted by the source project. But it does avoid having to append the CSV files with mis-matched columns, which seems to be a common issue that no-one has any real solutions for (yes, I have searched, a lot).