I am working with a 6.0 MB JSON file that is being used with about 100 other scripts on a server that will soon be set up. I wish to compress the file by deleting all of the extra spaces, tabs, returns, etc., but all of the sources I've found for compressing the file can't handle the file's size (it's around 108,000 lines of code). I need to break the file up in a way that it will be easy to reassemble once each chunk has been compressed. Does anyone know how to break it up in an efficient way? Help would be much appreciated!
Because python scripts could already handle the large file, I ended up using ipython and writing a .py script that would dump the script without spaces. To use this script, one would type:
$ ipython -i compression_script.py
This is the code within compression_script.py:
import json
filename= raw_input('Enter the file you wish to compress: ')# file name we want to compress
newname = 'compressed_' + filename # by default i have set the new filename to be 'compressed_' + filename
fp = open(filename)
jload = json.load(fp)
newfile = json.dumps(jload, indent = None, separators = (',', ':'))
f = open(newname, 'wb')
f.write(newfile)
f.close()
print('Compression complete! Type quit to exit IPython')
you can be done in php also like ....
//
$myfile = fopen("newfile.txt", "w") or die("Unable to open file!");
$handle = fopen("somehugefile.json", "r");
if ($handle) {
$i = 0;
while (!feof($handle)) {
$buffer = fgets($handle, 5096);
$buffer = str_replace("\r\n","", $buffer);
$buffer = str_replace("\t","", $buffer);
fwrite($myfile, $buffer);
$i++;
//var_dump($buffer);
/*
if ($i == 1000) {
die('stop');
}
*/
}
fclose($handle);
fclose($myfile);
}
Related
I´m trying to import about 3gb csv files to phpmyadmin. Some of them contains more terminated chars and then importing stops because of wrong fields.
I have two colums which i want to fill. Im using : as terminanting char but when there is more of them in line it just stops. I cannot manage csv files they are too big. I want to skip error lines or look for other solutions. How can i do this ?
csv files looks like this
ahoj123:dublin
cat:::dog
pes::lolko
As a solution to your problem, I have written a simple PHP file that will "fix" your file for you ..
It will open "test.csv" with contents of:
ahoj123:dublin
cat:::dog
pes::lolko
And convert it to the following and save to "fixed_test.csv"
ahoj123:dublin
cat:dog
pes:lolko
Bear in mind that I am basing this on your example, so I am letting $last keep it's EOL character since there is no reason to remove or edit it.
PHP file:
<?php
$filename = "test.csv";
$handle = fopen($filename, "r+") or die("Could not open $filename" . PHP_EOL);
$keep = '';
while(!feof($handle)) {
$line = fgets($handle);
$elements = explode(':', $line);
$first = $elements[0];
$key = (count($elements) - 1);
$last = $elements[$key];
$keep .= "$first:$last";
}
fclose($handle);
$new_filename = "better_test.csv";
$new_handle = fopen("fixed_test.csv", "w") or die("Could not open $new_filename" . PHP_EOL);
fwrite($new_handle, $keep);
fclose($new_handle);
I have extracted a load of files from a webapp, and all it gives me is a CSV file containing a load of blobs of the file, and the filename.
What is the best way to convert these to the actual files? I was thinking using a powershell script?
I created a PowerShell script, that extracts the blobs and filename from CSV and saves in a given directory as raw files.
$path = 'D:\TEMP\Attachments\extract.csv'
$exportPath = 'D:\TEMP\Files'
Import-Csv $path | Foreach-Object {
$b64 = $_.BODY
$bytes = [Convert]::FromBase64String($b64)
$filename = $exportPath + "\" + $_.NAME
[IO.File]::WriteAllBytes($filename, $bytes)
}
I use following code in order to save a file:
$file = UploadedFile::getInstance($model, 'uploadedFile');//Get the uploaded file
$fp = fopen($file->tempName, 'r');
//$content = fread($fp, filesize($file->tempName));
$content = file_get_contents($file->tempName);
fclose($fp);
$model->content = $content;
$model->save();
With mentioned code I can save files up to approximately 1 MB. But larger files throw an error after $model->save():
PDOStatement::execute(): MySQL server has gone away
I use mediumblob type. What can be a problem?
The problem was a max_allowed_packet = 1M inside my.ini.
What we have got : A single file csv file with field names as header.
What we need :
On the basis of size of the file we need to split it into multiple smaller csv files with exptension _00*.
Condtion : If file_size < 5 GB then no action.
If File_size > 5 GB then Split it into Multiple file with any dimension that ranges between ( 1 GB to < 5 GB ) .
Here we need to take care that while splitting the file by size we don't split a single record.
We need to preserve the header record of source file and replicate it into each new file.
Along with each small file a blank file with same name but with extension (.ok) needs to be created . It is just for notification that the file got created.
In the end delete the source file. Only keep new files. and create 1 final file with same name as source file but with extension .ok
Ex : Source file : file_name_20160316.csv size : 8.8 Gb
Output :
file_name_20160316_001.csv ( size : 4 GB)
file_name_20160316_001.ok
file_name_20160316_002.csv ( size : 4.8 GB)
file_name_20160316_002.ok
file_name_20160316.ok
Please help us writing Unix code for the same.
#!/usr/bin/perl -p
BEGIN
{
$dim = 5e9;
$header = <>; # We need to preserve the header record
exit if -s ARGV < $dim; # If file_size < 5 GB then no action.
$headsize = $told = tell;
# ranges between ( 1 GB to < 5 GB )
$dim = ($dim+(-s _)/int(1+(-s _)/$dim))/2 if (-s _)%$dim <= 1e9;
($base = $ARGV) =~ s/.csv/_/;
$extent = "000"
}
if (tell > $lim) # need new file?
{
$lim = $told+$dim-$headsize;
open OK, ">$base$extent.ok" and close OK if $output;
$output = $base.++$extent.'.csv';
open STDOUT, ">$output" or die "$output: $!\n";
print $header # replicate into each new file.
}
$told = tell;
END
{
open OK, ">$base$extent.ok" and close OK if $output;
chop $base;
unlink $ARGV and open OK, ">$base.ok" and close OK
}
so I have a Folder called DATA, and it includes the following: part1.html, part2.html, part3.html, HTML.htm, plain.html, and jojo.jsp.
Now i use the following commmand to open the DATA folder and extract the files containing .htm
opendir(DIR,'DATA');
my(#dir) = grep /\.htm/, readdir (DIR);
closedir(DIR);
It successfully prints out the name of the files containing the extensions .html . Now i wish to use the html file that are filtered and print the data out into the cygwin terminal. I tried to use the files and stored it to a variable, and use a foreach loop to open the first html file using Filehandler and printing out the data init. The loop will repeat itself and do the same for all the other html files. But i seemed to run into the error! Please help!
my $value = join(#dir);
print "$value\n";
foreach(#dir){
my $movies = my $value;
open (FHD, $movies) || die " could not open $movies\n";
my #movies = <FHD>;
my $value2 = join(', ', #movies);
print "$value2\n";
What's with this line?
my $movies = my $value;
You're making this a lot harder than it needs to be.
Just use glob to read the directory as that will automatically include the path information on your found files.
use strict;
use warnings;
use autodie;
for my $html (glob('DATA/*.htm*')) {
print "File: $html\n";
open my $fh, '<', $html;
print <$fh>;
}