Why is Spark corrupting part 2 of multipart upload

Why is Spark corrupting part 2 of multipart upload - json

I have a spark job that, once done, uploads output to S3 in json format.
dataframe.write.mode(SaveMode.Overwrite).json(file_path)
Yesterday though, one json it uploaded was incomplete, rest jsons looked fine as usual. Below is the snippet of logs about that one file. I have extracted these from 2 log files out of the many that were generated for that job run.
log file 1
20/04/07 13:12:41 INFO MultipartUploadOutputStream: uploadPart: partNum 1 of 's3://bucket/part-00072-50d3246e-e18c-4058-9e1c-ad714305c92f-c000.json' from local file '/mnt/s3/emrfs-5360609960688228490/0000000000', 134217728 bytes in 5577 ms, md5: qg7f22UwVchHRejYe+41GQ== md5hex: aa0edfdb653055c84745e8d87bee3519
20/04/07 13:12:43 INFO MultipartUploadOutputStream: uploadPart: partNum 2 of 's3://bucket/part-00072-50d3246e-e18c-4058-9e1c-ad714305c92f-c000.json' from local file '/mnt/s3/emrfs-5360609960688228490/0000000001', 69266128 bytes in 1322 ms, md5: hOJmAWIoAMs2EtyBCuUw2g== md5hex: 84e26601622800cb3612dc810ae530da
20/04/07 13:12:44 INFO Executor: Finished task 72.0 in stage 549.0 (TID 413542). 212642 bytes result sent to driver
20/04/07 13:12:44 INFO DefaultMultipartUploadDispatcher: Completed multipart upload of 2 parts 203483856 bytes
20/04/07 13:12:44 INFO SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_20200407131212_0549_m_000072_0
log file 2
20/04/07 13:12:37 INFO Executor: Running task 72.1 in stage 549.0 (TID 413637)
20/04/07 13:12:44 INFO Executor: Executor is trying to kill task 72.1 in stage 549.0 (TID 413637), reason: another attempt succeeded
20/04/07 13:12:44 INFO MultipartUploadOutputStream: uploadPart: partNum 2 of 's3://bucket/part-00072-50d3246e-e18c-4058-9e1c-ad714305c92f-c000.json' from local file '/mnt/s3/emrfs-3565489585808272492/0000000001', 1 bytes in 43 ms, md5: y7GE3Y4FyXCeXcrtqgSVzw== md5hex: cbb184dd8e05c9709e5dcaedaa0495cf
20/04/07 13:12:45 INFO MultipartUploadOutputStream: uploadPart: partNum 1 of 's3://bucket/part-00072-50d3246e-e18c-4058-9e1c-ad714305c92f-c000.json' from local file '/mnt/s3/emrfs-3565489585808272492/0000000000', 134217728 bytes in 1395 ms, md5: qg7f22UwVchHRejYe+41GQ== md5hex: aa0edfdb653055c84745e8d87bee3519
20/04/07 13:12:46 INFO DefaultMultipartUploadDispatcher: Completed multipart upload of 2 parts 134217729 bytes
20/04/07 13:12:46 INFO Executor: Executor interrupted and killed task 72.1 in stage 549.0 (TID 413637), reason: another attempt succeeded
As can be seen from the logs, the second task wanted to stop itself once it came to know that first one has uploaded the file. But it wasn't able to rollback entirely and in turn ended up overwriting with partial data(1 byte). That json now looks like this
{"key": "v}
instead of
{"key": "value"}
and this causes the reader of json to throw exception. I have tried to search this issue but can not see anyone ever posting about it. Has anyone faced this issue here? Is this a bug in spark? How can I overcome this?

Related

mariadb c-connector bind / execute mess up memory allocation

i'm using mariadb c-connector with prepare, bind and execute. it works usualy. but one case end up in "corrupted unsorted chunks" and core dumping when freeing bind buffer. i suggest the whole malloc organisation is messed up after calling mysql_stmt_execute(). my test's MysqlDynamic.c show:
the problem only is connected to x509cert variable bound by bnd[9]
freeing memory only fails if bnd[9].is_null = 0, if is_null execute end normally
freeing memory (using FreeStmt()) after bind and before execute end normally
print of bnd[9].buffer before execute show (void*) is connected to the correct string buffer
same behavior for setting bnd[9].buffer_length to STMT_INDICATOR_NTS or strlen()
other similar bindings (picture, bnd[10]) do not lead to corrupted memory and core dump.
i defined a c structure test for test data in my test program MysqlDynamic.c which is bound in MYSQL_BIND structure.
bindings for x509cert (string buffer) see bindInsTest():
bnd[9].buffer_type = MYSQL_TYPE_STRING;
bnd[9].buffer_length = STMT_INDICATOR_NTS;
bnd[9].is_null = &para->x509certI;
bnd[9].buffer = (void*) para->x509cert;
please get the details out of source file MysqlDynamic.c. please adapt defines in the source to your environment, verify content, and run it. you will find compile info in source code. MysqlDynymic -c will create the table. MysqlDynamic -i will insert 3 records each run. And 'MysqlDynamic -d` drop the the table again.
MysqlDynamic -vc show:
session set autocommit to <0>
connection id: 175
mariadb server ver:<100408>, client ver:<100408>
connected on localhost to db test by testA
>> if program get stuck - table is locked
table t_test created
mysql connection closed
pgm ended normaly
MysqlDynamic -i show
ins2: BufPara <92> name<master> stamp<> epoch<1651313806000>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<0x5596a0f0c220> buf<0x5596a0f0c220> null<0> length<172>
ins1: BufPara <91> name<> stamp<2020-04-30> epoch<1650707701123>
cert is cert<0x5596a0f181d0> buf<0x5596a0f181d0> null<0>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
ins0: BufPara <90> name<gugus> stamp<1988-10-12T18:43:36> epoch<922337203685477580>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
free(): corrupted unsorted chunks
Aborted (core dumped)
checking t_test table content show all records are inserted as expected.
you can disable loading of x509cert and/or picture by commenting out the defines line 57/58. the program than end normally. you also can comment out line 208. the buffers are then indicated as NULL.
Questions:
is there a generic coding mistake in the program causing this behavior?
can you run the program in your environment without core dumping? i'm currently using version 10.04.08.
any improvment in code will be welcome.

will my geth fast synching for mainnet ever catch up?

I'm using AMD Ryzen 7 2700x CPU, 64Gb memory 1Tb SSD, 100Mb Internet downspeed configuration to create a Ethereum node. I'm running the geth synchmode "fast" command to build a node but it seems to never catch up ! What numbers should I be looking at to see if it ever will. I've read https://github.com/ethereum/go-ethereum/issues/20962 so I know that block count isn't really what I should look for ? Is there something else whose rate of change I should monitor to see if I'm catching up ? e.g. pulled states in the output from the eth.synching command ? or the pending number in the log entry detailing the Imported new state entries ?
My geth command:
geth --syncmode "fast" --cache=4096 -llow-insecure-unlock --http --datadir /crypto/ethereum/mainnet --keystore /crypto/ethereum/keystore --signer=/crypto/ethereum/clef.ipc --maxpeers 25 2>/crypto/ethereum/mainnet_sync_r5.log
Output from eth.synching:
eth.syncing
{ currentBlock: 12168062,
highestBlock: 12168171,
knownStates: 183392940,
pulledStates: 183343841,
startingBlock: 12160600 }
last few lines from my command log:
INFO [04-03|11:16:04.511] Imported new state entries count=1350 elapsed=7.456ms
processed=183847128 pending=53248 trieretry=93 coderetry=0 duplicate=2969 unexpected=4186274
INFO [04-03|11:16:04.674] Imported new state entries count=0
elapsed=4.487ms processed=183847128 pending=52401 trieretry=93 coderetry=0 duplicate=2969 unexpected=4186367
INFO[04-03|11:16:04.681] Imported new state entries
count=1059 elapsed=6.784ms processed=183848187 pending=52401
trieretry=93 coderetry=0 duplicate=2969 unexpected=4186367
INFO [04-03|11:16:04.880] Imported new state entries
count=1152 elapsed=5.161ms processed=183849339 pending=53150
trieretry=0 coderetry=0 duplicate=2969 unexpected=4186367
INFO [04-03|11:16:05.003] Imported new state entries
count=1152 elapsed=5.906ms processed=183850491 pending=52394
trieretry=0 coderetry=0 duplicate=2969 unexpected=4186367

Using Systemverilog to read then print binary file. First bytes read & print ok, trouble\w byte containing a 1 in the ms bit position encountered

The Systemverilog code below is a single file testbench which reads a binary file into a memory using $fread then prints the memory contents. The binary file is 16 bytes and a view of it is included below (this is what I expect the Systemverilog code to print).
The output printed matches what I expect for the first 6 (0-5) bytes. At that point the expected output is 0x80, however the printed output is a sequence of 3 bytes starting with 0xef which are not in the stimulus file. After those 3 bytes the output matches the stimulus again. It seems as if when bit 7 of the binary byte read is 1, then the error occurs. It almost as if the data is being treated as signed, but it is not, its binary data printed as hex. The memory is defined as type logic which is unsigned.
This is similar to a question/answer in this post:
Read binary file data in Verilog into 2D Array.
However my code does not have the same issue (I use "rb") in the $fopen statement, so that
solution does not apply to this issue.
The Systemverilog spec 1800-2012 states in section 21.3.4.4 Reading binary data that $fread can be used to read a binary file and goes on to say how. I believe this example is compliant to what is stated in that section.
The code is posted on EDA Playground so that users can see it and run it.
https://www.edaplayground.com/x/5wzA
You need a login to run it and download. The login is free. It provides
access to full cloud-based versions of the industry standard tools for HDL simulation.
Also tried running 3 different simulators on EDA Playground. They all produce the same result.
Have tried re-arranging the stim.bin file so that the 0x80 value occurs at the beginning of the file rather than in the middle. In that case the error also occurs at the beginning of the testbench printing output.
Maybe the Systemverilog code is fine and the problem is the binary file? I have provided a screenshot of what emacs hexl mode shows for it's contents. Also viewed it another viewer and it looked the same. You can download it when running on EDA Playground to examine it in another editor. The binary file was generated by GNU Octave.
Would prefer to have a solution which uses Systemverilog $fread rather than something else in order to debug the original rather than work around it (learning). This will be developed into a Systemverilog testbench which applies stimulus read from a binary file generated in Octave/Matlab to a Systemverilog DUT. Binary fileIO is prefered because of the file access speed.
Why does the Systemverilog testbench print 0xef rather than 0x80 for mem[6]?
module tb();
// file descriptors
int read_file_descriptor;
// memory
logic [7:0] mem [15:0];
// ---------------------------------------------------------------------------
// Open the file
// ---------------------------------------------------------------------------
task open_file();
$display("Opening file");
read_file_descriptor=$fopen("stim.bin","rb");
endtask
// ---------------------------------------------------------------------------
// Read the contents of file descriptor
// ---------------------------------------------------------------------------
task readBinFile2Mem ();
int n_Temp;
n_Temp = $fread(mem, read_file_descriptor);
$display("n_Temp = %0d",n_Temp);
endtask
// ---------------------------------------------------------------------------
// Close the file
// ---------------------------------------------------------------------------
task close_file();
$display("Closing the file");
$fclose(read_file_descriptor);
endtask
// ---------------------------------------------------------------------------
// Shut down testbench
// ---------------------------------------------------------------------------
task shut_down();
$stop;
endtask
// ---------------------------------------------------------------------------
// Print memory contents
// ---------------------------------------------------------------------------
task printMem();
foreach(mem[i])
$display("mem[%0d] = %h",i,mem[i]);
endtask
// ---------------------------------------------------------------------------
// Main execution loop
// ---------------------------------------------------------------------------
initial
begin :initial_block
open_file;
readBinFile2Mem;
close_file;
printMem;
shut_down;
end :initial_block
endmodule
Binary Stimulus File:
Actual output:
Opening file
n_Temp = 16
Closing the file
mem[15] = 01
mem[14] = 00
mem[13] = 50
mem[12] = 60
mem[11] = 71
mem[10] = 72
mem[9] = 73
mem[8] = bd
mem[7] = bf
mem[6] = ef
mem[5] = 73
mem[4] = 72
mem[3] = 71
mem[2] = 60
mem[1] = 50
mem[0] = 00
Update:
An experiment was run in order to test that the binary file may be getting modified during the process of uploading to EDA playground. There is no Systemverilog code involved in these steps, it's just a file upload/download.
Steps:
(Used https://hexed.it/ to create and view the binary file)
Create/save binary file with the hex pattern 80 00 80 00 80 00 80 00
Create new playground
Upload new created binary file to the new playground
Check the 'download files after run' box on the playground
Save playground
Run playground
Save/unzip the results from the playground run
View the binary file, in my case it has been modified during the process of
upload/download. A screenshot of the result is shown below:
This experiment was conducted on two different Windows workstations.
Based on these results and the comments I am going to close this issue, with the disposition that this is not a Systemverilog issue, but is related to upload/dowload of binary files to EDA playground. Thanks to those who commented.

The unexpected output produced by the testbench is due to modifications that occur to the binary stimulus file during/after upload to EDA playground. The Systemverilog testbench performs as intended to print the contents of the binary file.
This conclusion is based on community comments and experimental results which are provided at the end of the updated question. A detailed procedure is given so that others can repeat the experiment.

How to distinguish "ipfs object links" "ipfs object stat" ? I mean the size of file

I read at https://docs.ipfs.io/reference/api/cli/#ipfs-object-links to figure out my problem.
I mean the method to calculate the size file.
ipfs object stat
ipfs object links

Let's look at example: QmdHY9TnXaXez2bSsQqanm2WaAineyiaFydwqqkEk6kyzs is a directory with a single JPG file.
$ ipfs object links /ipfs/QmdHY9TnXaXez2bSsQqanm2WaAineyiaFydwqqkEk6kyzs
QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR 119776 guardian.jpg
$ ipfs object stat /ipfs/QmdHY9TnXaXez2bSsQqanm2WaAineyiaFydwqqkEk6kyzs
NumLinks: 1
BlockSize: 60
LinksSize: 58
DataSize: 2
CumulativeSize: 119836
$ ipfs object stat /ipfs/QmdHY9TnXaXez2bSsQqanm2WaAineyiaFydwqqkEk6kyzs/guardian.jpg
NumLinks: 0
BlockSize: 119776
LinksSize: 4
DataSize: 119772
CumulativeSize: 119776
119772 is the size of guardian.jpg (blocks with file data alone)
119776 is the size of guardian.jpg DAG (file data + DAG metadata)
119836 is the size of guardian.jpg DAG + metadata of wrapping directory under CID QmdHY9TnXaXez2bSsQqanm2WaAineyiaFydwqqkEk6kyzs
If you want to understand what "DAG" and "DAG metadata" is about, check:
tutorials:
Decentralized data structures
Files API
video course: How IPFS Deals With Files

neo4j to csv java heap space error

I'm trying to get into a csv some query results from my neo4j DB graph.
I have neo4j 2.2.6 version and I am facing java.lang.outofmemoryerror : Java Heap Space error while I was trying to get all of my nodes with some properties (~1M nodes,~4M rel Graph) to a csv on neo4jshell via import-cypher,export-cypher. When I change the wrapper (wrapper.java.maxmemory,wrapper.java.minmemory to 4g) as I've seen to some other post the error remains and when I change properties (dbms.pagecache.memory to 3g) it crushes before I even open the server.

To test if it's a heap space problem you do not have to change the page cache, only the heap.
According to http://neo4j.com/docs/stable/server-performance.html the initmemory and maxmemory properties take the heap size in MB.
# neo4j-wrapper.conf
wrapper.java.initmemory=4000
wrapper.java.maxmemory=4000
In addition to the heap settings you can add a PERIODIC COMMIT to your LOAD CSV query. This will commit regularly to avoid memory errors: http://neo4j.com/docs/stable/query-periodic-commit.html
USING PERIODIC COMMIT 1000
LOAD CSV FROM ...

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why is Spark corrupting part 2 of multipart upload - json

Related

mariadb c-connector bind / execute mess up memory allocation

will my geth fast synching for mainnet ever catch up?

Using Systemverilog to read then print binary file. First bytes read & print ok, trouble\w byte containing a 1 in the ms bit position encountered

How to distinguish "ipfs object links" "ipfs object stat" ? I mean the size of file

neo4j to csv java heap space error

Categories

Resources