insecure string pickle error when uploading and downloading to MKS Integrity - pickle

I am getting the exception "ValueError: insecure string pickle" when attempting to run my program after creating a sandbox from MKS.
Hopefully you are still interested in helping if you are still reading this, so here's the full story.
I created an application in Python that analyzes data. When saving specific data from my program, I pickle the file. I correctly read and write it in binary and everything is working correctly on my computer.
I then used py2exe to wrap everything into an .exe. However, in order to get the pickled files to continue to work, I have to physically copy them into the the folder that py2exe. So my pickle is inside of the .exe folder and everything is working correctly when I run the .exe.
Next, I upload everything to MKS (an ALM, here is the Wikipedia page http://en.wikipedia.org/wiki/MKS_Integrity).
When I proceed to create a sandbox of my files and run the program, I get the dreaded "insecure string pickle" error. In other words, I am wondering if MKS screwed something up or added an end of line character to my pickle files. When I compare the contents of the MKS pickle file and the one I created before I uploaded the program to MKS, there are no differences.
I hope this is enough detail to describe my problem.
Please help!
Thanks

Have you tried adding your pickled files to your Integrity sandbox as binaries and not text?
When adding the file, on the Create Archive interface, select the options button, and change data type to "Binary" from "Auto". This will maintain any non-text formatting within the file.

Related

ADF Merge-Copying JSON files in Copy Data Activity creates error for Mapping Data Flow

I am trying to do some optimization in ADF. Setup is a third-party tool copies one JSON file per object to a BLOB storage container. These feed to a Mapping Data Flow. The individual files written by the third party tool work great. If I copy these files to a different BLOB folder using an Azure Copy Data activity, the MDF can no longer parse the files and gives an error: "JSON parsing error, unsupported encoding or multiline." I started this with a Merge Files, but outcome is same regardless of copy behavior I choose.
2ND EDIT: After another day's work, I have found that the Copy Activity Merge File from JSON to JSON definitely adds an EOL character to each single JSON object as it gets imported to the Merge file. I have also found that the MDF fails definitely with those EOL characters in the Merge file. If I remove all EOL characters from the Merge file, the same MDF will work. For me, this is a bug. The copy activity is adding a character that breaks the MDF. There seems to be a second issue in some of my data that doesn't fail as an individual file but does when concatenated that breaks the MDF when I try to pull all the files together, but I have tested the basic behavior on 1-5000 files and been able to repeat the fail/success tests.
I took the original file, and the copied file, ran them through all of sorts of test, what I eventually found when I dump into Notepad++:
Copied file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\r\n
Original file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\n
If I change the copied file from ending with \r\n to \n, the MDF can read the file again. What is going on here? And how do I change the file write behavior or the MDF settings so that I can concatenate or copy files without the CRLF?
EDIT: NEW INFORMATION -- It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Other possibly relevant information:
Both the source and sink JSON datasets are set to use UTF-8 (not default(UTF-8), although I tried that). Would a different encoding fix this?
I have tried remapping schemas, creating new data sets, creating new Mapping Data Flows, still get the same error.
EDITED for clarity based on comments:
In the case of a single JSON element in a file, I can get this to work -- data preview returns same success or failure as pipeline when run
In the case of multiple documents merged by ADF I get the below instead. It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Repro: Create any valid JSON as a single file, put it in blob storage, use it as a source in a mapping data flow, to do any sink operation. Create a second file with same schema, get them both to run in same flow using wildcard paths. Use a Copy Activity with Merge Files as the Sink Copy Activity and Array of Objects as the File pattern. Try to make your MDF use this new file. If it fails, download the file created by ADF, run it through a formatter (I have used both VS Code -> "Format Document" from standard VS Code JSON extension, and VS 2019 "Unminify" command) and reupload... It should work now.
don't know if you already solved the problem: I came across the exact same problem 3 days ago and after several tries I found a solution:
in the copy data activity under sink settings, use "set of objects" (instead of "array of objects") under File Pattern, so that the merged big JSON has the value of the original small JSON files written per line
in the MDF after setting up the wildcard paths with the *.json pattern, under JSON Settings select: Document per line as the Document form.
After that you should be good to go, as least it solved my problem. The automatic written CRLF in "array of objects" setting in the copy data activity should be a default setting and MSFT should provide the option to omit it in the settings in the future.
According to my test:
1.copy data activity can't change unix(LF) to windows(CRLF).
2.MDF can also parse unix(LF) file and windows(CRLF) file.
Maybe there is something else wrong.
By the way,I see there is a comma after "name":"Customer Name" in your Original file,I delete it before my test.

Managing a large SPSS (*.sav) file (4.2 GB)

I have received an SPSS file from survey fielded by another company that allegedly only contains ~1500 respondents, but the file size somehow has ballooned 4.2GB. My hunch is that the reason for this is that the file was from a global survey and the 1500 records that have been selected are from the US only so there are a series of blank variables, metadata for those variables that are included in this file and may also be in multiple languages/alphabets.
I only need a subset of this data, and can likely work with it if I removed the metadata but my issue has been that I can't get the damn thing open to cut down on the number of variables. I have been using the tools at my disposal to try the following workarounds, though I'm sure there are better options:
Opening the file using PSPP (freeware SPSS) - this causes the PSPP to stop responding
Using the R command read.spss (from the foreign package) to write a .csv - this claims that the file has a duplicate variable name and won't proceed further
Using the R command spss.system.file to write a .csv - when I tried this, R has spend a lot of time thinking as it as it attempts to run this and has been running for a couple hours with no apparent success.
Using the PSPP text conversion tool (https://pspp.benpfaff.org/) to create either a dictionary or a .csv file - both of these options crash after the file has completed uploading.
I've gone back to the other company to try have them work on reducing the file size, however I wasn't sure if anyone else had any ideas to do either of the following:
Open the file using another program/converter that could turn it into a .csv or other similarly skinny file format
Use another program to at least read only the variable names included in the file so that I can provide the other company with the specific variables I need
The following command from PSPP should do what you need:
$ pspp-convert originalFile.sav output.csv
In case it doesn't, please provide terminal error message.

MediaWiki filepath Magic Word doesn't work for some files types

I'm trying to use the MediaWiki filepath magic word` so that I can create some template links that pass a specific MediaWiki file. Unfortunately with certain file types, filepath just returns nothing.
The file I'm trying to get the path for that's failing is a text file in this case. I have confirmed that I am using the correct filename as I can create a regular file link using [[File:Name.txt]], and {{filepath:Image.png}} works properly.
Example of what I'm trying to accomplish:
[http://server/processfile.php?path={{filepath:<filename>}} Process A File]
Is this a known issue? Is there an easy way that I can debug what's happening here?
After digging around a bunch more I was able to resolve the issue. It turns out that even though the MediaWiki would accept the file, it was being assigned a random mime type because it was a .yaml file.
After updating mime.types and mime.info in MediaWiki and adding the mime type (text/yaml) to my IIS configuration, I was able to get the downloads working and the file links showing up.
Full disclosure: I may have been using an incorrectly cased file name even though I said that I was using the correct file name. :P

SevenZipArchiveException: Invalid archive. open/read error

I got the following error when I try to extract a zip file:
"SevenZip.SevenZipArchiveException: Invalid archive: open/read error! Is it encrypted and a wrong password was provided?
If your archive is an exotic one, it is possible that SevenZipSharp has no signature for its format and thus decided it is TAR by mistake."
Nothing works with zip files, but everything works fine with 7z files. Is it possible to extract zip files with the SevenZipExtractor?
string sourcePath = #"c:/temp/yyy.zip";
using (var file = new SevenZipExtractor(sourcePath))
{
file.ExtractArchive(outputPath);
}
What I found with this error when I encountered it was that it was an issue when I would attempt to decompress a certain set of files. For example, if you were to run the SevenZipCompressor and say it stopped halfway through, this would corrupt the compression of said files, so when you would attempt to decompress the files, the error would occur.
The fix for me was to recompress the set of files and to be sure it ran completely, and then the error went away, allowing the extraction to work.
So the moral of the issue at hand is to look at the source in this case and make sure the files or the archive aren't corrupt.
I've run into the same issue recently with version 18.5.0.
Downgrading the package to 9.38.3 solved the problem for me.
For people still running into this problem: this can also happen when trying to uncompress rar5 files that have filename encrypted turned on.

The connection "C:\\<path>\\*.txt" is not found. This error is thrown by Connections collection when the specific conn element is not found

I developed a SSIS package that creates several .txt files. These files are zipped and then the .txt files need to be removed. Using a foreach file enumerator, I loop through all the .txt files for a specific folder. The folder is retrieved from a variable in configuration and looks something like: C:\Folder\
The foreach loop uses: *.txt to gather all .txt files, does not traverse subfolder and uses the full qualified name.
In the Variable Mappings the "FileName" variable gets filled with the 0 index.
Within the foreachloop I use a File system task.
This task removes the .txt files which are generated before, using the FileName variable that is filled in the loop.
On the development machine this runs like a charm. All greens, no problem at all. Now I copy the package and the configuration file to the test environment. A basic version without the file removing was running perfectly fine here. I replaced the package. Nothing big.
Now I run the SQl Server Agent Job and it starts running. I can see all the text files appearing, and disappearing after it created the zipfiles. However, when all files are removed the package results with errors. Namely the error shown above in the title.
I tried looking for the connectionmanager that might have been removed
Looked for connection managers named in the config that don't exist in the package.
No such thing found. Annoying part is that the package is fully functioning, but still results with the error.
EDIT: I noticed that if I run the package using the execute package utility with the dev. config it gives the same errors.
Hopefully someone is able to help me out.
Thanks in advance!
I managed to "fix" the issue. Remove the File System Component responsible for deleting the files. Then add it again and configure it again.
I think this happens if you accidentally change General parameters before changing the Operation parameter. It holds the metadata to irrelevant parameters and upon execution says: "Wait, you defined this parameter but I don't need it, but I'm checking for it anyway, and it's not there!"
It's a bug for sure