How to read special characters in Pyspark - csv

Below special characters are present in a csv file.
" – " , " ’ " , " ‘ "
I am already using option("encoding","ISO-8859-1") in the read statement in order to handle few other scenarios (without this option few spaces are getting read as – ).
But by using this option,
" Samsung – 22 " is getting read as " Samsung ? 22 "
" ‘World’ " is getting read as " ?World? "

Related

How to convert text to CSV file?

I have a text like below and I am trying to convert it into a CSV file. How can I do it?
[
[“Date”, “Description”, “Deposits”, “Withdrawals”, “Balance”, " "]
,
[“4 Mar '22”, “DIRECT CREDIT SUPPORTPYMTD230576800#”, “$4,800.00”, " ", “$27,727.50”, " "]
,
[“22 Feb '22”, “DIRECT CREDIT 31/03/2021 D569582240# INC”, “$11.13”, " ", “$22,927.50”, " "]
]
Thank you.

Export to CSV using SAVE TRANSLATE but empty values are exported as a single space

I have a dataset in SPSS, see example dataset below. This is just an example, the real one is provided by a separate external process and has more columns and rows. The empty values are set as " " in the example but this is also how empty values are provided in SPSS, it's treated internally as null/empty/missing values.
data list list/FieldNam(a20) FormName(a20) FieldType(a20) Choices(a50) Required(F1) Identifier(a1) Minimum(f8) Maximum(f8).
begin data
"Field 1" "Form abc" "text" " " 1 "y" " " " "
"Field 2" "Form abc" "datetime" " " 1 "y" " " " "
"Field 3" "Form xyz" "radio" "0=never | 1=sometimes | 2=often | 3=always" " " " " " " " "
"Field 4" "Form xyz" "text" " " " " " " "1" "100"
"Field 5" "Form xyz" "radio" "0=no | 1=yes" " " " " " " " "
end data.
Then I use the following syntax to save it as a CSV text file.
SAVE TRANSLATE
/TYPE = CSV
/FIELDNAMES
/TEXTOPTIONS DELIMITER=',' QUALIFIER='"'
/OUTFILE = 'C:\Temp\my_csv_file.csv'
/ENCODING='Windows-1252'
/REPLACE.
And the resulting CSV file contains the following, with single spaces for the empty values
FieldNam,FormName,FieldType,Choices,Required,Identifier,Minimum,Maximum
Field 1,Form abc,text, ,1,y, ,
Field 2,Form abc,datetime, ,1,y, ,
Field 3,Form xyz,radio,0=never | 1=sometimes | 2=often | 3=always, , , ,
Field 4,Form xyz,text, , , ,1,100
Field 5,Form xyz,radio,0=no | 1=yes, , , ,
However, I would like the empty values to just be empty, like so:
FieldNam,FormName,FieldType,Choices,Required,Identifier,Minimum,Maximum
Field 1,Form abc,text,,1,y,,
Field 2,Form abc,datetime,,1,y,,
Field 3,Form xyz,radio,0=never | 1=sometimes | 2=often | 3=always,,,,
Field 4,Form xyz,text,,,,1,100
Field 5,Form xyz,radio,0=no | 1=yes,,,,
So my question is, is it possible to export the SPSS dataset like this?
The exported csv file will be used as input for another system, and it cannot handle the , , empty values. I know I can open it in Notepad and just do search-and-replace after the fact. But I want to automate it as much as possible because the export will be used more often, so this would save a lot of work.
Information from this page suggests one can invoke a script: https://www.ibm.com/docs/en/spss-statistics/23.0.0?topic=reference-script
SCRIPT
SCRIPT runs a script to customize the program or automate
regularly performed tasks. You can run a Basic script or a Python
script.
SCRIPT 'filename' [(quoted string)]
This command takes effect immediately. It does not read the active
dataset or execute pending transformations. See the topic Command
Order for more information.
Release History
Release 16.0
Scripts run from the SCRIPT command now run synchronously with the
command syntax stream.
Release 17.0
Ability to run Python scripts introduced.
Example Python script to invoke after each export for release 17.0 or higher:
import fileinput
import os
filename = 'C:\Temp\my_csv_file.csv'
postfix = '.bak'
with fileinput.FileInput(filename, inplace=True, backup=postfix) as file:
for line in file:
print(line.replace(', ', ',').replace(' ,', ','), end='')
try:
os.remove(filename + postfix)
except FileNotFoundError as e:
pass
The script performs a simple search and replace. I've included code to automatically remove the temporary backup file even though the Python manual states it automatically removes the file. For me it consistently does not at the moment (thus the manual removing of the file). But you may remove that specific code if it works without it for you.
Of course you could also use Python's csv module and iterate the rows and write it back to another csv, etc. See the documentation for that one here: https://docs.python.org/3/library/csv.html

Google Apps Script - replaceText - Can't Replace Period, Comma and Question Mark

I am trying to write a google app script, which will find and replace specific words with others in google docs...
I would like " hello " (space, hello, space) to be replaced by " R1 " (space, R1, space)
And if there is any punctuation mark after hello like a period, comma or question mark it should be the same logic:
" hello " to be replaced by " R1 "
" hello. " to be replaced by " R1. "
" hello, " to be replaced by " R1, "
" hello? " to be replaced by " R1? "
So I used the following:
function docReplace() {
var body = DocumentApp.getActiveDocument().getBody();
body.replaceText(" hello ", " R1 ");
body.replaceText(" hello. ", " R1. ");
body.replaceText(" hello, ", " R1, ");
body.replaceText(" hello? ", " R1? ");
}
Unfortunately this doesn't work, as "." , "," and "?" are regex symbols.
Then, I tried this:
function docReplace() {
var body = DocumentApp.getActiveDocument().getBody();
body.replaceText(" hello ", " R1 ");
body.replaceText(" hello\. ", " R1. ");
body.replaceText(" hello\, ", " R1, ");
body.replaceText(" hello\? ", " R1? ");
}
But still doesn't work. Commas and Question marks return as periods.
I would appreciate if anyone could help with the correct code.
You want to achieve the following replacement using Google Apps Script. In this sample, ## was used as the separator of the values.
From
## hello ##
## hello. ##
## hello, ##
## hello? ##
To
## R1 ##
## R1. ##
## R1, ##
## R1? ##
If my understanding is correct, how about this modification? In this modification, \., \, and \? are modified to \\., \\, and \\?, respectively.
Modified script:
function docReplace() {
var body = DocumentApp.getActiveDocument().getBody();
body.replaceText(" hello ", " R1 "); // Modified
body.replaceText(" hello\\. ", " R1. "); // Modified
body.replaceText(" hello\\, ", " R1, "); // Modified
body.replaceText(" hello\\? ", " R1? "); // Modified
}
Reference:
replaceText(searchPattern, replacement)
If I misunderstood your question and this was not the result you want, I apologize.

SSIS: Change row delimiter from {LF} to {CR}{LF} massively in flat file connection manager

I am new to SSIS, I have created a data flow with oledb source and flat file destination.
Initially the destination file have row delimiter as {LF}
but I have to change it as {CR}{LF} now. I have around more than 100 flat file destinations like this.
I tried the following approaches, the second one works but it is time consuming process.
I tried by opening each flat file connection manager and try to change the row delimiter but my visual studio is not responding. I did several times but no luck.
I deleted the flat file connection manager and re-create it with right row delimiter then its working fine but my concern is I have to do it for more than 100 times.
I opened the .dtsx file in a text editor and I can find header row delimiter but unable to find row delimiter.
I try to change the row delimiter in the expression but it does not take into effect.
Is there any best way we can simply do this?
I used this to remove CRLF
"$text = [IO.File]::ReadAllText(" + #ic + #FullFilePath + #ic + ") -replace " +
ic2 +"`r`n" + #ic2 + "," + #ic2 +" " + #ic2 + "; [IO.File]::WriteAllText(" +
#ic+ #FullFilePath + #ic + ", $text)"
where
#ic = '
#ic2 = """
#FullFilePath is the path returned from the For..Loop container.
Note: I copy the original file to a new folder and update the copy rather than modify the original.
I expect this would work for you if you change this code:
-replace " + ic2 +"`r`n" + #ic2 + "," + #ic2 +" " + #ic2 + "
to
-replace " + ic2 +"`n" + #ic2 + "," + #ic2 +"`r`n" + #ic2 + "
I developed this in VS 2008.
screenshot

Importing XPM graphics into an HTML5 canvas

Is this possible?
I am trying to port an old professor's demo-game into a web-playable format for fun, and he had setup all the graphics in the XPM format.
Is there some way to load XPM files directly into an HTML5 canvas? I could probably get by with loading them into an image editor and converting...but I'd rather stay as true to the original source as possible.
You could probably write some sort of parser for XPM in JavaScript and render canvas pixels using a similar approach to this question, however I think it'd be more efficient just to use something like ImageMagick and do a one off conversion:
mogrify -format png *.xpm
I made a little plugin to do this, there's a lot to improve but maybe it can help you... you can see the demo here: http://cortezcristian.com.ar/xpm2canvas/
You can also play with the demo in this fiddle: http://jsfiddle.net/crisboot/aXt3G/
<script src="./js/libs/jquery-1.7.1.min.js"></script>
<script src="./js/jquery.xpm2canvas.js"></script>
<script>
var pseudoXMP = [
/* <Values> */
/* <width/cols> <height/rows> <colors> <char on pixel>*/
"40 40 6 1",
/* <Colors> */
" c none",
". c #ffffff",
"X c #dadab6",
"o c #6c91b6",
"O c #476c6c",
"+ c #000000",
/* <Pixels> */
" ",
" ",
" ",
" . .X..XX.XX X ",
" .. .....X.XXXXXX XX ",
" ... ....X..XX.XXXXX XXX ",
" .. ..........X.XXXXXXXXXXX XX ",
" .... ........X..XX.XXXXXXXXX XXXX ",
" .... ..........X.XXXXXXXXXXX XXXX ",
" ooOOO..ooooooOooOOoOOOOOOOXX+++OO++ ",
" ooOOO..ooooooooOoOOOOOOOOOXX+++OO++ ",
" ....O..ooooooOooOOoOOOOOOOXX+XXXX++ ",
" ....O..ooooooooOoOOOOOOOOOXX+XXXX++ ",
" ..OOO..ooooooOooOOoOOOOOOOXX+++XX++ ",
" ++++..ooooooooOoOOOOOOOOOXX+++ +++ ",
" +++..ooooooOooOOoOOOOOOOXX+++ + ",
" ++..ooooooooOoOOOOOOOOOXX+++ ",
" ..ooooooOooOOoOOOOOOOXX+++ ",
" ..ooooooooOoOOOOOOOOOXX+++ ",
" ..ooooooOooOOoOOOOOOOXX+++ ",
" ..ooooooooOoOOOOOOOOOXX+++ ",
" ..oooooOooOOoOOOOOOXX+++ ",
" ..oooooooOoOOOOOOOOXX+++ ",
" ..ooooOooOOoOOOOOXX+++ ",
" ..ooooooOoOOOOOOOXX++++ ",
" ..o..oooOooOOoOOOOXX+XX+++ ",
" ...o..oooooOoOOOOOXX++XXX++ ",
" ....OO..ooOooOOoOOXX+++XXXX++ ",
" ...oo..+..oooOoOOOXX++XXooXXX++ ",
" ...ooo..++..OooOOoXX+++XXooOXXX+ ",
" ..oooOOXX+++....XXXX++++XXOOoOOXX+ ",
" ..oooOOXX+++ ...XXX+++++XXOOooOXX++ ",
" ..oooOXXX+++ ..XX+++ +XXOOooOXX++ ",
" .....XXX++++ XXXXXXX++ ",
" ....XX++++ XXXXXXX+ ",
" ...XX+++ XXXXX++ ",
" ",
" ",
" ",
" "];
$(document).ready(function(){
$('#xmp2canvas').xpm2canvas({xpm:pseudoXMP});
});
</script>
IIRC, the rendering context for a canvas element in such a context relies on manipulating the src attribute of an embedded img tag. As such, presumably XPM files only stand a chance of working if the browser in question supports them.
The best way to check this would be to test it. The accepted answer for this question contains some code that should help:
importing image on canvas html5