Mathematica - Import CSV and process columns? - csv

I have a CSV file that is formatted like:
0.0023709,8.5752e-007,4.847e-008
and I would like to import it into Mathematica and then have each column separated into a list so I can do some math on the selected column.
I know I can import the data with:
Import["data.csv"]
then I can separate the columns with this:
StringSplit[data[[1, 1]], ","]
which gives:
{"0.0023709", "8.5752e-007", "4.847e-008"}
The problem now is that I don't know how to get the data into individual lists and also Mathematica does not accept scientific notation in the form 8.5e-007.
Any help in how to break the data into columns and format the scientific notation would be great.
Thanks in advance.

KennyTM is correct.
data = Import["data.csv", "CSV"];
column1 = data[[All,1]]
column2 = data[[All,2]]
...

Davorak's answer is the correct one if you need to import a whole CSV file as an array. However, if you have a single string that you need to convert from the C/Fortran-style exponential notation, you can use ImportString with different arguments for the format. As an example, there's
In[1]:= ImportString["1.0e6", "List"]
Out[1]= {1.*^6}
The *^ operator is Mathematica's equivalent of the e. Note this is also a good way to split apart strings that are in CSV form:
In[2]:= ImportString["1.0e6,3.2,foo", "CSV"]
Out[2]= {{1.*10^6,3.2,foo}}
In both cases, you'll get your answer wrapped up in an extra level of list structure, which is pretty easy to deal with. However, if you're really sure you only have or want a single number, you can turn the string into a stream and use Read. It's cumbersome enough that I'd stick to ImportString, however:
In[3]:= Module[{stream = StringToStream["1.0e6"], number},
number = Read[stream, "Number"];
Close[stream];
number]
Out[3]= 1.*10^6

You can fix the notation by using StringReplace[].
In[1]: aa = {"0.0023709", "8.5752e-007", "4.847e-008"};
In[2]: ToExpression[
StringReplace[
#,
RegularExpression#"(^\d+\.\d+)e([+-]\d+)" -> "$1*10^$2"
]
] & # aa
Out[2]: {0.0023709, 8.5752*10^-7, 4.847*10^-8}
You can put the entire data array in place of aa to process is all at once with a one liner
{col1,col2,col3} = ToExpression[...] & # Transpose[Import["data.csv", "CSV"]];
with ToExpression[...] as above.

In MMA7, I use the "elements" argument. In fact, I can't Import even a .csv file without specifying the element:
aa=Import["data.csv","Data"]
When you do this, all strings are automatically converted to expressions: Head/#Flatten#aa is {Real, Real, ....}. Also, "8.5752e-007" becomes 8.5752*10^7, a legal MMA expression.
The result of the Import is a 1xn list {{ ... }}.
So, Transpose#aa gives the nx1 list {{.},{.}, .... }.
I think this is the format you wanted.

Related

How do I change the decimal separator in r-exams question to comma?

I'm trying to display a comma as the decimal separator in the questions that I'm creating with the "r-exams/sweave". However, I can't load the siunitx package or do any other configuration that allows this change. I intend to export the question to moodle using the function exams2moodle.
Via LaTeX packages
LaTeX packages like siunitx or icomma can only be used if the output is produced via LaTeX (e.g., exams2pdf etc.). However, when the LaTeX code is converted to HTML (e.g., as in exams2moodle) I'm not aware of a general solution of converting numbers with decimal point to decimal comma.
Via options(OutDec = ",") in R
However, what is relatively simple is to set options(OutDec = ",") within R. This works provided that:
all relevant numbers are either hard-coded in the text with a comma or produced dynamically with r ... or \Sexpr{...}, respectively,
it is assured that the exsolution in num exercises still uses a decimal point.
As an example, consider the schoice exercise deriv2. It fulfills both items above: 1. all numbers are inserted dynamically from R, 2. it is not a num exercise.
library("exams")
options(OutDec = ",")
exams2html("deriv2.Rmd")
The same also works for the Rnw version of the exercise.
If the exercise should always produce the numbers with a comma, you can also include the options(OutDec = ",") in the first code chunk at the beginning of the exercise file and revert to options(OutDec = ".") in a code chunk at the end.
Using the num exercise deriv is also possible. But to assure item 2. you would either need to write fmt(res, decimal.mark = ".") instead of just fmt(res) in the exsolution or alternatively revert to options(OutDec = ".") before adding the meta-information.
Via fmt(..., decimal.mark = "{,}") in R
One small disadvantage of the approach above is that viewers with an attention to detail might notice that in LaTeX math mode $...$ a small space is inserted after the comma. See the screenshot above for an example.
If you want to avoid this, then {,} needs to be used as the decimal separator. Unfortunately, options(OutDec) does not support this as it needs to be a string of length 1. Also, OutDec might not be enough because numbers in math mode need {,} while numbers in plain text need just ,.
In this case the easiest solution is to leave options(OutDec) at the system default. Instead use fmt(..., decimal.mark = "{,}") for numbers within math mode and fmt(..., decimal.mark = ",") in plain text. To reduce typing you could also add two convenience functions, say:
cfmt <- function(x, ...) fmt(x, ..., decimal.mark = ",")
mfmt <- function(x, ...) fmt(x, ..., decimal.mark = "{,}")
Instead of building on the exams::fmt() function you could also use the function base::format(..., decimal.mark = ...) in case that you want to handle any rounding yourself.
Requirements
Note that passing decimal.mark to fmt() requires at least version 2.4-0 of R/exams.

Apache Nifi: Replacing values in a column using Update Record Processor

I have a csv, which looks like this:
name,code,age
Himsara,9877,12
John,9437721,16
Razor,232,45
I have to replace the column code according to some regular expressions. My logic is shown in a Scala code below.
if(str.trim.length == 9 && str.startsWith("369")){"PROB"}
else if(str.trim.length < 8){"SHORT"}
else if(str.trim.startsWith("94")){"LOCAL"}
else{"INT"}
I used a UpdateRecord Processor to replace the data in the code column. I added a property called /code which contains the value.
${field.value:replaceFirst('^[0-9]{1,8}$','SHORT'):replaceFirst('[94]\w+','OFF_NET')}
This works when replacing code's with
length less than 8 with "SHORT"
starting with 94 with "LOCAL"
I am unable to find a way to replace data in the column, code when it's equal to 8 digits AND when it starts with 0. Also how can I replace the data if it doesn't fall into any condition mentioned above. (Situation which the data should be replaced with INT)
Hope you can suggest a workflow or value to be added to the property in Update record to make the above two replacements happen.
There is a length and startsWith functions.
${field.value:length():lt(8):ifElse(
'SHORT', ${field.value:startsWith(94):ifElse(
'LOCAL', ${field.value:length():equals(9):and(${field.value:startsWith(369)}):ifElse(
'PROB', 'INT'
)})})}
I have put the line breaks for easy to recognize the functions but it should be removed.
By the way, the INT means that some string values to replace? Sorry for the confusion.
Well, if you want to regular expression only, you can try the below code.
${field.value
:replaceFirst('[0-9]{1,8}', 'SHORT')
:replaceFirst('[94]\w+', 'OFF_NET')
:replaceFirst('369[0-9]{6}', 'PROB')
:replace(${field.value}, 'INT')
}

Web2Py - generic view for csv?

For web2py there are generic views e.g. for JSON.
I could not find a sample.
When looking at the web2py manual 10.1.2 and 10.1.6, its written:
'.. define a "generic.csv" file, but one would have to specify the name of the object to be serialized ("animals" in the example)'
Looking at the generic pdf view
{{
import os
from gluon.contrib.generics import pdf_from_html
filename = '%s/%s.html' % (request.controller,request.function)
if os.path.exists(os.path.join(request.folder,'views',filename)):
html=response.render(filename)
else:
html=BODY(BEAUTIFY(response._vars))
pass
=pdf_from_html(html)
}}
and also the specified csv (Manual charpter 10.1.6):
{{
import cStringIO
stream=cStringIO.StringIO() animals.export_to_csv_file(stream)
response.headers['Content-Type']='application/vnd.ms-excel'
response.write(stream.getvalue(), escape=False)
}}
Massimo is writing: 'web2py does not provide a "generic.csv";'
He is not fully against it but..
So lets try to get it and deactivate when necessary.
The generic view should look similar to (the non working)
(well, this we better call pseudocode as it is not working):
{{
import os
from gluon.contrib.generics export export_to_csv_file(stream)
filename = '%s/%s' % (request.controller,request.function)
if os.path.exists(os.path.join(request.folder,'views',filename)):
csv=response.render(filename)
else:
csv=BODY(BEAUTIFY(response._vars))
pass
= export_to_csv_file(stream)
}}
Whats wrong?
Or is there a sample?
Is there a reson not to have a generic csv?
{{
import os
from gluon.contrib.generics export export_to_csv_file(stream)
filename = '%s/%s' % (request.controller,request.function)
if os.path.exists(os.path.join(request.folder,'views',filename)):
csv=response.render(filename)
else:
csv=BODY(BEAUTIFY(response._vars))
pass
= export_to_csv_file(stream)
}}
Adapting the generic.pdf code so literally as above would not work for CSV output, as the generic.pdf code is first executing the standard HTML template and then simply converting the generated HTML to a PDF. This approach does not make sense for CSV, as CSV requires data of a particular structure.
As stated in the documentation:
Notice that one could also define a "generic.csv" file, but one would
have to specify the name of the object to be serialized ("animals" in
the example). This is why we do not provide a "generic.csv" file.
The execution of a view is triggered by a controller action returning a dictionary. The keys of the dictionary become available as variables in the view execution environment (the entire dictionary is also available as response._vars). If you want to create a generic.csv view, you therefore need to establish some conventions about what variables are in the returned dictionary as well as the possible structure(s) of the returned data.
For example, the controller could return something like dict(data=mydata). The code in generic.csv would then access the data variable and could convert it to CSV. In that case, there would have to be some convention about the structure of data -- perhaps it could be required to be a list of dictionaries or a DAL Rows object (or optionally either one).
Another possible convention is for the controller to return something like dict(columns=mycolumns, rows=myrows), where columns is a list of column names and rows is a list of lists containing the data for each row.
The point is, there is no universal convention for what the controller might return and how that can be converted into CSV, so you first need to decide on some conventions and then write generic.csv accordingly.
For example, here is a very simple generic.csv that would work only if the controller returns dict(rows=myrows), where myrows is a DAL Rows object:
{{
import cStringIO
stream=cStringIO.StringIO() rows.export_to_csv_file(stream)
response.headers['Content-Type']='application/vnd.ms-excel'
response.write(stream.getvalue(), escape=False)
}}
I tried:
# Sample from Web2Py manual 10.1.1 Page 464
def count():
session.counter = (session.counter or 0) + 1
return dict(counter=session.counter, now = request.now)
#and my own creation from a SQL table (if possible used for json and csv):
def csv_rt_bat_c_x():
battdat = db().select(db.csv_rt_bat_c.rec_time, db.csv_rt_bat_c.cellnr,
db.csv_rt_bat_c.volt_act, db.csv_rt_bat_c.id).as_list()
return dict(battdat=battdat)
Bot times I get an error when trying csv. It works for /default/count.json but not for /default/count.csv
I suppose the requirement:
dict(rows=myrows)
"where myrows is a DAL Rows object" is not met.

SSIS Processing money fields with what looks like signs over the last digit

I have a fixed length flat file input file. The records look like this
40000003858172870114823 0010087192017092762756014202METFORMIN HCL ER 500 MG 0000001200000300900000093E00000009E00000000{0000001{00000104{JOHN DOE 196907161423171289 2174558M2A2 000 xxxx YYYYY 100000000000 000020170915001 00010000300 000003zzzzzz 000{000000000{000000894{ aaaaaaaaaaaaaaa P2017092700000000{00000000{00000000{00000000{ 0000000{00000{ F89863 682004R0900001011B2017101109656 500 MG 2017010100000000{88044828665760
If you look just before the JOHN DOE you will see a field that represents a money field. It looks like 00000104{.
This looks like the type of field I used to process from a mainframe many years ago. How do I handle this in SSIS. If the { on the end is in fact a 0, then I want the field to be a string that reads 0000010.40.
I have other money fields that are, e.g. 00000159E. If my memory serves me correctly, that would be 00000015.95.
I can't find anything on how to do this transform.
Thanks,
Dick Rosenberg
import the values as strings
00000159E
00000104{
in derived column do your transforms with replace:
replace(replace(col,"E","5"),"{","0")
in another derived column cast to money and divide by 100
(DT_CY)(drvCol) / 100
I think you will need to either use a Script Component source in the data flow, or use a Derived Column transformation or Script Component transformation. I'd recommend a Script Component either way as it sounds like your custom logic will be fairly complex.
I have written a few detailed answers about how to implement a Script component source:
SSIS import a Flat File to SQL with the first row as header and last row as a total
How can I load in a pipe (|) delimited text file that has columns that sometimes contain line breaks?
Essentially, you need to locate the string, "00000104{", for example, and then convert it into decimal/money form before adding it into the data flow (or during it if you're using a Derived Column Transformation).
This could also be done in a Script Component transformation, which would function in a similar way to the Derived Column transformation, only you'd perhaps have a bit more scope for complex logic. Also in a Script Component transformation (as opposed to a source), you'd already have all of your other fields in place from the Flat File Source.

Readtimearray function in Julia TimeSeries package

I would like to read a csv file of the following form with readtimearray:
"","ES1 Index","VG1 Index","TY1 Comdty","RX1 Comdty","GC1 Comdty"
"1999-01-04",1391.12,3034.53,66.515625,86.2,441.39
"1999-01-05",1404.86,3072.41,66.3125,86.17,440.63
"1999-01-06",1435.12,3156.59,66.4375,86.32,441.7
"1999-01-07",1432.32,3106.08,66.25,86.22,447.67
"1999-01-08",1443.81,3093.46,65.859375,86.36,447.06
"1999-01-11",1427.84,3005.07,65.71875,85.74,449.5
"1999-01-12",1402.33,2968.04,65.953125,86.31,442.92
"1999-01-13",1388.88,2871.23,66.21875,86.52,439.4
"1999-01-14",1366.46,2836.72,66.546875,86.73,440.01
However, here's what I get when I evaluate readtimearray("myfile.csv")
ERROR: `convert` has no method matching convert(::Type{UTF8String}, ::Float64)
in push! at array.jl:460
in readtimearray at /home/juser/.julia/v0.3/TimeSeries/src/readwrite.jl:25
What is it that I am not seeing?
That looks like a bug in readtimearray.
Empty lines are removed but, to identify them,
the code only looks at the first column.
Since the header has an empty string in the first column, it is removed...
Changing the header of your file to
"date","ES1 Index","VG1 Index","TY1 Comdty","RX1 Comdty","GC1 Comdty"
addresses the problem.
You're using convert, which is meant for use with julia types (see doc for more info).
You parse the string using Date:
d=Date("1999-04-01","yyyy-mm-dd")
#...
array_of_dates = map(x->Date(x,"yyyy-mm-dd"),array_of_strings)