How do I change the decimal separator in r-exams question to comma? - sweave

I'm trying to display a comma as the decimal separator in the questions that I'm creating with the "r-exams/sweave". However, I can't load the siunitx package or do any other configuration that allows this change. I intend to export the question to moodle using the function exams2moodle.

Via LaTeX packages
LaTeX packages like siunitx or icomma can only be used if the output is produced via LaTeX (e.g., exams2pdf etc.). However, when the LaTeX code is converted to HTML (e.g., as in exams2moodle) I'm not aware of a general solution of converting numbers with decimal point to decimal comma.
Via options(OutDec = ",") in R
However, what is relatively simple is to set options(OutDec = ",") within R. This works provided that:
all relevant numbers are either hard-coded in the text with a comma or produced dynamically with r ... or \Sexpr{...}, respectively,
it is assured that the exsolution in num exercises still uses a decimal point.
As an example, consider the schoice exercise deriv2. It fulfills both items above: 1. all numbers are inserted dynamically from R, 2. it is not a num exercise.
library("exams")
options(OutDec = ",")
exams2html("deriv2.Rmd")
The same also works for the Rnw version of the exercise.
If the exercise should always produce the numbers with a comma, you can also include the options(OutDec = ",") in the first code chunk at the beginning of the exercise file and revert to options(OutDec = ".") in a code chunk at the end.
Using the num exercise deriv is also possible. But to assure item 2. you would either need to write fmt(res, decimal.mark = ".") instead of just fmt(res) in the exsolution or alternatively revert to options(OutDec = ".") before adding the meta-information.
Via fmt(..., decimal.mark = "{,}") in R
One small disadvantage of the approach above is that viewers with an attention to detail might notice that in LaTeX math mode $...$ a small space is inserted after the comma. See the screenshot above for an example.
If you want to avoid this, then {,} needs to be used as the decimal separator. Unfortunately, options(OutDec) does not support this as it needs to be a string of length 1. Also, OutDec might not be enough because numbers in math mode need {,} while numbers in plain text need just ,.
In this case the easiest solution is to leave options(OutDec) at the system default. Instead use fmt(..., decimal.mark = "{,}") for numbers within math mode and fmt(..., decimal.mark = ",") in plain text. To reduce typing you could also add two convenience functions, say:
cfmt <- function(x, ...) fmt(x, ..., decimal.mark = ",")
mfmt <- function(x, ...) fmt(x, ..., decimal.mark = "{,}")
Instead of building on the exams::fmt() function you could also use the function base::format(..., decimal.mark = ...) in case that you want to handle any rounding yourself.
Requirements
Note that passing decimal.mark to fmt() requires at least version 2.4-0 of R/exams.

Related

SSIS Derived Column - Parse Text between break returns

I have a text field from a SQL Server Source. It is a phone number field that typically has this format:
Home: 555-555-1212
Work: 555-555-1212
Cell: 555-555-1212
Emergency: 555-555-1212
I'm trying to split among fields so that only 555-555-1212 is displayed
I am then taking this field and converting to a string. There are literally break returns (\r\n) between the labels here. The goal here is to have this data split among multiple fields (home,work,cell,emergency,etc.) I was researching how to split text among fields and I made some progress. In the case of home numbers, I used this logic:
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"Home:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"Home:",1) - 5)
This works great as it parses up to the text return and I get 555-555-1212.
Now I experience an issue when searching for a text between break returns. I tried the same logic for Work numbers:
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"Work:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"Work:",1) - 5)
But that won't work and results in writing to my error redirection file. I then tried to insert a break return to find the text at the beginning
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"\nWork:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"\nWork:",1) - 5)
No luck there either. Any ideas on how I can address this. Also, I would appreciate an idea of how I can handle the emergency title at the end. There won't be a break return in that situation, but I still want to parse the text.
I look at your data and I see
Home:|555-555-1212|Work:|555-555-1212|Cell:|555-555-1212|Emergency:|555-555-1212
I'm using the pipe character, |, as a placeholder for where I would segment that string, which is basically wherever you have whitespace (space, tab, newline, etc).
There are two approaches to this. I'll start with the easy one.
Script Component
String.Split is your friend here. Look at what it did with that source data
I added a new Script Component, acting as a Transformation and created 4 output columns, all string of length 12 codepage 1252: Home, Work, Cell, and Emergency. I populate them like so
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string[] split = Row.PhoneData.Split();
Row.Home = split[1];
Row.Work = split[4];
Row.Cell = split[7];
Row.Emergency = split[10];
}
Derived Column
I'm not going to build out a full blown implementation of this. The above is much to simple but I run into situations where ETL devs say they aren't allowed to use Script tasks/components and that's usually because people reached for them first instead of last.
The approach here is to have lots of Derived Columns Components on your Data Flow. It won't hurt performance and in fact can make it easier. It definitely will make your debugging easier as you'll have lots of it to do.
DER Find Colons
This would add 4 columns into the dataflow - HomeColonPosition, WorkColonPosition etc. You've already started down this path but just build it out into the actual data flow as you'll need to reference these positions and again, it's easier to fix the calculation that populates a column versus a calculation that's wrong and used everywhere. You're likely to find that 4 derived columns are useful here as you'd want to use the previous colon's position as the starting point for the third argument to FINDSTRING
Thus, instead of Work being
FINDSTRING(PhoneData, ":", FINDSTRING(PhoneData, ":" 1) + 1)
it would just be
FINDSTRING(PhoneData, ":", HomeColonPosition + 1)
Just knowing the position of the 4 colons in that string, I can figure out where the phone numbers are (maybe). The position of the colon + 2 (colon and the space) is the starting point and then go out 12 characters.
Where this approach gets ugly, much as it did with the script approach is when that data isn't consistent.

Can I read the rest of the line after a positive value of IOSTAT?

I have a file with 13 columns and 41 lines consisting of the coefficients for the Joback Method for 41 different groups. Some of the values are non-existing, though, and the table lists them as "X". I saved the table as a .csv and in my code read the file to an array. An excerpt of two lines from the .csv (the second one contains non-exisiting coefficients) looks like this:
48.84,11.74,0.0169,0.0074,9.0,123.34,163.16,453.0,1124.0,-31.1,0.227,-0.00032,0.000000146
X,74.6,0.0255,-0.0099,X,23.61,X,797.0,X,X,X,X,X
What I've tried doing was to read and define an array to hold each IOSTAT value so I can know if an "X" was read (that is, IOSTAT would be positive):
DO I = 1, 41
(READ(25,*,IOSTAT=ReadStatus(I,J)) JobackCoeff, J = 1, 13)
END DO
The problem, I've found, is that if the first value of the line to be read is "X", producing a positive value of ReadStatus, then the rest of the values of those line are not read correctly.
My intent was to use the ReadStatus array to produce an error message if JobackCoeff(I,J) caused a read error, therefore pinpointing the "X"s.
Can I force the program to keep reading a line after there is a reading error? Or is there a better way of doing this?
As soon as an error occurs during the input execution then processing of the input list terminates. Further, all variables specified in the input list become undefined. The short answer to your first question is: no, there is no way to keep reading a line after a reading error.
We come, then, to the usual answer when more complicated input processing is required: read the line into a character variable and process that. I won't write complete code for you (mostly because it isn't clear exactly what is required), but when you have a character variable you may find the index intrinsic useful. With this you can locate Xs (with repeated calls on substrings to find all of them on a line).
Alternatively, if you provide an explicit format (rather than relying on list-directed (fmt=*) input) you may be able to do something with non-advancing input (advance='no' in the read statement). However, as soon as an error condition comes about then the position of the file becomes indeterminate: you'll also have to handle this. It's probably much simpler to process the line-as-a-character-variable.
An outline of the concept (without declarations, robustness) is given below.
read(iunit, '(A)') line
idx = 1
do i=1, 13
read(line(idx:), *, iostat=iostat) x(i)
if (iostat.gt.0) then
print '("Column ",I0," has an X")', i
x(i) = -HUGE(0.) ! Recall x(i) was left undefined
end if
idx = idx + INDEX(line(idx:), ',')
end do
An alternative, long used by many many Fortran programmers, and programmers in other languages, would be to use an editor of some sort (I like sed) and modify the file by changing all the Xs to NANs. Your compiler has to provide support for IEEE NaNs for this to work (most of the current crop in widespread use do) and they will correctly interpret NAN in the input file to a real number with value NaN.
This approach has the benefit, compared with the already accepted (and perfectly good) answer, of not requiring clever programming in Fortran to parse input lines containing mixed entries. Use an editor for string processing, use Fortran for reading numbers.

COBOL code to replace characters by html entities

I want to replace the characters '<' and '>' by < and > with COBOL. I was wondering about INSPECT statement, but it looks like this statement just can be used to translate one char by another. My intention is to replace all html characters by their html entities.
Can anyone figure out some way to do it? Maybe looping over the string and testing each char is the only way?
GnuCOBOL or IBM COBOL examples are welcome.
My best code is something like it: (http://ideone.com/MKiAc6)
IDENTIFICATION DIVISION.
PROGRAM-ID. HTMLSECURE.
ENVIRONMENT DIVISION.
DATA DIVISION.
WORKING-STORAGE SECTION.
77 INPTXT PIC X(50).
77 OUTTXT PIC X(500).
77 I PIC 9(4) COMP VALUE 1.
77 P PIC 9(4) COMP VALUE 1.
PROCEDURE DIVISION.
MOVE 1 TO P
MOVE '<SCRIPT> TEST TEST </SCRIPT>' TO INPTXT
PERFORM VARYING I FROM 1 BY 1
UNTIL I EQUAL LENGTH OF INPTXT
EVALUATE INPTXT(I:1)
WHEN '<'
MOVE "<" TO OUTTXT(P:4)
ADD 4 TO P
WHEN '>'
MOVE ">" TO OUTTXT(P:4)
ADD 4 TO P
WHEN OTHER
MOVE INPTXT(I:1) TO OUTTXT(P:1)
ADD 1 TO P
END-EVALUATE
END-PERFORM
DISPLAY OUTTXT
STOP RUN
.
GnuCOBOL (yes, another name branding change) has an intrinsic function extension, FUNCTION SUBSTITUTE.
move function substitute(inptxt, ">", ">", "<", "<") to where-ever-including-inptxt
Takes a subject string, and pairs of patterns and replacements. (This is not regex patterns, straight up text matching). See http://opencobol.add1tocobol.com/gnucobol/#function-substitute for some more details. The patterns and replacements can all be different lengths.
As intrinsic functions return anonymous COBOL fields, the result of the function can be used to overwrite the subject field, without worry of sliding overlap or other "change while reading" problems.
COBOL is a language of fixed-length fields. So no, INSPECT is not going to be able to do what you want.
If you need this for an IBM Mainframe, your SORT product (assuming sufficiently up-to-date) can do this using FINDREP.
If you look at the XML processing possibilities in Enterprise COBOL, you will see that they do exactly what you want (I'd guess). GnuCOBOL can also readily interface with lots of other things. If you are writing GnuCOBOL for running on a non-Mainframe, I'd suggest you ask on the GnuCOBOL part of SourceForge.
Otherwise, yes, it would come down to looping through the data. Once you clarify what you want a bit more, you may get examples of that if you still need them.

how do i decode/encode the url parameters for the new google maps?

Im trying to figure out how to extract the lat/long of the start/end in a google maps directions link that looks like this:
https://www.google.com/maps/preview#!data=!1m4!1m3!1d189334!2d-96.03687!3d36.1250439!4m21!3m20!1m4!3m2!3d36.0748342!4d-95.8040972!6e2!1m5!1s1331-1399+E+14th+St%2C+Tulsa%2C+OK+74120!2s0x87b6ec9a1679f9e5%3A0x6e70df70feebbb5e!3m2!3d36.1424613!4d-95.9736986!3m8!1m3!1d189334!2d-96.03687!3d36.1250439!3m2!1i1366!2i705!4f13.1&fid=0
Im guessing the "!" is a separator between variables followed by XY where x is a number and y is a lower case letter, but can not quite figure out how to reliably extract the coordinates as the number/order of variables changes as well as their XY prefixes.
ideas?
thanks
Well, this is old, but hey. I've been working on this a bit myself, so here's what I've figured out:
The data is an encoded javascript array, so the trick when trying to generate your own data string is to ensure that your formatting keeps the structure of the array intact. To do this, let's look at what each step represents.
As you're correctly figured out, each exclamation point defines the start of a value definition. The first character, an int value, is an inner count, and (I believe) acts as an identifier, although I'm not 100% certain on this. It seems to be pretty flexible in terms of what you can have here, as long as it's an int. The second character, however, is much more important. It defines the data type of the value. I don't know if I've found all the data types yet, but the ones I have figured out are:
m: matrix
f: float
d: double
i: integer
b: boolean
e: enum (as integer)
s: string
u: unsigned int
x: hexdecimal value?
the remaining characters actually hold the value itself, so a string will just hold the string, a boolean will be '1' or '0', and so on. However, there's an important gotcha: the matrix data type.
The value of the matrix will be an integer. This is the length of the matrix, measured in the number of values. That is, for a matrix !1mx, the next x value definitions will belong to the matrix. This includes nested matrix definitions, so a matrix of form [[1,2]] would look like !1m3!1m2!1i1!2i2 (outer matrix has three children, inner matrix has 2). this also means that, in order to remove a value from the list, you must also check it for matrix ancestors and, if they exist, update their values to reflect the now missing member.
The x data type is another anomaly. I'm going to guess it's hexdecimal encoded for most purposes, but in my particular situation (making a call for attribution info), they appear to also use the x data type to store lat/long information, and this is NOT encoded in hex, but is an unsigned long with the value set as
value = coordinate<0 ? (430+coordinate)*1e7 : coordinate*1e7
An example (pulled directly from google maps) of the x data type being used in this way:
https://www.google.com/maps/vt?pb=!1m8!4m7!2u7!5m2!1x405712614!2x3250870890!6m2!1x485303036!2x3461808386!2m1!1e0!2m20!1e2!2spsm!4m2!1sgid!2sznfCVopRY49wPV6IT72Cvw!4m2!1ssp!2s1!8m11!13m9!2sa!15b1!18m5!2b1!3b0!4b1!5b0!6b0!19b1!19u12!3m1!5e1105!4e5!18m1!1b1
For the context of the question asked, it's important to note that there are no reliable identifiers in the structure. Google reads the values in a specific order, so always keep in mind when building your own encoded data that order matters; you'll need to do some research/testing to determine that order. As for reading, your best hope is to rebuild the matrix structure, then scan it for something that looks like lat/long values (i.e. a matrix containing exactly two children of type double (or x?))
Looks like the developer tools from current browsers (I am using Chrome for that) can give you a lot of info.
Try the following:
Go to Google Maps with Chrome (or adapt the instructions for other browser);
Open Developer Tools (Ctrl + Shift + I);
Go to Network tab. Clear the current displayed values;
Drag the map until some url with encoded data appears;
Click on that url, and then go to the Preview sub-tab;
Try this.
function URLtoLatLng(url) {
this.lat = url.replace(/^.+!3d(.+)!4d.+$/, '$1');
this.lng = url.replace(/^.+!4d(.+)!6e.+$/, '$1');
return this;
}
var url = new URLtoLatLng('https://www.google.com/maps/preview#!data=!1m4!1m3!1d189334!2d-96.03687!3d36.1250439!4m21!3m20!1m4!3m2!3d36.0748342!4d-95.8040972!6e2!1m5!1s1331-1399+E+14th+St%2C+Tulsa%2C+OK+74120!2s0x87b6ec9a1679f9e5%3A0x6e70df70feebbb5e!3m2!3d36.1424613!4d-95.9736986!3m8!1m3!1d189334!2d-96.03687!3d36.1250439!3m2!1i1366!2i705!4f13.1&fid=0');
console.log(url.lat + ' ' + url.lng);

Mathematica - Import CSV and process columns?

I have a CSV file that is formatted like:
0.0023709,8.5752e-007,4.847e-008
and I would like to import it into Mathematica and then have each column separated into a list so I can do some math on the selected column.
I know I can import the data with:
Import["data.csv"]
then I can separate the columns with this:
StringSplit[data[[1, 1]], ","]
which gives:
{"0.0023709", "8.5752e-007", "4.847e-008"}
The problem now is that I don't know how to get the data into individual lists and also Mathematica does not accept scientific notation in the form 8.5e-007.
Any help in how to break the data into columns and format the scientific notation would be great.
Thanks in advance.
KennyTM is correct.
data = Import["data.csv", "CSV"];
column1 = data[[All,1]]
column2 = data[[All,2]]
...
Davorak's answer is the correct one if you need to import a whole CSV file as an array. However, if you have a single string that you need to convert from the C/Fortran-style exponential notation, you can use ImportString with different arguments for the format. As an example, there's
In[1]:= ImportString["1.0e6", "List"]
Out[1]= {1.*^6}
The *^ operator is Mathematica's equivalent of the e. Note this is also a good way to split apart strings that are in CSV form:
In[2]:= ImportString["1.0e6,3.2,foo", "CSV"]
Out[2]= {{1.*10^6,3.2,foo}}
In both cases, you'll get your answer wrapped up in an extra level of list structure, which is pretty easy to deal with. However, if you're really sure you only have or want a single number, you can turn the string into a stream and use Read. It's cumbersome enough that I'd stick to ImportString, however:
In[3]:= Module[{stream = StringToStream["1.0e6"], number},
number = Read[stream, "Number"];
Close[stream];
number]
Out[3]= 1.*10^6
You can fix the notation by using StringReplace[].
In[1]: aa = {"0.0023709", "8.5752e-007", "4.847e-008"};
In[2]: ToExpression[
StringReplace[
#,
RegularExpression#"(^\d+\.\d+)e([+-]\d+)" -> "$1*10^$2"
]
] & # aa
Out[2]: {0.0023709, 8.5752*10^-7, 4.847*10^-8}
You can put the entire data array in place of aa to process is all at once with a one liner
{col1,col2,col3} = ToExpression[...] & # Transpose[Import["data.csv", "CSV"]];
with ToExpression[...] as above.
In MMA7, I use the "elements" argument. In fact, I can't Import even a .csv file without specifying the element:
aa=Import["data.csv","Data"]
When you do this, all strings are automatically converted to expressions: Head/#Flatten#aa is {Real, Real, ....}. Also, "8.5752e-007" becomes 8.5752*10^7, a legal MMA expression.
The result of the Import is a 1xn list {{ ... }}.
So, Transpose#aa gives the nx1 list {{.},{.}, .... }.
I think this is the format you wanted.