How to read "select all that apply" ACASI data into SAS? - binary

I'm using Audio Computer-Assisted Self-Interview (ACASI) data (http://www.novaresearch.com/QDS/) that includes some multiple choice, "select all that apply" values coded as binary (0100010) depending on what the participants chose:
Raw binary data for "What kind of insurance do you have? Please select all that apply."
What is the easiest way to read this data into SAS so that it understands that multiple values were selected per participant?
Note: I looked at the answer here, How to clean and re-code check-all-that-apply responses in R survey data?, but wonder if--since my data is already binary--I can just read it into SAS as is? I'm also not sure what the syntax is for SAS, since it obviously varies from R.
Thanks!

If you want to divide TextDisplay variable to 12 parameters/variables, you can use an array to create a set of variables, and using a loop, moving between variables in this array. Here you have some example:
data result;
TextDisplay = "001001001000";
array tab(*) $1 q1-q12;
do i = 1 to dim(tab);
tab(i) = substr(TextDisplay, i, 1);
end;
drop i;
run;

Related

Read every nth batch in pyarrow.dataset.Dataset

In Pyarrow now you can do:
a = ds.dataset("blah.parquet")
b = a.to_batches()
first_batch = next(b)
What if I want the iterator to return me every Nth batch instead of every other? Seems like this could be something in FragmentScanOptions but that's not documented at all.
No, there is no way to do that today. I'm not sure what you're after but if you are trying to sample your data there are a few choices but none that achieve quite this effect.
To load only a fraction of your data from disk you can use pyarrow.dataset.head
There is a request in place for randomly sampling a dataset although the proposed implementation would still load all of the data into memory (and just drop rows according to some random probability).
Update: If your dataset is only parquet files then there are some rather custom parts and pieces that you can cobble together to achieve what you want.
a = ds.dataset("blah.parquet")
all_fragments = []
for fragment in a.get_fragments():
for row_group_fragment in fragment.split_by_row_group():
all_fragments.append(row_group_fragment)
sampled_fragments = all_fragments[::2]
# Have to construct the sample dataset manually
sampled_dataset = ds.FileSystemDataset(sampled_fragments, schema=a.schema, format=a.format)
# Iterator which will only return some of the batches
# of the source dataset
sampled_dataset.to_batches()

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

Octave: Saving Files In a loop

I am currently working on a project that involves long csv files. I have a for loop that separates different values in the time column, then finds the max in each section of time (there are many data points for each point in time). I want to save the data as either a .csv or a .dat, but I can only seem to save either the first or the last value. How can I get octave to save data in a new row on every pass through the loop?
If you are not too keen on writing to file on every loop which is generally slow, you can accumulate data in a variable and write data in one go.
X = [];
for i = 1:100,
X = [X;i]; //instead of i you can use row vectors
end
save("myfile.dat",'X');
And if you are keen on loops then use '-append' option
X = [];
for i = 1 : 10,
save("-append","myfile.dat",'i');
end

F# Read File, Split string list, summarize data, Nonfloat decimal numbers

I'm new to F# and got this assignment to create a very simple bankrepresentation.
I do not want any code answers directly related to the problem, but preferally links or tips on where to find solutions or how to find do the solutions.
The issues are the following:
Reading lines of a file (a line looks like this: "126,145001,1500.00" and it's sequence_number, account_number, amount)
Split the line to use the data from the line
summarize the data (to return the bank account balance)
Not using floating point numbers representing the amount, due to rounding errors(?)
Doing all of these in one function.
I know how to read a file, in a function.
I also know how to split a string.
I know how to recursivly add values from a list.
I do not know how to add values that are decimal without floating-point variables.
I do not know how to retrieve the string from a list in a function and split it.
I do not know how to do all of these things in on function taking in file name, account number, and account currency.
The function should return the balance after the transactions in the file have been proccessed.
My idea to solve this is to create a datatype that have the three variables sequence_number, account_number and amount, and then do the following:
Read the file,
Split the data and create an object of my custom type for each line in the file
Add and remove the values from the types and return the final balance.
If anyone could point me in the right direction for each or any problem I would be really thankful!
.NET contains a type called System.Decimal that is indeed more appropriate for storing financial figures than the typical floating point types. In F#, you can use the decimal function to convert a value of a different type (say a string) to a System.Decimal (which F# abbreviates as a type also named decimal): let d = decimal "1.23" You can also create these values directly by using the M suffix: let d' = 1.23M, but in your case that doesn't seem relevant.
Regarding your other questions, if you use System.IO.File.ReadLines, then you can get the individual lines of your file as a sequence. Then you can string together a bunch of operations on that sequence to achieve your desired result. For instance, you can take the sequence and use Seq.map <your splitting code here> to split each line (and convert to instances of your specific data type, if desired), and then use Seq.groupBy to group the transactions by account number, and then Seq.map again to apply your summarization logic to each group. Ask follow-up questions if any of this is unclear.

Where do I get "junk" data to help test my code?

For my C class I've written a simple statistics program -- it calculates max, min, mean, etc. Anyway, I've gotten the program successfully compiled, so all I need to do now is actually test it; the only problem is that I don't have anything to test with.
In my case, I need a list of doubles -- my program needs to accept between 2 and 1,000,000; Is there some resource online that can produce lists of otherwise meaningless data? I know Lorem Ipsum gets used for typesetting, and I'm wondering if there's something similar for various types of numerical data.
Or am I out of luck, and I'll have to just create my own junk data?
The problem with testing software is not the source of the data, but the test set. I mean, can you test an int sum(int a, int b) method by just inputting random numbers to it? No, you need to know what to expect. This is a test set: inputs and expected outputs.
What do you say when you discover that 548888876+99814465=643503341? How can you tell this is the real result?
More than finding random numbers to give your program, you must somehow know the results of your computation in advance in order to compare it.
There are a few ways to do it: what I suggest you is to pick a random number generator (amphetamachine +1) and use the data both on your code and on a program that you already know is good, ie. Matlab for your purposes. After computing your statistics with both, compare your results and see if you coded good or need to do some debug.
By the way, I volountarily altered the result of the above sum...
What about just generating a random double?
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
double number = r.NextDouble();
//do something with the value
}
Since the data you need will depend on the program, there is no source of generic data that I know of.
If you are able to write that program, you should be able to write a script to generate dummy data for yourself.
Just use a loop to print out random numbers within the range your program can accept.
Generate a file with random bytes:
$ dd \
of=random-bytes \
if=/dev/urandom \
bs=1024 \
count=1024
http://www.generatedata.com/#generator
I've used that data generator before with some success. To be fair, it will usually involve copy/pasting the data it generates into some other format that you'll be able to read in.
You can generate your own data for this specific case quite easily though. Loop a random number of times with a terminating condition of 1,000,000. Generating random doubles within the range you expect. Feed that in and away you go.
Generating your own test data in this case is probably the best option.
You could take the first million digits of pi and chop them up into however many doubles you want.
The first few could be 3.14159, 2.65358, 9.79323, 8.46264, 3.38327, 9.50288, 4.19716, and 9.39937, for example.