How to copy variable values within an SPSS file? - data-analysis

I have three seperate SPSS files with information about roughly 7500 hemicolectomy patients. One file contains the information about the hemicolectomies, the second one about other surgeries the patients have had during their lifetime and the last one contains information about their sick leaves during their lifetime.
I have merged (idnumber is the common variable) the files to a single SPSS document but i ran into a problem with filtering out the surgeries and sick leaves that have nothing to do with the hemicolectomy. I'm quite new to SPSS so the simplest way i could think of doing this is by somehow copying the hemicolectomy info to every case and then just using the date/time calculator to choose which sick leaves and surgeries to discard. Switching to wide format is unpractical due to the large number of unrelated surgeries and sick leaves: I'd have thousands of variables.
So basically I'd like to do the following:
IF idnumber = idnumber THEN variable1=variable1 AND variable2=variable2 etc
How would I go about doing this?
All help will be appreciated!

the IF command can only be used with one transformation:
IF [condition] [transformation].
Assuming both of your files are sorted by idnumber:
UPDATE file=[master_file_reference]
/file=[secondary_file_reference]
/BY idnumber.
EXECUTE.
The file reference can be made either by their dataset name, or by their full path.
More on the UPDATE command:
https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/syn_update_examples.html

I cant comment yet, so Im sorry if I misunderstand the problem. I wouldve asked for clarification in the comments to the question... here goes...
So you have three sources of data which have dates (?) of hemicolectomies, one for each case; dates (?) of other surgeries, multiple for each case; and sickleaves even more for each case. Is that right?
I'd try solving the problem before matching all three file by matching the file that contains one observation per patient (presumably hemicolectomies) to the one with the second most observations (presumably other surgeries) per patient with the /table keyword:
MATCH FILES /FILE= 'surgeries.sav' /table = 'hemicolectomies.sav'
/by idnumber.
EXECUTE.
this will "fill up" the blank cells for each patient with the hemicolectomy data.
now use the datetime to check which surgeries "belong" to the hemicolectomies, thus reduce your data and match it to the sickleave data using the /table keyword again.
Seems like the easiest solution to me.

Related

Create variable counting unique IDs in long format table

I would like to structure my long format SPSS file so I can clean it and get a better overview. However, I run into some problems.
Patients appear several times in the database (Column patientID). How can I make a new variable that contains only 1 patient ID preferable on the line with baseline data/first moment that questionnaires are completed?
I have consulted with my colleagues, but without concrete solutions/answers
This can be done using the lag function - after sorting the file:
sort cases by PatientID_Pseudo OpenInvulMomenten.
if $casenum=1 or ($casenum>1 and PatientID_Pseudo<>lag(PatientID_Pseudo)) newvar=PatientID_Pseudo.
exe.

LabVIEW - writing data from multiple DAQ Assistants in the same .csv-file

I have the following problem with my VI, which I could not solve by myself or research:
When running the VI, the data should be stored in a .csv-File. In the pictures, you can see the block diagram. When running, it produces the following file:
Test Steady State
T_saug_1/T_saug_2/Unbelegt/Unbelegt/T_ND/T_HD/T_Wasser_ein/T_Wasser_aus/T_front/T_back/T-right/T-left
18,320 18,491 20,873 20,838 20,463 20,969 20,353 20,543 20,480 20,618
20,618 20,238
As you can see, the data gets stored only in the first column (in the preview of the post it looks like it is a row, but it is really a column; T steady state is the header). But these temperatures are not the temperatures of the first sensor, it somehow stored the value for every sensor in the respective row. When the first row was filled, it stopped storing data entirely. I did not figure out how I could insert a file here, otherwise I would have done so... I want to store the data for each sensor in the associated column.
Another problem I have: the waveform-chart, which shows all the temperatures, only updates every 4-6 seconds. Not only is the interval between every update not always the same, but from my understanding it should update every second since the while-loop has a wait-timer set to 1000ms. I don't know what my mistake here is...
Please let me know if you have any ideas on how to solve the problems I have or suggestions where I could find answers to my questions. I am very new to LabVIEW, I am sorry if this question is silly.
With best regards an thank you for the patient help,
lempy.
csv-file
Block diagram
DAQ-Assis. for PT100
DAQ-Ass. for TC
The Write Delimited Spreadsheet VI has two boolean inputs: Append to file? and transpose?
Append to file? is not set for the first write, which defaults to FALSE. That means, on each write, the file is overwritten. For the second and third call, it is set to TRUE, so those data is appended.
The most simple solution is to put the first two write functions outside the main loop. This overwrites the file at start of the VI with the headers, and values will be appended as desired.
transpose? will swap rows and columns. Wire TRUE to it, and check if it works.
About your second question:
A loop runs as fast as the slowest process inside. If the graph is updated every 6s only, something takes 6s to complete. My guess is that those temperature readings take so long...

select from table where two parameters satisfy in mysql

I am totally clueless how to get around to get the following kinda result from the same table in MySQL.
Required Result:
The raw data as shown in below image.
Mc_id and op_id can be different. For example, if mc_id is 4 and op_id is 10 then it has to loop through each vouid and extract done_on_date, again it has to loop through for the same mc_id 4 and op_id 10 and extract done_on_date where done_on_date is after first extracted done_on_date. Here second extracted done_on_date, we refer to, as next_done_on_date, just to distinguish it differently. Accordingly continue till end of the table. I hope I am clear enough now.
The idea is basically to see when was particular operation_id carried out for the said machine having mc_id. First time operation done is refered to as done_on_date and when the same operation carried out for the same machine next time, we refer to as next_done_on_date but actually inside the database table it is done_on_date.
Though let me know if anything yet to be clarified

Reshape the dataset into more relational format (Transpose SOME rows and assign them to a data subset)

I have a spreadsheet/csv:
Code:,101,Course Description:,"Introduction to Rocket Science",
Student Name,Lecture Hours,Labs Hours,Test Score,Status
John Galt,48,120,4.7,Passed
James Taggart,50,120,4.9,Passed
...
I need to reshape it to the following view:
Code:,Course Description:,Students,Lecture Hours,Labs Hours,Average Test Score,Teaching Staff
101,"Introduction to Rocket Science",John Galt,48,120,4.7,Passed
101,"Introduction to Rocket Science",James Taggart,50,120,4.9,Passed
...
Beleive it or not, can not get the right idea how to do that despite it seems to be very primitive transformation, is there any silver bullet for this?
Original records (csv) have in a way json-like structure so my first approach was to represent the original data as a vector and then transpose it, (but in this case my resulting table looks like sparced matrix - rows I have transpored are blank in the rest of its values)
Another way Im considering - **serialize it into jsons and then de-serialize** into new spreadsheet (jsonize()) - in this case, Im having problems with merging them properly.
In both ways I have it "half-working";
Can anyone suggest simple and reliable algorithm for this;
Any language, RegEx, any tools, code snippets are very appreciated
Assuming that the pattern you've described here is consistent throughout, there are quite a few different approaches you could take I think, but in all cases you basically can use that fact that the 'Course' rows start with "Code:" but that's never going to be a student name.
You can take advantage of this either by a regular expression find/replace, or within OpenRefine.
Example:
Open file in a text editor that supports regular expressions in
find/replace
Search for lines starting with 'Code:' and add additional commas to the start of the row to shift the course data columns to the
right e.g. search for: ^Code: replace with: ,,,,,^Code:
If you now import the file into OpenRefine then you'll have a project with 10 columns (the 10th col is caused by the trailing
comma at the end of the course data row)
You can now use Transpose (or just rename) on the right-most columns which contain the course data, while leaving the left-most
columns which contain the student details
Isolate the rows that contain the phrase 'Student Name' in the first column and remove them (via a filter or facet)
Move the Course Code/Description columns to the beginning of the project, and use the 'Edit Cells->Fill Down' option on each column to get the values repeated on all the relevant lines
Finally rename the columns as you want, remove any extraneous columns

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.