Copying fits-file data and/or header into a new fits-file - astronomy

Similar question was asked before, but asked in an ambigous way and using a different code.
My problem: I want to make an exact copy of a .fits-file header into a new file. (I need to process a fits file in way, that I change the data, keep the header the same and save the result in a new file). Here a short example code, just demonstrating the tools I use and the discrepancy I arrive at:
data_old, header_old = fits.getdata("input_file.fits", header=True)
fits.writeto('output_file.fits', data_old, header_old, overwrite=True)
I would expect now that the the files are exact copies (headers and data of both being same). But if I check for difference, e.g. in this way -
fits.printdiff("input_file.fits", "output_file.fits")
I see that the two files are not exact copies of each other. The report says:
...
Files contain different numbers of HDUs:
a: 3
b: 2
Primary HDU:
Headers contain differences:
Headers have different number of cards:
a: 54
b: 4
...
Extension HDU 1:
Headers contain differences:
Keyword GCOUNT has different comments:
...
Why is there no exact copy? How can I do an exact copy of a header (and/or the data)? Is a key forgotten? Is there an alternative simple way of copy-pasting a fits-file-header?

If you just want to update the data array in an existing file while preserving the rest of the structure, have you tried the update function?
The only issue with that is it doesn't appear to have an option to write to a new file rather than update the existing file (maybe it should have this option). However, you can still use it by first copying the existing file, and then updating the copy.
Alternatively, you can do things more directly using the object-oriented API. Something like:
with fits.open(filename) as hdu_list:
hdu = hdu_list[<name or index of the HDU to update>]
hdu.data = <new ndarray>
# or hdu.data[<some index>] = <some value> i.e. just directly modify the existing array
hdu.writeto('updated.fits') # to write just that HDU to a new file, or
# hdu_list.writeto('updated.fits') # to write all HDUs, including the updated one, to a new file
There's nothing not "pythonic" about this :)

Related

How to generate dynamic files using config file in palantir foundry

I have two columns in config file col1 and col2.
Now I have to import this config file in my main python-transform and then extract the values of columns in order to create dynamic output path from these values by iterating over all the possible values.
For example
ouput_path1=Constant+value1+value2
ouput_path2=Constant+value3+value4
Please suggest some solution for generating output file in palantir foundary(code-repo)
What you probably want to use is a transform generator. In the "Python Transforms" chapter of the documentation, there's a section "Transform generation" which outlines the basics of this.
The most straightforward path is likely to generate multiple transforms, but if you want just one transform that outputs to multiple datasets, that would be possible too (if a little more complicated.)
For the former approach, you would add a .yaml file (or similar) to your repo, in which you define your values, and then you read the .yaml file and generate multiple transforms based on the values. The documentation gives an example that does pretty much exactly this.
For the latter approach, you would probably want to read the .yaml file in your pipeline definer, and then dynamically add outputs to a single transform. In your transforms code, you then need to be able to handle an arbitrary number of outputs in some way (which I presume you have a plan for.) I suspect you might need to fall back to manual transform registration for this, or you might need to construct a transforms object without using the decorator. If this is the solution you need, I can construct an example for you.
Before you proceed with this though, I want to note that the number of inputs and outputs is fixed at "CI-time" or "compile-time". When you press the "commit" button in Authoring (or you merge a PR), it is at this point that the code is run that generates the transforms/outputs. At a later time, when you build the actual dataset (i.e. you run the transforms) it is not possible to add/remove inputs, outputs and transforms anymore.
So to change the number of inputs/outputs/transforms, you will need to go to the repo, modify the .yaml file (or whatever you chose to use) and then press the commit button. This will cause the CI checks to run, and publish the new code, including any new transforms that might have been generated in the process.
If this doesn't work for you (i.e. you want to decide at dataset build-time which outputs to generate) you'll have to fundamentally re-think your approach. Otherwise you should be good with one of the two solutions I roughly outlined above.
You cannot programmatically create transforms based on another datasets's content. The datasets are created at CI time.
You can however have a constants file inside your code repo, which can be read at CI time, and use that to generate transforms. I.e.:
myconfig.py:
dataset_pairs = [
{
"in": "/path/to/input/dataset,
"out": "/path/to/output/dataset,
},
{
"in": "/path/to/input/dataset2,
"out": "/path/to/output/dataset2,
},
# ...
{
"in": "/path/to/input/datasetN,
"out": "/path/to/output/datasetN,
},
]
///////////////////////////
anotherfile.py
from myconfig import dataset_pairs
TRANSFORMS = []
for conf in dataset_pairs:
#transform_df(Output(conf["out"]), my_input=Input(conf["in"]))
def my_generated_transform(my_input)
# ...
return df
TRANSFORMS.append(my_generated_transform)
To re-iterate, you cannot create the config.py programatically based on a dataset contents, because when this code is run, it is at CI time, so it doesn't have access to the datasets.

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

How to get SSIS to select specific files in directory and assign name to variables (File System Task)

I have the following scenario:
I have a remote server that every week gets loaded with 2 files, these files have the following name format:
"FINAL_NAME06Apr16.txt" and
"FINAL_NAME_F106Apr16.txt"
The part in bold is fixed everytime, but the date changes, now, I need to pick, copy to another directory and rename these files. but I'm not sure about how to pick the name of the files to variables to operate with them as I need to put different name to each file.
How can I proceed? I' pretty sure it has to be done with naming a variable with an expression, but I don't know how to do that part.
I think I need some function to calculate the rest of the filename, I believe maybe some approach could be to first rename the part "FINAL_NAME_F1" and then rename the "FINAL_NAME" since some wildcards will pick both if don't do it that way?
Cheers.
You can calculate the date but why go through that complexity?
A Foreach (File) Loop Container, FELC, will handle this just fine. Add two of them to your control flow.
The first one will use a file mask of FINAL_NAME_F1*.txt. Inside that FELC, use a File System task to copy/move/rename the file to your new location.
That first FELC will run, find the target file and move it. It will then look for the next file, find none and go on to the next task.
Create a second FELC but this one will operate on FINAL_NAME*.txt It's crucial that the first FELC run first as this file mask will match both FINAL_NAME_f1-2019-01-01.txt and FINAL_NAME-2019-01-01.txt. By ordering our operations as such, we can reduce the complexity of the logic required.
Sample answer with a FELC to show where to plumb the various bits

Test and Training Set are Not Compatible

I have seen various articles about the same issue, Tried a lot of solutions and nothing is working. Kindly advice.
I am getting an error in WEKA:
"Problem Evaluating Classifier: Test and Training Set are Not
Compatible".
I am using
J48 as my algorithm
This is my Test set:
Trainset:
https://www.dropbox.com/s/fm0n1vkwc4yj8yn/train.csv
Evalset:
https://www.dropbox.com/s/2j9jgxnoxr8xjdx/Eval.csv
(I am unable to copy and paste due to long code)
I have tried "Batch Filtering" in WEKA (for Traningset) but it still does not work.
EDIT: I have even converted my .csv to .arff but still the same
issue.
EDIT2: I have made sure the headers in both CSV's match. Even
then same issue. Please help!
Please advice.
A common error in converting ".csv" files to ".arff" with Weka is when values for nominal attributes appear in a different order or not at all from dataset to dataset.
Your evaluation ".arff" file probably looks like this (skipping irrelevant data):
#relation Eval
#attribute a321 {TRUE}
Your train ".arff" file probably looks like this (skipping irrelevant data):
#relation train
#attribute a321 {FALSE}
However, both should contain all possible values for that attribute, and in the same order:
#attribute a321 {TRUE, FALSE}
You can remedy this by post-processing your ".arff" files in a text editor and changing the header so that your nominal values appear in the same order (and quantity) from file to file.
How do I divide a dataset into training and test set?
You can use the RemovePercentage filter (package weka.filters.unsupervised.instance).
In the Explorer just do the following:
training set:
Load the full dataset
select the RemovePercentage filter in the preprocess panel
set the correct percentage for the split
apply the filter
save the generated data as a new file
test set:
Load the full dataset (or just use undo to revert the changes to the dataset)
select the RemovePercentage filter if not yet selected
set the invertSelection property to true
apply the filter
save the generated data as new file

How to remove first row and last row without having identifier in Fixed length Flat file?

I have a fixed-length flat file with a header and footer, but those rows do not have any identifier like H for header, T for trailer or D for Data or something along those lines.
All my data starts with position 1, but header and footer start with different positions in the row format. I tried to use a conditional split, but I could not get the result I wanted.
Please help me to find some logic to get header and footer out of the data file. I want to store data in a SQL Server table, and header/footer data in a flat file for future reference.
As you have not included the exact file format in the your question, let us assume the looks like this:
This is the header line with some spaces at the beginning.
This, is, real, line 1
This, is, real, line 2
End of file. It has some spaces too at the beginning.
If this is a correct representation of your situation, you can use a flat file source, read each record in ins entirety in one column, say EntireRow, and then use a conditional split to identify the header/footer using the following condition:
LEN(EntireRow) > LEN(TRIM(EntireRow))
The issue here would be how would you split the meaningful rows (The real lines 1 and 2 in the example above). You can dump it in another file. Now, this file will be clean with header and footer removed. This is a simple but not the most efficient solution.
Solution 2:
A better solution would be to use a Script Component. Learn about how to add a row dynamically. One such example: http://www.sqlis.com/post/The-Script-Component-as-a-Transformation.aspx
Now, to get you started, a sample code that would eliminate the header/footer using your condition that they do not start at position 1 ... with no error handling. The link above will tell you how to add the output and the columns within that output. Make sure the data types your columns match with what your source file has.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
/*
Add your code here
*/
if ((Row.EntireRow.Length == Row.EntireRow.Trim().Length))
{
string[] sArrFields = Row.EntireRow.Split(',');
MeaningfulBuffer.AddRow();
MeaningfulBuffer.Col1 = sArrFields[0];
MeaningfulBuffer.Col2 = sArrFields[1];
MeaningfulBuffer.Col3 = sArrFields[2];
MeaningfulBuffer.Col4 = sArrFields[3];
}
}
Hope this much guidance is sufficient for you to resolve your issue and one of the two solutions I have offered works. Please respond back.