I am relatively new to Python. I need to calibrate a large number of FITS files. I have tried to get pixel counts per image using the following
dirname = 'Users\Mark\OneDrive\Python\MWtes'
for filename in glob.glob(os.path.join(dirname,'*.fits')):
with fits.open(filename) as hdulist:
hdu = hdulist[0]
image_file = get_pkg_data_filename('*.fits')
image_data = fits.getdata(image_file, ext=0)
image_data = np.array(image_data)
image_data.astype(float)
c = np.mean(image_data)
print (c)
But I get the message back that c is undefined. Any assistance would be greatly appreciated
Related
I've already trained several models for a binary classification problem, basing my election on F-Score and AUC. The code used has been the following:
svm = StandardScaler()
svm.fit(feat_train)
feat_train_std = svm.transform(feat_train)
feat_test_std = svm.transform(feat_test)
model_10= BalancedBaggingClassifier(base_estimator=SVC(C=1.0, random_state=1, kernel='linear'),
sampling_strategy='auto',
replacement=False,
random_state=0)
model_10.fit(feat_train_std, target_train)
pred_target_10 = model_10.predict(feat_test)
mostrar_resultados(target_test, pred_target_10)
pred_target_10 = model_10.predict_proba(feat_test)[:, 1]
average_precision_10 = average_precision_score(target_test, pred_target_10)
precision_10, recall_10, thresholds = precision_recall_curve(target_test, pred_target_10)
auc_precision_recall_10 = auc(recall_10, precision_10)
disp_10 = plot_precision_recall_curve(model_10, feat_test, target_test)
disp_10.ax_.set_title('Binary class Precision-Recall curve: '
'AUC={0:0.2f}'.format(auc_precision_recall_10))
Afterwards, I load the model as follows:
modelo_pickle = 'modelo_pickle.pkl'
joblib.dump(model_10,modelo_pickle)
loaded_model = joblib.load(modelo_pickle)
Then, the aim is to load a new dataset, which columns are the same as the model's variables, and make a prediction for each line:
lista_x=x.to_numpy().tolist()
resultados=[]
for i in lista_x:
pred = loaded_model.predict([i])
resultados.append(pred)
print(resultados)
However, every single result is equal to 1, which does not make any sense. Would anyone tell me what am I missing, please?
Thank you in advance.
Regards,
Previously described.
I'm trying to figure out how sequence to sequence loss is calculated. I am using the huggingface transformers library in this case, but this might actually be relevant to other DL libraries.
So to get the required data we can do:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
import torch.nn.functional as F
torch.manual_seed(42)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
MAX_LEN = 128
tokenize = lambda x: tokenizer(x, max_length=MAX_LEN, truncation=True, padding=True, return_tensors="pt")
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert from pre-trained checkpoints
input_seq = ["Hello, my dog is cute", "my cat cute"]
output_seq = ["Yes it is", "ok"]
input_tokens = tokenize(input_seq)
output_tokens = tokenize(output_seq)
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=output_tokens["input_ids"],
return_dict=True)
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
Edit 1
Thanks to #cronoik I was able to replicate the loss calculated by huggingface as being:
output_logits = logits[:,:-1,:]
output_mask = mask[:,:-1]
label_tokens = output_tokens["input_ids"][:, 1:].unsqueeze(-1)
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
huggingface_loss = -select_logits.mean()
However, since the last two tokens of the second input is just padding, shouldn't we calculate the loss to be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss = -seq_loss.mean()
^This takes into account the length of the sequence of each row of outputs, and the padding by masking it out. Think this is especially useful when we have batches of varying length outputs.
ok I found out where I was making the mistakes. This is all thanks to this thread in the HuggingFace forum.
The output labels need to have -100 for the masked version. The transoformers library does not do it for you.
One silly mistake I made was with the mask. It should have been output_mask = mask[:, 1:] instead of :-1.
1. Using Model
We need to set the masks of output to -100. It is important to use clone as shown below:
labels = output_tokens["input_ids"].clone()
labels[output_tokens["attention_mask"]==0] = -100
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=labels,
return_dict=True)
2. Calculating Loss
So the final way to replicate it is as follows:
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
# shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:]
# gather the logits and mask
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
-select_logits[output_mask==1].mean(), outputs["loss"]
The above however ignores the fact that this comes from two different lines. So an alternate way of calculating loss could be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss.mean()
thanks for sharing. However, the new version of transformers as of today actually does not "shift" anymore. The following is not needed.
#shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:
I am using a data set called sleep (found here: https://drive.google.com/file/d/15ZnsWtzbPpUBQN9qr-KZCnyX-0CYJHL5/view) to run a three way within subject ANOVA comparing Performance based on Stimulation, Deprivation, and Time. I have successfully done this before using anova_test from rstatix. I want to look at the sphericity output but it doesn't appear in the output. I have got it to come up with other three way within subject datasets, so I'm not sure why this is happening. Here is my code:
anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
I also tried to save it to an object and use get_anova_table, but that didn't look any different.
sleep_aov <- anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
get_anova_table(sleep_aov, correction = "GG")
This is an ideal dataset I pulled from the internet, so I'm starting to think the data had a W of 1 (perfect sphericity) and so rstatix is skipping this output. Is this something anova_test does?
Here also is my code using a dataset that does return Mauchly's:
weight_loss_long <- pivot_longer(data = weightloss, cols = c(t1, t2, t3), names_to = "time", values_to = "loss")
weight_loss_long$time <- factor(weight_loss_long$time)
anova_test(data = weight_loss_long, dv = loss, wid = id, within = c(diet, exercises, time))
Not an expert at all, but it might be because your factors have only two levels.
From anova_summary() help:
"Value
return an object of class anova_test a data frame containing the ANOVA table for independent measures ANOVA. However, for repeated/mixed measures ANOVA, it is a list containing the following components are returned:
ANOVA: a data frame containing ANOVA results
Mauchly's Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly's test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels.
Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values. "
I need to get the co-ordinates of matched image within the actualImage so that I can perform operations on it. However, I tried below two approaches,but both doesn't seem to work:
Approach 1:
Using below, I'm able to find a match but co-ordinates returned are just the width & height of image to be matched(which I already know). I want to get the position of the same within actual image.
BufferedImage actualImg = ImageIO.read(new File("C:/Images/SrcImg.PNG"));
ImageTarget actualTgt = new ImageTarget(actualImg);
BufferedImage searchImg = ImageIO.read(new File("C:/Images/TgtImg.PNG"));
ImageTarget searchTgt = new ImageTarget(searchImg);
ScreenRegion scrReg = new StaticImageScreenRegion(actualTgt.getImage());
ScreenRegion resReg = scrReg.find(searchTgt);
ScreenLocation center = resReg.getCenter();
System.out.println(":getElementFromImage: x_loc,y_loc =["+center.getX()+","+center.getY()+"]");
Approach 2:
In below code I tried with sikulix Finder. However, with this src.hasNext() returned true BUT src.next() threw nullpointer exception.Not sure what is the problem here:
Finder src = new Finder("C:/Images/SrcImg.PNG");
Pattern pat = new Pattern("C:/Images/TgtImg.PNG").similar(0.5);
src.find(pat);
Match m;
while( src.hasNext())
m = src.next();
src.destroy();
java.lang.NullPointerException
at org.sikuli.script.Finder.next(Finder.java:484)
at com.work.ImageFinder.main(ImageFinder.java:38)
I already spent good amount of time to make this work. Any help would be much appreciated.
Thanks!
It works fine after passing the Region to Finder like below:
Finder src = new Finder("C:/Images/SrcImg.PNG", new Region(0,0,<width>,<height>))
Pattern pat = new Pattern("C:/Images/TgtImg.PNG").similar(0.5);
src.find(pat);
Match m;
while( src.hasNext())
m = src.next();
src.destroy();
More details can be found below link:
Is it possible to use Sikuli to assert that images are the same in GUI-less mode?
I have a 3D array in Matlab of uint16(basically it is just an image 1080x1920x3). I want to store it in mysql. Here is what I'm doing:
MySQL:
create table imgtest(img longblob);
Matlab:
% image_data - is my image as described before
raw_im = reshape(image_data,1,[]);
conn = database('test','root','root','Vendor','MySQL','Server','localhost')
x = conn.Handle;
insertcommand = ['INSERT INTO imtest (img) values (?)'];
StatementObject = x.prepareStatement(insertcommand);
StatementObject.setObject(1,raw_im)
StatementObject.execute
The problem is that I'm writing about 600k uint16 values into this blob field. But when I take this field from the DB, I always getting about 1.2 million of uint8 elements(exactly two times more).
So, is there a way to read this byte field as a set of uint16, but not uint8?
Thank you.
I have been doing something similar for one of my projects
basically there was one difference but maybe it would clarify something to you.
I was loading image directly to DB from file with command:
INSERT INTO BaseImage(Image)
SELECT * FROM OPENROWSET(BULK N'C:\co.jpg', SINGLE_BLOB) as image
and getting it back to Matlab required typecasting (just like #sebastian mentioned)
SQL_query = 'select TOP 1 pk_BaseImage,Image from BaseImage order by pk_BaseImage desc';
[data] = SQL_query_exec(SQL_query);
pk_BaseImage = data.Data.pk_BaseImage;
out = typecast(data.Data.Image{1,1},'uint8');
BUT..
it was not enough, I had to do some trick to use 'out' as image
I was forced to write it to temporary file and read it again to Matlab (I know it's strange but it worked very well and I could for example calculate DWT, DFT and so on)
image_matrix = get_image_matrix( out );
get_image_matrix function looks like:
function [ out ] = get_image_matrix( input )
targetfilename = 'temp.jpg';
%wynik
fid = fopen(targetfilename,'w');
if fid
fwrite(fid,input,'uint8');
end
fclose(fid);
out = imread(targetfilename);
delete(targetfilename);
end
I hope it will help you :)
One important notice - I used gray-scale images (uint8 type)
You can most probably typecast the uint8's into uint16's to get back at your original image data:
uint16_result = typecast(uint8_result, 'uint16');
I'm not familiar with the database toolbox - so there might well be a way to tell Matlab to do this on its own.
OK, thank you both. I've summarized your answers and this what I've got:
Since blob field is nothing more than byte array, then we should cast our data in matlab before writing it to the DB. After reading it from DB, we should cast them back.
Minimum working example is:
MySQL
create table imgtest(img longblob);
Matlab
% image_data - is my image as described before
raw_im = typecast(reshape(image_data,1,[]),'uint8'); %! the main string
conn = database('test','root','root','Vendor','MySQL','Server','localhost')
x = conn.Handle;
insertcommand = ['INSERT INTO imtest (img) values (?)'];
StatementObject = x.prepareStatement(insertcommand);
StatementObject.setObject(1,raw_im)
StatementObject.execute
After we can read it back:
res = exec(conn,'Select * from imtest')
array_uint8 = fetch(res);
array_uint8 = array_uint8{1};
array_uint16 = typecast(array_uint8,'uint16').
Hope this will help someone.