Why do calculations provided by Skyfield not match actual data? - astronomy

I wanted to compare the semimajor axis that Skyfield calculates with the one from TLE.
from skyfield.elementslib import osculating_elements_of as OE
from skyfield.api import EarthSatellite, load, utc
ts = load.timescale()
sat = EarthSatellite(TLE1, TLE2)
date = sat.epoch.utc
test = sat.at(ts.utc(*date))
print(OE(test).semi_major_axis.km)
print(sat_df['semimajor_axis_km'].values[0])
output:
7087.916048058872
7080.642
How come they're not the same or at least closer?

Related

How do I query for the last 30 days of data in Power Query using JSON?

I would like to request the last 30 days of CrewHu Import data from today's date in this query. At the moment it is just set to get everything greater than the 25th September 2022 but I want to change this to be a dynamic value. Has anyone else had this problem / knows of a workaround?
let
Source = Json.Document(Web.Contents("https://api.crewhu.com/api" & "/v1/survey?query={""_updated_at"":{""$gte"":""2022-09-25T00:00:00.000Z""}}", [Headers=[X_CREWHU_APITOKEN="xxxxxxxxxxx"]])),
I've tried:
OneMonthAgo = Text.Replace(Text.Start (Text.From(Date.AddDays(DateTime.LocalNow(),-30)),10),"/","-") & "T00:00:00.000Z",
And calling this as a variable but because the string does not come with quotation marks it gives a syntax error when the variable is called in the 'Source = ' line.
Well, first you want
= Date.ToText(Date.From(Date.AddDays(DateTime.LocalNow(),-30)), [Format="yyyy-MM-dd"])& "T00:00:00.000Z"
since that returns 2022-09-28T00:00:00.000Z while yours returns 9-28-2022 T00:00:00.000Z which does not seem to be the original format
then try out this, which I cant test
let variable = Date.ToText(Date.From(Date.AddDays(DateTime.LocalNow(),-30)), [Format="yyyy-MM-dd"])& "T00:00:00.000Z",
Source = Json.Document(Web.Contents("https://api.crewhu.com/api" & "/v1/survey?query={""_updated_at"":{""$gte"":"""&variable&"""}}", [Headers=[X_CREWHU_APITOKEN="xxxxxxxxxxx"]]))
in Source

Sequence to Sequence Loss

I'm trying to figure out how sequence to sequence loss is calculated. I am using the huggingface transformers library in this case, but this might actually be relevant to other DL libraries.
So to get the required data we can do:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
import torch.nn.functional as F
torch.manual_seed(42)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
MAX_LEN = 128
tokenize = lambda x: tokenizer(x, max_length=MAX_LEN, truncation=True, padding=True, return_tensors="pt")
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert from pre-trained checkpoints
input_seq = ["Hello, my dog is cute", "my cat cute"]
output_seq = ["Yes it is", "ok"]
input_tokens = tokenize(input_seq)
output_tokens = tokenize(output_seq)
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=output_tokens["input_ids"],
return_dict=True)
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
Edit 1
Thanks to #cronoik I was able to replicate the loss calculated by huggingface as being:
output_logits = logits[:,:-1,:]
output_mask = mask[:,:-1]
label_tokens = output_tokens["input_ids"][:, 1:].unsqueeze(-1)
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
huggingface_loss = -select_logits.mean()
However, since the last two tokens of the second input is just padding, shouldn't we calculate the loss to be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss = -seq_loss.mean()
^This takes into account the length of the sequence of each row of outputs, and the padding by masking it out. Think this is especially useful when we have batches of varying length outputs.
ok I found out where I was making the mistakes. This is all thanks to this thread in the HuggingFace forum.
The output labels need to have -100 for the masked version. The transoformers library does not do it for you.
One silly mistake I made was with the mask. It should have been output_mask = mask[:, 1:] instead of :-1.
1. Using Model
We need to set the masks of output to -100. It is important to use clone as shown below:
labels = output_tokens["input_ids"].clone()
labels[output_tokens["attention_mask"]==0] = -100
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=labels,
return_dict=True)
2. Calculating Loss
So the final way to replicate it is as follows:
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
# shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:]
# gather the logits and mask
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
-select_logits[output_mask==1].mean(), outputs["loss"]
The above however ignores the fact that this comes from two different lines. So an alternate way of calculating loss could be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss.mean()
thanks for sharing. However, the new version of transformers as of today actually does not "shift" anymore. The following is not needed.
#shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:

anova_test not returning Mauchly's for three way within subject ANOVA

I am using a data set called sleep (found here: https://drive.google.com/file/d/15ZnsWtzbPpUBQN9qr-KZCnyX-0CYJHL5/view) to run a three way within subject ANOVA comparing Performance based on Stimulation, Deprivation, and Time. I have successfully done this before using anova_test from rstatix. I want to look at the sphericity output but it doesn't appear in the output. I have got it to come up with other three way within subject datasets, so I'm not sure why this is happening. Here is my code:
anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
I also tried to save it to an object and use get_anova_table, but that didn't look any different.
sleep_aov <- anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
get_anova_table(sleep_aov, correction = "GG")
This is an ideal dataset I pulled from the internet, so I'm starting to think the data had a W of 1 (perfect sphericity) and so rstatix is skipping this output. Is this something anova_test does?
Here also is my code using a dataset that does return Mauchly's:
weight_loss_long <- pivot_longer(data = weightloss, cols = c(t1, t2, t3), names_to = "time", values_to = "loss")
weight_loss_long$time <- factor(weight_loss_long$time)
anova_test(data = weight_loss_long, dv = loss, wid = id, within = c(diet, exercises, time))
Not an expert at all, but it might be because your factors have only two levels.
From anova_summary() help:
"Value
return an object of class anova_test a data frame containing the ANOVA table for independent measures ANOVA. However, for repeated/mixed measures ANOVA, it is a list containing the following components are returned:
ANOVA: a data frame containing ANOVA results
Mauchly's Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly's test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels.
Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values. "

Mean_squared_error output in function includes dtype and '0'

I want to calculate test statistics for a fb prophet forecast in a function because I want to average the test stats over different forecasts and cutoff points after using the fb-prophet cross_validation to get df_cv. I created a function that I apply to the dataframe after grouping by the cutoff points, in order to receive a measure per cutoff point. Then I calculate the mean over all these values.
The problem is that my function returns not only the value I am looking for but also a 0 as well as an information of the dtype. I can still do calculations with the returned value but when I want to plot etc. later it is very inconvenient. How can I strip these unnecessary values from the output?
def compute_avg_stats(df_cv,perf_measure):
measures = {'mse':mean_squared_error,'mae':mean_absolute_error,'mape':mean_absolute_percentage_error,'rmse':mean_squared_error}
performance_stats = {}
if perf_measure == 'rmse':
measure = np.sqrt(measures[perf_measure](y_true=df_cv['y'],y_pred=df_cv['yhat']))
else:
measure = measures[perf_measure](y_true=df_cv['yu'],y_pred=df_cv['yhat'])
return measure
df_cv.groupby('cutoff').apply(compute_avg_stats,perf_measure='rmse').to_frame().mean()
I think .mean() returns a Series. Try with .mean()[0]

Highcharts with external CSV $.get - No xAxis date

I'm trying to create a spline chart using this CSV:
slave_id,date,time,rtc_temp,temp1,temp2,temp3
1,2017/12/26,16:42:59,21,11.50,13.13,5.88
2,2017/12/26,16:43:29,21,14.13,20.63,99.99
1,2017/12/26,16:44:00,21,11.50,13.13,5.88
2,2017/12/26,16:44:30,21,14.13,20.63,99.99
1,2017/12/26,16:45:01,21,11.50,13.13,5.88
2,2017/12/26,16:45:31,21,14.13,20.63,99.99
1,2017/12/26,16:46:02,21,11.50,13.13,5.88
2,2017/12/26,16:46:32,21,14.13,20.63,99.99
As you can see here [IMAGE], the graph is showing the date and time, but the x Axis is not accepting the date / time.
Ive tried using date.UTC, but that did not work either. Can someone point me in the right direction?
https://jsfiddle.net/asvoy6b9/ [not working due to CSV missing]
Full code [Hastebin]
I see that date variable in your code is a string:
// all data lines start with a double quote
line = line.split(',');
date = line[1] + " " + line[2];
(...)
RTC.push([
date,
parseInt(line[3], 10)
]);
If you choose to construct the point's options as an array of two values and the first value is a string then it's treated as its name property (not x).
Explanation: https://www.highcharts.com/docs/chart-concepts/series
In that case Highcharts assigns subsequent integers as x values for all points (that's why there're values like 00:00:00.000 (1 Jan 1970), 00:00:00.001 etc.).
You need to parse your date to timestamp. You can use Date.UTC() (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/UTC) or some other function for this.
I've managed to get it working with Date.UTC using the following code:
var yyyymmdd = line[2].split("-"); //Split the date: 2017 12 16
var hhmmss = line[3].split(":"); //Split the time: 16 11 14
var date = Date.UTC(yyyymmdd[0], yyyymmdd[1] - 1, yyyymmdd[2], hhmmss[0], hhmmss[1], hhmmss[2]); //Stitch 'em together using Date.UTC