how to filter the abnormal temperature and humidity sensor values - kalman-filter

The temperature and humidity sensors get abnormal values like 80 degree from time to time.
How to filter the abnormal temperature and humidity sensor values? Is Kalman filtering the solution to filter out the abnormal values.

You don't describe enough about your application to indicate if a Kalman filter is really called for. However you end up filtering your data, however, you will likely need to do some filtering for the data outliers which you describe. There is a large universe of Robust techniques such as Trimmed Mean, Median Absolute Deviation, Least Median Of Squares (LMedS), and so forth. This document provides a good summary of those methods.
Also, Learning an Outlier-Robust Kalman Filter provides a good example of robust techniques used within a Kalman Filter which seems more in line with your question. I have used adaptations of this technique quite successfully in applications.

Related

How to deal with different subsampling time in economic datasets for deep learning?

I am building a deep learning model for macro-economic prediction. However, different indicators varies widely when it comes to its subsampling time, ranging from minutes to annually.
Dataframe example
The picture contains the 'Treasury Rates (DGS1-20)' which is sampled daily and 'Inflation Rate(CPALT...)' which is sampled monthly. These features are essential for the model to train and dropping out the NaN rows would result in too little data.
I've read some books and articles about how to deal with missing data that includes down sampling to monthly time frames, swapping the NaNs with -1, filling it with averages between the last and next value etc. But the methods that I read mostly deals with data sets that has a missing value of about 10% of the whole dataset while in this case of mine, the monthly sampled 'Inflation(CPI)' is missing at 90+% if I combine it with the 'Treasury Rate' dataset.
I was wondering if there was any workaround to handle missing values, particularly for economic data where the sampling time gap ranges so widely. Thank you

Designing a Kalaman filter

I am working on a project to monitor a water tank for energy storage. I have a large water tank, to which 10 thermometers are attached. By measuring the temperature, I can estimate the energy stored in the tank. That is quite simple, but I want to add another feature.
By sampling the amount of energy in time, I want to determine the power that is currently flowing in or out of the tank. I am measuring the energy in the tank every minute and determining the current power by comparing it to the last measurement. The direct result is very noisy (for example jumping between 2-4 kW), so I need some kind of filtering.
I started simply with averaging, which works fine,(current power is the average from last 10 measurements) but I wanted to try something a bit fancier. So I wrote a simple Kalman filter, the one described in this video - https://www.youtube.com/watch?v=PZrFFg5_Sd0&list=PLX2gX-ftPVXU3oUFNATxGXY90AULiqnWT&index=5 The problem is, I guess, that this filter is designed for static measurements, and my measurement changes quite a bit. As an example, in this plot, there are 60 measurements (1 every minute) and the line is the output of the Kalman filter:
Image
Is it possible to modify the filter to follow the actual value more accurately? Or is this just not a suitable system to filter with this kind of filter and I should use something else?

gnuRadio Dual Tone detection

I am trying to come up with an efficient way to characterize two narrowband tones separated by about 900kHz (one at around 100kHZ and one at around 1MHz once translated to baseband). They don't move much in freq over time but may have amplitude variations we want to monitor.
Each tone is roughly about 100Hz wide and we are required to characterize these two beasts over long periods of time down to a resolution of about 0.1 Hz. The samples are coming in at over 2M Samples/sec (TBD) to adequately acquire the highest tone.
I'm trying to avoid (if possible) doing brute force >2MSample FFTs on the data once a second to extract frequency domain data. Is there an efficient approach? Something akin to performing two (much) smaller FFTs around the bands of interest? Ive looked at Goertzel and chirp z methods but I am not certain it helps save processing.
Something akin to performing two (much) smaller FFTs around the bands of interest
There is, it's called Goertzel, and is kind of the FFT for single bins, and you already have looked at it. It will save you CPU time.
Anyway, there's no reason to do a 2M-point FFT; first of all, you only want a resolution of about 1/20 the sampling rate, hence, a 20-point FFT would totally do, and should be pretty doable for your CPU at these low rates; since you don't seem to care about phase of your tones, FFT->complex_to_mag.
However, there's one thing that you should always do: look at your signal of interest, and decimate down to the rate that fits exactly that. Since GNU Radio's filters are implemented cleverly, the filter itself will only run at the decimated rate, and you can spend the CPU cycles saved on a better filter.
Because a direct decimation from 2MHz to 100Hz (decimation: 20000) will really have an ugly filter length, you should do this multi-rated:
I'd try first decimating by 100, and then in a second step by 100, leaving you with 200Hz observable spectrum. The xlating fir filter blocks will let you use a simple low-pass filter (use the "Low-Pass Filter Taps" block to define a variable that contains such taps) as a band-selector.

Estimating error with a Kalman Filter

I'm working on adding a simple 1-D Kalman Filter to an application to process some noisy input data and output a cleaned result.
The example code I'm using comes from the Single-Variable example section of this tutorial and this python code.
This is working well for calculating the resulting value, however, when I first read about Kalman Filters I was under the impression that they could also be used for giving some measurement about how much "error" is in the inputs.
As an example, say I'm measuring an value of 10 but my input has a large amount of error. My input data may look like 6, 11, 14, 5, 19, 5, etc (some gaussian distribution around 10).
But say I switch to a less-noisy measurement and the measurements are 9.7, 10.3, 10.1, 10.0, 9.8, 10.1.
In both cases, the Kalman Filter will theoretically converge to a proper measurement of 10. What I want is to also have it give me some sort of numerical value estimating how much error there was in these data streams.
I believe this should be quite possible with a Kalman Filter, however, I'm having trouble finding a resource describing this. How can I do this?
In fact the situation is quite the opposite: The KF's estimate of your process noise is not affected by your data at all. If you look at the predict/update steps of the KF you'll see that the P term is never influenced by your state or your measurements. It is computed from your estimate of the additive process noise Q and your estimate of the measurement noise R.
If you have a dataset and want to measure it you can compute its mean and variance (which are what your state and process covariance represent). If you are talking about your input then you are talking about measuring the variance of your samples to set R.
If your actual input measurements are actually less noisy than expected then you will get a less noisy state, but with greater latency than you would have if you had set expectations correctly in R.
In a running filter you can look at your innovation sequence (the differences between the predicted and actual measurements) and compare them to your predicted innovation covariance (usually called S although sometimes rolled directly in as the denominator of K).
The Kalman Filter won’t give you a measurement about how much “error” is in the inputs. You will only get the error of the estimated outputs.
Why don‘t you use an online algorithm to calculate the variance of the inputs?

Best practice for storing GPS data of a tracking app in mysql database

I have a datamodel question for a GPS tracking app. When someone uses our app it will save latitude, longitude, current speed, timestamp and burned_calories every 5 seconds. When a workout is completed, the average speed, total time/distance and burned calories of the workout will be stored in a database. So far so good..
What we want is to also store the data that is saved those every 5 seconds, so we can utilize this later on to plot graphs/charts of a workout for example.
How should we store this amount of data in a database? A single workout can contain 720 rows if someone runs for an hour. Perhaps a serialised/gzcompressed data array in a single row. I'm aware though that this is bad practice..
A relational one/many to many model would be undone? I know MySQL can easily handle large amounts of data, but we are talking about 720 * workouts
twice a week * 7000 users = over 10 million rows a week.
(Ofcourse we could only store the data of every 10 seconds to halve the no. of rows, or every 20 seconds, etc... but it would still be a large amount of data over time + the accuracy of the graphs would decrease)
How would you do this?
Thanks in advance for your input!
Just some ideas:
Quantize your lat/lon data. I believe that for technical reasons, the data most likely will be quantized already, so if you can detect that quantization, you might use it. The idea here is to turn double numbers into reasonable integers. In the worst case, you may quantize to the precision double numbers provide, which means using 64 bit integers, but I very much doubt your data is even close to that resolution. Perhaps a simple grid with about one meter edge length is enough for you?
Compute differences. Most numbers will be fairly large in terms of absolute values, but also very close together (unless your members run around half the world…). So this will result in rather small numbers. Furthermore, as long as people run with constant speed into a constant direction, you will quite often see the same differences. The coarser your spatial grid in step 1, the more likely you get exactly the same differences here.
Compute a Huffman code for these differences. You might try encoding lat and long movement separately, or computing a single code with 2d displacement vectors at its leaves. Try both and compare the results.
Store the result in a BLOB, together with the dictionary to decode your Huffman code, and the initial position so you can return data to absolute coordinates.
The result should be a fairly small set of data for each data set, which you can retrieve and decompress as a whole. Retrieving individual parts from the database is not possible, but it sounds like you wouldn't be needing that.
The benefit of Huffman coding over gzip is that you won't have to artificially introduce an intermediate byte stream. Directly encoding the actual differences you encounter, with their individual properties, should work much better.