Can OpenMDAO unit conversion handle nonstandard-power scalings? - units-of-measurement

In tokamak plasma physics, scaling laws are often used to estimate plasma performance. These often require non-standard units, in particular, plasma density n in units of 10^19 or 10^20 particles per cubic meter, often called n19 and n20, respectively. (Nobody calls these 10 or 100 exa-particles-per-cubic-meter.) At the same time, many physics formulae call for these values in the "standard" m^-3.
I can imagine an OpenMDAO ScalingLaw component taking in an input with units=m**-3 (and probably ref=1e19 for numerical ease), and then manually dividing by 1e19 to get an n19.
Is there a 'better' way to handle this scale conversion automatically?

It sounds like the appropriate approach here might just be to add new units to OpenMDAO's unit library. This would make the conversion automatic as the data is passed around.
For instance, in astrodynamics we sometimes invent "canonical" units in which the distance unit is set to some specified value (like the Earth's radius), the gravitational parameter GM is assumed to be 1, and time units fall out of this. In OpenMDAO, the Distance Units (DU) can be set using the following code somewhere in your script before you start defining inputs and outputs:
import openmdao.utils.units as units
# Add canonical units to OpenMDAO
MU_earth = 3.986592936294783e14
R_earth = 6378137.0
period = 2 * np.pi * np.sqrt(R_earth**3 / MU_earth)
units.add_unit('TU', f'{period}*s')
units.add_unit('DU', f'{R_earth}*m')
You can verify the functionality by adding the units and then using the OpenMDAO convert_units function to test them:
import openmdao.utils.units as units
from openmdao.api import convert_units
units.add_unit('n19', '10**19/m**3')
units.add_unit('n20', '10**20/m**3')
print(convert_units(1, 'n19', 'm**-3'))
which outputs
1e+19
Once those units are added to the system, you can specify units='n19' or units='n20' when you add inputs or outputs.

Related

Is it possible that the number of basic functions is more than the number of observations in spline regression?

I want to run regression spline with B-spline basis function. The data is structured in such a way that the number of observations is less than the number of basis functions and I get a good result.
But I`m not sure if this is the correct case.
Do I have to have more rows than columns like linear regression?
Thank you.
When the number of observations, N, is small, it’s easy to fit a model with basis functions with low square error. If you have more basis functions than observations, then you could have 0 residuals (perfect fit to the data). But that is not to be trusted because it may not be representative of more data points. So yes, you want to have more observations than you do columns. Mathematically, you cannot properly estimate more than N columns because of collinearity. For a rule of thumb, 15 - 20 observations are usually needed for each additional variable / spline.
But, this isn't always the case, such as in genetics when we have hundreds of thousands of potential variables and small sample size. In that case, we turn to tools that help with a small sample size, such as cross validation and bootstrap.
Bootstrap (ie resample with replacement) your datapoints and refit splines many times (100 will probably do). Then you average the splines and use these as the final spline functions. Or you could do cross validation, where you train on a smaller dataset (70%) and then test it on the remaining dataset.
In the functional data analysis framework, there are packages in R that construct and fit spline bases (such as cubic, B, etc). These packages include refund, fda, and fda.usc.
For example,
B <- smooth.construct.cc.smooth.spec(object = list(term = "day.t", bs.dim = 12, fixed = FALSE, dim = 1, p.order = NA, by = NA),data = list(day.t = 200:320), knots = list())
constructs a B spline basis of dimension 12 (over time, day.t), but you can also use these packages to help choose a basis dimension.

Negative binomial regression SPSS - Quantity vs Distance

I have quite a simple dataset of quantities of litter found in a national park located on an island. For each data point I have corresponding GPS coordinates, and I've derived the distance of each point to the shore. My aim: observe if the quantities of litter increase or decrease with the distance to shore. I'm assuming that quantities of litter will increase with a decrease in distance, as litter is commonly found on beaches etc.
Quantities of litter are counts, i.e. non-parametric. Additionally I've tested the data to see if it follows a Poisson model and it does not (p-value <0.05), and I have a larger variance than the mean for each variable (quantity and distance) seemingly overdispersed. Therefore, I went on using a negbin regression, with an output as follows:
Omnibus test is highly significant (p=0.000). I was just slightly puzzled on the parameter estimates, and generally hoping that this approach makes sense. Any input much appreciated.
Interpreting the parameter estimates requires knowing the link function specified, which would be a log link if you specified your model as a negative binomial with log link on the Type of Model tab, but could be something else if you specified a custom model using a negative binomial distribution with another link (which could be identity, negative binomial, or power, instead).
If it's a log link, then for a distance of 0 (at the shore), you predict exp(2,636) for the count, or about 13,957. For a given distance from the shore, multiply the distance by -,042 and add that to the 2,636 value, then take the exponential function to the resulting power. So for every unit away from the shore you move, the log of the prediction decreases by ,042, and the prediction is multiplied by about ,959. One unit away, you predict about 13,383 for the count, two units away, about 12,833, etc. So the results are in general accord with your hypothesis. Different specific calculations would be required if you used a different link function.

Topojson: quantization VS simplification

What is the difference between quantization and simplification?
Is quantization another way of doing simplification?
Is it better to use quantization in certain situations?
Or should i be using a combination of both?
The total size of your geometry is controlled by two factors: the number of points and the number of digits (the precision) of each coordinate.
Say you have a large geometry with 1,000,000 points, where each two-dimensional point is represented as longitude in ±180° and latitude in ±90°:
[-90.07231180399987,29.501753271000098],[-90.06635619599979,29.499494248000133],…
Real numbers can have arbitrary precision (in JSON; in JavaScript they are limited by the precision of IEEE 754) and thus an infinite number of digits. But in practice the above is pretty typical, so say each coordinate has 18 digits. Including extra symbols ([, ] and ,), each point takes at most 1 + 18 + 1 + 18 + 1 = 39 bytes to encode in JSON, and the entire geometry is about 39 * 1,000,000 ≈ 39MB.
Now say we convert these real numbers to integers: both longitude and latitude are reduced to integers x and y where 0 ≤ x ≤ 99 and 0 ≤ y ≤ 99. A simple mapping between real-number points ⟨λ,φ⟩ and integer coordinates ⟨x,y⟩ is:
x = floor((λ + 180) / 360 * 100);
y = floor((φ + 90) / 180 * 100);
Since each coordinate now takes at most 2 digits to encode, each point takes at most 1 + 2 + 1 + 2 + 1 = 7 bytes to encode in JSON, and the entire geometry is about 7MB; we reduced the total size by 82%.
Of course, nothing comes for free: if you remove too much information, you will no longer be able to display the geometry accurately. The rule of thumb is that the size of your grid should be at least twice as big as the largest expected display size for the entire map. For example, if you’re displaying a world map in a 960×500 pixel space, then the default 10,000×10,000 (-q 1e4) is a reasonable choice.
So, quantization removes information by reducing the precision of each coordinate, effectively snapping each point to a regular grid. This reduces the size of the generated TopoJSON file because each coordinate is represented as an integer (such as between 0 and 9,999) with fewer digits.
In contrast, simplification removes information by removing points, applying a heuristic that tries to measure the visual salience of each point and removing the least-noticeable points. There are many different methods of simplification, but the Visvalingam method used by the TopoJSON reference implementation is described in my Line Simplification article so I won’t repeat myself here.
While quantization and simplification address these two different types of information mostly independently, there’s an additional complication: quantization is applied before the topology is constructed, whereas simplification is necessarily applied after to preserve the topology. Since quantization frequently introduces coincident points ([24,62],[24,62],[24,62]…), and coincident points are removed, quantization can also remove points.
The reason that quantization is applied before the topology is constructed is that geometric inputs are often not topologically valid. For example, if you takes a shapefile of Nevada counties and combine it with a shapefile of Nevada’s state border, the coordinates in one shapefile might not exactly match the coordinates in the other shapefile. By quantizing the coordinates before constructing the topology, you snap the coordinates to a regular grid and can get a cleaner topology with fewer arcs, hopefully correctly identifying all shared arcs. (Of course, if you over-quantize, then you can cause too many coincident points and get self-intersecting arcs, which causes other problems.)
In a future release, maybe 1.5.0, TopoJSON will allow you to control the quantization before the topology is constructed independently from the quantization of the output TopoJSON file. Thus, you could use a finer grid (or no grid at all!) to compute the topology, then simplify, then use a coarser grid appropriate for a low-resolution screen display. For now, these are tied together, so I recommend using a finer grid (e.g., -q 1e6) that produces a clean topology, at the expense of a slightly larger file. Since TopoJSON also uses delta-encoded coordinates, you rarely pay the full price for all the digits anyway!
The two are related, but have different purposes and results.
I believe quantization collapses nearby points based on the parameter (which you tune to the expected resolution of the view) - no point in having a resolution higher than the pixels that will be drawing the map. But it doesn't go out of the way to analyze the path to determine the optimal number of points needed to represent the shape.
Simplification is an algorithm that will analyze the polygon and reduce the number of points in an optimal manner such that the overall deformation of the polygon is minimized. Basically, it can be used to dramamatically reduce the number of points (and thus file size) without noticeable impact to the quality of the path.
As a parallel case study, consider a straight line made up of 10 points. Quantization will reduce the number of points (collapsing nearby or coincident points) based on the value you use. Simplification will analyze the line and realize that 8 out of the ten points can be removed without significantly changing the polygon's overall shape, and reduce the line to two points (because there is no deformation of the path by removing points on a line).
See also:
Topojson reference: https://github.com/mbostock/topojson/wiki/Command-Line-Reference
M. Bostock's Simplification article: http://bost.ocks.org/mike/simplify/
Both should be used in combination: quatization to reduce the map to a right sized grid, simplification to optimize the paths.

Temperature Scale in SA

First, this is not a question about temperature iteration counts or automatically optimized scheduling. It's how the data magnitude relates to the scaling of the exponentiation.
I'm using the classic formula:
if(delta < 0 || exp(-delta/tK) > random()) { // new state }
The input to the exp function is negative because delta/tK is positive, so the exp result is always less then 1. The random function also returns a value in the 0 to 1 range.
My test data is in the range 1 to 20, and the delta values are below 20. I pick a start temperature equal to the initial computed temperature of the system and linearly ramp down to 1.
In order to get SA to work, I have to scale tK. The working version uses:
exp(-delta/(tK * .001)) > random()
So how does the magnitude of tK relate to the magnitude of delta? I found the scaling factor by trial and error, and I don't understand why it's needed. To my understanding, as long as delta > tK and the step size and number of iterations are reasonable, it should work. In my test case, if I leave out the extra scale the temperature of the system does not decrease.
The various online sources I've looked at say nothing about working with real data. Sometimes they include the Boltzmann constant as a scale, but since I'm not simulating a physical particle system that doesn't help. Examples (typically with pseudocode) use values like 100 or 1000000.
So what am I missing? Is scaling another value that I must set by trial and error? It's bugging me because I don't just want to get this test case running, I want to understand the algorithm, and magic constants mean I don't know what's going on.
Classical SA has 2 parameters: startingTemperate and cooldownSchedule (= what you call scaling).
Configuring 2+ parameters is annoying, so in OptaPlanner's implementation, I automatically calculate the cooldownSchedule based on the timeGradiant (which is a double going from 0.0 to 1.0 during the solver time). This works well. As a guideline for the startingTemperature, I use the maximum score diff of a single move. For more information, see the docs.

Java units of measurement libraries other than JSR-275 and Units of Measure API

Are there any Java libraries dealing with units of measurement except for JSR 275 (rejected and abandoned) and Units of Measure API (which doesn't seem to have any production-quality implementations)?
I have written a units library that does not use static typesetting (as in many practical applications I encountered this would have been more cumbersome that I would like such a library to be).
It is designed to handle string based units as well as sharper defined units.
Some of the supported features include:
conversions of values, e.g.:
Units.convert(3, "m", "mm");
Units.convert(3, SiBaseUnit.METER, "mm");
would both return 3000.
simplification of string based units, e.g.:
Units.simplify("kg^3 m^4 s^-6 A^-1");
would return "J^2 T".
finding the names of a unit in a specific context, e.g.:
Units.inContext("lx s", UnitContextMatch.COMPATIBLE, PhysicsContext.PHOTOMETRY)
would return a navigable set containing ("luminous exposure").
supports SI units, binary units, imperial units, US customary units, atomic units, planck units and many more. The user can also easily define own units.
fully supports arbitrary logarithmic units, e.g.
LevelUnit.BEL.inReferenceTo(1, Unit.of("mV")); // automatically determines ref type -> root power
LevelUnit.BEL.inReferenceTo(1, Unit.of("W"), LevelUnitReferenceType.POWER); // specify type explicitly
Unit.of("ln(re 1 nA)") == LevelUnit.NEPER.inReferenceTo(1, Unit.of("nA")); // true
supports SI prefixes, binary prefixes and allows the user to easily implement own prefixes
Can handle unknown units if not relevant, e.g.:
Units.convert(3, "m^2 this_is_not_a_unit", "mm^2 this_is_not_a_unit");
would return 3e6, as the unknown unit this_is_not_a_unit is the same on both sides of the conversion.
for performance critical parts of the code one can obtain the conversion factor (if the conversion is purely multiplicative), e.g.:
Units.factor("kg", "t");
will return 1e-3.
Allows to check for equivalence, e.g.
Units.equivalent(1, "s", "min");
will return false, as 1min is not the same as 1s. On the other hand, checking for convertibility
Units.convertible("s", "min");
will return true.
tightly integrated in the coordinates library (as of Java 16 this library still requires preview-features, but as of Java 17 it will production ready)
The constants are implemented via a Constant interface that supports e.g.:
definition of own constants, e.g.
// (3 ± 0.2) mole
Constant.of(3, 0.2, "mole");
chaining commands, e.g.
// constant with the distance travelled by light in vacuum in (2 ± 0) seconds as value
PhysicsConstant.SPEED_OF_LIGHT_IN_VACUUM.mul(2, 0, SiBaseUnit.SECOND);
// constant of the elementary charge per (electron) mass
PhysicsConstant.ELEMENTARY_CHARGE.div(PhysicsConstant.ELECTRON_MASS);
Constant c = Constant.of(3, 0.2, "mole");
PhysicsConstant.SHIELDING_DIFFERENCE_OF_T_AND_P_IN_HT.mul(c);
(simple) uncertainty propagation
the Constant interface provides default implementations for the Texable interface from the jatex module, such that a constant can easily return proper LaTeX code.
properly documented implementations for most of the physics constants as defined by NIST, as well as some mathematical constants.
https://github.com/unitsofmeasurement/uom-se from JSR 363
https://mvnrepository.com/artifact/org.unitsofmeasurement/unit-api/0.6.2-RC1
Hopefully your problem got solved approx. 4 yrs ago!