If I run a prediction against a model with multiple categories, are the scores split amongst all categories? - amazon-machine-learning

Let's say I have created a model with ~30 items for each of 10 categories. I've taken all of the defaults that were provided to me.
The Average F1 Score for the model is 0.875 (I have 2 categories that are very closely related, so that's hurting accuracy a bit).
If I do a real-time prediction for a piece of text that should match positively for category 3 and 8, I get this result:
{
"Prediction": {
"details": {
"Algorithm": "SGD",
"PredictiveModelType": "MULTICLASS"
},
"predictedLabel": "8",
"predictedScores": {
"1": 0.002642059000208974,
"2": 0.010648942552506924,
"3": 0.41401588916778564,
"4": 0.02918998710811138,
"5": 0.008376320824027061,
"6": 0.009010250680148602,
"7": 0.006029266398400068,
"8": 0.4628857374191284,
"9": 0.04102163389325142,
"10": 0.01617990992963314
}
}
}
What I'm wondering is whether 3 & 8 both had effectively an ~80% certainty, but because they both matched the certainty was split between the two. If you sum all the predictedScores, you get .999999997, which has me questioning whether there's a total 1.0 score that gets split amongst each of the available categories...
If I instead set up 10 different models, and did binary matches against each of them independently, would I see that 3 & 8 would score higher (e.g. something closer to 0.8)?
I guess a related question, that I don't really need answered but might help clarify the overall question, is ... If I had a theoretical piece of text that definitely fit all 10 categories, could Amazon Machine Learning respond with a predictedScore value of 1.0 for each category? Or, because the maximum predictedScore is 1.0, would it return 0.1 for each category?

Amazon ML returns probabilities for each category known from the input set. Because they are true modeled probabilities, they must sum up to 1. In other words, you are correct when you say "there's a total 1.0 score that gets split amongst each of the available categories..."
Here is a reference page that answers this and some of your other questions:
http://docs.aws.amazon.com/machine-learning/latest/dg/reading-the-batchprediction-output-files.html#interpreting-the-contents-of-batch-prediction-files-for-a-multiclass-classification-ml-model

Related

Escaping json parsed from html using node-html-parser

I'm trying to parse the application/ld+json of a page parsed with node-html-parser I got it all working until I got this unescaped JSON issue where a \n in values is messing things up.
The small bit of JSON causing the issue (rest of JSON has been removed):
{
"name": "3. Given the balanced equation: 2H2(g) +
O2(g) --> 2H2O(l)
How many grams of H2O are formed if 9.00 mol H2(g) reacts
completely with an excess of O2(g)?
The molar mass of H2O is 18.0g/mol.
"
}
I tried using this escape function solution (simply put, str.replace(/[\n]/g, '\\n'), but it broke it.
How might I parse this string, with some values containing random new lines, and how to fix it?
Full Context (just for reference):
Source: https://www.numerade.com/ask/question/3-given-the-balanced-equation-2h2g-o2g-2h2ol-how-many-grams-of-h2o-are-formed-if-900-mol-h2g-reacts-completely-with-an-excess-of-o2g-the-molar-mass-of-h2o-is-180gmol-57997/
<script type="application/ld+json">
{
"#context": "https://schema.org",
"#type": "QAPage",
"mainEntity": {
"#type": "Question",
"name": "3. Given the balanced equation: 2H2(g) +
O2(g) --> 2H2O(l)
How many grams of H2O are formed if 9.00 mol H2(g) reacts
completely with an excess of O2(g)?
The molar mass of H2O is 18.0g/mol.
",
"text": "3. Given the balanced equation: 2H2(g) +
O2(g) --> 2H2O(l)
How many grams of H2O are formed if 9.00 mol H2(g) reacts
completely with an excess of O2(g)?
The molar mass of H2O is 18.0g/mol.
",
"answerCount": 4,
"dateCreated": "Oct. 9, 2021, 6:08 p.m.",
"author": {
"#type": "Person",
"name": "Matthew J."
},
"acceptedAnswer": {
"#type": "Answer",
"upvoteCount": 3,
"text": "In this problem, we have to find a mass of H2 that are formed if nine more of age to relax with an excess of so from balanced equation, we can see that two moles of is to Produced two moles of water, Then nine moles of H two must produce nine moles of water. So now we have most of water. We can easily find the mess MS. Of what, which is equally good number of multi multiply by the molar mass, so it is 1 62 g. So we can say that 1 62 g of H 20 are formed with nine more of age to react with Electricly with an excessive or two.",
"dateCreated": "Oct. 13, 2021, 5:12 p.m.",
"url": "https://www.numerade.com/ask/question/3-given-the-balanced-equation-2h2g-o2g-2h2ol-how-many-grams-of-h2o-are-formed-if-900-mol-h2g-reacts-completely-with-an-excess-of-o2g-the-molar-mass-of-h2o-is-180gmol-57997/",
"author": {
"#type": "Person",
"name": "Taimoor Shabbir"
}
},
"suggestedAnswer": [
{
"#type": "Answer",
"text": "So we're keeping the reaction between hydrogen all section when both off the gas, they ah ah, in the container. So a spark, An initial initiated reaction to your form. Water. So here we have soup on serial fees there from five graham off hydrogen and also syrup on Syria. When a five month off oxygen, what would be the mass of water being produce? So first of all, we have to Bannister can call the Ashram. Um, as you can see, that we have to Ah, hydrogen on the on the left. So we have it for two under, right? And then we have four hydrogen. So on the love we were just put put to you from the hydrogen and then the whole reaction Spartans. Okay, so the next step before we do anything is convert anything. That is not your number. Motion on verbal. So we're still points several feet. That's 75 year bites You where? Sierra 750.1 it if there were 18 eight. Uh, both. Okay, so, um, we have the lumber. I'm also had to found the limiting We agent we had found an emitting region, and then we can just a limbo move in different dimension. We agent to find out the mass off our water. Okay, so then they assume we Ah, I would pick all hydrogen. Assume I have. Ah, them. You have that much hydrogen, You know that? The footy we act so we will. How many oxygen do I need? The militia is 2 to 1 for Ah, hydrogen oxygen. So I just take ah, hydrogen. Um, the more motive I bite you, so I will have Ah, sirrah, point. Seriously. 09 for most of oxygen required to fully re at with. Ah, Hodgins. Okay, So this is the require amounts over here, over here. And there's the actual mouth, so you can see that. Actually, we have a lot off oxygen. So the excess we aging essentially, it's all sitting over here. All right, so let's stab is we're going to Ah, use hydrogen number mostly found that ward number almost again because we have enough also, June. But, uh, I know that the food that we have with all the hydrogen Okay, so the limbo move off water, it would be ah, we can find the from the motorway show and also find from the number of most of hydrogen. So the mother wisher is 2 to 2. Essential is 1 to 1. So it's essentially the same as the number most over here for water. So sue syrup on Syria 18 Ah, for out to the moment of water. Okay, so when we know the number more water we have vowed a mass, we just the more plight, Um, the mass off the mill a massive waters with 18 by the limbo Moe's so so far 018 times 18 And that which you have several 180.338 gram for water. Okay, so, uh, we've already filed the mass of water. We know that excess we agent sausage in. So how about the oxygen? We meaning? All right. So we have that much of all sojourn. 0.185 and then we know that you know that hopefully we have imagined they will consume syrup on sale soon. 94 So we just took our agent Noh Ah, concentration off our region. Know them. Don't move for oxygen. Subtract Ah ah ah! Mom required if we were rehab before their hydrogen and that we should be able to find out is because the syrup on cereal night one most we meaning for oxygen remaining this we meaning. And then we can further, um, convert that back to our master. You corresponding to Syria 0.219 Grandma Oxygen. We may you know, we asked your picture.",
"dateCreated": "Aug. 11, 2021, 12:50 a.m.",
"upvoteCount": 3,
"url": "https://www.numerade.com/questions/a-mixture-of-00375-g-of-hydrogen-and-00185-mol-oxygen-in-a-closed-container-is-sparked-to-initiate-a/",
"author": {
"#type": "Person",
"name": "Stephen Ho"
}
},
{
"#type": "Answer",
"text": "The reaction equation in this question is based on the same reaction equation that we had in the previous question. Now, during this reaction, five moles of hydrogen gas reacts with 0.15 moles of oxygen gas in order to produce a certain amount of water. We need to identify the limiting reactant here and also calculate the number of moles of water that can form during this reaction. For this purpose. We will look at two different situations in order to identify the limiting reactant first and that is, um firstly, we will look at the number of moles of water that can be produced If we start off with five moles of hydrogen gas, and secondly, we will look at the number of moles of water that can be produced when starting off with a 0.15 zero moles of oxygen gas. We will then compare these two situations in order to identify the limiting reactant. So, firstly, In order to determine the number of moles of water that can form when starting off with five miles of hydrogen gas, we need to work with the more ratio of water to hydrogen gas. So for this purpose, we will have a look at the stock geometric coefficients here. For water, it is 24 hydrogen gas, it is too, so that more ratio of water um over hydrogen guests is to over two. We can therefore say That the number of moles of water that can form in this case will be one times the number of moles of hydrogen gas, And this is equal to one times five moles, Which is just equal to five moles. Right now, let's look at the second situation where we start off with 0.15 moles of oxygen. Once again, we need to make use of the mole ratio. So in this case it will be to over one. So the number of moles of water with a number of moles of oxygen will be to over one. This means that the number of moles of water that can form in this case is two times the number of moles off oxygen gas. So this is two times uh 1.50 moles, and this is equal to three moles. Right? So in the first situation, when we started off with five miles off hydrogen gas, We were able to form five moles of water. But in the case of oxygen, we start off with oxygen Um and specifically 1.50 moles of oxygen. Then we can only end up with 3.00 moles of water, which is the least amount produced in the two situations. So because we can only produce a maximum A number of moles of three moles of water, this indicates that oxygen is the limiting reactant here. Oxygen is the limiting reactant. And if we start off with 0.150 miles of oxygen, Then we can produce three moles of water. Right? So to recap in this reaction, we had to identify the limiting reactant first. For this purpose, we compare the number of moles of water that um could form, starting off with the different number of moles off either. Um First of all, we looked at hydrogen gas and then on the other hand, the number of moles of oxygen gas. So in this way we realized that the limiting reactant is oxygen gas because it can only form three moles of water compared to the five mills that can be formed when we start off with the hydrogen gas, is the reactant. Now, if we start off with oxygen 0.15 moles of oxygen gas, then um three miles of water was formed in the end",
"dateCreated": "Aug. 11, 2021, 12:50 a.m.",
"upvoteCount": 3,
"url": "https://www.numerade.com/questions/if-500-mathrmmol-of-hydrogen-gas-and-150-mathrmmol-of-oxygen-gas-react-what-is-the-limiting-reactant/",
"author": {
"#type": "Person",
"name": "Marietjie Lutz"
}
},
{
"#type": "Answer",
"text": "for this problem, we're gonna be working on understanding limiting reactions and using them to solve for products, were given that we have this chemical equation four, NH three plus 502 yields four N O plus six H 20 Were given that we have 2.35 moles of NH three and 2.75 moles of 02 To work with, we need to understand which of these reactions is the limiting reactant and then use that to figure out how much water we're will be produced. It's important to figure out which of these is the limiting reactant, because this reaction will only go so far as that reaction allows. So, to figure out which one is the limiting reactant, we can choose to use either the NH three or the 02 It doesn't matter. I'm gonna go with the NH three. So I'm going to lay out what I have, I have 2.35 moles of NH three. Next thing I'm gonna do is I'm going to look at our ratio by looking at the coefficients in our balanced equation and see that for every four moles of NH three, we're going to also use five moles of 02 So the way to work this out is I'm going to multiply 2.35 by five and then whatever I get from that, I will then divide by four. And when I do that I get 2.94 malls of 02 because our moles of NH three will cancel out. So then I'm going to go look at how much 02 were given. 2.75 Well, that is that is less than 2.94 So, what this means is that we do not have enough 02 to fully react with the NH three that were given. So that means that are limiting reactant is 02 The next thing we're gonna do is we're going to use that limiting reactant to solve for another ratio like this to find out how much water is going to be produced. So we have 2.75 moles of 02 to work with. We're going to set up our ratio again for every five moles of 02 we can create six moles of H 20 I'm going to multiply 2.75 by six and then divide that answer by five to get 3.30 moles of H 20 and that is how much water we can produce.",
"dateCreated": "Aug. 11, 2021, 12:50 a.m.",
"upvoteCount": 3,
"url": "https://www.numerade.com/questions/in-the-following-reaction-235-mathrmmol-of-mathrmnh_3-reacts-with-275-mathrmmol-of-mathrmo_2-how-man/",
"author": {
"#type": "Person",
"name": "Shaelyn Deal"
}
},
{
"#type": "Answer",
"text": "in this question, Methane gas reacts with oxygen in order to form carbon dioxide and water. So this is a combustion reaction and we start off with one mole of methane gas and five moles of oxygen gas. Now we need to determine the limiting reactant here so that we can determine the number of moles of water that perform in the end. In order to determine that limiting reactant, we will compare the number of moles of water that can be formed Firstly, if we start off with one mole of the fungus and secondly, then if we start off with five miles off oxygen gas. So when we compare these two situations, we will be able to identify the limiting reactive. Now, in order to determine the number of malls Can be formed from one mole of methane gas, we make use of the mole ratio. So we know that geometric coefficient of water six and has a documentary coefficient of two. So the mole ratio of water to six, the kids. Therefore the number of moles of water can be formed in this case 6/2", 3 times the number of Malzahn. Yes. Right. And we know we saw it off with um five starting with one mole of methane gas. And therefore this number of northern waters will be $3.1 which is equal to three months. Right. So let's have a look at the second situation where we choose to start off with the oxygen as our reacted. No, once again, you wanted to To calculate the number of moles of water that can be produced. We need to make use of the mole ratio more racial. In this case of water to oxygen is 6-7. So the number of moles of water Over the number of number of moles of oxygen will be 6, 7. So the number of moles of water that can be 46 or seven times oxygen. So it's 6/7 times. We started off with five levels of oxygen, six of the seven times 5 And that is equal to round off to two decimal places. 4.29. Uh huh. So now I need to compare the number of moles that can be formed in these two different situations to firstly, when I started off with one more of methane gas, The reaction um we're able to produce three moles of water. But if I started off with five months of oxygen gas, We actually were able to produce, 4.29 mi of water. So therefore the maximum number of moles that can be produced in this case. Yes, three. And that is from using um it's in gas, which is the limiting reactant. So we started off with a balanced equation and we had to identify the limiting reactant in this case by and we did that by comparing the number of moles of water that could form by starting off in the first place with one more of anything goes. In the second place, five moles of oxygen within. Saw that um The methane gas could not produce more than three miles. Um whereas the oxygen case of the oxygen, Um the reaction produced 4.29 moles. So because we could only produce three moles by using one move methane gas, this is the maximum number of moles that could be produced in this reaction. And therefore we also know that um the limiting reacting to here is a same gas.",
"dateCreated": "Aug. 11, 2021, 12:50 a.m.",
"upvoteCount": 3,
"url": "https://www.numerade.com/questions/if-100-mathrmmol-of-ethane-gas-and-500-mathrmmol-of-oxygen-gas-react-what-is-the-limiting-reactant-a/",
"author": {
"#type": "Person",
"name": "Marietjie Lutz"
}
}
]
}
}
</script>
Basically, I was trying to read broken JSON, as Felix mentioned JSON cannot contain literal line breaks.
Solution: use https://www.npmjs.com/package/jsonrepair module. It detected the bad lines and fixed them, this is likely what google does (some sort of JSON repair).
PS: I tried https://www.npmjs.com/package/json-fixer without success

Relational database trouble in a custom e-commerce web application

First of all, I apologize for the bad English you are about to read...
I'm trying to develop a little e-commerce web application (from scratch - without using platforms like Magento, OpenCart, Shopify...) for a pizza delivery in the city where I live. The restaurant also sells some italian food, like pasta, fish and meat.
I'm stuck in a relational database problem, I will explain what I did in the database. I will write the tables structures followed by some examples records.
Unlike the pastas, the pizza's price varies according to size (an attribute).
The data will be displayed in the following way (please see the picture below):
Showing a pizza example record in the front-end. When the user selects the size, the price will be displayed below and two controls to add or substract the quantity of that product (with that size) will also be displayed.
This is the case of one pizza with one attribute (an attribute that affects the price), because there are some attributes that not affects the price, i.e: the cooking or doneness. Another case is that a product that have more than one attribute that affect the price.
In summary:
Product without attributes, it has an only one price.
Product with only one attribute that affect the price.
Product with two or more attributes that affect the price.
MySQL Tables:
Categories(ID, name):
1, Pizzas
2, Pastas
_
Products(ID, category_id, name, description)
1, 1, Margherita, Lorem ipsum
2, 1, 4 Stagioni, Lorem ipsum
3, 1, Capricciosa, Lorem ipsum
4, 2, Bologna, Lorem ipsum
5, 2, Pesto, Lorem ipsum
_
Attributes (ID, name)
1, Size
2, Cooking
_
Meta_attributes(ID, attribute_id, name)
1, 1, Small
2, 1, Medium
3, 1, Big
4, 2, Blue
5, 2, Medium well
6, 2, Well done
7, 2, Overcooked
_
meta_attributes_values(ID, product_id, meta_attrib_id, value)
1, 1, 1, 12
2, 1, 2, 16
3, 1, 3, 19
4, 2, 1, 14
5, 2, 2, 18
6, 2, 3, 20
_
In this schema, a product can have a value if and only if it has a meta_atrib and in order to have an meta_atrib it must have an attrib. But the pasta is "linear" it no have any attrib, for one pasta product there are only one price.
Questions:
How should be the database to handle all these cases?
What about special cases where one attribute influences another? For example, suppose that the price of a pizza varies according to its size (this is true), but suppose that it also has an attribute called "extra" and that the price of the extra attribute varies according to the size of the pizza, Because being larger will require more. I know that the example is not very clear but I hope I have made the case clear and express myself well.
Thanks for reading!
It's important to note that the schema you've described doesn't represent an actual order, it represents the abstract concept of a pizza. A graph of Products, Attributes, Meta-Attributes, and Values is far more complicated than an itemized order needs to be.
What's really in an order? There are Products, each of which has a base price; and there are, as you note, things which affect the price of a Product. These modifiers come in at least two types:
Additive modifiers tack on a flat sum to the base price. A "medium" pizza costs $4 above the base; a "large" costs $7 more.
Dependent modifiers change the base price after it's adjusted by additive modifiers. The simplest form of dependent modifier is a multiplier: whatever the adjusted price of a pizza is, one with "extra hot peppers" costs 0.10x more than that.
With any luck, that's all you have to deal with. If "extra peppers" instead costs $0.50 on a small, $0.60 on a medium, and $1.00 on a large, you have to track all three and correlate with the size modifier since the addition isn't a consistent function of adjusted price. Treating additive modifiers like size independently -- for example, by having base, medium, and large prices in Product -- may be more effective in that case.
It would be possible to achieve a simpler representation still by treating products and attributes identically and storing them in a single table with a foreign key to itself to represent the parent-child relationship. Effectively, you'd have no Products, only Attributes. And "Margherita" would be an Attribute that adds $12 to an item base price of $0.
But getting back to the concrete, if you need to track Orders with Order_Items too, even a one-row-per-attribute solution is unwieldy since you have a profusion of foreign keys in each line item of the order. In this case, it may be best to store your sub-items (or everything, if you roll it all into one table) in a JSON field, such that your Order_Items table looks like this:
id order_id subtotal attributes
1 1 17.60 [{"name": "Margherita", "adds": 12.00}, {"name": "Medium", "adds": 4}, {"name": "Extra hot peppers", "multiplier": 0.10}]
2 1 12.00 [{"name": "Pesto", "adds": 12.00}]
This is a) denormalized and b) breaks referential integrity. Both of these, in this instance, are good things! If you ever adjust prices or even take something off the menu, you don't want to screw up your bookkeeping or trip a foreign key constraint error.

What is the standard for formatting currency values in JSON?

Bearing in mind various quirks of the data types, and localization, what is the best way for a web service to communicate monetary values to and from applications? Is there a standard somewhere?
My first thought was to simply use the number type. For example
"amount": 1234.56
I have seen many arguments about issues with a lack of precision and rounding errors when using floating point data types for monetary calculations--however, we are just transmitting the value, not calculating, so that shouldn't matter.
EventBrite's JSON currency specifications specify something like this:
{
"currency": "USD",
"value": 432,
"display": "$4.32"
}
Bravo for avoiding floating point values, but now we run into another issue: what's the largest number we can hold?
One comment (I don’t know if it’s true, but seems reasonable) claims that, since number implementations vary in JSON, the best you can expect is a 32-bit signed integer. The largest value a 32-bit signed integer can hold is 2147483647. If we represent values in the minor unit, that’s $21,474,836.47. $21 million seems like a huge number, but it’s not inconceivable that some application may need to work with a value larger than that. The problem gets worse with currencies where 1,000 of the minor unit make a major unit, or where the currency is worth less than the US dollar. For example, a Tunisian Dinar is divided into 1,000 milim. 2147483647 milim, or 2147483.647 TND is $1,124,492.04. It's even more likely values over $1 million may be worked with in some cases. Another example: the subunits of the Vietnamese dong have been rendered useless by inflation, so let’s just use major units. 2147483647 VND is $98,526.55. I’m sure many use cases (bank balances, real estate values, etc.) are substantially higher than that. (EventBrite probably doesn’t have to worry about ticket prices being that high, though!)
If we avoid that problem by communicating the value as a string, how should the string be formatted? Different countries/locales have drastically different formats—different currency symbols, whether the symbol occurs before or after the amount, whether or not there is a space between the symbol and amount, if a comma or period is used to separate the decimal, if commas are used as a thousands separator, parentheses or a minus sign to indicate negative values, and possibly more that I’m not aware of.
Should the app know what locale/currency it's working with, communicate values like
"amount": "1234.56"
back and forth, and trust the app to correctly format the amount? (Also: should the decimal value be avoided, and the value specified in terms of the smallest monetary unit? Or should the major and minor unit be listed in different properties?)
Or should the server provide the raw value and the formatted value?
"amount": "1234.56"
"displayAmount": "$1,234.56"
Or should the server provide the raw value and the currency code, and let the app format it?
"amount": "1234.56"
"currencyCode": "USD"
I assume whichever method is used should be used in both directions, transmitting to and from the server.
I have been unable to find the standard--do you have an answer, or can point me to a resource that defines this? It seems like a common issue.
I don't know if it's the best solution, but what I'm trying now is to just pass values as strings unformatted except for a decimal point, like so:
"amount": "1234.56"
The app could easily parse that (and convert it to double, BigDecimal, int, or whatever method the app developer feels best for floating-point arithmetic). The app would be responsible for formatting the value for display according to locale and currency.
This format could accommodate other currency values, whether highly inflated large numbers, numbers with three digits after the decimal point, numbers with no fractional values at all, etc.
Of course, this would assume the app already knows the locale and currency used (from another call, an app setting, or local device values). If those need to be specified per call, another option would be:
"amount": "1234.56",
"currency": "USD",
"locale": "en_US"
I'm tempted to roll these into one JSON object, but a JSON feed may have multiple amounts for different purposes, and then would only need to specify currency settings once. Of course, if it could vary for each amount listed, then it would be best to encapsulate them together, like so:
{
"amount": "1234.56",
"currency": "USD",
"locale": "en_US"
}
Another debatable approach is for the server to provide the raw amount and the formatted amount. (If so, I would suggest encapsulating it as an object, instead of having multiple properties in a feed that all define the same concept):
{
"displayAmount":"$1,234.56",
"calculationAmount":"1234.56"
}
Here, more of the work is offloaded to the server. It also ensures consistency across different platforms and apps in how the numbers are displayed, while still providing an easily parseable value for conditional testing and the like.
However, it does leave a problem--what if the app needs to perform calculations and then show the results to the user? It will still need to format the number for display. Might as well go with the first example at the top of this answer and give the app control over the formatting.
Those are my thoughts, at least. I've been unable to find any solid best practices or research in this area, so I welcome better solutions or potential pitfalls I haven't pointed out.
AFAIK, there is no "currency" standard in JSON - it is a standard based on rudimentary types. Things you might want to consider is that some currencies do not have a decimal part (Guinean Franc, Indonesian Rupiah) and some can be divided into thousandths (Bahraini Dinar)- hence you don't want to assume two decimal places. For Iranian Real $2million is not going to get you far so I would expect you need to deal with doubles not integers. If you are looking for a general international model then you will need a currency code as countries with hyperinflation often change currencies every year of two to divide the value by 1,000,000 (or 100 mill). Historically Brazil and Iran have both done this, I think.
If you need a reference for currency codes (and a bit of other good information) then take a look here: https://gist.github.com/Fluidbyte/2973986
Amount of money should be represented as string.
The idea of using string is that any client that consumes the json should parse it into decimal type such as BigDecimal to avoid floating point imprecision.
However it would only be meaningful if any part of the system avoids floating point too. Even if the backend is only passing data and not doing any calculation, using floating point would eventually result in what you see (in the program) is not what you get (on the json).
And assuming that the source is a database, it is important to have the data stored with right type. If the data is already stored as floating point then any subsequent conversion or casting would be meaningless as it would technically be passing imprecision around.
ON Dev Portal - API Guidelines - Currencies you may find interesting suggestions :
"price" : {
"amount": 40,
"currency": "EUR"
}
It's a bit harder to produce & format than just a string, but I feel this is the cleanest and meaningful way to achieve it :
uncouple amount and currency
use number JSON type
Here the JSON format suggested:
https://pattern.yaas.io/v2/schema-monetary-amount.json
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"title": "Monetary Amount",
"description":"Schema defining monetary amount in given currency.",
"properties": {
"amount": {
"type": "number",
"description": "The amount in the specified currency"
},
"currency": {
"type": "string",
"pattern": "^[a-zA-Z]{3}$",
"description": "ISO 4217 currency code, e.g.: USD, EUR, CHF"
}
},
"required": [
"amount",
"currency"
]
}
Another questions related to currency format pointed out right or wrongly, that the practice is much more like a string with base units :
{
"price": "40.0"
}
There probably isn't any official standard. We are using the following structure for our products:
"amount": {
"currency": "EUR",
"scale": 2,
"value": 875
}
The example above represents amount €8.75.
Currency is defined as string (and values should correspond to ISO4217), scale and value are integers. The meaning of "scale" is obvious. This structure solves many of the problems with currencies not having fractions, having non-standard fractions etc.

Searching for "nearby" results

I have a dataset, currently just stored in a JSON file, which contains about 40k different geolocations. It looks something like this:
[
{"title": "Place 1", "loc": {"x": "00.000", "y": "00.00000"}},
{"title": "Place 2", "loc": {"x": "00.000", "y": "00.00000"}},
]
where a place's loc is just its coordinates.
I'd like to be able to run queries on this data so, for any given user-inputted loc I can get the n nearest Places.
Or in other words I'd like to write some function f so that this works:
def f(loc, n): ...
f({"x": "5", "y": "5"}, 3) #=> [{"title": "Place 1", "distance": 7.073}, {"title": "Place 2": "distance": 7.073}, {"title": "Place 3", "distance": 7.073}]
if there is a place 1, 2 and 3 all at {x: 0, y: 0}.
I have no idea what the standard way of solving an issue like this is. Using an SQL DB with an index on precomputed distances doesn't work, because the supplied loc is arbitrary. Running through the entire database and calculating distances for everything is far too inefficient, and far too slow. (I need < 30ms response times.)
The only solution that makes sense would be to somehow make "buckets" of close locations (within some r of eachother), and then to computer the distance between the user-given loc and the bucket's loc to narrow down the options first. But I feel like creating such a solution myself would be similar to not using databases at all; there must be a more efficient/industry standard approach. Is there one?
This is a generalized form of the nearest neighbor problem (more formally known as k-nearest neighbor). You're right, the solution that makes sense uses buckets. You could store the buckets in the database which allows you to leverage SQL, just filter out all points not in the appropriate buckets. Depending on your database, this actually may already be implemented for you, which would be the "industry standard" approach you suggested.
Otherwise, writing it yourself is pretty efficient and can be done without deviating too much from the database.
Oracle provides Spatial data capabilities. It has an inbuilt nearest neighbour function SDO_NN which will do the job for you. Only overheard will be putting all the data in db, rest will be taken care of by oracle db.
You can use a database with a point data type and a spatial index like MySQL. You can also use a quadkey or a quadtree. It's subdivide the plane and reduce the dimension. You can download my PHP class Hilbert-curve# phpclasses.org. It's uses a quadkey and can help to organize locations in buckets and build a proximity searching. A quadkey can reduce overlapping searches because of a special database.

mysql best way to manage product sizes

I am developing product database, for sizes i created a separate table PRODUCT_SIZE(id,sizetext)
e.g. (1,'Small') ,(2,'Large'), (3,'Extra Large'),...
I provided These sizes list as checkbox, when a product is added, all possible sizes can be selected against current product.
e.g. for T-Shirt, SMALL, and LARGE sizes selected.
these 2 Sized are available against each new stock purchased entry.
Now i came to know, that there can be different size units, some items can be in inches, some in kg, and some in meters.
I have a altered solution in mind:
to alter table
PRODUCT_SIZE(id,sizetext, UNitType);
Now it can be: (1,'5','KG') ,(2,'10','KG'), (3,'2.5'.'Inches'),...
Is ther any better approch, suggestion?
It seems like you're forcing 'clothing size', 'weight' and 'length' into one 'size' attribute.
Try these tables:
product (product_id, name)
"Nike t-shirt"
attribute_group (attribute_group_id, name)
"Shirt size", "Weight", "Length", etc.
attribute_value (attribute_value_id, attribute_group_id, name)
"Shirt size" would have rows for "Small", "Large", etc.
product_attribute (product_id, attribute_value)
"Nike t-shirt" is "Large"
Add a "display order" to attribute_value, too (so "Small" can be displayed before "Large").
Do this for your other attributes, too.
I've done this for a production site, and I think it worked well.
Good luck.
Instead of making a seperate table for this, why don't you just put all of your dropdown options in an application scoped variable? Then you can just add that data right into a field in product as a string and deal with the different options/units programmatically.
I made a database storing clothes where the sizes were a few types.One article has sizes like xs s m l other is
26 27 28 29 30 and so on.
I decided to do this:
# on one side in the script i define size types and names;
$sizeTypes[1] = [XS, S, M, L];
$sizeTypes[2] = [29, 30, 31, 32];
#The and so on
#and on the other side in the database, there are just two columns
size_type_id(int) | size_qty |
# so if I have one article with 3 pieces of size S 2 pieces of size M and 5 pieces of size L the database will store:
size_type_id| size_qty |
1 |0:0;1:3;2:2;3:5|
then in the script I just translate it so that 0 of type 1 is XS 1 of type 1 is S 2 of type 2 is 31 and so on