Entering imperial length measurements on a web form - language-agnostic

I live in Australia and have not dealt much with imperial length measurements.
How would you go about getting users to enter imperial length measurements on a web form, with a precision of 1/64th of an inch? I have thought of several ways to do it, but I don't know if there is a standard way of doing it, that users in the building industry in the USA would be used to.
Option 1
One big text box that users would then type in 5'11" 63/64 (5 feet, 11.984375 inches), which I would then parse, and store in a DB in millimeters.
Option 2
One text box for feet, one text box for inches, and one drop down box with [1/64,1/32,...,63/64] for the fraction part of inches, which would then be stored in the DB as millimeters.
Option 3
Something else...

I'd go for option 2 for the following reasons.
it simplifies your coding (and this is a big one for me, being intrinsically lazy).
it gives you greater control over what is entered.
it makes it clear to the user what they're allowed to enter.
How you store it in the database is a matter for you to decide. Millimeters can't describe 64ths of an inch perfectly (at least as an integer), but that may not be an issue for you.
My own thoughts are that you should store things as accurately as you ask the user to enter and if, as you say, you really need accuracy to 1/64th of an inch, losing that accuracy when you put them in the database is not going to be good when you have to explain it to your users.
You could do one of:
store them as 64ths of an inch (so that 5'11" 63/64 becomes 5x12x64 + 11x64 + 63, or 1007).
store them as a float (mm or inches, up to you) but convert them back to imperial when displaying them.
store them as three separate integer fields, feet, inches and 64ths of an inch.
My own preference there would be the first one. It results in a simple integer field which can be indexed far better than separate fields, there is no loss of accuracy and, provided you convert all application measurements to 64ths of an inch before comparing them with (and inserting them in, of course) the database, you should have no problems.
This conversion is a simple process (unlike the metric-imperial ones).

I would go for a text box for feet, and a text box for inches. Allow the user to put fractional inches separated by a space. Be sure to provide an example of usage near the input area. For example:
[5___] feet, [11 3/4_] inches
Example: [6] feet, [1 1/4] inches
This seems like a good compromise between natural input and parseability, so you don't have to worry about all the possible methods that people use to delineate feet and inches (ft, ', f, -, etc.), but people can still easily enter whatever unit they please.

I'm upvoting Paul Fisher's answer but I would like to offer my own anyway, partly because I feel I have more than 600 characters' worth of commentary to add. ;)
I like that his answer seemed to put at least as much emphasis on making things easy for the user as it did on making things easy for the programmer. I believe we should put the users first to the greatest extent that we can. Ideally, you'd make a mock-up of various solutions and ask actual building industry users (who deal with imperial units) to try them out and give you feedback, but for now you're asking us, not actual users, so...
I am also leaning toward the two-field UI. Personally, I would make the parser handle both decimal and fractional input (allowing either '11.25' or '11 1/4'). You'll have to figure out what to do when they pick something that's not a binary fraction, like 5/7 (when they probably meant 5/8, for example).
As for storage, I guess you could go with millimeters, as long as it's a float. I'd be more comfortable storing inches, because if your values are all multiples of 64ths of an inch, you shouldn't have rounding issues. I agree with Pax that it's important to be able to give it back to the user in imperial units, complete with reduced fractions. (Maybe it's a big assumption I'm making about the building industry. For all I know, they normalize everything to 64ths, and are happier to see '11 16/64' than '11 1/4'!)

I don't know if this qualifies as an "answer", but if I were to do it, I would try to get by with just offering a single field in feet or inches, depending on the expected range.
But that might not be good enough, in which case one field for feet and another for inches should be sufficient. If someone needs to get precise, they can type in the decimals in the inches.

Related

Mixing text and numeric features for text classification using deep learning

I have a problem about classification of text into several categories (topics). Apart from text, I have some numeric features that I believe may be useful (there are also missing values among those features). But the most important information is, of course, presented in the text. Therefore, I think deep learning approach (with a common pipeline: embedding layer + CNN or RNN with dropout + Dense layer) would be the best choice. What is the best practice to mix the current model that works only on text input with numeric features? Are there any tricks, best common practices, state-of-the-art research going on in this field? Are there any papers/experiments (on GitHub, maybe) on this topic?
It'd be great if we could think of the problem in general, but for the sake of having an idea of what sort of problem we may solve, I will give a specific example. Let's suppose we have reviews from users in which they describe a problem they faced while receiving a service or purchasing an item. The target feature is multi-label: the set of tags (categories/topics) associated with the complaint that a user had (we should choose relevant ones among a few hundreds of possible topics).
Then apart from the user's comment itself (which is the most important feature), we may want also to take into account some numerical features like price, waiting time, rating (customer satisfaction score), etc. This can potentially be useful for predicting some particular categories.
The idea is to mix all these features somehow in a deep learning model to produce the final model. Not sure if I know much about the best ways how to do it. What are the best practices / useful tricks for this kinds of problems?
For each numeric feature, statistically have a representation (you can use pandas.DataFrame.describe), also plotting the distribution would visually make you stronger.
After having the values of mean, std, max, min etc. You should get rid ofoutliers which can harm your training model. For example, if your features have its 90% of its numeric values from 18 to 72 but has also values like 1.1 or 1200 etc. you should get rid of those by equalizing them to 18 or 72 depending on the side. You can use np.clip()
After having a reasonable distribution, you should convert those numeric features to categorical features. For instance, numeric distribution from 18 to 72 can be grouped as 18, 27, 36, ......, 72, taking the intervals. You can increase the resolution or decrease it, depending on your understanding and the performance of the algorithm. You can use np.digitize() or do manually by a simple function that you can write.
In the end you have a categorical feature just like the texts. CNN or RNN can work fine with categorical representations of the numeric values as well as you get the better advantage to have feature crosses to increase your performance.
But if you ask for something of more complex, I might not have understood your question or I may not know it. Still, if you want to ask more or differently, I will be happy to try to help.

Storing a percentage in Rails + MySQL

I need to use a percentage in my Rails app. In any view, including when it is entered by the user, the format will need to be the hundreds format, 100.000. When it's used in calculations, it needs to be represented in the hundredths format, 1.00000.
My migration (I'm adding the column to an existing table) has the following line:
add_column :worker, :cash_split, :decimal, :precision => 6, :scale => 5
So, as of right now, I'm storing it in the hundredths (1.00000) format. My basis for choosing to store it in this format is that i figure it will mean cleaner business logic (i.e. no worker.cash_split / 100.0.to_d code hanging around) when i need to do multiplication.
My only other thought was maybe abusing the composed_of method. I could store the data in the hundreds (100.000) format as cash_split and then make an attribute accessor cash_split_percentage that returns cash_split in its 1.0000 format counterpart.
Your first thought is the right one...don't overthink it.
You should definitely store percentage numbers in the database in hundredths format. And use that format in all of your Ruby calculations.
Percentage figures are a display convention. Eg the number 0.45 is displayed as 45%. As such, use a View helper to convert your percentage figures from their internal format (decimal numbers) to your chosen display format--a string which includes the % sign.
It depends.
First off, I don't think there is a right way or a wrong way. It's your app and your code, so you can do what you want, but you should do what makes the most sense for your circumstances.
As #BishmaStornelli commented in #LarryK's answer,
How would you handle percentages in forms? Users will want to enter it like 45% but it should be stored as 0.45. Nevertheless, if the user inputs another field wrong and the form is re-rendered, the percentage field sould have 45 and not 0.45. With this I want to say that a callback may not be the final solution.
You're damned if you do and you're damned if you don't. You either clutter up your calculation code with divisions by 100, or you clutter up your Model and Views with converting from a decimal to a percentage and back again.
I suppose the answer depends on which is more heavy in your application.
If you are conducting lots of calculations based on this percentage then storing it as a decimal would seem like the best approach that will provide the least amount of code and the clearest, cleanest code to view and maintain.
However, if you are not conducting lots of calculations based on this percentage (maybe only a couple) then it may make more sense to not have to write a bunch of code in the Model and Views to display the decimal as a nice percentage, and just divide by 100 when you need to perform a calculation.
In our particularly case, and the reason I ended up here, we want the User to enter the value as a nice percentage, like 75%, in the form. And we always want to display this value in Views, Reports, etc. as a nice clean percentage, like 75%. And we only need to perform a calculation with this value a couple of times.
So, it makes sense in our case to store the value as a percentage in the database. It makes saving and viewing much easier, and only incurs a "divide by 100" penalty in the couple of spots we perform a calculation on it.
Hopefully, that helps others and provides a different viewpoint to the already well-written and accepted answer.
Thanks #BishmaStornelli for the alternative perspective!

Algorithm for online approximation of a slowly-changing, real valued function

I'm tackling an interesting machine learning problem and would love to hear if anyone knows a good algorithm to deal with the following:
The algorithm must learn to approximate a function of N inputs and M outputs
N is quite large, e.g. 1,000-10,000
M is quite small, e.g. 5-10
All inputs and outputs are floating point values, could be positive or negative, likely to be relatively small in absolute value but no absolute guarantees on bounds
Each time period I get N inputs and need to predict the M outputs, at the end of the time period the actual values for the M outputs are provided (i.e. this is a supervised learning situation where learning needs to take place online)
The underlying function is non-linear, but not too nasty (e.g. I expect it will be smooth and continuous over most of the input space)
There will be a small amount of noise in the function, but signal/noise is likely to be good - I expect the N inputs will expain 95%+ of the output values
The underlying function is slowly changing over time - unlikely to change drastically in a single time period but is likely to shift slightly over the 1000s of time periods range
There is no hidden state to worry about (other than the changing function), i.e. all the information required is in the N inputs
I'm currently thinking some kind of back-propagation neural network with lots of hidden nodes might work - but is that really the best approach for this situation and will it handle the changing function?
With your number of inputs and outputs, I'd also go for a neural network, it should do a good approximation. The slight change is good for a back-propagation technique, it should not have to 'de-learn' stuff.
I think stochastic gradient descent (http://en.wikipedia.org/wiki/Stochastic_gradient_descent) would be a straight forward first step, it will probably work nicely given the operating conditions you have.
I'd also go for an ANN. Single layer might do fine since your input space is large. You might wanna give it a shot before adding a lot of hidden layers.
#mikera What is it going to be used for? Is it an assignment in a ML course?

Best usability practice for accepting long-ish account numbers

A user recently inquired (OK, complained) as to why a 19-digit account number on our web site was broken up into 4 individual text boxes of length [5,5,5,4]. Not being the original designer, I couldn't answer the question, but I'd always it assumed that it was done in order to preserve data quality and possibly to provide a better user experience also.
Other more generic examples include Phone with Area Code (10 consecutive digits versus [3,3,4]) and of course SSN (9 digits versus [3,2,4])
It got me wondering whether there are any known standards out there on the topic? When do you split up your ID#? Specifically with regards to user experience and minimizing data entry errors.
I know there was some research into this, the most I can find at the moment is the Wikipedia article on Short-term memory, specifically chunking. There's also The Magical Number Seven, Plus or Minus Two.
When I'm providing ID's to end users I, personally like to break it up into blocks of 5 which appears to be the same convention the original designer of your system used. I've got no logical reason that I can give you for having picked this number other than it "feels right". Short of being able to spend a lot of money on carrying out a study, "gut instinct" and following contentions from other systems is probably the way to go.
That said, if you can make the UI more usable to the user by:
Automatically moving from the end of one field to the start of another when it's complete
Automatically moving from the start of one field to the prior field and deleting the last character when the user presses delete in an empty field that isn't the first one
OR
Replacing it with one long field that has some form of "input mask" on it (not sure if this is doable in plain HTML, but it may be feasible using one of the UI frameworks) so it appears like "_____ - _____ - _____ - ____" and ends up looking like "1235 - 54321 - 12345 - 1234"
It would almost certainly make them happier!
Don't know about standards, but from a personal point of view:
If there are multiple fields, make sure the cursor moves to the next field once a field is full.
If there's only one field, allow spaces/dashes/whatever to be used in that field because you can filter them out. It's really annoying when sites/programs force you to enter dates in "dd/mm/yyyy" format, for example, meaning the day/month must be padded with zeroes. "23/8/2010" should be acceptable.
You need to consider the wider context of your particular application. There are always pros and cons of any design decision, but their impact changes depending on the situation, so you have to think every time.
Splitting the long number into several fields makes it easier to read, especially if you choose to divide the number the same way as most of your users. You can also often validate the input as soon as the user goes to the next field, so you indicate errors earlier.
On the other hand, users rarely type long numbers like that nowadays: most of the time they just copy-paste them from whatever note-keeping solution they have chosen, in whatever format they have it there. That means that a single field, without any limit on lenght or allowed characters suddenly makes a lot of sense -- you can filter the characters out anyways (just make sure you display the final form of the number to the user at some point). There are also issues with moving the focus between fields, with browsers remembering previous values (you just have to select one number, not 4 parts of the same number then), etc.
In general, I would say that as browsers slowly become more and more usable, you should take advantage of the mechanisms they provide by using the stock solutions, and not inventing complex solutions on your own. You may be a step before them today, but in two years the browsers will catch up and your site will suck.

Is it possible to do A/B testing by page rather than by individual?

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc).
Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n)
This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page.
Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?
Your intuition is correct. In theory, randomizing by page will work fine. Both treatment groups will have balanced characteristics in expectation.
However, the sample size is quite small so you need to be careful. Simple randomization may create imbalance by chance. The standard solution is to block on pre-treatment characteristics of the shirts. The most important characteristic is your pre-treatment outcome, which I assume is the conversion rate.
There are many ways to create "balanced" randomized designs. For instance, you you could create pairs using optimal matching, and randomize within pairs. A rougher match could be found by ranking pages by their conversion rate in the previous week/month and then creating pairs of neighbors. Or you could combine blocked randomization within Aaron's suggestion: randomize within pairs and then flip the treatment each week.
A second concern, somewhat unrelated, is interaction between treatments. This may be more problematic. It's possible that if a user sees one button on one page and then a different button on a different page, that new button will have a particularly large effect. That is, can you really view treatments as independent? Does the button on one page affect the likelihood of conversion on another? Unfortunately, it probably does, particularly because if you buy a t-shirt on one page, you're probably very unlikely to buy a t-shirt on the other page. I'd worry about this more than the randomization. The standard approach -- randomizing by unique user -- better mimics your final design.
You could always run an experiment to see if you get the same results using these two methods, and then proceed with the simpler one if you do.
You can't.
Lets 50 t-shirts have button A and the remaining 50 have button B. After your test, you realize t-shirts with button A have a better conversion rate.
Now - was the conversion better because of button A, or was it better because the t-shirt designs were really cool and people liked them?
You can't answer that question objectively, so you can't do A/B testing in this manner.
The trouble with your approach is that you're testing two things at the same time.
Say, design x is using button a. Design y is using button b. Design y gets more sales, and more conversions.
Is that because button b gives a better conversion rate than button a, or is that because design y gives a better conversion rate than design x?
If your volume of designs is very high, your volume of users is very low, and your conversions are distributed evenly amongst your designs, I could see your approach being better than the normal fashion - because the risk that the "good" designs clump together and skew your result would be smaller than the risk that the "good" users do. However, in that case you won't have a particularly large sample size of conversions to draw conclusions from - you need a sufficiently high volume of users for AB testing to be worthwhile in the first place.
Instead of changing the sale button for some pages, run all pages with button A for a week and then change to button B for another week. That should give you enough data to see whether the number of sales change significantly between the two buttons.
A week should be short enough that seasonal/weather effect shouldn't apply.