Is it possible to do A/B testing by page rather than by individual? - language-agnostic

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc).
Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n)
This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page.
Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

Your intuition is correct. In theory, randomizing by page will work fine. Both treatment groups will have balanced characteristics in expectation.
However, the sample size is quite small so you need to be careful. Simple randomization may create imbalance by chance. The standard solution is to block on pre-treatment characteristics of the shirts. The most important characteristic is your pre-treatment outcome, which I assume is the conversion rate.
There are many ways to create "balanced" randomized designs. For instance, you you could create pairs using optimal matching, and randomize within pairs. A rougher match could be found by ranking pages by their conversion rate in the previous week/month and then creating pairs of neighbors. Or you could combine blocked randomization within Aaron's suggestion: randomize within pairs and then flip the treatment each week.
A second concern, somewhat unrelated, is interaction between treatments. This may be more problematic. It's possible that if a user sees one button on one page and then a different button on a different page, that new button will have a particularly large effect. That is, can you really view treatments as independent? Does the button on one page affect the likelihood of conversion on another? Unfortunately, it probably does, particularly because if you buy a t-shirt on one page, you're probably very unlikely to buy a t-shirt on the other page. I'd worry about this more than the randomization. The standard approach -- randomizing by unique user -- better mimics your final design.
You could always run an experiment to see if you get the same results using these two methods, and then proceed with the simpler one if you do.

You can't.
Lets 50 t-shirts have button A and the remaining 50 have button B. After your test, you realize t-shirts with button A have a better conversion rate.
Now - was the conversion better because of button A, or was it better because the t-shirt designs were really cool and people liked them?
You can't answer that question objectively, so you can't do A/B testing in this manner.

The trouble with your approach is that you're testing two things at the same time.
Say, design x is using button a. Design y is using button b. Design y gets more sales, and more conversions.
Is that because button b gives a better conversion rate than button a, or is that because design y gives a better conversion rate than design x?
If your volume of designs is very high, your volume of users is very low, and your conversions are distributed evenly amongst your designs, I could see your approach being better than the normal fashion - because the risk that the "good" designs clump together and skew your result would be smaller than the risk that the "good" users do. However, in that case you won't have a particularly large sample size of conversions to draw conclusions from - you need a sufficiently high volume of users for AB testing to be worthwhile in the first place.

Instead of changing the sale button for some pages, run all pages with button A for a week and then change to button B for another week. That should give you enough data to see whether the number of sales change significantly between the two buttons.
A week should be short enough that seasonal/weather effect shouldn't apply.

Related

More efficient to have two tables or one table with tons of fields

Related but not quite the same thing:which is more effcient? (or at least reading through it didn't help me any)
So I am working on a new site (selling insurance policies) we already have several sites up (its a rails application) that do this so I have a table in my sql database called policies.
As you can imagine it has lots of columns to support all the different options available.
While working on this new site I realized I needed to keep track of 20+ more options.
My concern is that the policies table is already large, but the columns in it right now are almost all used by every application we have. Whereas if I add these they would only be used for the new site and would leave tons of null cells on all the rest of the policies.
So my question is do I add those to the existing table or create a new table just for the policies sold on that site? Also I believe that if I created a new table I could leave out some of the columns (but not very many) from the main policies table because they are not needed for this application.
"[A]lmost all used" suggests that you could, upon considering it, split it more naturally.
Now, much of the efficiency concern here goes down to three things:
A single table can be scanned through more quickly than joins across several.
Large rows have a memory and disk-space cost in themselves.
If a single table represents something that is really a 1-to-many, then it requires more work on insert, delete or update.
Point 2 only really comes in, should there be a lot of cases where you need one particular subset of the data, and another batch where you need another subset, and maybe just a few where you need them all. If you're using most of the columns in most places, then it doesn't gain you anything. In that case, splitting tables is bad.
Point 1 and 3 argue for and against joining into one big table, respectively.
Before any of that though, let's get back to "almost all". If there are several rows with a batch of null fields, why? Often answering that "why?" reveals that really there's a natural split there, that should be broken off into another table as part of normal normalisation*. Repetition of fields, is an even greater suggestion that this is the case.
Do this first.
To denormalise - whether by splitting what is naturally one table, or joining what is naturally several - is a very particular type of optimisation - it makes some things more efficient at the cost of making other things less efficient, and it introduces possibilities of bugs that don't exist otherwise. I would never say you should never denormalise - I do it myself - but you need to be able to say "I am denormalising table X & Y in this manner, because it will help case C which happens enough and I can live with the extra cost to case D". Then you need to check it actually did help case C significantly and case D insignificantly, along with looking for hidden costs.
One of the reasons for normalising in the first place is it gives good average performance over a wide range of cases. It's the balance you want most of the time. Denormalising from the get-go rather than with a normalised database as a starting point is almost always premature.
*Fun trivia fact: The name "normalization" was in part a take on Richard Nixon's "Vietnamisation" policy meaning there was a running joke in some quarters of adding "-isation" onto just about anything. Were it not for the Whitehouse's reaction to the Tet Offensive, we could be using the gernund "normalising," or something completely different instead.

Database design: Using hundred of fields for little values

I'm planning to develop a PHP Web App, it will mainly be used by registered users(sessions)
While thinking about the DB design, I was contemplating that in order to give the best user experience possible there would be lots of options for the user to activate, deactivate, specify, etc.
For example:
- Options for each layout elements, dialog boxes, dashboard, grid, etc.
- color, size, stay visible, invisible, don't ask again, show everytime, advanced mode, simple mode, etc.
This would get like 100s of fields ranging from simple Yes/No or 1 to N values..., for each user.
So, is it having a field for each of these options the way to go?
or how do those CRMs or CMS or other Web Apps do it to store lots of 1-2 char long values?
Do they group them on Text fields separated by a special char and then "explode" them as an array for runtime usage?
thank you
How about something like this:
CREATE TABLE settings (
user_id INT,
setting_name VARCHAR(255),
setting_value CHAR(2)
)
That way, to store a configuration setting for a user, you can do:
INSERT INTO settings (user_id, setting_name, setting_value),
VALUES (1, "timezone", "+8")
And when you need to query a setting for a particular user, you can do:
SELECT setting_value FROM settings
WHERE user_id = 1 AND setting_name = "timezone"
I would absolutely be inclined to have individual fields for each option. My rule of thumb is that each column holds exactly one piece of data whenever possible. No more, no less. As was mentioned earlier, the ease of maintenance and the ability to add / drop options down the road far outweighs the pain in the arse of setting it up. I would, however, put some thought into how you create the table(s). The idea mentioned earlier was to have a Settings table with 100 columns ( one for each option ) and one row for each user. That would work, to be sure. If it were me I would be inclined to break it down a bit further. You start with a basic User table, of course. That would hold the basics of username, password, userid etc. That way you can use the numeric userid as the key index for your Settings table(s). But after that I would try to break down the settings into smaller tables based on logical usage. For example, if you have 100 options, and 19 of those pertain to how a user views / is viewed / behaves in one specific part of the site, say something like a forum, then break those out into a separate table ie ForumSettings. Maybe there are 12 more that pertain to email preferences, but would not be used in other areas of the site / app. Now you have an EmailSettings table. Doing this would not only reduce the number of columns in your generic Settings table, but it would also make writing queries for specific tasks or areas of the app much easier, speed up the performance a tick, and make maintenance moving forward far less painful. Some may disagree as from a strictly data modeling perspective I'm pretty sure that the one Settings table would be indicated. But from a real world perspective, I have never gone wrong using logical chunks such as this.
From a pure data-model perspective, that would be the clearest design (though awful wide). Some might try to bitmask them into a single field for assumed space reasons, but the logic to encode/decode makes that not worthwhile, in my opinion. Also you lose the ability to index on them.
Another option (I just saw posted) is to hold a separate table with an FK back to the user table. But then you have to iterate over the results to get the value you want to check for.

Best usability practice for accepting long-ish account numbers

A user recently inquired (OK, complained) as to why a 19-digit account number on our web site was broken up into 4 individual text boxes of length [5,5,5,4]. Not being the original designer, I couldn't answer the question, but I'd always it assumed that it was done in order to preserve data quality and possibly to provide a better user experience also.
Other more generic examples include Phone with Area Code (10 consecutive digits versus [3,3,4]) and of course SSN (9 digits versus [3,2,4])
It got me wondering whether there are any known standards out there on the topic? When do you split up your ID#? Specifically with regards to user experience and minimizing data entry errors.
I know there was some research into this, the most I can find at the moment is the Wikipedia article on Short-term memory, specifically chunking. There's also The Magical Number Seven, Plus or Minus Two.
When I'm providing ID's to end users I, personally like to break it up into blocks of 5 which appears to be the same convention the original designer of your system used. I've got no logical reason that I can give you for having picked this number other than it "feels right". Short of being able to spend a lot of money on carrying out a study, "gut instinct" and following contentions from other systems is probably the way to go.
That said, if you can make the UI more usable to the user by:
Automatically moving from the end of one field to the start of another when it's complete
Automatically moving from the start of one field to the prior field and deleting the last character when the user presses delete in an empty field that isn't the first one
OR
Replacing it with one long field that has some form of "input mask" on it (not sure if this is doable in plain HTML, but it may be feasible using one of the UI frameworks) so it appears like "_____ - _____ - _____ - ____" and ends up looking like "1235 - 54321 - 12345 - 1234"
It would almost certainly make them happier!
Don't know about standards, but from a personal point of view:
If there are multiple fields, make sure the cursor moves to the next field once a field is full.
If there's only one field, allow spaces/dashes/whatever to be used in that field because you can filter them out. It's really annoying when sites/programs force you to enter dates in "dd/mm/yyyy" format, for example, meaning the day/month must be padded with zeroes. "23/8/2010" should be acceptable.
You need to consider the wider context of your particular application. There are always pros and cons of any design decision, but their impact changes depending on the situation, so you have to think every time.
Splitting the long number into several fields makes it easier to read, especially if you choose to divide the number the same way as most of your users. You can also often validate the input as soon as the user goes to the next field, so you indicate errors earlier.
On the other hand, users rarely type long numbers like that nowadays: most of the time they just copy-paste them from whatever note-keeping solution they have chosen, in whatever format they have it there. That means that a single field, without any limit on lenght or allowed characters suddenly makes a lot of sense -- you can filter the characters out anyways (just make sure you display the final form of the number to the user at some point). There are also issues with moving the focus between fields, with browsers remembering previous values (you just have to select one number, not 4 parts of the same number then), etc.
In general, I would say that as browsers slowly become more and more usable, you should take advantage of the mechanisms they provide by using the stock solutions, and not inventing complex solutions on your own. You may be a step before them today, but in two years the browsers will catch up and your site will suck.

Top k problem - finding usage for my academic work

Top k problem - searching BEST k (3 or 1000) elements in DB
There is fundamental problem with relational DB, that to find top k elems, there is a need to process ALL rows in table. Which make it useless on big data.
I'm making application (for university research, not really my invention, I'm implementing and trying to improve original idea) that allows you to effectively find top k elements by visiting only 3-5% of stored data. Which make it really fast.
There are even user preferences, so on some domain, you can specify function that specify best value for user and aggregation function that specify most significant attributes.
For example DB of cars: attributes:(price, mileage, age of car, ccm, fuel/mile, type of car...) and user values for example 10*price + 5*fuel/mile + 4*mileage + age of car, (s)he doesn't care about type of car and other. - this is aggregation specification
Then for each attribute (price, mileage, ...), there can be totally different "value-function" that specifies best value for user. So for example (price: lower, the better, then value go down, up to $50k, where value is 0 (user don't want car more expensive than 50k). Mileage: other function based on his/hers criteria, ans so on...
You can see that there is quite freedom to specify your preferences and acording to it, best k elements in DB will be found quickly.
I've spent many sleepless night thinking about real-life usability. Who can benefit from that query db? But I failed to whomp up anything and sticking to only academic write-only stance. :-( I hope there can be some real usage for that, but I don't see any....
.... do YOU have any idea how to use that in real-life, real problem, etc...
I'd love to hear from You.
Have a database of people's CVs and establish hiring criteria for different jobs, allowing for a dynamic display of the top k candidates.
Also, considering the fast nature of your solution, you can think of exploiting it in rendering near real-time graphs of highly dynamic data, like stock market quotes or even applications in molecular or DNA-related studies.
New idea: perhaps your research might have applications in clustering, where you would use it to implement a fast k - Nearest Neighbor clustering by complex criteria without having to scan the whole data set each time. This would lead to faster clustering of larger data sets in respect with more complex criteria in picking the K-NN for each data node.
There are unlimited possible real-use scenarios. Getting the top-n values is used all the time.
But I highly doubt that it's possible to get top-n objects without having an index. An index can only be built if the properties that will be searched are known ahead of searching. And if that's the case, a simple index in a relational database is able to provide the same functionality.
It's used in financial organizations all the time, you need to see the most profitable assets / least profitable, etc.

How should I start designing an AI algorithm for an artillery warfare game?

Here's the background... in my free time I'm designing an artillery warfare game called Staker (inspired by the old BASIC games Tank Wars and Scorched Earth) and I'm programming it in MATLAB. Your first thought might be "Why MATLAB? There are plenty of other languages/software packages that are better for game design." And you would be right. However, I'm a dork and I'm interested in learning the nuts and bolts of how you would design a game from the ground up, so I don't necessarily want to use anything with prefab modules. Also, I've used MATLAB for years and I like the challenge of doing things with it that others haven't really tried to do.
Now to the problem at hand: I want to incorporate AI so that the player can go up against the computer. I've only just started thinking about how to design the algorithm to choose an azimuth angle, elevation angle, and projectile velocity to hit a target, and then adjust them each turn. I feel like maybe I've been overthinking the problem and trying to make the AI too complex at the outset, so I thought I'd pause and ask the community here for ideas about how they would design an algorithm.
Some specific questions:
Are there specific references for AI design that you would suggest I check out?
Would you design the AI players to vary in difficulty in a continuous manner (a difficulty of 0 (easy) to 1 (hard), all still using the same general algorithm) or would you design specific algorithms for a discrete number of AI players (like an easy enemy that fires in random directions or a hard enemy that is able to account for the effects of wind)?
What sorts of mathematical algorithms (pseudocode description) would you start with?
Some additional info: the model I use to simulate projectile motion incorporates fluid drag and the effect of wind. The "fluid" can be air or water. In air, the air density (and thus effect of drag) varies with height above the ground based on some simple atmospheric models. In water, the drag is so great that the projectile usually requires additional thrust. In other words, the projectile can be affected by forces other than just gravity.
In a real artillery situation all these factors would be handled either with formulas or simply brute-force simulation: Fire an electronic shell, apply all relevant forces and see where it lands. Adjust and try again until the electronic shell hits the target. Now you have your numbers to send to the gun.
Given the complexity of the situation I doubt there is any answer better than the brute-force one. While you could precalculate a table of expected drag effects vs velocity I can't see it being worthwhile.
Of course a game where the AI dropped the first shell on your head every time wouldn't be interesting. Once you know the correct values you'll have to make the AI a lousy shot. Apply a random factor to the shot and then walk to towards the target--move it say 30+random(140)% towards the true target each time it shoots.
Edit:
I do agree with BCS's notion of improving it as time goes on. I said that but then changed my mind on how to write a bunch of it and then ended up forgetting to put it back in. The tougher it's supposed to be the smaller the random component should be.
Loren's brute force solution is appealing as because it would allow easy "Intelligence adjustments" by adding more iterations. Also the adjustment factors for the iteration could be part of the intelligence as some value will make it converge faster.
Also for the basic system (no drag, wind, etc) there is a closed form solution that can be derived from a basic physics text. I would make the first guess be that and then do one or more iteration per turn. You might want to try and come up with an empirical correction correlation to improve the first shot (something that will make the first shot distributions average be closer to correct)
Thanks Loren and BCS, I think you've hit upon an idea I was considering (which prompted question #2 above). The pseudocode for an AIs turn would look something like this:
nSims; % A variable storing the numbers of projectile simulations
% done per turn for the AI (i.e. difficulty)
prevParams; % A variable storing the previous shot parameters
prevResults; % A variable storing some measure of accuracy of the last shot
newParams = get_new_guess(prevParams,prevResults);
loop for nSims times,
newResults = simulate_projectile_flight(newParams);
newParams = get_new_guess(newParams,newResults);
end
fire_projectile(newParams);
In this case, the variable nSims is essentially a measure of "intelligence" for the AI. A "dumb" AI would have nSims=0, and would simply make a new guess each turn (based on results of the previous turn). A "smart" AI would refine its guess nSims times per turn by simulating the projectile flight.
Two more questions spring from this:
1) What goes into the function get_new_guess? How should I adjust the three shot parameters to minimize the distance to the target? For example, if a shot falls short of the target, you can try to get it closer by adjusting the elevation angle only, adjusting the projectile velocity only, or adjusting both of them together.
2) Should get_new_guess be the same for all AIs, with the nSims value being the only determiner of "intelligence"? Or should get_new_guess be dependent on another "intelligence" parameter (like guessAccuracy)?
A difference between artillery games and real artillery situations is that all sides have 100% information, and that there are typically more than 2 opponents.
As a result, your evaluation function should consider which opponent it would be more urgent to try and eliminate. For example, if I have an easy kill at 90%, but a 50% chance on someone who's trying to kill me and just missed two shots near me, it's more important to deal with that chance.
I think you would need some way of evaluating the risk everyone poses to you in terms of ammunition, location, activity, past history, etc.
I'm now addressing the response you posted:
While you have the general idea I don't believe your approach will be workable--it's going to converge way too fast even for a low value of nSims. I doubt you want more than one iteration of get_new_guess between shells and it very well might need some randomizing beyond that.
Even if you can use multiple iterations they wouldn't be good at making a continuously increasing difficulty as they will be big steps. It seems to me that difficulty must be handled by randomness.
First, get_initial_guess:
To start out I would have a table that divides the world up into zones--the higher the difficulty the more zones. The borders between these zones would have precalculated power for 45, 60 & 75 degrees. Do a test plot, if a shell smacks terrain try again at a higher angle--if 75 hits terrain use it anyway.
The initial shell should be fired at a random power between the values given for the low and high bounds.
Now, for get_new_guess:
Did the shell hit terrain? Increase the angle. I think there will be a constant ratio of how much power needs to be increased to maintain the same distance--you'll need to run tests on this.
Assuming it didn't smack a mountain, note if it's short or long. This gives you a bound. The new guess is somewhere between the two bounds (if you're missing a bound, use the value from the table in get_initial_guess in it's place.)
Note what percentage of the way between the low and high bound impact points the target is and choose a power that far between the low and high bound power.
This is probably far too accurate and will likely require some randomizing. I've changed my mind about adding a simple random %. Rather, multiple random numbers should be used to get a bell curve.
Another thought: Are we dealing with a system where only one shell is active at once? Long ago I implemented an artillery game where you had 5 barrels, each with a fixed reload time that was above the maximum possible flight time.
With that I found myself using a strategy of firing shells spread across the range between my current low bound and high bound. It's possible that being a mere human I wasn't using an optimal strategy, though--this was realtime, getting a round off as soon as the barrel was ready was more important than ensuring it was aimed as well as possible as it would converge quite fast, anyway. I would generally put a shell on target on the second salvo and the third would generally all be hits. (A kill required killing ALL pixels in the target.)
In an AI situation I would model both this and a strategy of holding back some of the barrels to fire more accurate rounds later. I would still fire a spread across the target range, the only question is whether I would use all barrels or not.
I have personally created such a system - for the web-game Zwok, using brute force. I fired lots of shots in random directions and recorded the best result. I wouldn't recommend doing it any other way as the difference between timesteps etc will give you unexpected results.