Tableau Log Function Incorrect - heatmap

Here is sample dataviz
Heatmap of linear order quantity (region vs quantity)
Created calculated field logarithmic = int(log([Order Quantity])) and later on logarithmic = int(log([Order Quantity],10))
Heatmap where size is based on logarithmic.
Size doesn't change and number is incorrect, please guide.

tl;dr Sum the order quantities before taking the logarithm.
int(log(SUM[Order Quantity]))
Otherwise you are taking the logarithm of each individual Order Item, and then adding the logarithms. The aggregation function, sum() in your case, is specified when you place the field on the shelf unless you make it explicit in the calculated field.
Here are a couple of ways to use the log field, dual or triple encoding the log by size, color and shape. A custom legend works better with multiple encoded symbols than the default legends.

Related

Negative coefficient for dummy variable regression analysis

I am interpreting multiple regression analysis with the dependent variable being rate of return (ROR) on a stock. Also, I have included a dummy-variable that represents the stock getting included and excluded out of an index (1 = included, 0 = excluded).
The output showed a negative coefficient-value (the value is -0,03852), and I´m not quite sure how to make of this. Does this mean that being included in an index has a negative effect on the dependent variable?
Could someone please provide me an "easy" explanation?
Thanks!
It means when the stock is included (value of 1), you expect the rate of return to decrease by 0.03852. More or less, i guess you can call it a negative effect. What you should check is whether this coefficient is significant or what is the standard error of this coefficient. Usually regression tools should provide that.

SSRS Format to display as percent

I've gone through quite a few examples on here and I apologize if I'm asking a repeat question, as far as I can tell, I am not.
I have an SSRS report made that shows gross sales for certain aspects of our sales departments. They are broken down, in row, by "cost, gross profit, gross profit %, order count, total sales." The columns are the aspects of our sales. Web sales, phone sales, etc....
In the tablix I can format a text box to display the results as numbers, but as you can see, I have also Percentage and Count in there. I don't know how to format those within the context of the original text box format. So I know I have everything that shows under there as a number already, but how do I handle getting the percentage to show as a percentage and the count to show as a count?
For example, all the percentages currently show as, "$0.35" and various other numbers that follow that form. The count's currently appear as currency too.
I've used an example I found on here, "=Iif ( Me.Value = Floor ( Me.Value ) , "0%" , "0.00%" )," but all that did was make everything that showed up in that column, "0.00%" I am fairly new to SSRS and have been cramming consistently for the past two weeks, but I just cannot find help on this. Thank you in advance for anything you can offer.
Update: =IIF(Fields!LVS_Web.Value=0.00, "0%", format(Fields!LVS_Web.Value, "P"))
That worked... to a degree, but now everything is a percent.... thinking ELSE here but I don't know how ELSE goes in, I've not once seen the word ELSE.
Update 2: The thing that I've noticed is that in the statement, where it says, "=0.00, "0%"," that doesn't even really apply. I've just put that there because I'm new to this and I just needed an argument involved. I took the 0% and changed it to N under the condition that the number was < .99, hopeing I would just catch all of the decimals that fell below the value of 1. Like, "$.23", which later became 23.45%, so I COULD do that, but what I don't udnerstand is it made everything else, "N," instead of a number. Why is that? It doesn't make everything else, "P?"
I'm losing my damned mind.
There is also the fact that this is information being pulled from a stored procedure, I don't really know too much about those quite yet, I get assigned simple tasks ever so often as a stepping stool for learning. I don't really know what the query was, but I couldn't edit it if I wanted to. This can be done with expression formatting but my expression is too broad, but I get mixed results using Greater or Less than, and it's probably not the wisest thing to use since these numbers are not set in stone. My day is almost done, I've made very very little progress, but I had a good lunch. So success.
So I provided my own answer for this problem, and it works. Thanks me. Thanks to all the tried to help me and did help as well. I appreciate the effort strangers will put out for each other.
I've had a new problem develop, I need to display a time relative to the data being pulled. I can put NOW in there and get today's date, but if someone is pulling information from FEB, they may be a little off-put by the current date. I'll probably get this figured out soon, but if anyone can help in the meantime, I would appreciate it.
A standard principle is to separate data from display, so use the Value property to store the data in its native data type and use the Format property to display it how you want. So rather than use an expression formatting the Value property such as =Format(Fields.SomeField.Value, "0.00%") leave the Value as =Fields!SomeField.Value and set the Format property to P2.
This is especially important when exporting your report to Excel because if you have the right data type for your data it will export to Excel as the right data type. If you use the Format function it will export as text, making sorting and formula not work properly.
The easiest thing to do to control the formatting is use the standard numeric formats. Click on the cell or range of cells that you want to have a certain format and set the Format property. They are a format specifier letter followed by an optional digit for precision (number of decimal places). Some useful ones are:
C Currency with 2 decimal places (by default)
N4 Number with 4 decimal places
P0 Percentage with no decimal places
Click on the link above for the full list. Format the number cells as numbers and the percents as percents - you don't need to try to make one format string fit every cell.
These standard numeric formats also respect regional settings. You should set your report's Language property to =User!Language to use the user's regional settings rather than the report server's.
If the number is already * 100 eg. 9.5 should be shown as 9.5% then use the format:
0.00\%
9.5 -> 9.5%
0.34 -> 0.34%
This way you can use the standard number formatting and just add the % to the end. The \ escapes the %, preventing the *100 in formatting (which would make 9.5 show 950%.).
=iif(Fields!Metric.Value = "Gross Profit %",
Format(Fields!LVS_Web.Value,"P"),
iif(Fields!Metric.Value = "Order Count",
Format(Fields!LVS_Web.Value,"G4"),
Format(Fields!LVS_Web.Value,"C")))
This is what saved me and did what I wanted. There is another error, but it's my bosses fault, so now I get to laugh at him. Thanks everyone.
Source:
https://technet.microsoft.com/en-us/library/bb630415(v=sql.100).aspx
This is simple to use,
Percent of (the sum of line item totals for the current scope)/(the sum of line item totals for the dataset).
This value is formatted using FormatPercent specifying one decimal place.
="Percentage contributing to all sales: " & FormatPercent(Sum(Field!LineTotal.Value)/Sum(Field!LineTotal.Value,"Sales"),1)

understanding getByteTimeDomainData and getByteFrequencyData in web audio

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I'm looking at with the returned arrays I'm getting from each method.
For getByteTimeDomainData, what time period is covered with each pass? I believe most oscopes cover a 32 millisecond span for each pass. Is that what is covered here as well? For the actual element values themselves, the range seems to be 0 - 255. Is this equivalent to -1 - +1 volts?
For getByteFrequencyData the frequencies covered is based on the sampling rate, so each index is an actual frequency, but what about the actual element values themselves? Is there a dB range that is equivalent to the values returned in the returned array?
getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.
The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)
If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).
Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio
experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox browser; it does not work in my Chrome browser).
TimeDomain/getFloatTimeDomainData
TimeDomain functions are over some span of time.
We often visualize TimeDomain data using oscilloscopes.
In other words:
we visualize TimeDomain data with a line chart,
where the x-axis (aka the "original domain") is time
and the y axis is a measure of a signal (aka the "amplitude").
Change the voice-change-o-matic "visualizer setting" to Sinewave to
see getFloatTimeDomainData(...)
Frequency/getFloatFrequencyData
Frequency functions (GetByteFrequencyData) are at a point in time; i.e. right now; "the current frequency data"
We sometimes see these in mp3 players/ "winamp bargraph style" music players (aka "equalizer" visualizations).
In other words:
we visualize Frequency data with a bar graph
where the x-axis (aka "domain") are frequencies or frequency bands
and the y-axis is the strength of each frequency band
Change the voice-change-o-matic "visualizer setting" to Frequency bars to see getFloatFrequencyData(...)
Fourier Transform (aka Fast Fourier Transform/FFT)
Another way to think about "time domain vs frequency" is shown the diagram below, from Fast Fourier Transform wikipedia
getFloatTimeDomainData gives you the chart on on the top (x-axis is Time)
getFloatFrequencyData gives you the chart on the bottom (x-axis is Frequency)
a Fast Fourier Transform (FFT) converts the Time Domain data into Frequency data, in other words, FFT converts the first chart to the second chart.
cwilso has it backwards.
the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).
fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.
A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.
So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)
I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

roundings with Access

With Microsoft Access 2010, I have two Single fields:
A = 1.1
B = 2.1
I create a query where I have defined C=A*B
Microsoft Access says that C = 2.30999994277954
but, in reality, C =2.31
How can I get the right result (2.31)?
Slightly off results from operations performed on decimal values can happen if your numeric field size is single or double rather than decimal. Single and double (or floating point) numbers are very close approximations of the "true" numbers, but should not be relied upon if accuracy in operations is required. A related stackoverflow question has more information about this issue: Access comparing floating-point numbers "incorrectly"
If it's possible to modify the underlying table's design, you should change the field size property for the "A" and "B" fields from single to decimal. After changing the field size BUT BEFORE saving the table, you will also need to adjust the Scale property for "A" and "B" from 0 to whatever number of places to the right of the decimal point you might require. You will likely still have a notice about losing data, but if you adjust the field properties correctly before saving the table, this shouldn't be a problem. You should probably make a copy of the table before doing this so that you can verify that there was no data loss. After saving your table and verifying the changes did not result in data loss, your query should represent A * B accurately.

Function to dampen a value

I have a list of documents each having a relevance score for a search query. I need older documents to have their relevance score dampened, to try to introduce their date in the ranking process. I already tried fiddling with functions such as 1/(1+date_difference), but the reciprocal function is too discriminating for close recent dates.
I was thinking maybe a mathematical function with range (0..1) and domain(0..x) to amplify their score, where the x-axis is the age of a document. It's best to explain what I further need from the function by an image:
Decaying behavior is often modeled well by an exponentional function (many decaying processes in nature also follow it). You would use 2 positive parameters A and B and get
y(x) = A exp(-B x)
Since you want a y-range [0,1] set A=1. Larger B give slower decays.
If a simple 1/(1+x) decreases too quickly too soon, a sigmoid function like 1/(1+e^-x) or the error function might be better suited to your purpose. Let the current date be somewhere in the negative numbers for such a function, and you can get a value that is current for some configurable time and then decreases towards a base value.
log((x+1)-age_of_document)
Where the base of the logarithm is (x+1). Note the x is as per your diagram and is the "threshold". If the age of the document is greater than x the score goes negative. Multiply by the maximum possible score to introduce scaling.
E.g. Domain = (0,10) with a maximum score of 10: 10*(log(11-x))/log(11)
A bit late, but as thiton says, you might want to use a sigmoid function instead, since it has a "floor" value for your long tail data points. E.g.:
0.8/(1+5^(x-3)) + 0.2 - You can adjust the constants 5 and 3 to control the slope of the curve. The 0.2 is where the floor will be.