I think this is trivial, but I'm still a SSRS newbie...
Anyway, from my data set I'm able to Group and visualize data like this:
Place |Score
=================
place1|score1
place2|score2
etc. |etc.
Every score value is a sum aggregation of values. Now, I'd like to display all these values like this:
Evalutation | # of places<br>
===============================<br>
Bad | # of places with less than 40% "good" scores <br>
Good | # of places with more than 40%, less than 70% "good" scores <br>
Excellent | # of places with more than 70% "good" scores <br>
A good score is defined by two calculated attributes being not empty. I'm really stuck here. Any suggestion?
Thank you!
Related
So this is the code for removing partial duplicates within the same column of a data, however, i guess because of the process of matching each row with other rows, the code takes a lot of time to run on a dataset of even 2000 rows. Is there any way to reduce the run time?
Here's the code-
from fuzzywuzzy import fuzz,process
rows = ["I have your Body Wash and I wonder if it contains animal ingredients. Also, which animal ingredients? I prefer not to use product with animal ingredients.","This also doesn't have the ADA on there. Is this a fake toothpaste an imitation of yours?","I have your Body Wash and I wonder if it contains animal ingredients. I prefer not to use product with animal ingredients.","I didn't see the ADA stamp on this box. I just want to make sure it was still safe to use?","Hello, I was just wondering if the new toothpaste is ADA approved? It doesn’t say on the packaging","Hello, I was just wondering if the new toothpaste is ADA approved? It doesn’t say on the box."]
clean = []
threshold = 80 # this is arbitrary
for row in rows:
# score each sentence against each other sentence
# [('string', score),..]
scores = process.extract(row, rows, scorer=fuzz.token_set_ratio)
# basic idea is if there is a close second match we want to evaluate
# and keep the longer of the two
if scores[1][1] > threshold:
clean.append(max([x[0] for x in scores[:2]],key=len))
else:
clean.append(scores[0][0])
# remove dupes
clean = set(clean)
I have a table like below and I want to return the name of the item with the greatest effect of a particular type. For example, I want the name of the ring with the best 'Shield' enchantment, in this case 'Brusef Amelion's Ring'.
Description
Apparel slot
Effect Type
Effect Value
Apron of Adroitness
Chest
Fortify Agility
5 pts
Brusef Amelion's Ring
Ring
Shield
18%
Cuirass of the Herald
Chest
Fortify Health
15 pts
Fortify Magicka Pants
Legs
Fortify Magicka
20 pts
Grand ring of Aegis
Ring
Shield
6%
I've tried using a MAXIFS statement:
=maxifs(Effect Value, Apparel Slot, "=Ring", Effect Type, "=Shield")
and that returns 0.18, as I'd expect. But I want it to return the name of this item, 'Brusef Amelion's Ring'. So I then tried using a vlookup on this value, but there doesn't seem to be the option to only lookup a value if ('Apparel Slot'='Ring' && 'Effect Type'='Shield'), for example.
I feel like there must be a way of nesting some specific functions here, but I can't quite figure it out.
Is there any way to do this while avoiding manually sorting and filtering my data before running each query?
Is this what you are looking for?
=DGET(A1:D6,"Description",{"Apparel slot","Effect Value";"Ring",MAXIFS(D2:D6,B2:B6,"Ring",C2:C6,"Shield")})
DGET function needs column headers to work with as you can see. So we want the value from the column Description. "Originally" this is a DGET function:
DGET(A2:F20,"price",{"Ticker";"Google"})
This is saying: Find price where ticker = google.
We can enhance this a bit by entering two criteria's this is done , or \ separated (depends on your country settings) like this:
DGET(A2:F20,"price",{"Ticker","Year;"Google",2020})
DGET(A2:F20;"price";{"Ticker"\"Year;"Google"\2020})
Find price where ticker = google and year = 2020.
Ofcource you can replace "Google" with a cell reference. Hope this helps.
I have a 3d cylinder chart that I am having some problems with. I want to effectively sort the cylinders with the highest value at the back and the lowest value at the front. Otherwise the tallest valuest cover the smallest values.
I have tried sorting both a-z and z-a but I really need it to be dynamic based on the values. I have also tried sorting the values by the actual value field. both a-z and z-a but this seems to return completely random results.
the data in the database (example) looks like. I use a parameter to separate by supplier.
Date catgeory_Type cost supplier
01/01/2013 apple $5 abc
01/01/2013 pear $10 def
01/01/2013 bannana $15 cgi
01/02/2013 apple $7 etc
01/02/2013 pear $12 etc
01/02/2013 banana $18 etc
I believe I need some form of expression that sorts the values based on cost. as both a-z and z-a in the instance would provide cylinders that blocked other cylinders.
I have tried sorting the series group by :=Sum(Fields!cost.Value, "DataSet1") and =Fields!cost.Value but this seems to return random results.
I would be happy even if I could achieve a custom sort such as sort by "bannana, pear, apple" although for some "suppliers" this would still cause me an issue.
edit 1: strangely enough this works with a line chart but not a 3d cylinder
edit 2: example
attached is an example. I want the tallest cylinders at the back. but methods mentioned above do not work
In chart area properties -> 3D-options , Enable,
series clustering
Choose this option to cluster series groups. When multiple series for
bar or column charts are clustered, they are displayed along two
distinct rows in the chart area. If series are not clustered, their
corresponding data points are displayed adjacent to each other in one
row. This option is applicable only to bar and column charts.
Also try changing the Rotation & Inclination degrees, to get a better look.
Decrease wall thickness also.
I have a MySQL database I'm searching through. Lets say this is a database of people. When querying for a specific record, it is possible to find a match 100% on each attribute. But querying the database to find closest match on probability (closest matches on table attributes) is more of the strategy.
In this scenario, does it make sense to create a temporary table (much like a tally-sheet) to indicate what attributes match/what attributes are present? What is the typical approach to doing advanced searches on database like this?
Example (below) of a hypothetical stored Procedure
*parameters are just to exemplify how I would search. I'm not concerned how to perform my selects. Question is about approach, strategy, technique *
call FindPerson ("Brown Eyes", "Brown hair", "Height:6'1", "white", "Name:Joe" ,"weight180", "Age 34" "sex m");
RESULT TABLE
NAME AGE HEIGHT WEIGHT HAIR SKIN sex RANK_MATCH
Joe 32 6'1 180 Brown white m 1
Mike 33 6'1 179 Brown white m 2
James 31 6'0 179 Brown black m 3
Just out of my mind. You can create your own score and sort by it. Something like
SELECT `id`,
(IF(`age`=32,1,0)+IF(`height`="6'1",1,0)+...) as `score`
FROM `people`
HAVING `score` > 0
ORDER BY `score` DESC
LIMIT 10;
With this, you can handle every field with its own comparison, and also weight the individual attributes by not just add 1 but 2 or more.
But I'm quiet not sure, how performant this is.
The approach I would use would be to create a scoring function (your stored proc) that would evaluate the given input's standard distance from the mean.
In the proc, you would judge each criteria in a fashion similar to:
INPUT AGE: 32
calculate MEAN of AGE WHERE (sex = m): 34.5
calculate STANDARD DEVIATION of AGE WHERE (sex = m): 2.5
calculate how many STDEVs 32 is from the 34.5 (also known as z-score): 1
Repeat this process for all numeric datatypes, summing them and ORDER BY the sum.
In doing so, the following schema change would be required: height changed from foot/inch form to strictly inches.
Depending on your needs, you may also consider coming up with an arbitrary scale for sex and skin color/hair color. Of course, you may think that measures like these should NOT be factored in because of how drastically it would change the scoring function. If you chose to, you'd have to find some number that would be added to the above SUM...but it's hard because nominative variables don't translate easily into these kinds of things.
If you find that haircolor/skin color is able to be usefully transferred into say, the continous color spectrum, your scoring tidbit would be the same...color value of input vs color value of means and standard deviations.
The query that would find your matches would be something to the effect of:
SELECT
ABS(INPUT_AGE - AVG(AGE)) / STD(AGE) AS age_z,
ABS(INPUT_WT - AVG(WT)) / STD(WT) AS wt_z,
...
(age_z + wt_z + ...) AS score
FROM `table`
ORDER BY score ASC
This is for a new feature on http://cssfingerprint.com (see /about for general info).
The feature looks up the sites you've visited in a database of site demographics, and tries to guess what your demographic stats are based on that.
All my demgraphics are in 0..1 probability format, not ratios or absolute numbers or the like.
Essentially, you have a large number of data points that each tend you towards their own demographics. However, just taking the average is poor, because it means that by adding in a lot of generic data, the number goes down.
For example, suppose you've visited sites S0..S50. All except S0 are 48% female; S0 is 100% male. If I'm guessing your gender, I want to have a value close to 100%, not just the 49% that a straight average would give.
Also, consider that most demographics (i.e. everything other than gender) does not have the average at 50%. For example, the average probability of having kids 0-17 is ~37%. The more a given site's demographics are different from this average (e.g. maybe it's a site for parents, or for child-free people), the more it should count in my guess of your status.
What's the best way to calculate this?
For extra credit: what's the best way to calculate this, that is also cheap & easy to do in mysql?
ETA: I think that something approximating what I want is Φ(AVG(z-score ^ 2, sign preserved)). But I'm not sure if this is a good weighting function.
(Φ is the standard normal distribution function - http://en.wikipedia.org/wiki/Standard_normal_distribution#Definition)
A good framework for these kinds of calculations is Bayesian inference. You have a prior distribution of the demographics - eg 50% male, 37% childless, etc. Preferrably, you would have it multivariately: 10% male childless 0-17 Caucasian ..., but you can start with one-at-a-time.
After this prior each site contributes new information about the likelihood of a demographic category, and you get the posterior estimate which informs your final guess. Using some independence assumptions the updating formula is as follows:
posterior odds = (prior odds) * (site likelihood ratio),
where odds = p/(1-p) and the likelihood ratio is a multiplier modifying the odds after visiting the site. There are various formulas for it, but in this case I would just use the above formula for the general population and the site's population to calculate it.
For example, for a site that has 35% of its visitors in the "under 20" agegroup, which represents 20% of the population, the site likelihood ratio would be
LR = (0.35/0.65) / (0.2/0.8) = 2.154
so visiting this site would raise the odds of being "under 20" 2.154-fold.
A site that is 100% male would have an infinite LR, but you would probably want to limit it somewhat by, say, using only 99.9% male. A site that is 50% male would have an LR of 1, so it would not contribute any information on gender distribution.
Suppose you start knowing nothing about a person - his or her odds of being "under 20" are 0.2/0.8 = 0.25. Suppose the first site has an LR=2.154 for this outcome - now the odds of being "under 20" becomes 0.25*(2.154) = 0.538 (corresponding to the probability of 35%). If the second site has the same LR, the posterior odds become 1.16, which is already 54%, etc. (probability = odds/(1+odds)). At the end you would pick the category with the highest posterior probability.
There are loads of caveats with these calculations - for example, the assumption of independence likely being wrong, but it can provide a good start.
The naive Bayesian formula for you case looks like this:
SELECT probability
FROM (
SELECT #apriori := CAST(#apriori * ratio / (#apriori * ratio + (1 - #apriori) * (1 - ratio)) AS DECIMAL(30, 30)) AS probability,
#step := #step + 1 AS step
FROM (
SELECT #apriori := 0.5,
#step := 0
) vars,
(
SELECT 0.99 AS ratio
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
) q
) q2
ORDER BY
step DESC
LIMIT 1
Quick 'n' dirty: get a male score by multiplying the male probabilities, and a female score by multiplying the female probabilities. Predict the larger. (Actually, don't multiply; sum the log of each probability instead.) I think this is a maximum likelihood estimator if you make the right (highly unrealistic) assumptions.
The standard formula for calculating the weighted mean is given in this question and this question
I think you could look into these approaches and then work out how you calculate your weights.
In your gender example above you could adopt something along the lines of a set of weights {1, ..., 0 , ..., 1} which is a linear decrease from 0 to 1 for gender values of 0% male to 50% and then a corresponding increase up to 100%. If you want the effect to be skewed in favour of the outlying values then you easily come up with a exponential or trigonometric function that provides a different set of weights. If you wanted to then a normal distribution curve will also do the trick.