Formatting my csv file for creating boxplots with python - csv

I have a probably quite simple problem as I'm a beginner.
I want to create boxplots in python using seaborn, but I'm having difficulties setting up my csv file and my code to get the result I want.
I have samples that have been treated with 7 different treatments in triplicates (in total 21 columns)
For each sample I have measured the content of 43 different compounds (43 rows)
--> My goal is to create one graph for each of the compounds (43 graphs) showing seven boxplots (7 treatments) made up of the triplicate measurements, respectively.
This is what my data looks like for better understanding:
Compounds, Treatment A (Replicate 1), Treatment A (Replicate 2), Treatment A (Replicate 3), Treatment B (Replicate 1), ..., Treatment G (Replicate 3)
Compound 1
Compound 2
...
Compound 43
I would be very thankful for help! :-)
Yours,
Natalie

Related

LC-3 algorithm for converting ASCII strings to Binary Values

Figure 10.4 provides an algorithm for converting ASCII strings to binary values. Suppose the decimal number is arbitrarily long. Rather than store a table of 10 values for the thousands-place digit, another table for the 10 ten-thousands-place digit, and so on, design an algorithm to do the conversion without resorting to any tables whatsoever.
I have attached pictures of figure 10.4. I am not looking for an answer to the problem, but rather can someone please explain this problem and perhaps give some direction on how to go about creating the algorithm?
Figure 10.4
Figure 10.4 second image
I am unsure as to what it means by tables and do not know where to start really.
The tables are those global, initialized arrays: one called Lookup10 holding 10, 20, 30, 40, ..., and another called Lookup100 holding 100, 200, 300, 400...
You can ignore the tables: as per the assignment instructions you're supposed to find a different way to accomplish this anyway.  Or, you can run that code in simulator or mentally to understand how it works.
The bottom line is that LC-3, while it can do anything (it is turning complete), it can't do much in any one instruction.  For arithmetic & logic, it can do add, not, and.  That's pretty much it!  But that's enough — let's note that modern hardware does everything with only one logic gate, namely NAND, which is a binary operator (so NAND directly available; NOT by providing NAND with the same operand for both inputs; AND by doing NOT after NAND; OR using NOT on both inputs first and then NAND; etc..)
For example, LC-3 cannot multiply or divide or modulus or right shift directly — each of those operations is many instructions and in the general case, some looping construct.  Multiplication can be done by repetitive addition, and division/modulus by repetitive subtraction.  These are super inefficient for larger operands, and there are much more efficient algorithms that are also substantially more complex, so those greatly increase program complexity beyond that already with the repetitive operation approach.
That subroutine goes backwards through the use input string.  It takes a string length count in R1 as parameter supplied by caller (not shown).  It looks at the last character in the input and converts it from an ASCII character to a binary number.
(We would commonly do that conversion from ascii character to numeric value using subtraction: moving the character values from the ascii character range of 0x30..0x39 to numeric values in the range 0..9, but they do it with masking, which also works.  The subtraction approach integrates better with error detection (checking if not a valid digit character, which is not done here), whereas the masking approach is simpler for LC-3.)
The subroutine then obtains the 2nd last digit (moving backwards through the user's input string), converting that to binary using the mask approach.  That yields a number between 0 and 9, which is used as an index into the first table Lookup10.  The value obtained from the table at that index position is basically the index × 10.  So this table is a × 10 table.  The same approach is used for the third (and first or, last-going-backwards) digit, except it uses the 2nd table which is a × 100 table.
The standard approach for string to binary is called atoi (search it) standing for ascii to integer.  It moves forwards through the string, and for every new digit, it multiples the existing value, computed so far, by 10 before adding in the next digit's numeric value.
So, if the string is 456, the first it obtains 4, then because there is another digit, 4 × 10 = 40, then + 5 for 45, then × 10 for 450, then + 6 for 456, and so on.
The advantage of this approach is that it can handle any number of digits (up to overflow).  The disadvantage, of course, is that it requires multiplication, which is a complication for LC-3.
Multiplication where one operand is the constant 10 is fairly easy even in LC-3's limited capabilities, and can be done with simple addition without looping.  Basically:
n × 10 = n + n + n + n + n + n + n + n + n + n
and LC-3 can do those 9 additions in just 9 instructions.  Still, we can also observe that:
n × 10 = n × 8 + n × 2
and also that:
n × 10 = (n × 4 + n) × 2     (which is n × 5 × 2)
which can be done in just 4 instructions on LC-3 (and none of these needs looping)!
So, if you want to do this approach, you'll have to figure out how to go forwards through the string instead of backwards as the given table version does, and, how to multiply by 10 (use any one of the above suggestions).
There are other approaches as well if you study atoi.  You could keep the backwards approach, but now will have to multiply by 10, by 100, by 1000, a different factor for each successive digit .  That might be done by repetitive addition.  Or a count of how many times to multiply by 10 — e.g. n × 1000 = n × 10 × 10 × 10.

Non Scaled SSRS Line Chart with mulitple series

I am trying to present time series of multiple sensors on a single SSRS (v14) line chart
I need to plot N series, with each independently plotting the series data in the space provided by the chart (independent vertical axis)
More about the data
There can be anywhere from ~1-10 series
The challenge is that they are different orders of magnitude.
One might be degrees F (~0-212)
One might be Carbon ppm (~1-16)
One might be Ftlbs Thrust (~10k-100k)
the point is , they have no relation and can be very different
The exact value is not important. I can hide the vertical axis
More about what I am trying to do
The idea is to show the multiple time series, plotted together against time for the 4 hours before and after
'an event'. Its not the necessarily the exact value that is important. the subject matter expert would be looking for something odd (temperature falls, thrust spikes, etc).
Things I have tried
If there were just 2 series, i could easily use the 2nd axis available in the SSRS chart. Thats exactly the idea I am chasing. But in this case, I want N series to plot using its own axis.
I have tried stacking N transparent graphs on top of each other. This would be a really ugly solution, but SSRS even wont let you do it. It unstacks them for you.
I have experimented with the Allow Scale Breaks property on the Vert Axis. This would solve the problem but we don't like the 'double jagged line'
Turning on Logarithmic scale is a possibility. It does do a better job of displaying all the data. but its not really what we want. Its going to change the shape of data that ranges over a couple orders of magnitude.
I tried the sparkline component and am having the same problem.
This approach is essentially the same a Greg's answer above. I've had to do this same process in the past comparing trends of data even though the units were dissimilar.
I took a very simple approach of adding an additional column to the query that showed each value as a percentage of the maximum value in each series.
As an example (just 2 series here for clarity) I started with data like this in myTable
Series Month myValue
A Jan 4
A Feb 8
A Mar 16
B Jan 200
B Feb 300
B Mar 400
My Dataset query would be something like.
SELECT *, myValue / MAX(myValue) OVER(PARTITION BY Series) as myPlotValue FROM myTable
This gives us a final dataset which looks liek this.
Series Month myValue myPlotValue
A Jan 4 0.25
A Feb 8 0.5
A Mar 16 1
B Jan 200 0.5
B Feb 300 0.75
B Mar 400 1
As you can see all plot values are now between 0 and 1.
I created that charts using the myPlotValue field and had the option of using the original values from the myValue field as datapoint labels.
After talking to some math people, this is a standard problem and it is solved by a process called normalization of the data.
Essentially you are changing all the series to fit in a given range (usually 0-1)
You can scale and add an offset if that makes sense for your problem domain somehow.
https://www.statisticshowto.datasciencecentral.com/normalized/

Weka MLP multiple outputs

I'm using the multiplayer perceptron on Weka to classify data. Classification output should be a unique binary vector associated with certain input, e.g., 1, 1, -1, 1, -1, -1, 1. Output vector is 31-element long while input is 39-element vector of real numbers. That is, the output cannot be represented by one column in the CSV file. Rather, I should have 31 columns for output values (class) beside the 39 columns of the input. I know how to use Weka when I have one-column classes, but with such vector output I have a problem. I must have it like that because I need to compare it with MLP ANN in Matlab that hase 31 outputs in the output layer. Therefore, I cannot assign an abstract symbol for each unique combination in order to have one coloumn in my CSV. Your help is highly appreciated. Thanks in advance and have a nice day

Real world examples of a kind of "sorted" data

Consider a sorted list of numbers which is "cut," so that it is increasing except for one jump. For instance the order might be,
11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
What kinds of data naturally have this representation, with one or possibly many "cuts" obscuring the default ordering? The only one I can think of is a deck of cards, but I was asked to produce examples of data that might look like this in an interview. Weeks later, and I still can't think of any, but my curiosity prevails.
Is there a special name for this kind of data? I tried googling "cut data" but that obviously didn't work.
All insight is appreciated.
[Edit] From the discussions below this appears to have some interesting relationships with symmetry groups, and what sorts of rearrangements are possible with just the cut operation. I may have to ask my local mathematicians what I can do with this.
I can think of a few off the top of my head.
The first is the hour of the day as it rolls into a new day: ... 22 23 0 1 2 ....
The second is the alpha ordering on file names: pax1 pax10 pax11 ... pax19 pax2 pax20 ....
Yet another is the months of the financial year (in Australia, most companies close off their financial year at the end of June): 7 8 9 10 11 12 1 2 3 4 5 6.
After a quick analysis, it's obvious to see that any sequence of "cuts" results in a single cut with respect to a different index. In fact, it is only the most recent cut point that matters, as that value will end up at the front of the list, and it will be equivalent to a cut of this data from the original index of that element.
So not so interesting.

Binary to standard digit?

I'm going to make a computer in Minecraft. I understand how to build a computer where it can make binary operations but I want the outputs to be displayed as standard integer numbers. How you "convert" the binaries into standard digits? Is there any chart for that? And the digits will be shown like in old calculators; with 7 lines.
--
| |
--
| |
--
In electronics, what you need is called a "binary to binary coded decimal" converter. "Binary coded decimal" is the set of bits needed to produce a number on a 7 segment display. Here's a PDF describing how one of these chips works. Page 3 of the PDF shows the truth table needed to do the conversion as well as a picture of all of the NAND gates that implement it in hardware. You can use the truth table to build the set of boolean expressions needed in your program.
0 = 0
1 = 1
10 = 2
11 = 3
100 = 4
101 = 5
110 = 6
111 = 7
...
Do you see the pattern? Here's the formula:
number = 2^0 * (rightmost digit)
+ 2^1 * (rightmost-but-1 digit
+ 2^2 * (rightmost-but-2 digit) + ...
Maybe what you are looking for is called BCD or Binary Coded Decimal. There is a chart and a karnaugh map for it that has been used for decades. a quick Google search for it gave me this technical page
http://circuitscan.homestead.com/files/digelec/bcdto7seg.htm
How are you trying to build the computer?
Maybe that key word can at least help you find what you need. :)
Your problem has two parts:
Convert a binary number into digits, that is do a binary to BCD conversion.
Convert a digit into a set of segments to activate.
For the latter you can use a table that assigns the bitmap of active segments to each digit.
I think's that's two different questions.
There isn't a "binary string of 0/1" to integer conversion built in - you would normally just write your own to loop over the string and detect each power of 2.
YOu can also write your own 7segment LED display - it's a little tricky because it's on multiple lines, but would be an interesting excersize.
Alternatively most GUIs have an LCD font,Qt certainly does