UCUM representation of liter - units-of-measurement

I'm confused about how the UCUM defines the symbol for "liter". Yes I'm aware that historically the symbol l has been used, and that more recently L has been added by standards bodies (see e.g. BIPM SI 8th Edition) as an alternative to l. But I thought the UCUM was supposed to give us a single, unambiguous set of symbols for interchange.
But reading UCUM more closely, I see that it provides both case-sensitive and case-insensitive versions of symbols. Moreover I see that, for case-sensitivity, "liter" is defined twice, once with a case-sensitive symbol of l and another with a case-sensitive symbol of L. So the way I interpret this is that, if you're in a case-sensitive environment, there are actually two liter symbols, l and L, and they both mean the same thing (effectively making the symbol case-insensitive---sheesh!).
So if I'm interpreting this correctly, it means that if a program supports UCUM, and even if it does so in a case-sensitive manner, _a UCUM program must always interpret l and L as synonyms, including derived units such as ml/mL. Is that a correct interpretation? Is UCUM forcing us to do equivalency lookups for certain symbols?
Interpreting the UCUM correctly has direct consequences in its implementations. Take for example JSR 363 (see UCUM UnitFormat for JSR 363 ), which superseded JSR 275 (which purported to support UCUM but never did), which has had UCUM support pulled and moved to Eclipse UOMo; read the horrible story. So I'm stuck with a the JSR 363 Reference Implementation, which serializes milliliter as ml. So when UOMo finally adds UCUM support for JSR 363, will it recognize the m" serialization from JSR 363 RI and "mL" from UCUM as interchangeable?
Werner Keil tells me (in the previously mentioned thread) that the UCUM considers l and L to be "two different distinct units". But ucum.js considers them to be the same units.
So here is my specific question: I'm using the JSR 363 RI to serialize units, which produces l for liters. I would prefer a UCUM implementation, but that's all that's available right now. If I use this implementation, it will produce loads of data using l instead of L. When my code is finally upgraded to a UCUM implementation, will it consider the currently serialized l data to be equivalent to L, or will it consider the data to be distinct from data using L units? What does the UCUM specification say should happen?
Let me ask this another way: let's say I'm going to write my own UCUM implementation of JSR 363. If my UnitFormat.parse(CharSequence csq) parses l and L, should the resulting unitLowercaseL.equals(unitUppercaseL) return true or false according to the UCUM specification and JSR 363?

Related

ECMAScript 2017, 5 Notational Conventions: What are productions, terminal and nonterminal symbols? [duplicate]

Can someone explain to me what a context free grammar is? After looking at the Wikipedia entry and then the Wikipedia entry on formal grammar, I am left utterly and totally befuddled. Would someone be so kind as to explain what these things are?
I am wondering this because I wish to investigate parsing, and also on the side, the limitation of a regex engine.
I'm not sure if these terms are directly programming related, or if they are related more to linguistics in general. If that is the case, I apologize, perhaps this could be moved if so?
A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.
A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols... very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.
Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:
S -> BBB
B -> 0
B -> 1
The way to interpret this is to say that S can be replaced by BBB, and B can be replaced by 0, and B can be replaced by 1. So to construct the string 010 we can do S -> BBB -> 0BB -> 01B -> 010.
A context-free grammar is simply a grammar where the thing that you're replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can't appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like
S -> AB
AB -> 1
A -> AA
B -> 0
Are not context free since they contain rules like AB -> 1 that have more than one non-terminal symbol on the left.
Language Theory is related to Theory of Computation. Which is the more philosophical side of Computer Science, about deciding which programs are possible, or which will ever be possible to write, and what type of problems is it impossible to write an algorithm to solve.
A regular expression is a way of describing a regular language. A regular language is a language which can be decided by a deterministic finite automaton.
You should read the article on Finite State Machines: http://en.wikipedia.org/wiki/Finite_state_machine
And Regular languages:
http://en.wikipedia.org/wiki/Regular_language
All Regular Languages are Context Free Languages, but there are Context Free Languages that are not regular. A Context Free Language is the set of all strings accept by a Context Free Grammer or a Pushdown Automata which is a Finite State Machine with a single stack: http://en.wikipedia.org/wiki/Pushdown_automaton#PDA_and_Context-free_Languages
There are more complicated languages that require a Turing Machine (Any possible program you can write on your computer) to decide if a string is in the language or not.
Language theory is also very related to the P vs. NP problem, and some other interesting stuff.
My Introduction to Computer Science third year text book was pretty good at explaining this stuff: Introduction to the Theory of Computation. By Michael Sipser. But, it cost me like $160 to buy it new and it's not very big. Maybe you can find a used copy or find a copy at a library or something it might help you.
EDIT:
The limitations of Regular Expressions and higher language classes have been researched a ton over the past 50 years or so. You might be interested in the pumping lemma for regular languages. It is a means of proving that a certain language is not regular:
http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages
If a language isn't regular it may be Context Free, which means it could be described by a Context Free Grammer, or it may be even in a higher language class, you could prove it's not Context Free by the pumping lemma for Context Free languages which is similar to the one for regular expressions.
A language can even be undecidable, which means even a Turing machine (may program your computer can run) can't be programmed to decide if a string should be accepted as in the language or rejected.
I think the part you're most interested in is the Finite State Machines (Both Deterministic and Deterministic) to see what languages a Regular Expression can decide, and the pumping lemma to prove which languages are not regular.
Basically a language isn't regular if it needs some sort of memory or ability to count. The language of matching parenthesis is not regular for example because the machine needs to remember if it has opened a parenthesis to know if it has to close one.
The language of all strings using the letters a and b that contain at least three b's is a regular language: abababa
The language of all strings using the letters a and b that contain more b's than a's is not regular.
Also you should not that all finite language are regular, for example:
The language of all strings less than 50 characters long using the letters a and b that contain more b's than a's is regular, since it is finite we know it could be described as (b|abb|bab|bba|aabbb|ababb|...) ect until all the possible combinations are listed.

Is HTML Turing Complete?

After reading this question Is CSS Turing complete? -- which received a few thoughtful, succinct answers -- it made me wonder: Is HTML Turing Complete?
Although the short answer is a definitive Yes or No, please also provide a short description or counter-example to prove whether HTML is or is not Turing Complete (obviously it cannot be both). Information on other versions of HTML may be interesting, but the correct answer should answer this for HTML5.
By itself (without CSS or JS), HTML (5 or otherwise) cannot possibly be Turing-complete because it is not a machine. Asking whether it is or not is essentially equivalent to asking whether an apple or an orange is Turing complete, or to take a more relevant example, a book.
HTML is not something that "runs". It is a representation. It is a format. It is an information encoding. Not being a machine, it cannot compute anything on its own, at the level of Turing completeness or any other level.
It seems clear to me that states and transitions can be represented in HTML with pages and hyperlinks, respectively. With this, one can implement deterministic finite automata where clicking links transitions between states. For example, I implemented a few simple DFA which are accessible here.
DFA are much simpler that the Turing Machine though. To implement something closer to a TM, an additional mechanism involving reading and writing to memory would be necessary, besides the basic states/transitions functionality. However, HTML does not seem to have this kind of feature. So I would say HTML is not Turing-complete, but is able to simulate DFA.
Edit1: I was reminded of the video On The Turing Completeness of PowerPoint when writing this answer.
Edit2: complementing this answer with the DFA definition and clarification.
Edit3: it might be worth mentioning that any machine in the real world is a finite-state machine due to reality's constraint of finite memory. So in a way, DFA can actually do anything that any real machine can do, as far as I know. See: https://en.wikipedia.org/wiki/Turing_machine#Comparison_with_real_machines
Definition
From https://en.wikipedia.org/wiki/Deterministic_finite_automaton#Formal_definition
In the theory of computation, a branch of theoretical computer
science, a deterministic finite automaton (DFA)—also known as
deterministic finite acceptor (DFA), deterministic finite-state
machine (DFSM), or deterministic finite-state automaton (DFSA)—is a
finite-state machine that accepts or rejects a given string of
symbols, by running through a state sequence uniquely determined by
the string.
A deterministic finite automaton M is a 5-tuple, (Q, Σ, δ, q0, F),
consisting of
a finite set of states Q
a finite set of input symbols called the alphabet Σ
a transition function δ : Q × Σ → Q
an initial or start state q0
a set of accept states F
The following example is of a DFA M, with a binary alphabet, which
requires that the input contains an even number of 0s.
M = (Q, Σ, δ, q0, F) where
Q = {S1, S2}
Σ = {0, 1}
q0 = S1
F = {S1} and
δ is defined by the following state transition table:
0
0
s1
s2
s1
s2
s1
s2
State diagram for M:
The state S1 represents that there has been an even number of 0s in
the input so far, while S2 signifies an odd number. A 1 in the input
does not change the state of the automaton. When the input ends, the
state will show whether the input contained an even number of 0s or
not. If the input did contain an even number of 0s, M will finish in
state S1, an accepting state, so the input string will be accepted.
HTML implementation
The DFA M exemplified above plus a few of the most basic DFA were implemented in Markdown and converted/hosted as HTML pages by Github, accessible here.
Following the definition of M, its HTML implementation is detailed as follows.
The set of states Q contains the pages s1.html and s2.html, and also the acceptance page acc.html and the rejection page rej.html. These two additional states are a "user-friendly" way to communicate the acceptance of a word and don't affect the semantics of the DFA.
The set of symbols Σ is defined as the symbols 0 and 1. The empty string symbol ε was also included to denote the end of the input, leading to either acc.html or rej.html state.
The initial state q0 is s1.html.
The set of accept states is {acc.html}.
The set of transitions is defined by hyperlinks such that page s1.html contains a link with text "0" leading to s2.html, a link with text "1" leading to s1.html, and a link with text "ε" leading to acc.html. Each page is analogous according to the following transition table. Obs: acc.html and rej.html don't contain links.
0
1
ε
s1.html
s2.html
s1.html
acc.html
s2.html
s1.html
s2.html
rej.html
Questions
In what ways are those HTML pages "machines"? Don't these machines include the browser and the person who clicks the links? In what way does a link perform computation?
DFA is an abstract machine, i.e. a mathematical object. By the definition shown above, it is a tuple that defines transition rules between states according to a set of symbols. A real-world implementation of these rules (i.e. who keeps track of the current state, looks up the transition table and updates the current state accordingly) is then outside the scope of the definition. And for that matter, a Turing machine is a similar tuple with a few more elements to it.
As described above, the HTML implementation represents the DFA M in full: every state and every transition is represented by a page and a link respectively. Browsers, clicks and CPUs are then irrelevant in the context of the DFA.
In other words, as written by #Not_Here in the comments:
Rules don't innately implement themselves, they're just rules an
implementation should follow. Consider it this way: Turing machines
aren't actual machines, Turing didn't build machines. They're purely
mathematical objects, they're tuples of sets (state, symbols) and a
transition function between states. Turing machines are purely
mathematical objects, they're sets of instructions for how to
implement a computation, and so is this example in HTML.
The Wikipedia article on abstract machines:
An abstract machine, also called an abstract computer, is a
theoretical computer used for defining a model of computation.
Abstraction of computing processes is used in both the computer
science and computer engineering disciplines and usually assumes a
discrete time paradigm.
In the theory of computation, abstract machines are often used in
thought experiments regarding computability or to analyze the
complexity of algorithms (see computational complexity theory). A
typical abstract machine consists of a definition in terms of input,
output, and the set of allowable operations used to turn the former
into the latter. The best-known example is the Turing machine.
Some have claimed to implement Rule 110, a cellular automaton, using pure HTML and CSS (no JavaScript). You can see a video here, or browse the source of one implementation.
Why is this relevant? It has been proven that Rule 110 is itself Turing complete, meaning that it can simulate any Turing machine. If we then implement Rule 110 using pure HTML, it follows that HTML can simulate any Turing machine via its simulation of that particular cellular automaton.
The critiques of this HTML "proof" focus on the fact that human input is required to drive the operation of the HTML machine. As seen in the video above, the human's input is constrained to a repeating pattern of Tab + Space (because the HTML machine consists of a series of checkboxes). Much as a Turing machine would require a clock signal and motive force to move its read/write head if it were to be implemented as a physical machine, the HTML machine needs energy input from the human -- but no information input, and crucially, no decision making.
In summary: HTML is probably Turing-complete, as proven by construction.

Why are leading zeroes used to represent octal numbers?

I've always wondered why leading zeroes (0) are used to represent octal numbers, instead of — for example — 0o. The use of 0o would be just as helpful, but would not cause as many problems as leading 0es (e.g. parseInt('08'); in JavaScript). What are the reason(s) behind this design choice?
All modern languages import this convention from C, which imported it from B, which imported it from BCPL.
Except BCPL used #1234 for octal and #x1234 for hexadecimal. B has departed from this convention because # was an unary operator in B (integer to floating point conversion), so #1234 could not be used, and # as a base indicator was replaced with 0.
The designers of B tried to make the syntax very compact. I guess this is the reason they did not use a two-character prefix.
Worth noting that in Python 3.0, they decided that octal literals must be prefixed with '0o' and the old '0' prefix became a SyntaxError, for the exact reasons you mention in your question
https://www.python.org/dev/peps/pep-3127/#removal-of-old-octal-syntax
"0b" is often used for binary rather than for octal. The leading "0" is, I suspect for "O -ctal".
If you know you are going to be parsing octal then use parseInt('08', 10); to make it treat the number as base ten.

Boolean operations on CUDA

My application needs to perform bit-vector operations like OR and XOR on bit-vectors.
e.g suppose array A = 000100101 (a.k.a bit vector)
B = 100101010
A . B = 100101111
Does CUDA support boolean variables? e.g. bool as in C. If yes, how is it stored and operated on? Does it also support bit-vector operations?. I couldn't find the answer in the CUDA Programming Guide.
CUDA supports the standard C++ bool, but in C++ it is only a type guaranteed to support two states, so bit operations shouldn't be used on it. In CUDA, as in C++, you get the standard complement of bitwise operators for integral types (and, or, xor, complement and left and right shift). Ideally you should aim to use a 32 bit type (or a packed 32 bit CUDA vector type) for memory throughput reasons.

Suggested reading for BITS/Bytes and sample code to perform operations etc

Need a refresher on bits/bytes, hex notation and how it relates to programming (C# preferred).
Looking for a good reading list (online preferably).
There are several layers to consider here:
Electronic
In the electronic paradigm, everything is a wire.
A single wire represents a single bit.
0 is the LOW voltage, 1 is the
HIGH voltage. The voltages may be [0,5], [-3.3, 3], [-5, 5], [0, 1.3],
etc. The key thing is that there are only two voltage levels which control
the action of the transistors.
A byte is a collection of wires(To be precise, it's probably collected in a set of flip-flops called registers, but let's leave it as "wires" for now).
Programming
A bit is 0 or 1.
A byte is - in modern systems - 8 bits. Ancient systems might have had 10-bit bytes or other sizes; they don't exist today.
A nybble is 4 bits; half a byte.
Hexadecimal is an efficient representation of 8 bits. For example: F
maps to 1111 1111. That is more efficient than writing 15. Plus, it is quite clear if you are writing down multiple byte values: FF is unambiguous; 1515 can be read several different ways.
Historically, octal has been also used(base 8). However, the only place where I have met it is in the Unix permissions.
Since on the electronic layer, it is most efficient to collect memory
in groups of 2^n, hex is a natural notation for representing
memory. Further, if you happen to work at the driver level, you may
need to specifically control a given bit, which will require the use
of bit-level operators. It is clear which bytes are on HI if you say
F & outputByte than 15 & outputByte.
In general, much of modern programming does not need to concern itself
with binary and hexadecimal. However, if you are in a place where you
need to know it, there is no slipping by - you really need to know
it then.
Particular areas that need the knowledge of binary include: embedded
systems, driver writing, operating system writing, network protocols,
and compression algorithms.
While you wanted C#, C# is really not the right language for bit-level
manipulation. Traditionally, C and C++ are the languages used for bit
work. Erlang works with bit manipulation, and Perl supports it as
well. VHDL is completely bit-oriented, but is fairly difficult to work
with from the typical programming perspective.
Here is some sample C code for performing different logical operations:
char a, b, c;
c = a ^ b; //XOR
c = a & b; //AND
c = a | b; //OR
c = ~(a & b); //NOT AND(NAND)
c = ~a; //NOT
c = a << 2; //Left shift 2 places
c = a >> 2; //Right shift 2 places.
A bit is either 1 or 0.
A byte is 8 bits.
Each character in hex is 4 bits represented as 0-F
0000 is 0
0001 is 1
0010 is 2
0011 is 3
...
1110 is E
1111 is F
There's a pretty good intro to C#'s bit-munching operations here
Here is some basic reading: http://www.learn-c.com/data_lines.htm
Bits and bytes hardly ever relates to C# since the CLR handles memory by itself. There are classes and methods handling hex notation and all those things in the framework too. But, it is still a fun read.
Write Great Code is a good primer on this topic among others...brings you from the bare metal to higher order languages.