What is ST in PostGIS? - gis

Almost all the functions in PostGIS start with ST. e.g. ST_Distance_Sphere, ST_GeomFromText, ST_Intersection, etc.
What does ST mean?
http://www.postgis.org/documentation/manual-svn/PostGIS_Special_Functions_Index.html

From the manual:
PostGIS has begun a transition from the existing naming convention to
an SQL-MM-centric convention. As a result, most of the functions that
you know and love have been renamed using the standard spatial type
(ST) prefix. Previous functions are still available, though are not
listed in this document where updated functions are equivalent. These
will be deprecated in a future release.

Originally, it was for spatial and temporal data. From http://doesen0.informatik.uni-leipzig.de/proceedings/paper/68.pdf:
The SQL/MM standard uses consistently the prefix ST for all tables, views, types, methods,
and function names. The prefix stood originally for Spatial and Temporal. It was intended
in the early stages of the standard development to define a combination of temporal
and spatial extension. A reason for that was that spatial information is very often tied with
temporal data... During the development of SQL/MM
Spatial, it was decided that temporal has a broader scope beyond the spatial application... The contributors to SQL/MM did not want to move forward with a Spatio-temporal support until
SQL/Temporal developed.
... Today, one might want to interpret it as Spatial Type.

Related

ANSI aggregate functions for SQL

Looking at ANSI aggregate functions for SQL and I can't find anything for strings. However each database seems to have its own, e.g GROUP_CONCAT and LISTAGG for MySQL and Oracle respectively, making portability a little difficult. Is there something I am missing? Is there a reason for this?
ANSI has adopted listagg() as the standard. I would not hold my breath waiting for other databases to change their function, though.
String aggregation was either viewed as unimportant originally or the committee could not decide on an appropriate standard.
Here is an interesting perspective on the issue regarding Postgres. I would caution reading too much into Oracle controlling the standards committee (unless the author has inside information). IBM has also been very active and DB2 supports listagg().
An aggregate function for this is specified in the standard, LISTAGG; it was specified in SQL:2016 (ISO-9075-2:2016) in section 10.9 <aggregate function>. The fact each database has its own, is because it wasn't standardized in earlier versions of the standard.
Why it wasn't standardized before would be guessing to the reasons, deliberations and arguments of the standardization committee, which - as far as I know - are not publicly available, but either it wasn't considered important enough, or the committee couldn't come to an agreement on syntax and behaviour in earlier versions.

What is differentiable programming?

Native support for differential programming has been added to Swift for the Swift for Tensorflow project. Julia has similar with Zygote.
What exactly is differentiable programming?
what does it enable? Wikipedia says
the programs can be differentiated throughout
but what does that mean?
how would one use it (e.g. a simple example)?
and how does it relate to automatic differentiation (the two seem conflated a lot of the time)?
I like to think about this question in terms of user-facing features (differentiable programming) vs implementation details (automatic differentiation).
From a user's perspective:
"Differentiable programming" is APIs for differentiation. An example is a def gradient(f) higher-order function for computing the gradient of f. These APIs may be first-class language features, or implemented in and provided by libraries.
"Automatic differentiation" is an implementation detail for automatically computing derivative functions. There are many techniques (e.g. source code transformation, operator overloading) and multiple modes (e.g. forward-mode, reverse-mode).
Explained in code:
def f(x):
return x * x * x
∇f = gradient(f)
print(∇f(4)) # 48.0
# Using the `gradient` API:
# ▶ differentiable programming.
# How `gradient` works to compute the gradient of `f`:
# ▶ automatic differentiation.
I never heard the term "differentiable programming" before reading your question, but having used the concepts noted in your references, both from the side of creating code to solve a derivative with Symbolic differentiation and with Automatic differentiation and having written interpreters and compilers, to me this just means that they have made the ability to calculate the numeric value of the derivative of a function easier. I don't know if they made it a First-class citizen, but the new way doesn't require the use of a function/method call; it is done with syntax and the compiler/interpreter hides the translation into calls.
If you look at the Zygote example it clearly shows the use of prime notation
julia> f(10), f'(10)
Most seasoned programmers would guess what I just noted because there was not a research paper explaining it. In other words it is just that obvious.
Another way to think about it is that if you have ever tried to calculate a derivative in a programming language you know how hard it can be at times and then ask yourself why don't they (the language designers and programmers) just add it into the language. In these cases they did.
What surprises me is how long it to took before derivatives became available via syntax instead of calls, but if you have ever worked with scientific code or coded neural networks at at that level then you will understand why this is a concept that is being touted as something of value.
Also I would not view this as another programming paradigm, but I am sure it will be added to the list.
How does it relate to automatic differentiation (the two seem conflated a lot of the time)?
In both cases that you referenced, they use automatic differentiation to calculate the derivative instead of using symbolic differentiation. I do not view differentiable programming and automatic differentiation as being two distinct sets, but instead that differentiable programming has a means of being implemented and the way they chose was to use automatic differentiation, they could have chose symbolic differentiation or some other means.
It seems you are trying to read more into what differential programming is than it really is. It is not a new way of programming, but just a nice feature added for doing derivatives.
Perhaps if they named it differentiable syntax it might have been more clear. The use of the word programming gives it more panache than I think it deserves.
EDIT
After skimming Swift Differentiable Programming Mega-Proposal and trying to compare that with the Julia example using Zygote, I would have to modify the answer into parts that talk about Zygote and then switch gears to talk about Swift. They each took a different path, but the commonality and bottom line is that the languages know something about differentiation which makes the job of coding them easier and hopefully produces less errors.
About the Wikipedia quote that
the programs can be differentiated throughout
At first reading it seems nonsense or at least lacks enough detail to understand it in context which is why I am sure you asked.
In having many years of digging into what others are trying to communicate, one learns that unless the source has been peer reviewed to take it with a grain of salt, and unless it is absolutely necessary to understand, then just ignore it. In this case if you ignore the sentence most of what your reference makes sense. However I take it that you want an answer, so let's try and figure out what it means.
The key word that has me perplexed is throughout, but since you note the statement came from Wikipedia and in Wikipedia they give three references for the statement, a search of the word throughout appears only in one
∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing
Thus, since our ∂P system does not require primitives to handle new
types, this means that almost all functions and types defined
throughout the language are automatically supported by Zygote, and
users can easily accelerate specific functions as they deem necessary.
So my take on this is that by going back to the source, e.g. the paper, you can better understand how that percolated up into Wikipedia, but it seems that the meaning was lost along the way.
In this case if you really want to know the meaning of that statement you should ask on the Wikipedia talk page and ask the author of the statement directly.
Also note that the paper referenced is not peer reviewed, so the statements in there may not have any meaning amongst peers at present. As I said, I would just ignore it and get on with writing wonderful code.
You can guess its definition by application of differentiability.
It's been used for optimization i.e. to calculate minimum value or maximum value
Many of these problems can be solved by finding the appropriate function and then using techniques to find the maximum or the minimum value required.

Triple Stores vs Relational Databases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I was wondering what are the advantages of using Triple Stores over a relational database?
The viewpoint of the CTO of a company that extensively uses RDF Triplestores commercially:
Schema flexibility - it's possible to do the equivalent of a schema change to an RDF store live, and without any downtime, or redesign - it's not a free lunch, you need to be careful with how your software works, but it's a pretty easy thing to do.
More modern - RDF stores are typically queried over HTTP it's very easy to fit them into Service Architectures without hacky bridging solutions, or performance penalties. Also they handle internationalised content better than typical SQL databases - e.g. you can have multiple values in different languages.
Standardisation - the level of standardisation of implementations using RDF and SPARQL is much higher than SQL. It's possible to swap out one triplestore for another, though you have to be careful you're not stepping outside the standards. Moving data between stores is easy, as they all speak the same language.
Expressivity - it's much easier to model complex data in RDF than in SQL, and the query language makes it easier to do things like LEFT JOINs (called OPTIONAL in SPARQL). Conversely though, if your data is very tabular, then SQL is much easier.
Provenance - SPARQL lets you track where each piece of information came from, and you can store metadata about it, letting you easily do sophisticated queries, only taking into account data from certain sources, or with a certain trust level, on from some date range etc.
There are downsides though. SQL databases are generally much more mature, and have more features than typical RDF databases. Things like transactions are often much more crude, or non existent. Also, the cost per unit information stored in RDF v's SQL is noticeably higher. It's hard to generalise, but it can be significant if you have a lot of data - though at least in our case it's an overall benefit financially given the flexibility and power.
Both commenters are correct, especially since Semantic Web is not a database, it's a bit more general than that.
But I guess you might mean triple store, rather than Semantic Web in general, as triple store v. relational database is a somewhat more meaningful comparison. I'll preface the rest of my answer by noting that I'm not an expert in relational database systems, but I have a little bit of knowledge about triple stores.
Triple (or quad) stores are basically databases for data on the semantic web, particularly RDF. That's kind of where the similarity between triples stores & relational databases end. Both store data, both have query languages, both can be used to build applications on top of; so I guess if you squint your eyes, they're pretty similar. But the type of data each stores is quite different, so the two technologies optimize for different use cases and data structures, so they're not really interchangeable.
A lot of people have done work in overlaying a triples view of the world on top of a relational database, and that can work, and also will be slower than a system dedicated for storing and retrieving triples. Part of the problems is that SPARQL, the standard query language used by triple stores, can require a lot of self joins, something relational databases are not optimized for. If you look at benchmarks, such as SP2B, you can see that Oracle, which just overlays SPARQL support on its relational system, runs in the middle or at the back of the pack when compared with systems that more natively support RDF.
Of course, the RDF systems would probably get crushed by Oracle if they were doing SQL queries over relational data. But that's kind of the point, you pick the tool that's well suited for the application you want to build.
So if you're thinking about building a semantic web application, or just trying to get some familiarity in the area, I'd recommend ultimately going with a dedicated triple store.
I won't delve into reasoning and how that plays into query answering in triple stores, as that's yet another discussion, but it's another important distinction between relational systems and triple stores that do reasoning.
Some triplestores (Virtuoso, Jena SDB) are based on relational databases and simply provide an RDF / SPARQL interface. So to rephrase the question slighty, are triplestores built from the ground up as a triplestore more performant than those that aren't - #steve-harris definitely knows the answer to that ;) but I wager a yes.
Secondly, what features do triplestores have that RDBMS don't. The simple answer is support for SPARQL, RDF, OWL etc. (i.e the Semantic Web Technology stack) and to make it a fair fight, its better to define the value of SPARQL based on SPARQL 1.1 (it has considerably more features than 1.0). This provides support for federation (so so cool), property path expressions and entailment regimes along with an standards set of update protocols, graph management protocols (that SPARQL 1.0 didn't have and sorely lacked). Also #steve-harris points out that transactions are not part of the standard (can of worms) although many vendors provide non-standardised mechanisms for transactions (Virtuoso supports JDBC and Hibernate compliant connection pooling and management along with all the transactional features of Hibernate)
The big drawback in my mind is that not many triplestores support all of SPARQL 1.1 (since it is still not in recommendation) and this is where the real benefits lie.
Having said that, I am and always have been an advocate of substituting RDBMS with triplestores and platforms I deliver run entirely off triplestores (Volkswagen in my last role was an example of this), deprecating the need for RDBMS. An additional advantage is that Object to RDF mapping is more flexible and provides more options and flexibility than traditional ORM (also known as putting a square peg in a round hole).
Also you can still use a database but use RDF as a data exchange format which is very flexible.

Which FOSS RDBMS to use for geospatial data?

I'm developing an application using Google Maps API. The goal is to geocode certain locations and then allow users to search for these locations based on which ones are nearest to the user (e.g. "Thing x is 20 miles from you").
In MySQL, I can just store the geo-coordinates and use haversine formula to do the distance calculations. Someone has suggested I consider Postgres because it has "better support for geographical data."
So, the question is: what are the pros and cons of using MySQL or Postgres?
PostgreSQL has support for what you are talking about on board. But for a lot more functionality (and for what the "someone" you mention was probably thinking about) turn to PostGIS.
See the home page, documentation, or start at good old Wikipedia for an overview.
Edit after question in comment:
In particular, see the function support matrix to get an impression what PostGIS can do for you.
Computing the distance between two points is a standard feature. You can have that for a variety of data types. Which data type to use? See this question in the FAQ and further links there.
If it is just points, MySQL is fine. If you have more complex geometries, like delivery routes, or cell reception areas, or whatever, you want PostGIS, because it supports more sophisticated indexing of geometric data (r-trees). MyISAM is actually better than InnoDB for spatial data, BTW, because it also supports r-tree spatial indexes (but not as powerful queries as PostGIS.) If you just need points, though, InnoDB or MyISAM b-trees are adequate. If bounding boxes are enough (ie, you need everything within a rectangular plane around some point), then geohash-based indexes are ok. More background on all that here. It is well worth the trouble getting familiar with PostGIS and Postgres, they are both remarkably good projects and by far my preferred relational db, but just looking up points does not require them.
You could also check the MonogDB, it supports geospatial indexing letting you query for nearest objects very effectively ! Thats what the guys at Foursquare using to find nearest venues ...

Tools to help reverse engineer binary file formats

What tools are available to aid in decoding unknown binary data formats?
I know Hex Workshop and 010 Editor both support structures. These are okay to a limited extent for a known fixed format but get difficult to use with anything more complicated, especially for unknown formats. I guess I'm looking at a module for a scripting language or a scriptable GUI tool.
For example, I'd like to be able to find a structure within a block of data from limited known information, perhaps a magic number. Once I've found a structure, then follow known length and offset words to find other structures. Then repeat this recursively and iteratively where it makes sense.
In my dreams, perhaps even automatically identify possible offsets and lengths based on what I've already told the system!
Here are some tips that come to mind:
From my experience, interactive scripting languages (I use Python) can be a great help. You can write a simple framework to deal with binary streams and some simple algorithms. Then you can write scripts that will take your binary and check various things. For example:
Do some statistical analysis on various parts. Random data, for example, will tell you that this part is probably compressed/encrypted. Zeros may mean padding between parts. Scattered zeros may mean integer values or Unicode strings and so on. Try to spot various offsets. Try to convert parts of the binary into 2 or 4 byte integers or into floats, print them and see if they make sence. Write some functions that will search for repeating or very similar parts in the data, this way you can easily spot headers.
Try to find as many strings as possible, try different encodings (c strings, pascal strings, utf8/16, etc.). There are some good tools for that (I think that Hex Workshop has such a tool). Strings can tell you a lot.
Good luck!
For Mac OS X, there's a great tool that's even better than my iBored: Synalyze It!
(http://www.synalysis.net/)
Compared to iBored, it is better suited for non-blocked files, while also giving full control over structures, including scriptability (with Lua). And it visualizes structures better, too.
Tupni; to my knowledge not directly available out of Microsoft Research, but there is a paper about this tool which can be of interest to someone wanting to write a similar program (perhaps open source):
Tupni: Automatic Reverse Engineering of Input Formats (# ACM digital library)
Abstract
Recent work has established the importance of automatic reverse
engineering of protocol or file format specifications. However, the
formats reverse engineered by previous tools have missed important
information that is critical for security applications. In this
paper, we present Tupni, a tool that can reverse engineer an input
format with a rich set of information, including record sequences,
record types, and input constraints. Tupni can generalize the format
specification over multiple inputs. We have implemented a
prototype of Tupni and evaluated it on 10 different formats: five
file formats (WMF, BMP, JPG, PNG and TIF) and five network
protocols (DNS, RPC, TFTP, HTTP and FTP). Tupni identified all
record sequences in the test inputs. We also show that, by aggregating
over multiple WMF files, Tupni can derive a more complete
format specification for WMF. Furthermore, we demonstrate the
utility of Tupni by using the rich information it provides for zeroday
vulnerability signature generation, which was not possible with
previous reverse engineering tools.
My own tool "iBored", which I released just recently, can do parts of this. I wrote the tool to visualize and debug file system formats (UDF, HFS, ISO9660, FAT etc.), and implemented search, copy and later even structure and templates support. The structure support is pretty straight-forward, and the templates are a way to identify structures dynamically.
The entire thing is programmable in a Visual BASIC dialect, allowing you to test values, read specific blocks, and all.
The tool is free, works on all platforms (Win, Mac, Linux), but as it's personal tool which I just released to the public to share it, it's not much documented.
However, if you want to give it a try, and like to give feedback, I might add more useful features.
I'd even open source it, but as it's written in REALbasic, I doubt many people will join such a project.
Link: iBored home page
I still occasionally use an old hex editor called A.X.E., Advanced Hex Editor. It seems to have largely disappeared from the Internet now, though Google should still be able to find it for you. The last version I know of was version 3.4, but I've really only used the free-for-personal-use version 2.1.
Its most interesting feature, and the one I've had the most use for deciphering various game and graphics formats, is its graphical view mode. That basically just shows you the file with each byte turned into a color-coded pixel. And as simple as that sounds, it has made my reverse-engineering attempts a lot easier at times.
I suppose doing it by eye is quite the opposite of doing automatic analysis, though, and the graphical mode won't be much use for finding and following offsets...
The later version has some features that sound like they could fit your needs (scripts, regularity finder, grammar generator), but I have no idea how good they are.
There is Hachoir which is a Python library for parsing any binary format into fields, and then browse the fields. It has lots of parsers for common formats, but you can also write own parsers for your files (eg. when working with code that reads or writes binary files, I usually write a Hachoir parser first to have a debugging aid). Looks like the project is pretty much inactive by now, though.
Kaitai is an open-source language for describing binary structures in data streams. It comes with a translator that can output parsing code for many programming languages, for inclusion in your own program code.
My project icebuddha.com supports this using python to describe the format in the browser.
A cut'n'paste of my answer to a similar question:
One tool is WinOLS, which is designed for interpreting and editing vehicle engine managment computer binary images (mostly the numeric data in their lookup tables). It has support for various endian formats (though not PDP, I think) and viewing data at various widths and offsets, defining array areas (maps) and visualising them in 2D or 3D with all kinds of scaling and offset options. It also has a heuristic/statistical automatic map finder, which might work for you.
It's a commercial tool, but the free demo will let you do everything but save changes to the binary and use engine management features you don't need.