ANSI aggregate functions for SQL - mysql

Looking at ANSI aggregate functions for SQL and I can't find anything for strings. However each database seems to have its own, e.g GROUP_CONCAT and LISTAGG for MySQL and Oracle respectively, making portability a little difficult. Is there something I am missing? Is there a reason for this?

ANSI has adopted listagg() as the standard. I would not hold my breath waiting for other databases to change their function, though.
String aggregation was either viewed as unimportant originally or the committee could not decide on an appropriate standard.
Here is an interesting perspective on the issue regarding Postgres. I would caution reading too much into Oracle controlling the standards committee (unless the author has inside information). IBM has also been very active and DB2 supports listagg().

An aggregate function for this is specified in the standard, LISTAGG; it was specified in SQL:2016 (ISO-9075-2:2016) in section 10.9 <aggregate function>. The fact each database has its own, is because it wasn't standardized in earlier versions of the standard.
Why it wasn't standardized before would be guessing to the reasons, deliberations and arguments of the standardization committee, which - as far as I know - are not publicly available, but either it wasn't considered important enough, or the committee couldn't come to an agreement on syntax and behaviour in earlier versions.

Related

NoSQL Query Language, UnQL? N1QL? CouchBase, C embedded library

I'm investigating DocumentDBs, and I'm checking out the query side options. I know nothing has firmly established as of yet, but have a few questions I've yet seen fully answered.
Couchbase dropped out of UnQL? and then developed N1QL? Does this mean that they see N1QL as a more appropriate different query language? or does it extend what was set in UnQL? Was anything actually formally standardized?
Is anyone allowed to implement N1QL? Is it an open de-facto standard, vs. something patented in some way.
Regarding your first question...
N1QL is based on SQL, which is an ISO standard. Some of the language extensions like NEST/UNNEST and array comprehensions have been used and/or proposed elsewhere.
N1QL is not based on UnQL, but addresses some of the same needs with the advantage of being SQL.

Triple Stores vs Relational Databases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I was wondering what are the advantages of using Triple Stores over a relational database?
The viewpoint of the CTO of a company that extensively uses RDF Triplestores commercially:
Schema flexibility - it's possible to do the equivalent of a schema change to an RDF store live, and without any downtime, or redesign - it's not a free lunch, you need to be careful with how your software works, but it's a pretty easy thing to do.
More modern - RDF stores are typically queried over HTTP it's very easy to fit them into Service Architectures without hacky bridging solutions, or performance penalties. Also they handle internationalised content better than typical SQL databases - e.g. you can have multiple values in different languages.
Standardisation - the level of standardisation of implementations using RDF and SPARQL is much higher than SQL. It's possible to swap out one triplestore for another, though you have to be careful you're not stepping outside the standards. Moving data between stores is easy, as they all speak the same language.
Expressivity - it's much easier to model complex data in RDF than in SQL, and the query language makes it easier to do things like LEFT JOINs (called OPTIONAL in SPARQL). Conversely though, if your data is very tabular, then SQL is much easier.
Provenance - SPARQL lets you track where each piece of information came from, and you can store metadata about it, letting you easily do sophisticated queries, only taking into account data from certain sources, or with a certain trust level, on from some date range etc.
There are downsides though. SQL databases are generally much more mature, and have more features than typical RDF databases. Things like transactions are often much more crude, or non existent. Also, the cost per unit information stored in RDF v's SQL is noticeably higher. It's hard to generalise, but it can be significant if you have a lot of data - though at least in our case it's an overall benefit financially given the flexibility and power.
Both commenters are correct, especially since Semantic Web is not a database, it's a bit more general than that.
But I guess you might mean triple store, rather than Semantic Web in general, as triple store v. relational database is a somewhat more meaningful comparison. I'll preface the rest of my answer by noting that I'm not an expert in relational database systems, but I have a little bit of knowledge about triple stores.
Triple (or quad) stores are basically databases for data on the semantic web, particularly RDF. That's kind of where the similarity between triples stores & relational databases end. Both store data, both have query languages, both can be used to build applications on top of; so I guess if you squint your eyes, they're pretty similar. But the type of data each stores is quite different, so the two technologies optimize for different use cases and data structures, so they're not really interchangeable.
A lot of people have done work in overlaying a triples view of the world on top of a relational database, and that can work, and also will be slower than a system dedicated for storing and retrieving triples. Part of the problems is that SPARQL, the standard query language used by triple stores, can require a lot of self joins, something relational databases are not optimized for. If you look at benchmarks, such as SP2B, you can see that Oracle, which just overlays SPARQL support on its relational system, runs in the middle or at the back of the pack when compared with systems that more natively support RDF.
Of course, the RDF systems would probably get crushed by Oracle if they were doing SQL queries over relational data. But that's kind of the point, you pick the tool that's well suited for the application you want to build.
So if you're thinking about building a semantic web application, or just trying to get some familiarity in the area, I'd recommend ultimately going with a dedicated triple store.
I won't delve into reasoning and how that plays into query answering in triple stores, as that's yet another discussion, but it's another important distinction between relational systems and triple stores that do reasoning.
Some triplestores (Virtuoso, Jena SDB) are based on relational databases and simply provide an RDF / SPARQL interface. So to rephrase the question slighty, are triplestores built from the ground up as a triplestore more performant than those that aren't - #steve-harris definitely knows the answer to that ;) but I wager a yes.
Secondly, what features do triplestores have that RDBMS don't. The simple answer is support for SPARQL, RDF, OWL etc. (i.e the Semantic Web Technology stack) and to make it a fair fight, its better to define the value of SPARQL based on SPARQL 1.1 (it has considerably more features than 1.0). This provides support for federation (so so cool), property path expressions and entailment regimes along with an standards set of update protocols, graph management protocols (that SPARQL 1.0 didn't have and sorely lacked). Also #steve-harris points out that transactions are not part of the standard (can of worms) although many vendors provide non-standardised mechanisms for transactions (Virtuoso supports JDBC and Hibernate compliant connection pooling and management along with all the transactional features of Hibernate)
The big drawback in my mind is that not many triplestores support all of SPARQL 1.1 (since it is still not in recommendation) and this is where the real benefits lie.
Having said that, I am and always have been an advocate of substituting RDBMS with triplestores and platforms I deliver run entirely off triplestores (Volkswagen in my last role was an example of this), deprecating the need for RDBMS. An additional advantage is that Object to RDF mapping is more flexible and provides more options and flexibility than traditional ORM (also known as putting a square peg in a round hole).
Also you can still use a database but use RDF as a data exchange format which is very flexible.

What is ST in PostGIS?

Almost all the functions in PostGIS start with ST. e.g. ST_Distance_Sphere, ST_GeomFromText, ST_Intersection, etc.
What does ST mean?
http://www.postgis.org/documentation/manual-svn/PostGIS_Special_Functions_Index.html
From the manual:
PostGIS has begun a transition from the existing naming convention to
an SQL-MM-centric convention. As a result, most of the functions that
you know and love have been renamed using the standard spatial type
(ST) prefix. Previous functions are still available, though are not
listed in this document where updated functions are equivalent. These
will be deprecated in a future release.
Originally, it was for spatial and temporal data. From http://doesen0.informatik.uni-leipzig.de/proceedings/paper/68.pdf:
The SQL/MM standard uses consistently the prefix ST for all tables, views, types, methods,
and function names. The prefix stood originally for Spatial and Temporal. It was intended
in the early stages of the standard development to define a combination of temporal
and spatial extension. A reason for that was that spatial information is very often tied with
temporal data... During the development of SQL/MM
Spatial, it was decided that temporal has a broader scope beyond the spatial application... The contributors to SQL/MM did not want to move forward with a Spatio-temporal support until
SQL/Temporal developed.
... Today, one might want to interpret it as Spatial Type.

Is it wise to rely on default features of a programming language?

Should I frequently rely on default values?
For example, in PHP, if you have the following:
<?php
$var .= "Value";
?>
This is perfectly fine - it works. But what if assignment like this to a previously unused variable is later eliminated from the language? (I'm not referring to just general assignment to an unused variable.)
There are countless examples of where the default value of something has changed and so much existing code was then useless.
On the other hand, without defaults, there is a lot of code redundancy.
What is the proper way of dealing with this?
In my opinion, the odds of a language changing drastically once it reaches a certain level of acceptance are pretty low.
To me, each language comes with a (sometimes more or less) unique set of features. Not using those because they just might disappear some day seems shortsighted. Naturally, don't use esoteric features just for the sake of doing so -- make sure you follow usual principles of readability and best practices for your language of choice, but otherwise I see no need to discriminate against particular features.
Default-value features of a programming language, if actually a documented part of the standard rather than just an accident of the implementation (which many past "default initializations" have been), are no different from any other features of a programming language. You might as well ask if it's wise to rely on anything else in the language, and the answer regardless of wisdom is that there's no choice -- you have to rely on something, and anything could hypothetically be changed in a future version.
Of course, if the thing that you're relying on is a commonly-used feature of the language, rather than an odd corner case, then there's a lot more chance that it will be retained in future versions. In addition, if you're concerned about such things, it's wise to choose a well-established language that has a history of maintaining backwards compatibility. Some languages take great pains to make sure that older code runs in the new version of the language, and some less so.
So, that's the general answer. The specific answer about default values is that it depends on the particular case, the language in question, and so forth. You can have absolute ironclad reliance on the fact that global static variables in C will be zero at program start. Some other cases are rather notably less reliable.
If something is a defined feature of a language, it's probably pretty safe to rely on it.
Where you should be wary of relying on functionality is when you get into areas where you're dealing with behavior that is either unspecified, or if you're "misusing" a feature by using it for something completely other than what it was intended for.
Since "default value of a string" is pretty much a mainline scenario, it's probably safe. What's more dangerous is that if you happen to declare that variable earlier, you can get hit by it changing the expected value as an unintended side effect.
Basically, if you're not abusing a language feature, or relying on undefined behavior, you should probably worry more about unintended side effects in your code than you should worry about the language changing - especially if the language is mature.

Does a "thin data access layer" mainly imply writing SQL by hand?

When you say "thin data access layer", does this mainly mean you are talking about writing your SQL manually as opposed to relying on an ORM tool to generate it for you?
That probably depends on who says it, but for me a thin data access layer would imply that there is little to no additional logic in the layer (i.e. data storage abstractions), probably no support for targeting multiple RDBMS, no layer-specific caching, no advanced error handling (retry, failover), etc.
Since ORM tools tend to supply many of those things, a solution with an ORM would probably not be considered "thin". Many home-grown data access layers would also not be considered "thin" if they provide features such as the ones listed above.
Depends on how we define the word "thin". It's one of the most abused terms I hear, rivaled only by "lightweight".
That's one way to define it, but perhaps not the best. An ORM layer does a lot besides just generate SQL for you (e.g., caching, marking "dirty" fields, etc.) That "thin" layer written in lovingly crafted SQL can become pretty bloated by the time you implement all the features an ORM is providing.
I think "thin" in this context means:
It is lightweight;
It has a low performance overhead; and
You write minimal code.
Writing SQL certainly fits this bill but there's no reason it couldn't be an ORM either although most ORMs that spring to mind don't strike me as lightweight.
I think it depends on the context.
It could very well mean that, or it may simply mean that your business objects map directly a simple underlying relational table structure: one table per class, one column per class attribute, so that the translation of business object structure to database table structure is "thin" (i.e. not complex). This could still be handled by an ORM of course.
It may mean that there is no or minimal logic employed on the database such as avoiding the use of stored procedures. As other people have mentioned it depends on the statement's context as to the most likely meaning.
I thought data access were always supposed to be thin... DALs aren't really the place to have logic.
Maybe the person you talked to is talking about a combination of a business layer and a data access layer; where the business layer is non-existent (e.g. a very simple app, or perhaps all of the business rules are in the database, etc).