Can a SPARQL query import or re-use a set of PREFIX definitions (namespace bindings)? - namespaces

I log my explorations of RDF data by executing SPARQL queries in a Jupyter notebook (Web-based REPL environment).
Very often I create a query by copying the previous query for tweaking. The notebook fills up with SPARQL queries that all start with the same eight PREFIX definitions (e.g., PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>).
I keep the PREFIX list short in order to reduce clutter, but this means I must often switch windows to search some other prefix that must be added (e.g., PREFIX eurovoc: ...).
Is there a way to save PREFIX definitions in a file, then simply import those definitions into a query?
Alternatively, since I currently execute the queries in Jupyter Notebook using the Apache JENA command-line utility arq, I'd be happy for any command-line utility that allows a SPARQL query to be split into multiple files, or other such workarounds.
I searched for an answer to the question on stackoverflow.com and on the Web by trying many queries involving the words SPARQL, PREFIX, declaration, definition, reundant, import, re-use, reuse, namespace, binding, separate, file, and multiple, which I reproduce here in order to make this question more easily findable by others who may be asking the same question.

I've just stumbled upon this question, while looking for something similar. AFAIK, SPARQL has no import directive, but there are tools like rdf.sh, which can look at all the prefixes you use in a file and build the required definitions (they're usually based on prefix.cc).
Moreover, I've seen endpoints (eg, Fuseki) that have predefined prefix definitions and complete your SPARQL automatically, but relying on them makes queries less portable.

Related

Are there any configuration libraries that provide a native implementations for merging modifications found in split files (as seen in .d folders)

I've been having trouble sorting through the noise while trying to find an answer to this. There are certainly a lot of libraries available for handling configuration files but what I'm looking for is an answer to whether there is a solution available for this specific kind of split configuration.
On Linux systems I've found that it's not uncommon to find a program which has split its default configuration away from user modifications by instructing the user to place a subset of the default configuration into a separate folder (commonly found with a .d suffix). These changes override what is found in the default configuration and provide a very easy way to track at a later date what has been modified.
There is a wide variety of incompatible syntaxes used across different configuration files, and I am not aware of a library that can parse them all. But if you are willing to restrict yourself to just one configuration syntax, then two answers to your question come to mind.
The first is to use a scripting language as your configuration file syntax. Let's assume an application reads both default.cfg and override.cfg (in that order). If default.cfg contains name="john", then override.cfg might contain name="mary", which will have the effect of overriding the value of name. This occurs because the shell interpreter provides a common place to store the global variables assigned within the script files. The following pseudocode shows how your application could interact with a scripting language interpreter:
interp = new Interpreter();
interp.executeFile("default.cfg");
interp.executeFile("override.cfg");
name = interp.getValueOfVariable("name");
The second is to use Config4*, which is a Configuration-file parser I wrote; it is available in C++ and Java. I recommend you read Chapter 2 (Overivew of Config4* Syntax) in the Config4* Getting Started Guide. The "adaptive configuration" abilities of Config4* seem to fit the question I think you are asking. If, after reading Chapter 2, you agree, then read Chapter 3, which provides an overview of the C++ and Java API. That will probably be enough to get you started.

How to specify non-trivial metadata to drive a shake build system?

Given a non-homogeneously structured mixed-language code base, what is the recommended way to specify metadata to drive a Shake build system?
In particular, the metadata should describe source language (C++, C#, Fortran), source files, result type (static/dynamic lib, executable), compiler switches (potentially different for each artifact), etc.
Preferably, the metadata should be simply structured and stored in one separate file per artifact.
Is there a smart way to generalize the approach suggested in Defining your own build system with Shake?
The approach from the presentation scales quite far - I've used it for huge multilanguage projects. I've used three tweaks beyond the talk:
Add file extensions
Typically file extensions give you the source language and result type. For example:
mycsharp.dll = foo.cs bar.cs
myfortran.exe = main.f90 util.f90
docs.pdf = docs.tex references.bib
Now you can have entirely different rules to interpret Fortran executables, C# executables (or dlls) and PDF documents.
Add some 'leading' characters
Often you want data about flags, or other command-line relevant data. For example:
mycsharp.dll = foo.cs bar.cs -define:DEBUG -optimize +mono
I tend to use special leading characters. In the above example I've used - to denote flags (which are usually passed on verbatim), and + to denote a selection from an enumeration which contains useful special cases (e.g. use the mono compiler).
One word of caution, don't use too many weird leading special characters, or you end up inventing your own language - keep it simple.
Use a C pre processor (CPP)
The C pre processor gives you #include, #define and #ifdef, all of which can be used in more complex structured metadata. You can use this with Shake by invoking cpphs on the metadata file first.
While the two previous tweaks are recommended, the use of CPP was originally for #include. Now the built-in Shake metadata has an include mechanism I'm not sure I'd bother with CPP, which keeps things simpler.

What issues could arise from using class hierarchy to structure the different parts of a configuration setting?

Here is the context of my question. It is typical that one organizes configuration values into different files. In my case, my criteria is easy editing and portability from one server to another. The package is for Internet payments and it is designed so that a single installation of the package can be used for different applications. Also, we expect that an application can have different stages (development, testing, staging and production) on different servers. I use different files for each of the following three categories: the config values that depend only on the application, those that depend only on the server and those that depend on both. In this way, I can easily move the configuration values that depend only on the application from one server to another, say from development to production. They are edited often. So, it is worth it. Similarly, I can edit the values that are specific to the server in a single file without having to maintain redundant copies for the different applications. The term "configuration value" includes anything that must be defined differently in different applications or servers, even functions. If the definition of a function depends on the application or on the server, then it is a part of the configuration process. The term "configuration value" appeared natural to me, even it includes functions.
Now, here is the question. I wanted the functions to be PHPUnit testable. I use PHP, but perhaps the question makes sense in other languages as well. I decided to store the configuration values as properties and methods in classes and used class hierarchy to organize the different categories. The base class is PaymentConfigServer (depend only on the server). The application dependent values are in PaymentConfigApp extends PaymentConfigServer and those that depend on both are in PaymentConfig extends PaymentConfigApp. The class PaymentConfigApp contains configuration values that depend either on the application or on the server, but the file itself contains values that depend on the application only. Similarly, PaymentConfig contains all conf values, but the file itself contains values that depend on both only. Can this use of class hierarchy lead to issues? I am not looking for discussions about the best approach. I just want to know, if you met a similar situation, what issues I should keep in mind, what conflicts could arise, etc?
Typically, subclasses are used to add or modify functionality rather than remove functionality. Thus, the single-inheritance approach you suggested suffers from a conceptual problem that is likely to result in confusion for anyone who has to maintain the application if/when you get hit by a bus: the base class provides support for server-specific configuration, but then you (pretend to) remove that functionality in the PaymentConfigApp subclass, and (pretend to) re-add the functionality in its PaymentConfig subclass.
I am not familiar with the PHP language, but if it supports multiple inheritance, then I think it would be more logical to have two base classes: PaymentConfigServer and PaymentConfigApp, and then have PaymentConfig inherit from both of those base classes.
Another approach might be to have just a single class in which the constructor takes an enum parameter (or a pair of boolean parameters) that is used to specify whether the class should deal with just server-specific configuration, just application-specific configuration, or both types of configuration.
By the way, the approach you are suggesting for maintaining configuration data is not one that I have used. If you are interested in reading about an alternative approach, then you can read an answer I gave to another question on StackOverflow.

Is there a good DSL for manipulating MySQL scripts independent of any particular web framework?

I have a simple MySQL script that I use in a web application to complete rebuild/reset my DB to a clean initial state. Thus, in this script I define the various tables, stored procs, etc. that I need.
This is fairly good initial solution b/c it's simple and does the job without being overkill. However there are some drawbacks. One example is typing. It would be nice to define stored procs with richer types so I don't need to repeat declarations like VARCHAR(64).
Thus, my question is: is there a good DSL for manipulating MySQL scripts? (e.g. it could ultimately generate valid MySQL scripts) that is effectively a nice DSL over MySQL, without trying to do too much and have too many bells and whistles. Would be nice if the language itself had decent support for DSL, but more importantly, it would be nice to find something that wasn't heavily wedded to a particular web framework.
Some cursory searches did not yield anything immediately obvious.
I guess one practical alternative is to just use your favorite ORM as a way of getting at a solution that's effectively nice. So part of the motivation of this question is to see if the DSL approach has been explored to any success.
I'm assuming you mean an Internal DSL (see http://martinfowler.com/bliki/DomainSpecificLanguage.html, and http://en.wikipedia.org/wiki/Domain-specific_language) because SQL is a DSL, i.e. an External DSL (by Martin Fowler's definition, which has gained fairly wide acceptance).
Given that assumption, and not knowing what language you prefer, I was able to find a few Internal DSL's for SQL code generation:
Ruby - sqldsl.rubyforge.org/
Java - code.google.com/p/sql-dsl/
Scala - github.com/p3t0r/scala-sql-dsl
if you google "SQL DSL" there are more, also try googling "SQL DSL [enter your favorite language here]" and you may find something more suitable.
Another approach which has a different set of advantages and disadvantages (than an internal DSL) would be generating the SQL code from a template. Either a template in the form of a string with variable escapes (or concatenation) or in a separate file using a template language.

Parsing language for both binary and character files

The problem:
You have some data and your program needs specified input. For example strings which are numbers. You are searching for a way to transform the original data in a format you need.
And the problem is: The source can be anything. It can be XML, property lists, binary which
contains the needed data deeply embedded in binary junk. And your output format may vary
also: It can be number strings, float, doubles....
You don't want to program. You want routines which gives you commands capable to transform the data in a form you wish. Surely it contains regular expressions, but it is very good designed and it offers capabilities which are sometimes much more easier and more powerful.
ADDITION:
Many users have this problem and hope that their programs can convert, read and write data which is given by other sources. If it can't, they are doomed or use programs like business
intelligence. That is NOT the problem.
I am talking of a tool for a developer who knows what is he doing, but who is also dissatisfied to write every time routines in a regular language. A professional data manipulation tool, something like a hex editor, regex, vi, grep, parser melted together
accessible by routines or a REPL.
If you have the spec of the data format, you can access and transform the data at once. No need to debug or plan meticulously how to program the transformation. I am searching for a solution because I don't believe the problem is new.
It allows:
joining/grouping/merging of results
inserting/deleting/finding/replacing
write macros which allows to execute a command chain repeatedly
meta-grouping (lists->tables->n-dimensional tables)
Example (No, I am not looking for a solution to this, it is just an example):
You want to read xml strings embedded in a binary file with variable length records. Your
tool reads the record length and deletes the junk surrounding your text. Now it splits open
the xml and extracts the strings. Being Indian number glyphs and containing decimal commas instead of decimal points, your tool transforms it into ASCII and replaces commas with points. Now the results must be stored into matrices of variable length....etc. etc.
I am searching for a good language / language-design and if possible, an implementation.
Which design do you like or even, if it does not fulfill the conditions, wouldn't you want to miss ?
EDIT: The question is if a solution for the problem exists and if yes, which implementations are available. You DO NOT implement your own sorting algorithm if Quicksort, Mergesort and Heapsort is available. You DO NOT invent your own text parsing
method if you have regular expressions. You DO NOT invent your own 3D language for graphics if OpenGL/Direct3D is available. There are existing solutions or at least papers describing the problem and giving suggestions. And there are people who may have worked and experienced such problems and who can give ideas and suggestions. The idea that this problem is totally new and I should work out and implement it myself without background
knowledge seems for me, I must admit, totally off the mark.
UPDATE:
Unfortunately I had less time than anticipated to delve in the subject because our development team is currently in a hot phase. But I have contacted the author of TextTransformer and he kindly answered my questions.
I have investigated TextTransformer (http://www.texttransformer.de) in the meantime and so far I can see it offers a complete and efficient solution if you are going to parse character data.
For anyone who will give it a try to implement a good parsing language, the smallest set of operators to directly transform any input data to any output data if (!) they were powerful enough seems to be:
Insert/Remove: Self-explaining
Group/Ungroup: Split the input data into a set of tokens and organize them into groups
and supergroups (datastructures, lists, tables etc.)
Transform
Substituition: Change the content of the tokens (special operation: replace)
Transposition: Change the order of tokens (swap,merge etc.)
Have you investigated TextTransformer?
I have no experience with this, but it sounds pretty good and the author makes quite competent posts in the comp.compilers newsgroup.
You still have to some programming work.
For a programmer, I would suggest:
Perl against a SQL backend.
For a non-programmer, what it sounds like you're looking for is some sort of business intelligence suite.
This suggestion may broaden the scope of your search too much... but here it is:
You could either reuse, as-is, or otherwise get "inspiration" from the [open source] code of the SnapLogic framework.
Edit (answering the comment on SnapLogic documentation etc.)
I agree, the SnapLogic documentation leaves a bit to be desired, in particular for people in your situation, i.e. when just trying to quickly get an overview of what SnapLogic can do, and if it would generally meet their needs, without investing much time and learn the system in earnest.
Also, I realize that the scope and typical uses of of SnapLogic differ, somewhat, from the requirements expressed in the question, and I should have taken the time to better articulate the possible connection.
So here goes...
A salient and powerful feature of SnapLogic is its ability to [virtually] codelessly create "pipelines" i.e. processes made from pre-built components;
Components addressing the most common needs of Data Integration tasks at-large are supplied with the SnapLogic framework. For example, there are components to
read in and/or write to files in CSV or XML or fixed length format
connect to various SQL backends (for either input, output or both)
transform/format [readily parsed] data fields
sort records
join records for lookup and general "denormalized" record building (akin to SQL joins but applicable to any input [of reasonnable size])
merge sources
Filter records within a source (to select and, at a later step, work on say only records with attribute "State" equal to "NY")
see this list of available components for more details
A relatively weak area of functionality of SnapLogic (for the described purpose of the OP) is with regards to parsing. Standard components will only read generic file formats (XML, RSS, CSV, Fixed Len, DBMSes...) therefore structured (or semi-structured?) files such as the one described in the question, with mixed binary and text and such are unlikely to ever be a standard component.
You'd therefore need to write your own parsing logic, in Python or Java, respecting the SnapLogic API of course so the module can later "play nice" with the other ones.
BTW, the task of parsing the files described could be done in one of two ways, with a "monolithic" reader component (i.e. one which takes in the whole file and produces an array of readily parsed records), or with a multi-component approach, whereby an input component reads in and parse the file at "record" level (or line level or block level whatever this may be), and other standard or custom SnapLogic components are used to create a pipeline which effectively expresses the logic of parsing a record (or block or...) into its individual fields/attributes.
The second approach is of course more modular and may be applicable if the goal is to process many different files format, whereby each new format requires piecing together components with no or little coding. Whatever the approach used for the input / parsing of the file(s), the SnapLogic framework remains available to create pipelines to then process the parsed input in various fashion.
My understanding of the question therefore prompted me to suggest SnapLogic as a possible framework for the problem at hand, because I understood the gap in feature concerning the "codeless" parsing of odd-formatted files, but also saw some commonality of features with regards to creating various processing pipelines.
I also edged my suggestion, with an expression like "inspire onself from", because of the possible feature gap, but also because of the relative lack of maturity of the SnapLogic offering and its apparent commercial/open-source ambivalence.
(Note: this statement is neither a critique of the technical maturity/value of the framework per-se, nor a critique of business-oriented use of open-source, but rather a warning that business/commercial pressures may shape the offering in various direction)
To summarize:
Depending on the specific details of the vision expressed in the question, SnapLogic may be worthy of consideration, provided one understands that "some-assembly-required" will apply, in particular in the area of file parsing, and that the specific shape and nature of the product may evolve (but then again it is open source so one can freeze it or bend it as needed).
A more generic remark is that SnapLogic is based on Python which is a very swell language for coding various connectors, convertion logic etc.
In reply to Paul Nathan you mentioned writing throwaway code as something rather unpleasant. I don't see why it should be so. After all, all of our code will be thrown away and replaced eventually, no matter how perfect we wrote it. So my opinion is that writing throwaway code is pretty much ok, if you don't spend too much time writing it.
So, it seems that there are two approaches to solving your solution: either a) find some specific tool intended for the purpose (parse data, perform some basic operations on it and storing it in some specific structure) or b) use some general purpose language with lots of libraries and code it yourself.
I don't think that approach a) is viable because sooner or later you'll bump into an obstacle not covered by the tool and you'll spend your time and nerves hacking the tool, or mailing the authors and waiting for them to implement what you need. I might as well be wrong, so please if you find a perfect tool, drop here a link (I myself am doing lots of data processing in my day job and I can't swear that I couldn't do it more efficiently).
Approach b) may at first seem "unpleasant", but given a nice high-level expressive language with bunch of useful libraries (regexps, XML manipulation, creating parsers...) it shouldn't be too hard, and may be gradually turned into a DSL for the very purpose. Beside Perl which was already mentioned, Python and Ruby sound like good candidates for these languages (I bet some Lisp derivatives too, but I have no experience there).
You might find AntlrWorks useful if you go so far as defining formal grammars for what you're parsing.