RDF4J SPARQL query to JSON - json

I am trying to move data from a SPARQL endpoint to a JSONObject. Using RDF4J.
RDF4J documentation does not address this directly (some info about using endpoints, less about converting to JSON, and nothing where these two cases meet up).
Sofar I have:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
Map<String, String> headers = new HashMap<String, String>();
headers.put("Accept", "SPARQL/JSON");
repo.setAdditionalHttpHeaders(headers);
try (RepositoryConnection conn = repo.getConnection())
{
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
GraphQuery query = conn.prepareGraphQuery(queryString);
debug("Mark 2");
try (GraphQueryResult result = query.evaluate())
this fails because "Server responded with an unsupported file format: application/sparql-results+json"
I figured a SPARQLGraphQuery should take the place of GraphQuery, but RepositoryConnection does not have a relevant prepare statement.
If I exchange
try (RepositoryConnection conn = repo.getConnection())
with
try (SPARQLConnection conn = (SPARQLConnection)repo.getConnection())
I run into the problem that SPARQLConnection does not generate a SPARQLGraphQuery. The closest I can get is:
SPARQLGraphQuery query = (SPARQLGraphQuery)conn.prepareQuery(QueryLanguage.SPARQL, queryString);
which gives a runtime error as these types cannot be cast to eachother.
I do not know how to proceed from here. Any help or advise much appreciated. Thank you

this fails because "Server responded with an unsupported file format: application/sparql-results+json"
In RDF4J, SPARQL SELECT queries are tuple queries, so named because each result is a set of bindings, which are tuples of the form (name, value). In contrast, CONSTRUCT (and DESCRIBE) queries are graph queries, so called because their result is a graph, that is, a collection of RDF statements.
Furthermore, setting additional headers for the response format as you have done here is not necessary (except in rare circumstances), the RDF4J client handles this for you automatically, based on the registered set of parsers.
So, in short, simplify your code as follows:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
try (RepositoryConnection conn = repo.getConnection()) {
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
TupleQuery query = conn.prepareTupleQuery(queryString);
debug("Mark 2");
try (TupleQueryResult result = query.evaluate()) {
...
}
}
If you want to write the result of the query in JSON format, you could use a TupleQueryResultHandler, for example the SPARQLResultsJSONWriter, as follows:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
try (RepositoryConnection conn = repo.getConnection()) {
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
TupleQuery query = conn.prepareTupleQuery(queryString);
query.evaluate(new SPARQLResultsJSONWriter(System.out));
}
This will write the result of the query (in this example to standard output) using the SPARQL Query Results JSON format. If you have a non-standard format in mind, you could of course also create your own TupleQueryResultHandler implementation.
For more details on the various ways in which you can process the result (including iterating, streaming, adding to a List, or just directly sending to a result handler), see the documentation on querying a repository. As an aside, the javadoc on the RDF4J APIs is pretty extensive too, so if your Java editing environment has support for displaying that, I'd advise you to make use of it.

Related

Best way to connect to MySQL and execute a query? (probably with Dapper)

I will preface with I simply could not get the Sql Type Provider to work - it threw a dozen different errors at points and seemed to be a version conflict. So I want to avoid that. I've been following mostly C# examples and can't always get the syntax right in F#.
I am targeting .NET6 (though can drop to 5 if it's going to be an issue).
I have modelled the data as a type as well.
I like the look of Dapper the best but I generally don't need a full ORM and would just like to run raw SQL queries so am open to other solutions.
I have a MySQL server running and a connection string.
I would like to
Initialize an SQL connection with my connection string.
Execute a query (preferably in raw SQL). If a select query, map it to my data type.
Be able to nearly execute more queries from elsewhere in the code without reinitializing a connection.
It's really just a package and a syntax example of those three things that I need. Thanks.
This is an example where I've used Dapper to query an MS SQL Express database. I have quite a lot of helper methods that I've made trough the years in order to make Dapper (and to a slight degree also SqlClient) easy and type safe in F#. Below you see just two of these helpers - queryMultipleAsSeq and queryMultipleToList.
I realize now that it's not that easy to get going with Dapper and F# unless these can be made available to others. I have created a repo on GitHub for this, which will be updated regularly with new helper functions and demos to show how they're used.
The address is https://github.com/BentTranberg/DemoDapperStuff
Ok, now this initial demo:
module DemoSql.Main
open System
open System.Data.SqlClient
open Dapper
open Dapper.Contrib
open Dapper.Contrib.Extensions
let queryMultipleAsSeq<'T> (conn: SqlConnection, sql: string, args: obj) : 'T seq =
conn.Query<'T> (sql, args)
let queryMultipleToList<'T> (conn: SqlConnection, sql: string, args: obj) : 'T list =
queryMultipleAsSeq (conn, sql, args)
|> Seq.toList
let connectionString = #"Server=.\SqlExpress;Database=MyDb;User Id=sa;Password=password"
let [<Literal>] tableUser = "User"
[<Table (tableUser); CLIMutable>]
type EntUser =
{
Id: int
UserName: string
Role: string
PasswordHash: string
}
let getUsers () =
use conn = new SqlConnection(connectionString)
(conn, "SELECT * FROM " + tableUser, null)
|> queryMultipleToList<EntUser>
[<EntryPoint>]
let main _ =
getUsers ()
|> List.iter (fun user -> printfn "Id=%d User=%s" user.Id user.UserName)
Console.ReadKey() |> ignore
0
The packages used for this demo:
<PackageReference Include="Dapper.Contrib" Version="2.0.78" />
<PackageReference Include="System.Data.SqlClient" Version="4.8.2" />
The Dapper.Contrib will drag along Dapper itself.

NamedList with Deep Pagination

QueryRequest req=new QueryRequest(solrQuery);
NoOpResponseParser responseParser = new NoOpResponseParser();
responseParser.setWriterType("csv");
searcherServer.setParser(responseParser);
NamedList<Object> resp=searcherServer.request(req);
QueryResponse res = searcherServer.query(solrQuery);
responseString = (String)resp.get("response");
I use the above code to get the output in CSV format. The data I am trying to fetch is huge (In billions). So I want to include deep pagination of SOLR and get chunks of CSV output. Is there a way to do? Also, with the current version of SOLR (I cannot upgrade) I have to use the above code to get CSV output.
I tried the below way to fetch the results.
searcherServer = new HttpSolrServer(url);
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(query);
solrQuery.set("fl","field1");
solrQuery.setParam("wt", "csv");
solrQuery.setStart(0);
solrQuery.setRows(1000);
solrQuery.setSort(SolrQuery.SortClause.asc("field2"));
In the output from the above code has wt as javabin. So I cannot get the CSV output.
Any suggestions?
You have two ways.
use Solr export request handler (or add it) and wt=csv parameter. Just to be clear, this is an Implicit Request Handler usually available even in older Solr versions and specifically designed to handle scenarios that involve exporting millions of records.
implement deep paging correctly. I suggest Yonic post paging and deep paging, it easier than you think. But after you'll have correctly implement, you also need to create the csv file by yourself.
The solution I found was:
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(query); //what you want to fetch
QueryResponse res = searcherServer.query(solrQuery);
int numFound = (int)res.getResults().getNumFound();
int rowsToBeFetched = (numFound > 1000 ? (int)(numFound/6) : numFound);
for(int i=0; i< numFound; i=i+rowsToBeFetched ){
solrQuery.set("fl","fieldToBeFetched");
solrQuery.setParam("wt", "csv");
solrQuery.setStart(i);
solrQuery.setRows(rowsToBeFetched);
QueryRequest req=new QueryRequest(solrQuery);
NoOpResponseParser responseParser = new NoOpResponseParser();
responseParser.setWriterType("csv");
searcherServer.setParser(responseParser);
NamedList<Object> resp=searcherServer.request(req);
responseString = (String)resp.get("response"); //This is in CSV format
}
Pros:
Since I don't get the result at once, it was faster.
The output was csv.
Hitting solr multiple items isn't costly.
Cons:
The result is not unique, meaning there can be repeated data based on what you are fetching.
To get unique results, you can use facets.
Thanks!

Springdata jpa and Native queries

I am doing a spring application and am kind stack. Iam running a query as shown below
#Autowired
EntityManagerFactory entityManagerFactory;
public List countTransactionsGroupByProvider(){
EntityManager em = entityManagerFactory.createEntityManager();
String query = "SELECT t.order_name,count(t.order_name) as number_of_transactions from transactions_view t where "
+ "t.transaction_date between '2014-07-24' and '2014-10-27' group by t.order_name";
List result = em.createNativeQuery(query).getResultList();
return result;
}
Now,This is working fine.it returns the data below:
[["Airtel",148], ["Expresso",8], ["Glo",49],
["MTN",110],["Select network",1],["Surfline",88],
["Tigo",35],["Vodafone",136],["Vouchers",30]]
My problem is I want this to return in the below format:
[{"order_name":"Airtel","number_of_transactions":148},
{"order_name":"Expresso","number_of_transactions":8},
{"order_name":"MTN","number_of_transactions":110},etc]
Then I can feed this into morris.js to plot a graph.
Any suggestion as to how to go about this.Thank much
You should probably just write some supporting code to transform the data into the format you want. Not sure you're going to get much traction trying to get JPA to produce the data in the format you want although it's arguably not out of the question.

Play + Slick: How to do partial model updates?

I am using Play 2.2.x with Slick 2.0 (with MYSQL backend) to write a REST API. I have a User model with bunch of fields like age, name, gender etc. I want to create a route PATCH /users/:id which takes in partial user object (i.e. a subset of the fields of a full user model) in the body and updates the user's info. I am confused how I can achieve this:
How do I use PATCH verb in Play 2.2.x?
What is a generic way to parse the partial user object into an update query to execute in Slick 2.0?I am expecting to execute a single SQL statement e.g. update users set age=?, dob=? where id=?
Disclaimer: I haven't used Slick, so am just going by their documentation about Plain SQL Queries for this.
To answer your first question:
PATCH is just-another HTTP verb in your routes file, so for your example:
PATCH /users/:id controllers.UserController.patchById(id)
Your UserController could then be something like this:
val possibleUserFields = Seq("firstName", "middleName", "lastName", "age")
def patchById(id:String) = Action(parse.json) { request =>
def addClause(fieldName:String) = {
(request.body \ fieldName).asOpt[String].map { fieldValue =>
s"$fieldName=$fieldValue"
}
}
val clauses = possibleUserFields.flatMap ( addClause )
val updateStatement = "update users set " + clauses.mkString(",") + s" where id = $id"
// TODO: Actually make the Slick call, possibly using the 'sqlu' interpolator (see docs)
Ok(s"$updateStatement")
}
What this does:
Defines the list of JSON field names that might be present in the PATCH JSON
Defines an Action that will parse the incoming body as JSON
Iterates over all of the possible field names, testing whether they exist in the incoming JSON
If so, adds a clause of the form fieldname=<newValue> to a list
Builds an SQL update statement, comma-separating each of these clauses as required
I don't know if this is generic enough for you, there's probably a way to get the field names (i.e. the Slick column names) out of Slick, but like I said, I'm not even a Slick user, let alone an expert :-)

Excluding Content From SQL Bulk Insert

I want to import my IIS logs into SQL for reporting using Bulk Insert, but the comment lines - the ones that start with a # - cause a problem becasue those lines do not have the same number f fields as the data lines.
If I manually deleted the comments, I can perform a bulk insert.
Is there a way to perform a bulk insert while excluding lines based on a match such as : any line that beings with a "#".
Thanks.
The approach I generally use with BULK INSERT and irregular data is to push the incoming data into a temporary staging table with a single VARCHAR(MAX) column.
Once it's in there, I can use more flexible decision-making tools like SQL queries and string functions to decide which rows I want to select out of the staging table and bring into my main tables. This is also helpful because BULK INSERT can be maddeningly cryptic about the why and how of why it fails on a specific file.
The only other option I can think of is using pre-upload scripting to trim comments and other lines that don't fit your tabular criteria before you do your bulk insert.
I recommend using logparser.exe instead. LogParser has some pretty neat capabilities on its own, but it can also be used to format the IIS log to be properly imported by SQL Server.
Microsoft has a tool called "PrepWebLog" http://support.microsoft.com/kb/296093 - which strips-out these hash/pound characters, however I'm running it now (using a PowerShell script for multiple files) and am finding its performance intolerably slow.
I think it'd be faster if I wrote a C# program (or maybe even a macro).
Update: PrepWebLog just crashed on me. I'd avoid it.
Update #2, I looked at PowerShell's Get-Content and Set-Content commands but didn't like the syntax and possible performance. So I wrote this little C# console app:
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
Regex hashString = new Regex("^#.+\r\n", RegexOptions.Multiline | RegexOptions.Compiled);
foreach (string file in Directory.GetFiles(path, "*.log"))
{
string data;
using (StreamReader sr = new StreamReader(file))
{
data = sr.ReadToEnd();
}
string output = hashString.Replace(data, string.Empty);
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
sw.Write(output);
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
It's pretty quick.
Following up on what PeterX wrote, I modified the application to handle large log files since anything sufficiently large would create an out-of-memory exception. Also, since we're only interested in whether or not the first character of a line starts with a hash, we can just use StartsWith() method on the read operation.
class Program
{
static void Main(string[] args)
{
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
string line;
foreach (string file in Directory.GetFiles(path, "*.log"))
{
using (StreamReader sr = new StreamReader(file))
{
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
while ((line = sr.ReadLine()) != null)
{
if(!line.StartsWith("#"))
{
sw.WriteLine(line);
}
}
}
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
}
}