Is there a way to do spatial analysis (NOT just graphics) in SAS? What I really want is the ability to geographic queries like one can do in PostGIS or SpatialLite in PROC SQL.
I asked this on the SAS-L list and got nothing.
Thanks!
I'm not sure whether your question is specific enough about what you're wanting to do, for someone to give you a good answer.
If you've got lat/long data, you could do detail and aggregate queries by choosing where clause criteria based on the lat/long values.
Incidentally, you might want to check out http://runsubmit.com for another stackoverflow style site with a more concentrated population of sas users.
I'm not familiar with PostGIS or SpatialLite, but SAS has some procedures dedicated to GIS specific tasks. This link says
SAS/GIS software enables you to do
more than simply view your data in its
spatial context. It also enables you
to interact with the data by selecting
features and performing actions that
are based on your selections.
I don't know if proc sql will be able to easily replicate those features, but once the data is in SAS data sets, I don't see why you couldn't at least do some basic querys.
SAS also has some examples data and code working with spatial data here.
The answer seems to be No, SAS doesn't support spatial datatypes and operators like spatialite or postgis.
(I am answering my own question to close the discussion, but thanks to all!)
Actually, when you consider that the SQL queries in Spatialite/PostGIS are just being translated into a specific method of calculation. It can actually be done.
So to answer your question, can it be done in SAS in an easy way like spatialite (i.e simple queries?), No
But you could write a function to do what you need using Base SAS, i find SAS to be one of the best languages to use for quick data analysis.
Related
I am looking at taking unstructured data in the form of files, processing it and storing it in a database for retrieval.
The data will be in natural language and the queries to get information will also be in natural language.
Ex: the data could be "Roses are red" and the query could be "What is the color of a rose?"
I have looked at several nlp systems, focusing more on open-source information extraction and relation extraction system and the following seems apt and easy for quick start:
https://www.npmjs.com/package/mitie
This can give data in the form of (word,type) pairs. It also gives a relation as result of running the the processing (check the site example).
I want to know if sql is good database to save this information. For retrieving the information, I will need to convert the natural language query also to some kind of (word, meaning) pairs
and for using sql I will have to write a layer that converts natural language to sql queries.
Please suggest if there are any open source database that work well in this situation. I'm open to suggestions for databases that work with other open-source information extraction and relation extraction systems if not MITIE.
SQL wont be an appropriate choice for your problem. You can use NLP or rules to extract relationships and then store that relationship in a Triple Store or a Graph database. There are many good open source Graph Databases like Neo4j and Apache Titan. You can query Google for Triple-stores, I suppose Apache Jena should be a good choice. After storing your data you can query your graphs using any of the Graph Query Languages like Gremlin or Cypher etc. (like SQL). Note that the heart of your system would be a Knowledge Graph.
You may also setup a Lucene/Solr based Search System on your unstructured data which may help you with answering your queries in conjunction with Graph Databases. All of these (NLP, IR, Graph DB/Triplestores etc.) would coexist to solve your problem.
It would be like an ensemble. No silver bullets :) However to start with look at Graph DB's or Triple-stores.
My problem is, I have a Dimensional Model DB NFL league. So we have Players, Teams, Leagues as the dimension tables and Match as the factual table relates these tables. For instance, if I need to query stats of a player in a particular match or a range of matches, it is very painstaking SQL query with lots of joins to convert machine readable ID based tables to human readable name based version. In addition, analysis of that data is also very painful. For being a solution, I suggest to transform that DB to Analysis friendly version. Again for example, Player table ll include players at each row with related stats and same for Teams as well.
The question is, is there any framework, method or schema that might guide me to design the analysis friendly DB layout. Also still the use of SQL is favorable or any non-sql DB is better for this problem?
I know it sounds very general question but I just want to hear some expertise about the topic. Therefore, any help, suggestion is very welcome.
I was in a team faced with a similar situation about 13 years ago. We used a tool called "PowerPlay", a Business Intelligence tool from Cognos. This tool was very friendly to the data analysts, with drill down capabilities, and all sorts of name based searching.
If I recall correctly (it's been a while), The BI tool stored the data in its own format (a data cube) but it had its own tool for automatically discovering the structure of an SQL based data source. That automatic tool was really struggling with the OLTP database, which was SQL (Oracle) and which was a real mess... a terrible relational design.
So what I ended up doing was building a star schema to collect and organize the same data, but more compatible with a multidimensional view of the data. I then built the ETL stuff to load the star from the OLTP database. The BI tool cut through the star schema like a hot knife through butter.
And the analysts didn't have to mess with ID fields at all.
It sounds like your starting place is like the star schema I had to build. So I would suggest that there are BI tools out there that you can lay on top of your star and that will provide precisely the kind of analyst friendly environment you are looking for. Cognos is only one of many vendors of BI tools.
A few caveats: If you go this way, you have to make an effort to make sure your name fields "make sense" if they are going to provide meaningful guidance to the analysts trying to drill down or search. Sometimes original data sources treat name fields as more or less meaningless stuff, where errors don't matter much. The same goes for column names. Column names that DBAs like are often gibberish to data analysts. You may also have to flatten any hierarchical groupings in your dimension tables, but you may have already done this. It depends on what your BI tool needs.
Hope this helps, even if it's a little generic.
I don't know completely what I want, but surely someone has had the same need, and has solved it in a far better manner than I could:
I'm looking for some mechanism to extract the data definition of a mySQL table from the database and allow it to be queried for the list of columns and their definitions, as part of a routine to dynamically construct DML? It would also be good to have the table parameters (e.g. ENGINE, INDEX, etc.) available, too.
Our databases aren't particularly advanced, and I certainly don't have an encyclopedic knowledge of SQL DDL, so what I came up with probably wouldn't be of much use to anyone else. Is there something already out there in Perl - preferably object-oriented - to do this, at least for mySQL?
Yes, there's a Perl package SQL::Translator, part of a toolset called SQLFairy. It parses SQL DDL from an SQL script or from a live database instance. It supports several RDBMS, including MySQL.
Then it offers tools to do schema conversions, schema diffs, and a bunch of other cool stuff.
http://metacpan.org/pod/SQL::Translator
http://sqlfairy.sourceforge.net/
I found the docs are better than most Perl projects, but still I had to read the code to understand how to use it in the way I wanted to.
The DBI interface has a set of "Catalog Methods": http://metacpan.org/pod/DBI#Catalog-Methods.
There is a similar StackOverflow question you can look at: How do I get schemas from Perl's DBI?
Does anyone know of a quick and easy test to see if a query is properly formatted for both MySQL & MSSQL. Perhaps other database types as well, such as SQL Server? I only have access to MySQL at this time.
Info: I'm working on an Open Source project called JJWDesign Google Maps for SugarCRM. Some of the queries use the SugarCRM classes; others I have to write custom. For example, some are special distance calculations against the geocode information stored in the tables.
http://www.sugarforge.org/projects/jjwgooglemaps/
More importantly, while there is an accepted syntax, each flavour of database has it's own specific functions, features and things you can do.
The best you can do is to make do with the most basic of features. Oracle has different functions for datetime compared to mysql compared to db2. While I would love to assist in a 'free as in beer' project, you really will need to check each function to see if it is the same across all major vendors. General functions most often are, so abs() will be fairly consistent, but others simply won't.
You're talking about a SQL parser so by definition it either isn't going to be quick and easy or it will do only the simplest checking.
Each RDBMS has its own flavour of SQL too so you'd really be limited to testing whether it was ANSI SQL.
This is kind of implementation question maybe. I wonder if I where to make a tool to convert some relational database to some other kind of database. What would the approach be?
If I for example want to convert data and the structure from a mysql database to mssql. Would I need to use regular expression to parse the SQL-file? Or maybe I could convert it to XML or JSON first and from that structure parse into my targeted database?
Using existing tools for converting mysql to mssql or anything similar is not in this scope. Since I want to know how it is actually done.
Well it's kind of a broad question, but generally speaking, having your own abstract representation of the structure and data would be a good thing, because you could extend your system "easily" by writing importers and exporters, and actually decouple your code a little by abstracting the relational db concepts into your own format.
The importers would "reverse engineer" a given database, by converting it to your own representation (as you say, xml/json or even your own query language -that would be better I guess-). Then the exporters would just convert from your format to the requested SQL dialect. No regular expressions, no other stuff "hardcoded".
This will allow you to extend your system and support a bigger number of sources and targets, and also handle errors like some SQL features from a "source" not supported in the selected "target".
My 2 cents, hope it helps!