I'm doing a PhD in political science and plan to collect datas on several organizations and do some social network analysis.
I'm new to database building and management, and I'd like to know what tool (LibreOffice Base, MySql, etc.) would be the best for me.
In this database, I would have a list of organizations (approximately 200) with various attributes and files attached, a list of individuals working in/with one or several of these organizations, a list of coalitions linking different organizations, a list of events where some organizations/individuals are, etc.
I'm starting from scrap, what do you think would be the best way to build this database ? I tried with LibreOffice Base and started to create different tables (organizations, individuals, etc.) but I'm not sure it will fit my needs.
Thanks in advance !
I think you should look at a graph database engine like Neo4J for this use case.
With a graph DB, one could store entities (such as organizations in your case) and relationships between them in the form of nodes and edges. By having your data stored this way, you can easily interrogate your data for insights that might not be easily done using standard relational data systems.
Free ebook describing the need for graph storage and the common problems solved by such a datastore
Say, that you wish to store information about organizations, individuals working for those organizations, their job roles etc. You can do this by creating Nodes and relationships like the following:
(Person:"Joe Bloggs")-[:WORKS_AT]->(Organization: Google)
(Person:"Joe Bloggs")-[:WORKS_AS]->(JobRole: "Software Developer")
The above describes a person called Joe Bloggs that works for Google as a software developer. You can use Ciper query language (official language used for querying Neo4j) to get all software developers working for google.
Related
Please excuse me if this isn't the traditional format of how questions are asked or if it is too broad.
I am currently looking for suggestions on how to design and build a customer database with some very simple fields. I work in the sales department for a company that retails after-market truck accessories both online and over the phone. I have some basic programming skills and a little technical ability, the goal of this project is to create a method of targeting sales leads we would like to follow up with.
We will be collecting information from the salesman individually, I imagined creating a web portal that allowed them to answer the following questions:
Customer Email?
How did they find us?
The website they called in on.
Year of the Vehicle?
Make of the Vehicle?
Was the initial investment < $750.00?
Date of purchase?
I would like to query and target customers based on those questions I have listed above.
Any suggestions or insight would be very much appreciated.
-Luke
You have to face a three-stage process:
First Step: Collect all the information you need to include in your database. You can collect this information from people or documents.
Second Phase: Develops a relational database model that allows you to manage the information according to the first 3 formalization rules. If the database is used by external users, you should designs queries and forms to facilitate their work.
Third Phase: Make your own database. If you do not have a software yet, look for one that supports the relational model. Have you already tried MS access?
I am looking at taking unstructured data in the form of files, processing it and storing it in a database for retrieval.
The data will be in natural language and the queries to get information will also be in natural language.
Ex: the data could be "Roses are red" and the query could be "What is the color of a rose?"
I have looked at several nlp systems, focusing more on open-source information extraction and relation extraction system and the following seems apt and easy for quick start:
https://www.npmjs.com/package/mitie
This can give data in the form of (word,type) pairs. It also gives a relation as result of running the the processing (check the site example).
I want to know if sql is good database to save this information. For retrieving the information, I will need to convert the natural language query also to some kind of (word, meaning) pairs
and for using sql I will have to write a layer that converts natural language to sql queries.
Please suggest if there are any open source database that work well in this situation. I'm open to suggestions for databases that work with other open-source information extraction and relation extraction systems if not MITIE.
SQL wont be an appropriate choice for your problem. You can use NLP or rules to extract relationships and then store that relationship in a Triple Store or a Graph database. There are many good open source Graph Databases like Neo4j and Apache Titan. You can query Google for Triple-stores, I suppose Apache Jena should be a good choice. After storing your data you can query your graphs using any of the Graph Query Languages like Gremlin or Cypher etc. (like SQL). Note that the heart of your system would be a Knowledge Graph.
You may also setup a Lucene/Solr based Search System on your unstructured data which may help you with answering your queries in conjunction with Graph Databases. All of these (NLP, IR, Graph DB/Triplestores etc.) would coexist to solve your problem.
It would be like an ensemble. No silver bullets :) However to start with look at Graph DB's or Triple-stores.
I have planned a SaaS Application for which we have selected Java for building the back end ,not yet selected which frameworks to utilize and I have opted to use either Ember JS with Bootstrap or Foundation with Bootstrap or Angular JS with Bootstrap for building the front end of the application.
I am confused with the persistence layer that either I should go for traditional Relational databases or should I go for 'Nosql Database' .
The idea is simple and straight ,offering Highly Configurable School Management System in SaaS model.The module at first I will be working on is Time and Attendance tracking of School/Universities/Colleges/Coaching Centers etc.As you can see that the policy of attendance (late,absent,present) varies from school to school ,university to university ,within university department to department ,within department teacher to teacher ,and coaching to coaching etc .And our application is not going to be deployed to their respective servers and will be hosted at cloud so one application running Accommodating dynamic policies running in isolation from the other.
My data is expected to grow with the period of time ,at fast pace since every school/coaching/institute will contain data of the following and following entities will be using the Application,Parent ,Student,Teachers,Principles,person who wants to take admission,Peons,etc
I have read answers to questions posted for the same kind of query and I found that people have used relational databases for this kind of application but they have built it 5-10 years back when there was no concept of Nosql databases ,All we knew was relational,object oriented databases so it would not be wrong to say that they opt the stack that was available at that time
I think that you should go for relational databases and I don't see any need of using Nosql databases ,the schema you will be having is static I am sure and you will be needed to maintain complex relationships as well.
Have a look at 'Multi tenant Architecture' ,and I would suggest you to use one database per client against one db per all clients.
Lets see what others would recommend you.
My problem is, I have a Dimensional Model DB NFL league. So we have Players, Teams, Leagues as the dimension tables and Match as the factual table relates these tables. For instance, if I need to query stats of a player in a particular match or a range of matches, it is very painstaking SQL query with lots of joins to convert machine readable ID based tables to human readable name based version. In addition, analysis of that data is also very painful. For being a solution, I suggest to transform that DB to Analysis friendly version. Again for example, Player table ll include players at each row with related stats and same for Teams as well.
The question is, is there any framework, method or schema that might guide me to design the analysis friendly DB layout. Also still the use of SQL is favorable or any non-sql DB is better for this problem?
I know it sounds very general question but I just want to hear some expertise about the topic. Therefore, any help, suggestion is very welcome.
I was in a team faced with a similar situation about 13 years ago. We used a tool called "PowerPlay", a Business Intelligence tool from Cognos. This tool was very friendly to the data analysts, with drill down capabilities, and all sorts of name based searching.
If I recall correctly (it's been a while), The BI tool stored the data in its own format (a data cube) but it had its own tool for automatically discovering the structure of an SQL based data source. That automatic tool was really struggling with the OLTP database, which was SQL (Oracle) and which was a real mess... a terrible relational design.
So what I ended up doing was building a star schema to collect and organize the same data, but more compatible with a multidimensional view of the data. I then built the ETL stuff to load the star from the OLTP database. The BI tool cut through the star schema like a hot knife through butter.
And the analysts didn't have to mess with ID fields at all.
It sounds like your starting place is like the star schema I had to build. So I would suggest that there are BI tools out there that you can lay on top of your star and that will provide precisely the kind of analyst friendly environment you are looking for. Cognos is only one of many vendors of BI tools.
A few caveats: If you go this way, you have to make an effort to make sure your name fields "make sense" if they are going to provide meaningful guidance to the analysts trying to drill down or search. Sometimes original data sources treat name fields as more or less meaningless stuff, where errors don't matter much. The same goes for column names. Column names that DBAs like are often gibberish to data analysts. You may also have to flatten any hierarchical groupings in your dimension tables, but you may have already done this. It depends on what your BI tool needs.
Hope this helps, even if it's a little generic.
I am doing (want to do) some experiments with Linked Open Datasets particularly those put out by governments.
I have a RDBMS (more specifically MySQL). I designed it with semantic web ideas in mind i.e. I have a information stored as objects, predicates and classes which define objects. In turn all objects are related to each other though statements of the form subject --> predicate --> object (where the subjects are from the objects table).
I want to be able to query other RDF triple stores from my application and let other triple stores query my data. Is it possible to "set something up" so that this is possible?
I have looked at Jena. Using Jena seems to mean I have to it as a storage application rather than MySQL - the only problem with this is that I include a new concept called a category (which I don't think is part of the semantic web languages). I will use categories to help with displaying information (they don't have any other meaning) but using Jena seems to mean that I can't organise predicates under categories for more convenient viewing.
I am using Java so a JAVA API is preferred.
It's also possible I misunderstood the purpose of Jena, and maybe that can be of use, but I am not sure how.
I am sure four days from now this question will seem rather silly, but at the moment I am somewhat confused about how to proceed.
I'm not sure what you mean by "a new concept called category", perhaps you can give an example?
If you mean that you want to add additional metadata, perhaps as a way of organizing information in the user interface, there is no need to extend the semantic web languages or storage systems - they can already do what you want.
Suppose you have data for a school from the UK Government schools dataset (using Turtle encoding for brevity):
#prefix sch-ont: <http://education.data.gov.uk/def/school/>.
<http://education.data.gov.uk/id/school/135412>
a sch-ont:School;
sch-ont:establishmentStatus
<http://education.data.gov.uk/def/school/EstablishmentStatus_Open>;
sch-ont:MSOA <http://statistics.data.gov.uk/id/msoa/E02000001>;
sch-ont:establishmentName "Guildhall School of Music and Drama";
...
You can directly query that data from the SPARQL end-point, or you can download the data and store it locally in your own triple store. Either way, you're perfectly at liberty to add extra information that's useful to your users. For example:
#prefix ankurs-app: <http://ankur.org/example/app/vocab/display#>.
<http://education.data.gov.uk/id/school/135412>
ankurs-app:category ankurs-app:wkdCool.
You can store this new triple in the same graph as the downloaded data, or you can store it in a separate named-graph to indicate that it's information that has a different provenance than the source data. Either way, it's then simple to query it either programmatically from Jena, or via a SPARQL query.
Doing a layout for efficiently querying schemaless triple-centric data is a well-studied, and hard, problem. Most of the RDF platforms, including Jena, have well-optimised code for querying and updating triples from their own database schemes. You would have to have very good reasons for embarking on your own relational table layout :)
If you really do need to take an existing relational table scheme and map it to a Jena RDF model, look at D2RQ.
Why didn't you just use a triple store to store all of your data? If you use a triple store with SPARQL endpoint capability then you would have a SPARQL-accessible web api. Similarly, many other data sets on the web are exposed as SPARQL endpoints and accessible via HTTP.
There are many triple stores available with persistent storage both in a db and otherwise (Jena + SDB, Mulgara, Virtuoso, Oracle, etc). You could certainly extend Mulgara through their resolvers to support queries against your custom db but I think that's probably a lot of work for not too much real value.
I'm sure you could use existing concepts to handle your notion of categories in RDF or perhaps by layering something over Jena.