I am building a project in MS Access 2010. I have previous experience in Oracle. I am reading about MS Access and keep seeing references to table relationships. It looks like a convenient way to assist the average person in data entry and validation and for query building, but I write queries exclusively in SQL mode and enforce data entry for users with forms that have their own validation rules.
Is it really necessary to enforce relationships? It doesn't seem like it really gains me anything at an advanced level, and might actually cause problems for me or someone else who eventually takes over maintenance from me later. I've never used them before and I'm not really seeing a benefit to starting now. Can anyone shed some light on that?
You say you have previous experience of Oracle. Did you never define Foreign Key constraints in Oracle? If you did, then that is what you are doing when you define relationships in Access. You can use it for enforcing referential integrity (not allowing you to delete a parent record if child records still exist) or, if you use the cascade delete option, for automatically deleting child records if you delete a parent record. It's a useful backup to cover coding errors where you might have forgotten about possible child records that would otherwise be orphaned if you did not have the relationship (FK) defined.
From a person just querying data, then the relationships are not that important. However from an application point of view, they are VERY helpful if not outright important.
For example, you might have a customer’s table, and then say an orders table. The business rule is that you can’t create an order unless you first have a customer. So if you freely write some SQL to add an order without a customer, your update/insert query will NOT work. And if you need to delete a customer, then all orders for that customer can/will automatic delete for you without having to write a complex delete SQL statement. You might for example want to delete all customers older than 5 or 10 years (so they are inactive). When you delete those customers, then you want all orders also deleted. (This is a VERY difficult query to write if you have to delete the child records for each customer. with enforced relations, then all child records will automatic delete for you (enforced cascade delete)).
And it also important from a reporting point of view. If you write a query to display all customers this month and their billing totals then you get one total result. However if you decide that you do NOT want to display/include customers, you might hit just the orders table and get a total amount that way. The problem is without RI, then you might (by accident or even just some user launching the orders form) have entered order information (with a total amount) but NO customer.
Now what happens is when you run the two different reports/quires, you find the total is DIFFERENT! In a complex application as to “why” the two reports are different can take days, or with lots of data even a week to figure out why two reports on monthly sales do NOT agree with each other. If you enforce the business rule that no orders can be entered into the system UNLESS they have a customer, then you eliminate such errors in reporting. You can “say” that you are perfect user of SQL, but with lots of code, lots of forms for data entry, how can you EVER be sure that orders are NEVER entered without a customer. The user during data entry may forget to enter the customer in that order form. And even if you write code in that order form to ENSURE that customer must be selected, maybe YOU during the writing of some SQL by accident insert an order record into the system without a customer. However your monthly customer total report query “assumes” that you have a customer record that you THEN join in the order totals data.
However some reports must just run on the orders data (a monthly summary total does not need to include customers). The problem now is somewhere in the system you have an order record with total data that does not have a customer. The result is different reports and quires on sales total now don’t agree. This is an outright nightmare.
So some bug or error in the application code might occur and result in what is supposed to be relational data now having “orphaned” records. Perhaps your business rules allow entering of orders without a customer assigned, but then your monthly sales report will have to show that fact, or any query that hits the orders table and does not include customers will have to “check” for the possibility in those queries that no customer record yet exits.
The above is only a SIMPLE scratching of the surface of the GAZILLION issues that crop up. So while you might be just creating simple quires on the data, the problem is that data correctly related in the system? The old saying about garbage in = garbage out rings true here.
At the end of the day when you’re SQL quires pulls data with MULTIPLE tables, then you HAVE to make assumptions about that data and its relational integrity (RI). So when you write that query to display customers and their order totals, you ASSUME and drop in the customers table, and then relational join in the orders table. However if orders exist without a customer record, then your query not going to produce the correct values. And worse a report that hits the orders table will now produce different results.
If you enforce RI then no matter what, you cannot enter an order by accident or force without FIRST having created a customer record. If you don’t enforce such rules, then your data will produce incorrect results.
And a typical complex application will have 40 or 70 related tables. And EVERY ONE of those tables is going to have assumptions made as to if parent (or child) record are “assumed” to have been created correctly based on your set of business assumptions.
You might have a tour booking system. Customers might phone up, put down a deposit but NOT yet be booked to a particular tour. If you allow this setup, then your query on customers this month and their booked tour will have to take this into account. However maybe the business rules are that any customer in the system that puts money down MUST ALSO be booked to a tour (and thus you query to grab that information will take this rule into account).
If every query you always made never was to include data from more than one table, then you likely don’t benefit much from enforcing relational data. However the instant you start bundling queries with multiple tables, then you MUST know the assumptions being made about that data before you can write a query. So do you allow customers with a deposit in the system without a tour booking or not? This rule will decide how you must write that query. If RI is enforced, then you can query on a customer “booking” that and you KNOW that it will be attached to a tour event. And same goes if any booking + deposit does not need a booking – but you HAVE to know the assumption made about that data.
So based on assumptions made about the data is the ONLY practical way to create a query to pull data based on those assumptions. And if you enforce RI, then you at least know the data MUST be related and setup based on those assumptions.
At the end of the day? Anyone creating a data base that models a business application and rules without enforcing RI is building a ship without rudder and without a compass.
And exporting data from each table is a NON issue. However if that data is a mess and has orphaned records, then you only wind up exporting a incorrect data model to another database and all of the issues and problems remain.
If you are building queries purely in SQL mode, defining the relationships probably doesn't make any difference for you. The only thing that might be useful is that if you built something, then didn't look at it again for a few months, you would be able to quickly re-acquaint yourself with the relationships conceptually.
For anyone using the access query builder, defining the relationships allows you to quickly add tables to the query while Access automatically builds the proper (GIGO) relationships for the query JOIN. Again, if you are writing in SQL, you probably already do this, so not much help for you in query building.
Bottom Line - it's more of a graphical tool to streamline the query process, at least until you try exporting the tables to a "real" RDBMS, as someone else already mentioned.
If its not a requirement for you use case then you don't necessarily have use this. A use case where this could be "required" is in an Order Based Scenario.
Lets say you have a Database that creates and tracks Orders. Each Order can have multiple Lines that are tied to the same Order. But for Normalization purposes, most people would separate these into two separate tables. OrderHead and OrderDetail. You would want to enforce Referential Integrity here to ensure that there is never a child record in OrderDetail that doesn't link back to a Parent Order.
I'm sure that you could prevent things like that without it, but it mainly just enforces it.
Relationships helps in preserving data integrity and I agree with your point that if user is entering from access form, probability of errors due to integrity is lesser. But in future if user is moving from MS Access to pure RDBMS, this relationship will definitely will be helpful.
Though objective of relationship is not for migration at later point-in-time, for your case that is one valid reason I could think-of.
Other than that, for MS Access with its own forms relationship may not add specific values.
Related
We had an MS Access guru at our company who left for another position. Before she left she gave me a quick introduction on how to create queries from a sql server. I am really struggling with this and as I have no one to turn to at our company I was hoping you guys could help.
Hope you can help!
Thanks!
Well, keep in mind that when you build a query, it DOES NOT necessary mean that a enforced relationship exists here. (it might).
Further more, if you imported the tables, then again its doubtful that relations are defined in Access unless you use the relationships window to "enforce" such relationships.
However, when building a query? We will often join on two fields. When you build a query in the query builder, you are free to "make up" any kind of join you want.
Say I was given two different spreadsheets. One had some people, and another had a list of hotels.
Ok, so say we want to generate a list of all people in the same city as the hotels.
You might join between table "People" and say Hotels with city.
however, WHAT happens if there is more then one state with the same City name?
Well, then just join on City AND State!!!
So you get this:
So I not have some related tables here. I just feel like and want to, and need to join the two tables of data.
As such, we never cared or setup or "had" some relationship defined, but all we care about is creating and building a working query.
So, don't confuse the simple act of building some query with that of having setup a corrrect relatonships between tables.
For a working application? Yes, you most certainly will setup relatonships.
So, if you setup relatonships correctly, then you not be able to say add a customer "invoice" reocrd without FIRST having a customer record. You don't have to do this, but it is a very good idea for a working applicaton.
However, when dealing with imported data? You often may not have an pre-defined relationships.
Now, of course in "most" cases, a query that involves multiple tables will in near all cases "follow" what you defined as relationships in the relationships window but it not necessary a requirement at all.
As noted, when building a working application? Then yes, of course you want to setup the relatonships BEFORE you start adding data.
But for general data processing, and creating queries against say different tables of data you are slicing and dicing and working with?
You are free to cook up and draw lines between the tables in the query builder, and as such, often such quires will have zero to do with the relationships you defined, or in fact even when you don't have any relationships defined at all.
That above People and the list of hotels is a great example. I mean, it rather cool that I simple joined on both City and State, and did not have to write one line of data processing code for my desired results
(a list of people in cities that live in the same city as my hotel list).
So don't confuse what we call "referential integrity" and defined relationships. We define these relationships so it becomes impossible for you the developer to add a customer invoice without first having added the customer. And it also means that you, your code, or even a editing the tables directly will not allow this to occur.
However, when dealing with just reporting, or importing data to work on? Well, then often we will not have any relationships defined, but that sure does not stop us from firing up the query builder and drawing join lines between tables.
Between two given Tables you can have one relationship involving two (of more) fields or two (or more) relationships each involving one field. Both cases are possible and have different implications.
The first case, as the first commenter pointed out, is typically used when you have a compound key in the master Table of the relationship.
The second case is typically used when you have two candidate keys in the master table, each of which is used as a master field in each of the two independent relationships.
In Ms-access the case of two independent relationships may be identified because it implies two table-boxes for the same table in the relationships pane.
First post for me so please bear with me if I'm short on providing enough info.
I'm trying to put a query together that will be used as a subform on a project expense data entry form. As part of the query I want Access to pull the correct tax rates to calculate PST & GST correctly in calculated fields within the query. I have a query that consists of 5 tables:
tblProjectExpenseLineItems (PK: ExpenseLineItemID, FK: ProjectExpenseID)
tblProjectExpenses (PK: ProjectExpenseID, FK: ProjectNoID)
tblProjects (PK: ProjectNoID, FK: ClientID)
tblClientList (PK: Client ID)
tblTaxRate
Query design looks like this: qryProjectExpensesLineItemsExtended Query Design
The picture doesn't show it but the PSTTaxRate field is set to [tblTaxRate]![TaxRate] and the PST field is just the PST calculation on the expense line item.
I manually joined [tblClientList].[Province] to [tblTaxRate].[TaxJurisdiction]. These aren't related in the database relationships since neither fields are primary keys and I get the "indeterminate" relationship type. I have checked and confirmed that the values in these fields are in fact the same so results do show when I run the query.
The query fields are primarily from the tblProjectsExpensesLineItems table only since this is the table I want to update through the subform (I've tried adding the different PKs to see if that would change anything but no such luck). The only reason I have the other four tables is to get the [tblClientList].[Province] field so that I can pick up the location of the client and know which tax to charge. Where I live we've had our PST change a few times recently so I further filter the query using the [ExpenseDate] field to find the tax rate that fits between the [tblTaxRate].[StartDate] and [tblTaxRate].[EndDate] fields.
Things I've checked/read into:
I've checked that the table relationships have been set, are related to primary keys, and have "Enforce Referential Integrity" checked.
I've tried deleting tblTaxRate out of the query since it doesn't have an actual relationship. The query still isn't updateable and Access prompts me for the TaxRate, StartDate and EndDate fields when I run the query
I found a very detailed post here Why is my query not updateable? about reasons why queries aren't updateable. I'm pretty new to Access so I was able to rule out most of them, but some of them I don't quite understand (maybe something to do with the one-to-many and many-to-one relationships?)
Deleting all the tables except for the one I want to update. This of course makes the query updateable but Access prompts me for all the fields related to trying to find the tax rate.
I thought maybe an easy way out is to just manually enter tax rates but the database is being used for invoicing so I'm trying to eliminate potential for user input error.
I also thought this would be easier if I used form controls to do the heavy lifting but the tax calcs show up in many forms so I was hoping to keep the calcs at the query level so that I don't have to keep writing the calcs for every form and instead just reference the same query.
I'm at a total loss. I have the query responding properly but I can't do any data entry which is the sole purpose of the query! Any help is much appreciated!!
Scott
I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.
We presently use a pen/paper based roster to manage table games staff at the casino. Each row is an employee, each column is a 20 minute block of time and each cell represents what table the employee is assigned to, or alternatively they've been assigned to a break. The start and end time of shifts for employees vary as do the games/skills they can deal. We need to keep a copy of the rosters for 7 years, with paper this is fairly easy, I'm wanting to develop a digital application and am having difficulty how to store the data in a database for archiving.
I'm fairly new to working with databases, I think I understand how to model the data for a graph database like neo4j, but I had difficulty when it came to working with time. I've tried to learn about RDBMS databases like MySQL, below is how I think the data should be modelled. Please point out if I'm going in the wrong direction or if a different database type would be more appropriate, it would be greatly appreciated!
Basic Data
Here is some basic data to work with before we factor in scheduling/time.
Employee
- ID Number
- Name
- Skills (Blackjack, Baccarat, Roulette, etc)
Table
- ID Number
- Skill/Type (Can only be one skill)
It may be better to store the roster data as a file like JSON instead? Time sensitive data wouldn't be so much of a problem then. The benefit of going digital with a database would be queries, these could help assist time consuming tasks where human error is common.
Possible Queries
Note: Staff that are on shift are either on a break or on the floor (assigned to a table), Skills have a major or minor type based on difficulty to learn.
What staff have been on the floor for 80 minutes or more? (They are due for a break)
What open tables can I assign this employee to based on their skillset?
I need an employee that has Baccarat skill but is not already been assigned to a Baccarat table.
What employee(s) was on this table during this period of time?
Where was this employee at this point in time?
Who is on shift right now?
How many staff on shift can deal Blackjack?
How many staff have 3 major skills?
What staff have had the Baccarat skill for at least 3 months?
These queries could also be sorted by alphabetical order or time, skill etc.
I'm pretty sure I know how to perform these queries with cypher for neo4j provided I model the data right. I'm not as knowledgeable with SQL queries, I've read it can get a bit complicated depending on the query and structure.
----------------------------------------------------------------------------------------
MYSQL Specific
An employee table could contain properties such as their ID number and Name, but am I right that for their skills and shifts these would be separate tables that reference the employee by a unique integer(I think this is called a foreign key?).
Another table could store the gaming Tables, these would have their own ID and reference a skill/gametype with a foreign key.
To record data like the pen/paper roster, each day could have a table with columns starting from 0000 increasing by 20 in value going all the way to 2340? Prior to the time columns I could have one for staff where each employee is represented with their foreign key, the time columns would then have foreign keys to the assigned gaming Tables, the row data is bound to have many cells that aren't populated since the employee shift won't be 24/7. If I'm using foreign keys to reference gaming Tables I now have a problem when the employee is on break? Unless I treat say the first gaming Table entry as a break?
I may need to further complicate things though, management will over time try different gaming Table layouts, some of the gaming Tables can be converted from say Blackjack to Baccarat. this is bound to happen quite a bit over 7 years, would I want to be creating new gaming Table entries or add a column to use a foreign key and refer to a new table that stores the history of game types during periods of time? Employees will also learn to deal new games during their career, very rarely they may also have the skill removed.
----------------------------------------------------------------------------------------
Neo4j Specific
With this data would I have an Employee and a Table node that have "isA" relationship edges mapping to actual employees or tables?
I imagine with the skills for the two types I would be best with a Skill node and establish relationships like so?: Blackjack->isA->Skill, Employee->hasSkill->Blackjack, Table->typeIs->Blackjack?
TIME
I find difficulty when I want this database to now work with a timeline. I've come across the following suggestions for connecting nodes with time:
Unix Epoch seems to be a common recommendation?
Connecting nodes to a year/month/day graph?
Lucene timeline? (I don't know much about this or how to work with it, have seen some mention it)
And some cases with how time and data relate:
Staff have varied days and start/end times from week to week, this could be shift node with properties {shiftStart,shiftEnd,actualStart,actualEnd}, staff may arrive late or get sick during shift. Would this be the right way to link each shift to an employee? Employee(node)->Shifts(groupNode)->Shift(node)
Tables and Staff may have skill data modified, with archived data this could be an issue, I think the solution is to have time property on the relationship to the skill?
We open and close tables throughout the day, each table has open/close times for each day, this could change in a month depending on what management wants, in addition the times are not strict, for various reasons a manager may open or close tables during the shift. The open/closed status of a table node may only be relevant for queries during the shift, which confuses me as I'd want this for queries but for archiving with time it might not make sense?
It's with queries that I have trouble deciding when to use a node or add a property to a node. For an Employee they have a name and ID number, if I wanted to find an employee by their ID number would it be better to have that as a node of it's own? It would be more direct right, instead of going through all employees for that unique ID number.
I've also come across labels just recently, I can understand that those would be useful for typing employee and table nodes rather than grouping them under a node. With the shifts for an employee I think should continue to be grouped with a shifts node, If I were to do cypher queries for employees working shifts through a time period a label might be appropriate, however should it be applied to individual shift nodes or the shifts group node that links back to the employee? I might need to add a property to individual shift nodes or the relationship to the shifts group node? I'm not sure if there should be a shifts group node, I'm assuming that reducing the edges connecting to the employee node would be optimal for queries.
----------------------------------------------------------------------------------------
If there are any great resources I can learn about database development that'd be great, there is so much information and options out there it's difficult to know what to begin with. Thanks for your time :)
Thanks for spending the time to put a quality question together. Your requirements are great and your specifications of your system are very detailed. I was able to translate your specs into a graph data model for Neo4j. See below.
Above you'll see a fairly explanatory graph data model. In case you are unfamiliar with this, I suggest reading Graph Databases: http://graphdatabases.com/ -- This website you can get a free digital PDF copy of the book but in case you want to buy a hard copy you can find it on Amazon.
Let's break down the graph model in the image. At the top you'll see a time indexing structure that is (Year)->(Month)->(Day)->(Hour), which I have abbreviated as Y M D H. The ellipses indicate that the graph is continuing, but for the sake of space on the screen I've only showed a sub-graph.
This time index gives you a way to generate time series or ask certain questions on your data model that are time specific. Very useful.
The bottom portion of the image contains your enterprise data model for your casino. The nodes represent your business objects:
Game
Table
Employee
Skill
What's great about graph databases is that you can look at this image and semantically understand the language of your question by jumping from one node to another by their relationships.
Here is a Cypher query you can use to ask your questions about the data model. You can just tweak it slightly to match your questions.
MATCH (employee:Employee)-[:HAS_SKILL]->(skill:Skill),
(employee)<-[:DEALS]-(game:Game)-[:LOCATION]->(table:Table),
(game)-[:BEGINS]->(hour:H)<-[*]-(day:D)<-[*]-(month:M)<-[*]-(year:Y)
WHERE skill.type = "Blackjack" AND
day.day = 17 AND
month.month = 1 AND
year.year = 2014
RETURN employee, skill, game, table
The above query finds the sub-graph for all employees who have the skill Blackjack and their table and location on a specific date (1/17/14).
To do this in SQL would be very difficult. The next thing you need to think about is importing your data into a Neo4j database. If you're curious on how to do that please look at other questions here on SO and if you need more help, feel free to post another question or reach out to me on Twitter #kennybastani.
Cheers,
Kenny
We dont have any existing data warehouse, but we have customers (in OLTP) that have been with us many years and made purchases. How can I populate a customer dimension and then "replay" all the age updates that have occurred over the years, so that the type 2 dimension will have all the updates for those customers.
Since I want to populate the fact table with sales and refer to the DimCustomerFK. But when our clients query for data I want those customers to have the correct age. Since if I dont make any changes the customer will have the same age now and 10 years back when he placed the first order.
Any ideas how this can be made?
Interesting problem Patrik.
Some options:-
1) design SQL to parse through your customer / transaction OLTP data to create a daily flat file of customer updates. So you will end up with many thousand fairly small files (obviously depending on the number of customers you have and the date range). Name them Customeryyyymmdd.csv. Then create an ETL suite to read in the flat files in forward date order and apply the type 2 changes in order to the DWH.
2) build a very complex SQL query (I'm waving my hands around here as I dont know your data structures so couldnt suggest how complex this would be) that creates an ordered customer change list that you can pass through an ETL SCD component record by record.
Either seems logically feasible given what you have said earlier, but that may give you some ideas to consider that may give you a more concrete solution.
g/l
Mark.