Why An Address Data Is An Entity? - relational-database

I always think that an address data is a value object since it is immutable and its equality is defined by the same data in all fields. For example, a billing address in a part of a payment and a shipping address is a part of an order or a fulfillment. When someone changes her/his address, a new address data is needed. But, every single sample code/application, I have run into, has an address data as an entity, which its DB table has its own ID. It would make a sense if a system wants to keep track of all addresses where all business activities/events occur. I, however, don't see such intention in those sample code/application. Do I miss something in the regard?

You can't generalize.
Examples are one thing, real world problems are another. You can't say that for all projects one solution fits it all.
I'll give you an example I had in a project conserning aggregate roots.
Logically and legally a subsidiary is an extension of its company, eg. Walmart has its HQ with tax number and everything and subsidiaries without tax number where the actual stuff is sold. Logically, for applying to a goverment funding or something similar, the HQ sends a request for its subsidiary. Here, Walmart HQ is an aggregate root and its subsidiary is a part of an aggregate in funding procedures.
This is a logical example.
What I had is that a subsidiary can legally apply for state funding without the knowledge of HQ! Therefor, HQ is not an aggregate root anymore, but a subsidiary is. It was extremely illogical, but those were the business requirements.
The point is the same with your value object question. Although you can use Address as an example that it is an entity or a value object, it is the requirements of the business that dictate what an address is, and not what is logical.

Pre-note: there are domains where an address should be an entity, like a mail service; we do not talk about those domains
From my experience, people tend to implement an address as an entity because of the persistence: it is easier to persist an address as a sub-entity to a relational database than to persist a value object because of the entities ID that act as primary keys in the storage table.
However, there are tactics that permit storing a value object as an database entity but still using it just as a value object, as it should be. Vaughn Vernon shows how to do this in his book, Chapter 6, sub-chapter Persisting Value Objects.

Related

Database design & normalization

I'm creating a messaging system for a e-learning platform and there are some design concerns that I'd like some feedback on.
First of all, it is important for me and my system to be highly modifiable in the future. As such, maintaining a fairly high normalization across my tables is important.
On to how my system will work:
All members (students or teachers) are part of a virtual classroom.
Teachers can create tasks and exercises in these classrooms and assign them to one or multiple students (member_task table not illustrated).
A student can request help for a specific task or exercise by sending a message to the teachers of the classroom.
Messages sent by students are sent to all the teachers. They cannot address a message to a specific teacher.
Messages sent by teachers can be addressed to one or more students.
Students cannot send messages to other students.
Messages behave like chat, meaning that a private conversation starts between a student and all teachers when they send a message.
Here's the ER diagram I made:
So my question is, is this table normalized properly for my purpose? Is there anything that can be done to reduce redundancy of data across my tables? And out of curiosity, is it in BCNF?
Another question: I don't intend to ever implement delete features anywhere in my system. Only "archiving" where said classroom/task/member/message/whatever is simply hidden/deactivated. So is there any reason to actually use FK?
EDIT: Also, a friend brought to my attention that the Conversations table might be redundant, and it kinda feels so. Thoughts?
Thanks.
In response to your emphasis on "modifiability" which I'm taking to mean with respect to application and schema evolution I'm actually going to suggest a fairly extreme solution. Before that some notes some aspects you've mentioned. First, foreign keys represent meaningful constraints in your data. They should always be defined and enforced. Foreign keys are not there just for cascading delete. Second, the Conversations table is arguably redundant. It would make sense if you had a notion of "session" of chat which would correspond to a Conversation. Otherwise, you just have a bunch of messages throughout time. The Conversation table could also enable a many-to-many relation between messages and tasks/exercises if you wanted to have chats that simultaneously covered multiple exercises, for example.
Now for the extreme suggestion. You could use 6NF. In particular, you might look at its incarnation in anchor modeling. The most notable difference in this approach is each attribute is modeled as a different table. 6NF supports temporal databases (supported in anchor modeling via "historized" attributes/ties). This means handling situations like a student being associated to a task now but not later won't cause all their messages to disappear. Most relevant to you, all schema modifications are non-destructive and additive, so no old code breaks when you make a change.
There are downsides. First, it's a bit weird, and in particular anchor modeling (somewhat gratuitously?) introduces a bunch of new terms. Second, it produces weird queries for most relational databases which they may not optimize well. This can sometimes be resolved with materialized views. Third, at the physical level, every attribute is effectively nullable. Finally, the tooling and support, while present, is pretty young. In particular, for MySQL, you may only be "inspired by" what's provided on the anchor modeling site.
As far as the actual database model would go, it would look roughly similar. Anchor modeling uses the term "anchor" for roughly the same thing as an entity, and "tie" for roughly the same thing as a relation. For simplicity, dropping the Conversation relation (and thus directly connecting Message to Task), the image would be similar: you'd have an anchor for Classroom, Member, Message, and Task, and a tie replacing Recipient that you might called ReceivedMessage representing the relation of "member received message message". The attributes on your entities would be attribute nodes. Making the message attribute on the Message anchor historized would allow messages to be edited if desired and support a history of revisions.
One concern I have is that I don't see a Users table which will hold all the students and teachers info (login, email, system id, role, etc) but I assume there is something similar in our system?
Now, looking into the Members table: usually students change classes every semester or so and you don't want last semesters' students to receive new messages. I would suggest the following:
Members
=============
PK member_id
FK class_id
FK user_id
--------------
join_date
leave_date
active
role
The last two fields might be redundant:
active: is an alternative solution if you want to avoid using dates. This will become false when a user stops being member of this class. Since there is not delete feature, the Members entry has to be preserved for archive purposes (and historical log).
role: Depends on how you setup Users table and roles in your system. If a user entry has role field(s) then this is not needed. However, this field allows for the same user to assume different roles in different classes. Example: a 3rd year student, who was a member of this class 2 years ago, is now working as TA/LA (teaching/lab assistant) for the same class. This depends on how the institution works... in my BSc we had the "rule": anyone with grade > 8.5/10 in Java could volunteer to do workshops to other students (using uni's labs). Finally, this field if used as a mask or a constant, allows for roles to be extended (future-proof)
As for FKs I will always suggest using them for data consistency. Things can get really ugly really fast without FKs. The limitations they impose can be worked around and they are usually needed: What is the purpose of archiving a message with sender_id if the sender has been deleted by accident? Also, note that in most systems FKs are indexed which improves the performance of queries/joins.
Hope the above helps and not confuse things :)

Aggregates that require sharing of an entity

Consider my scenario of a model consisting of two aggregate roots, Customer and Order as well as a "shared" entity Address.
Also note that Address is abstract has the following subclasses: PhysicalAddress, PostOfficeBoxAddress and PrivateBagAddress.
A Customer can have many addresses organized into some sort of address book.
Upon making an order a customer would select an Address from their address book to be used as a delivery address.
My initial thoughts were to share an address between the two entities, but I have since opted out as it will cause trouble with managing the respective invariants.
Another option I could go for is to create two hierarchies of Address, each for their purposes as a customer address or delivery address. This again doesn't seem right as there is a lot of repeated code.
How would I model this situation properly?
An entity is something that should be able to exist by ifself such as a customer or order. However an Address is not an entity, an Address is a value type and cannot exist on its own hence:
An Address can only exist as a value type that is aggregated within an entity
You can define your address hierarchy as a value type in your domain but many entities may use this.
We find that we come across these types of entities all the time such as Address, MoneyType etc.
Solution would be to create 1 Address hierarchy value type in your domain. Then any entity can have an Address as a property where applicable

Best strategy for storing order's addresses

I have a 'strategy' question.
Thing is, we have a table of customers' addresses and customer orders. Structure is something like (just an example, ignore filed types etc.):
Address
id INT
line1 TEXT
line2 TEXT
state TEXT
zip TEXT
countryid INT
To preserve historical validity of the data we are storing those addresses in a text field with orders (previously it was done by reference, but this is wrong because if address changes all old orders change delivery address too, which is wrong). E.g:
Orders
id INT
productid INT
quantity INT
delivery_address TEXT
delivery address is something akin to CONCAT_WS("\n",line1,line2,state,zip,country_name)
Everything is nice and dandy, however it seems that customers need an access to historical data and be able to export those in XML format and they want to have those lines split up properly again. Because sometimes there is no line2 or state or zip or whatever, how can we store this information in a way that we can then decipher the 'label' of each line?
Storing as JSON encoded array was suggested but is this a best way? I thought about storing it as XML... or maybe create those 6-10 extra columns and store address data with every order? Perhaps some of you guys have more experience in dealing with this kind of stuff and be able to point me in the right direction.
Thanks in advance!
Personally I would model the addresses as a single table, every update to the address would generate a new row, this would be marked as the current address.
I guess you could allow deletes if there are no related orders, however it would be simpiler to mark the old record as inactive.
This will allow you to preserve the relationship between orders & addresses,
and to easily query the historic data at a later date.
see the wikipedia entry for slowly changing dimensions
The best way IMHO is to add history to the address-table. This will cause extra elements to be added for its key (say address_id and {start_of_validity, end_of_validity}) The customer id than becomes a foreign key into the customer table. The orders table references only the address_id field (which is "stable" in time). New orders would reference the "current" row in address.
NB: I dont know json.
You should store those as 6-10 extra fields, just like you do in the current address. You see, that way you have every piece of information at hand, without having to parse anything.
Any other approach (concatenation, JSON, XML) will make you have to do parsing when you need to access the info.
when you say "previously it was done by reference, but this is wrong because if address changes all old orders change delivery address too, which is wrong", it was not that wrong ...
Funny, isn't it?
So, as proposed by others, adresses should (must?) be stored in an independant table. You'll then have different address types (invoicing, delivery), address status (active, inactive) and a de facto address history log ...
In order to be able to utilize the address data for future uses you will definitely want to retain as much metadata (meaning, fields such as Address, City, State, and ZIP). Losing this data by pulling it all into a single line looks simpler and may conserve a small amount of space but in the end is not the best method. In fact, breaking it apart is very difficult--much like separating out first and last names from a generic, one-size-fits-all "name" column. Having the data stored in complete entries, utilizing 6-10 new fields (as mentioned) is the best way to go.
Even better would be standardizing the addresses (at least the US addresses) when they are first entered. That would ensure that the address is real and deliverable and eliminate shipping issues in the future. My thoughts, always retain as much of the data as possible because storage is cheap and data is valuable.
In the interest of full disclosure, I am the founder SmartyStreets. We provide street address verification.

MySQL database model for signups with and without addresses

I've been thinking about this all evening (GMT) but I can't seem to figure out a good solution for this one. Here's the case...
I have to create a signup system which distinguishes 4 kinds of "users":
Individual sign ups (require address info)
Group sign ups (don't require address info)
Group contact (require address info)
Application users (don't require address info)
I really cannot come up with a decent way of modeling this into something that makes sense. I'd greatly appreciate it if you could share your ideas.
Thanks in advance!
Sounds like good case for single table inheritance
Requiring certain data is more a function of your application logic than your database. You can definitely define database columns that don't allow NULL values, but they can be set to "" (empty string) without any errors.
As far as how to structure your database, have two separate tables:
User
UserAddress
When you have a new signup that requires contact info, your application will create records in both tables. When a new signup doesn't require address info, your application will only create a record in the User table.
There are a couple considerations here: first, I like to look at User/Group as a case of a Composite pattern. It clearly meets the requirement: you often have to treat the aggregate and individual versions of the entity interchangeably (as you note). Implementing a composite in a database is not that hard. If you are using an ORM, it is pretty simple (inheritance).
On the other part of the question, you always have the ability to create data structures that are mostly empty. Generally, that's a bad idea. So you can say 'well, in the beginning, we don't have any information about the User so we will just leave all the other fields blank.' A better approach is to try and model the phases as if they were part of an FSM. One of the clearest ways to do this in this particular case is to distinguish between Users, Accounts and some other more domain-specific entity, e.g. Subscriber or Customer. Then, I can come and browse using User, sign up and make an Account, then later when you want address and other personal information, become a subscriber. This would also imply inheritance, and you have the added benefit of being able to have a true representation of the population at any time that doesn't require stupid shenanigans like 'SELECT COUNT(*) WHERE _ not null,' etc.
Here's a suggestion from my end after weighing pro's and con's on this model. As I think the ideal setup is to have all users be a user entity that belong to a group without differentiating groups from individuals (except of course flag a group contact person and creating a link with a groups table) we came up with the alternative to copy the group contact user details to the group members when they group is created.
This way all entities that actually are a person will get their own table.
Could this be a good idea? Awaiting your comments :)
I've decided to go with a construction where group members are separated from the user pool anyway. The group members eventually have no relation with a user since they don't require access to mutating their personal data, that's what a group contact person is for. Eventually I could add a possibility for groups to have multiple contact persons, even distinguishing persons that are or are not allowed to edit any member data.
That's my answer on this one.

Object relationships

Suppose you have two objects, Person and Address and they have a relationship, that is a Person has an address. Obviously the person object has a reference to the address object but does the address object really have to know about the Person object? What gives that it has to be a two way relationship?
Thanks!
In this case it looks like your Address object doesn't need to have a relation to the Person object.
As a rule, you can think whether an object "needs to know" the other in order to work. A Person needs to know its Address, while an Address doesn't need to know the Person it belongs to.
"need to know" here reflects the need to interact with the object in a method.
It all depends on the use of the objects. If you have a situation where you must take an address and show (or use) its person (or persons) it will be a two-way relationship. But if you never need to access a person given an address then you will not need a two way relationship.
Other example: if those objects are associated with database tables and those table have a relationship (say Person.id == Address.IdPerson) then it will be useful to have a two way relationship in the classes for inserts and updates.
The only type of situation I can think of that it would be beneficial is if you had an external reference to an Address and you wanted to know who lived there. Even then, I think a separate associative data structure that maps Address -> Person would be a better design.
Other than that, in the relationship you describe there is no reason it has to be a two-way relationship. There's no rule that says an Address needs a reference to a Person.
They way to think about it is does my object depend on a property to exist. So in this particular scenrio, Person depends on an Address therefore it needs to know about the address details. The Address on the other hand does not depend on the Person hence can exist without knowledge of the Person.
If you need to access the person via the Address I would suggest implementing a function in the Address object that you can call which will retrieve the Person(s) relating to that particular address.
James.
In my opinion the Address object doesn't know about the Person object at all.
If it does then it interduces thight coupling to the system.
If you have to find a Person by it's Address you create a Pepole contianer that has a find method that searches by Address.
An Address doesn't have to know of the existance of Person, I think this is not logic in a real world: an address is just a group of information (say street, town, etc), it can be used by a letter or by a person, but in each case it still remain what it is, without knowing its surroundings.
It should only care of information strictly related to him.
What gives that it has to be a two way
relationship?
The only thing I can come up with that would be a motivation for an Address to know about other objects would be if you are in dire need of caching. Let's say you have some form of application where searches are made all the time. But that's pretty much it.
The complexity of this problem depends on the business domain and what you mean by address. In the majority of systems this is simplified to be a person object who has a relationship to an address object representing where they stay and if two people stay at the same address they address will be duplicated.
If you want to have a more complex real world model then consider changing to terminology to locations (which may even be represented spatially).
Considerations
A person may be related to zero or more locations (consider summer and winter homes)
A location exists regardless of having people related to it
If a person moves home consider changing the relation ( used to live at ) rather than the location. This provides a great deal of flexibility it the future
Multiple people can be related to the same location. This makes it very simple to find everyone who currently (or previously) stayed at a location
Storing locations and relationships provides great flexibility but it comes with an overhead, so you need to decide if your design needs this level of flexibility.