I'm trying to figure out how Salesforce's metadata architecture works behind the scenes. There's a video they've released ( https://www.youtube.com/watch?v=jrKA3cJmoms ) where he goes through many of the important tables that drive it along (about 18m in).
I've figured out the structure for the basic representation / storage / retrieval of simple stuff, but where i'm hazy is how the relationship pivot table works. I'll be happy when:
a) I know exactly how the pivot table relates to things (RelationId column he mentions is not clear to me)
b) I can construct a query for it.
Screenshot from the video
I've not had any luck finding any resources describing it at this level in the detail I need, or managed to find any packages that emulate it that I can learn from.
Does anyone have any low-level experience with this part of Salesforce that could help?
EDIT: Thank you, David Reed for further details in your edit. So presumably you agree that things aren't exactly as explained?
In the 'value' column, the GUID of the related record is stored
This allows ease of fetching -to-one related records and, with a little bit of simple SQL switching, resolve a group of records in the reverse direction.
I believe Salesforce don't have many-to-many relationships, as opposed to using a 'junction', so the above is still relevant
I guess now though I wonder what the point of the pivot table is at all, as there's a very simple relationship going on here now. Unless the lack of index on the value columns dictates the need for one...
Or, could it be more likely/useful if:
The record's value column stores a GUID to the relationship record and not directly to the related record?
This relationship record holds all necessary information required to put together a decent query and ALSO includes the GUID of the related record?
Neither option clear up the ambiguity for me, unless I'm missing something.
You cannot see, query, or otherwise access the internal tables that underlie Salesforce's on-platform schema. When you build an application on the platform, you query relationships using SOQL relationship queries; there are no pivot tables involved in the work you can see and do on the platform.
While some presentations and documentation discuss at some level the underlying implementation, the precise details of the SQL tables, schemas, query optimizers, and so on are not public.
As a Salesforce developer or developer who interacts with Salesforce via the API, you do not need to worry about the underlying SQL implementation used on Salesforce's servers at almost any time. The main point at which that knowledge can become helpful is when you are working with massive data volumes (multiple millions of records). The most helpful documentation for that use case is Best Practices for Deployments with Large Data Volumes. The underlying schema is briefly discussed under Underlying Concepts. But bear in mind
As a customer, you also cannot optimize the SQL underlying many application operations because it is generated by the system, not written by each tenant.
The implementation details are also subject to change.
Metadata Tables and Data Tables
When an organisation declares an object’s field with a relationship type, Force.com maps the field to a Value field in MT_Data, and then uses this field to store the ObjID of a related object.
I believe the documentation you mentioned is using the identifier ObjId ambiguously, and here actually means what it refers to earlier in the document as GUID - the Salesforce Id. Another paragraph states
The MT_Name_Denorm table is a lean data table that stores the ObjID and Name of each record in MT_Data. When an application needs to provide a list of records involved in a parent/child relationship, Force.com uses the MT_Name_Denorm table to execute a relatively simple query that retrieves the Name of each referenced record for display in the app, say, as part of a hyperlink.
This also doesn't make sense unless ObjId is being used to mean what is called GUID in the visual depiction of the table above in the document - the Salesforce Id of the record.
Related
I need to store organisation ownership hierarchy in a laravel backend. Each node in the hierarchy can be one of a number of types, and each relationship needs to carry the amount of ownership (and potentially more meta data relating to the relationship between nodes). The structure can be arbitrarily deep, and it must be possible to attach a subtree an arbitrary number of times (see C1 below, which appears twice). Below is a sketch of kind of hierarchy I need....
I am using mySQL 8 so I have access to CTE for recursion. I have looked into the adjacency-list package (staudenmeir/laravel-adjacency-list) which uses CTE and looks good, but it uses self referencing tables. I think this means that I cannot store relationship data, and the I don't think I can get the repeated sub tree structure you see above.
I am currently exploring many to many relationships, with a custom pivot table to store the "relationship weighting". But I am unsure if this is a sensible approach and perhaps I'm missing some useful design pattern or this.
I am aware that this is a nebulous question, but while I'm trying to crack this myself using eloquent relationships, I thought I might get a discussion going about design pattens for this type of work.
I have three tables in my SQL Schema: Clients, with address and so on, orders with order details and files, which stores uploaded files. both the files table and the orders table contain foreign keys referencing the Client tables.
How would I do that in IndexedDB? IÄm new to this whole key-index-thinking and would just like to understand, how the same Thing would be done with indexedDB.
Now I know there is a shim.js file, but I'm trying to understand the concept itself.
Help and tips highly appreciated!
EDIT:
So I would really have to think about which queries I want to allow and then optimize my IndexedDB implementation for those queries, is that the main point here? Basically, I want to to store a customer once and then many orders for that customer and then be able to upload small files (preferably pdfs) for that customer, not even necessarily for each order (although if that's easy to implement, I may do it)... I see every customer as a separate entity, I wont have things like "give me all customers who ordered xy" - I only need to have each customer once and then store all the orders for the customer and all the files. I wanto be able to go: Search for customer with the name of XY - which then gives me a list of all orders and their dates and a list of the files uploaded for that customer (maybe associated to the order).
This question is a bit too broad to answer correctly. Nevertheless, the major concept to learn when transitioning from SQL to No-SQL (indexedDB) is the concept of object stores. Most SQL databases are relational and perform much of the work of optimizing queries for you. indexedDB does not. So the concepts of normalization and denormalization work a bit differently. The focal point is to explicitly plan your own queries. Unlike the design of an app/system that allows simple ad-hoc SQL queries that are designed at a later point in time, and possibly even easily added/changed at a later time, you really need to do a lot of the planning up front for indexedDB.
So it is not quite safe to say that the transition is simply a matter of creating three object stores to correspond to your three relational tables. For one, there is no concept of joining in indexedDB so you cannot join on foreign keys.
It is not clear from your question but your 3 tables are clients, orders, and files. I will go out on a limb here and make some guesses. I would bet you could use a single object store, clients. Then, for each client object, store the normal client properties, store an orders array property, and store a files array property. In the orders array, store order objects.
If your files are binary, this won't work, you will need to use blobs, and may even encounter issues with blob support in various browser indexedDB implementations (Chrome sort of supports it, it is unclear from version to version).
This assumes your typical query plan is that you need to do something like list the orders for a client, and that is the most frequently used type of query.
If you needed to do something across orders, independent of which client an order belongs to, this would not work so well and you would have to iterate over the entire store.
If the clients-orders relation is many to many, then this also would not work so well, because of the need to store the order info redundantly per client. However, one note here, is that this redundant storage is quite common in NoSQL-style databases like indexedDB. The goal is not to perfectly model the data, but to store the data in such a way that it your most frequently occurring queries complete quickly (while still maintaining correctness).
Edit:
Based on your edit, I would suggest a simple prototype that uses three object stores. In your client view page where you display client details, simply run three separate queries.
Get the one entity from the client object store based on client id.
Open a cursor over the orders and get all orders for the client. In the orders store, use a client-id property. Create an index on this client-id property. Open the cursor over the index for a specific client id.
Open a cursor over the files store using a similar tactic as #2.
In your bizlogic layer, enforce your data constraints. For example, when deleting a client, first delete all the files from the files store, then delete all the orders from the orders store, and then delete the single client entity from the client store.
What I am suggesting is to not overthink it. It is not that complicated. So far you have not described something that sounds like it will have performance issues so there is no need for something more elegant.
I will go with Josh answer but if you are still finding it hard to use indexeddb and want to continue using sql. You can use sqlweb - It will let you do operation inside indexeddb by using sql query.
e.g -
var connection = new JsStore.Instance('jsstore worker path');
connection.runSql("select * from Customers").then(function(result) {
console.log(result);
});
Here is the link - http://jsstore.net/tutorial/sqlweb/
Below, I explain a basic design for a database I am working on. As I am not a DB, I am concerned if I am on a good track or a bad one so I wanted to float this on stack for some advice. I was not able to find a similar discussion that fit's my design.
In my database, every table is considered an entity. An Entity could be a customer account, a person, a user, a set of employee information, contractor information, a truck, a plane, a product, a support ticket, etc etc. Here are my current entities (Tables)...
People
Users
Accounts
AccountUsers
Addresses
Employee Information
Contractor Information
And to store information about these Entities I have two tables:
Entity Tables
-EntityType
-> EntityTypeID (INT)
-Entities
-> EntityID (BIGINT)
-> EnitityType (INT) : foreign key
Every table I have made has an Auto Generated primary key, and a foreign key on an entityID column to the entities table.
In the entities table I have some shared fields like,
DateCreated
DateModified
User_Created
User_Modified
IsDeleted
CanUIDelete
I use triggers on all of the table's to automatically create their entity entry with the correct entity type on inserts. And update triggers update the LastModified date.
From an application layer point of view, all the code has to do is worry about the individual entities (except for the USER_Modified/User_Created fields "it does updates on that" by joining on the entityID)
Now the reason for the entities table, is down the line I plan on having an EAV model, so every entity type can be extended with custom fields. It also serves as a decent place to store metadata about the entities (like the created/modified fields).
I'm just new to DB design, and want a 2nd opinion.
I plan on having an EAV model, so every entity type can be extended with custom fields.
Why? Do all your entities require to be extensible in this way? Probably not -- in most applications there are one or two entities at most that would benefit from this level of flexibility. The other entities actually benefit from the stability and clarity of not changing all the time.
EAV is an example of the Inner-Platform Effect:
The Inner-Platform Effect is a result of designing a system to be so customizable that it ends becoming a poor replica of the platform it was designed with.
In other words, now it's your responsibility to write application code to do all the things that a proper RDBMS already provides, like constraints and data types. Even something as simple as making a column mandatory like NOT NULL doesn't work in EAV.
It's true sometimes a project requires a lot of tables. But you're fooling yourself if you think you have simplified the project by making just two tables. You will still have just as many distinct Entities as you would have had tables, but now it's up to you to keep them from turning into a pile of rubbish.
Before you invest too much time into EAV, read this story about a company that nearly ceased to function because someone tried to make their data repository arbitrarily flexible: Bad CaRMa.
I also wrote more about EAV in a blog post, EAV FAIL, and in a chapter of my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You haven't really given a design. If you had given a description of tables, the application-oriented criterion for when a row goes in of each them and consequent constraints including keys, fks etc for the part of your application involving your entities then you would have given part of a design. In other words, if you had given that part's straightforward relational design. (Just because you're not implementing it that way doesn't mean you don't need to design properly.) Notice that this must include application-level state and functionality for "extending with custom fields". But then you have to give a description of tables, the criterion for when a row goes in each of them and consequent constraints including keys, fks etc for the part of your implementation that encodes the previous part via EAV, plus operators for manipulating them. In other words, if you had given that part's straightforward relational design. The part of your design that is implementing a DBMS. Then you would really have given a design.
The notion that one needs to use EAV "so every entity type can be extended with custom fields" is mistaken. Just implement via calls that update metadata tables sometimes instead of just updating regular tables: DDL instead of DML.
I am working on a project which involves building a social network-style application allowing users to share inventory/product information within their network (for sourcing).
I am a decent programmer, but I am admittedly not an expert with databases; even more so when it comes to database design. Currently, user/company information is stored via a relational database scheme in MySQL which is working perfectly.
My problem is that while my relational scheme works brilliantly for user/company information, it is confusing me on how to implement inventory information. The issue is that each "inventory list" will definitely contain differing attributes specific to the product type, but identical to the attributes of each other product in the list. My first thought was to create a table for each "inventory list". However, I feel like this would be very messy and would complicate future attempts at KDD. I also (briefly) considered using a 'master inventory' and storing the information (e.g. the variable categories and data as a JSON string. But I figured JSON strings MySQL would just become a larger pain in the ass.
My question is essentially how would someone else solve this problem? Or, more generally, sticking with principles of relational database management, what is the "correct" way to associate unique, large data sets of similar type with a parent user? The thing is, I know I could easily jerry-build something that would work, but I am genuinely interested in what the consensus is on how to solve this problem.
Thanks!
I would check out this post: Entity Attribute Value Database vs. strict Relational Model Ecommerce
The way I've always seen this done is to make a base table for inventory that stores universally common fields. A product id, a product name, etc.
Then you have another table that has dynamic attributes. A very popular example of this is Wordpress. If you look at their data model, they use this idea heavily.
One of the good things about this approach is that it's flexible. One of the major negatives is that it's slow and can produce complex code.
I'll throw out an alternative of using a document database. In that case, each document can have a different schema/structure and you can still run queries against them.
I'm building a PHP/MySQL website and I'm currently working on my database design. I do have some database and MySQL experience, but I've never structured a database from scratch for a real world application which hopefully is going to get some good traffic, so I'd love to hear advices from people who've already done it, in order to avoid common mistakes. I hope my explanations are not too confusing.
What I need
In my application, the user should be able to write a post (title + text), then create an "object" (which can be anything, like a video, or a song, etc.) and attach it to the post. The site has a list of predefined object types the user can create, and I should be able to add new types in the future. The user should also have the ability to see the object's details in a dedicated page and add a comment to it - the same applies to posts.
What I tried
I created an objects table with these fields: oid, type, name and date. This table contains records for anything the user should be able to add comments to (i.e. posts and objects). Then I created a postmeta table which contains additional post data (such as text, author, last edit date, etc.), a videometa table for data about the "video" object (URL, description, etc.), and so on. A postobject table (pid,oid) links objects to posts. Additionally, there's a comments table which contains the comment text, the author and the ID of the object it refers to.
Since the list of object types is predefined and is probably not going to change (though I still need the ability to add a type easily at any time without changing the app's code structure or the database design), and it is relatively small, it's not a problem to create a "meta" table for each type and make a corresponding PHP class in my application to handle it.
Finally, a page on the site needs to show a list of all the posts including the objects attached to it, sorted by date. So I get all the records from the objects table with type "post" and join it with postmeta to get the post metadata. Then I query postobject to get all the objects attached to this post, and comments to get all the comments.
The questions
Does this make any sense? Is it any good to design a database in this way for a real world site? I need to join quite a few tables to get all the data I need, and the objects table is going to become huge since it contains almost every item (only the type, name and creation date, though) - this is to keep the database and the app code flexible, but does it work in the real world, or is it too expensive in the long term? Am I thinking about it in the wrong way with this kind of OOP approach?
More specifically: suppose I need to list all the posts, including their attached objects and metadata. I would need to join these tables, at least: posts, postmeta, postobject and {$objecttype}meta (not to mention an users table to get all posts by a specific user, for example). Would I get poor performance doing this, even if I'm using only numeric indexes?
Also, I considered using a NoSQL database (MongoDB) for this project (thanks to Stuart Ellis advice). Apparently it seems much more suitable since I need some flexibility here. But my doubt is: metadata for my objects includes a lot of references to other records in the database. So how would I avoid data duplication if I can't use JOIN? Should I use DBRef and the techniques described here? How do they compare to MySQL JOINs used in the structure described above in terms of performance?
I hope these questions do make any sense. This is my first project of this kind and I just want to avoid to make huge mistakes before I launch it and find out I need to rework the design completely.
I'm not a NoSQL person, but I wonder whether this particular case might actually be handled best with a document database (MongoDB or CouchDB). Various type of objects with metadata attached sounds like the kind of scenario that MongoDB is designed for.
FWIW, you've got a couple of issues with your table and field naming that might bite you later. For example, type and date are rather generic, and also reserved words. You've also mixed singular and plural table names, which will throw any automatic object mapping.
Whichever database you use, it's a good idea to find an existing set of database naming conventions and apply it from the start - this will help you avoid subtle issues and ensure that your naming stays consistent. I tend to use the Rails naming conventions ATM, because they are well-known and fairly sensible.
Or you could store the object contents as a file, outside of the database, if you're concerned about the database space.
If you store anything in the database, you already have the object type in objects; so you could just add object_contents table with a long binary field to store the object. You don't need to create a new table for each new type.
I've seen a lot of JOIN's in real world web application (5 to 10). Objects table may get large, but that's indices are for. So far, I don't see anything wrong in your database. BTW, what felt strange to me - one post, one object, and separate comments for each? No ability to mix pictures with text?