How to allow flexible html form fields yet easy to run sql reports? - html

I am building a website that allows applicants to submit application forms. The fields for the application form need to be flexible to allow changes.
The traditional method is to map every single form field to a database column but this is pretty limited in terms of growth as new fields are introduced as system evolves. As new columns are introduced, existing database rows would have null values or some kind of "default" value due to "missing data".
However if I make the fields a key/value driven approach it will be very hard to do reporting later on.
So I am looking for some suggestions/recommendations if someone has done similar implementations. Thanks.
Example 1 (field -> column):
app form may have the following
fields:
first name
last name
and the related database table shall
look as below:
first_name nvarchar(255)
last_name nvarchar(255)
Example 2 (key/value pairs):
first_name (key column), john (value column), textbox (type)
last_name (key column), smith (value column), textbox (type)
I found some examples like polldaddy.com wufoo.com which allow dynamic web/html form generation but I think in my case they are kind of useless due to reporting requirements. And I think their implementation would be similar to my "example 2".
Updated:
I found this project (mvc dynamic forms) and I believe the concepts are similar to what I need to achieve. I will take a deep look at the project.

For running of the application (see OLTP) you'll want to use the key/value pair approach you mentioned, it's the only sensible way to achieve what you need in terms of flexibility (and have a system which is maintainable).
A good approach to get around the reporting problem is to have separate database schemas for the transactional (OLTP) and reporting (OLAP) bits. Differnet schema doesn't mean different physcial database - although it might make sense to separate them at some point.
You'd then have some sort of ETL process that migrated data between the two (from the OLTP source to the destination OLAP tables).
If you keep the OLTP, OLAP and ETL logic all in the same place it will be easier to manage and preserve nice clean separation. Alternatively you could build the ETL logic into your application - it really just depends how you've architected the rest of the solution (have you abstracted out the data access completely or not) and what your drivers are (is this an in-house tool, cloud-based or will it be a system people deploy onto their own kit.
The beauty of the separate OLTP / OLAP set-up is that both are geared towards doing their respective jobs well - without impacting on the other.

Related

How do Salesforce query using the relationships table behind the scenes?

I'm trying to figure out how Salesforce's metadata architecture works behind the scenes. There's a video they've released ( https://www.youtube.com/watch?v=jrKA3cJmoms ) where he goes through many of the important tables that drive it along (about 18m in).
I've figured out the structure for the basic representation / storage / retrieval of simple stuff, but where i'm hazy is how the relationship pivot table works. I'll be happy when:
a) I know exactly how the pivot table relates to things (RelationId column he mentions is not clear to me)
b) I can construct a query for it.
Screenshot from the video
I've not had any luck finding any resources describing it at this level in the detail I need, or managed to find any packages that emulate it that I can learn from.
Does anyone have any low-level experience with this part of Salesforce that could help?
EDIT: Thank you, David Reed for further details in your edit. So presumably you agree that things aren't exactly as explained?
In the 'value' column, the GUID of the related record is stored
This allows ease of fetching -to-one related records and, with a little bit of simple SQL switching, resolve a group of records in the reverse direction.
I believe Salesforce don't have many-to-many relationships, as opposed to using a 'junction', so the above is still relevant
I guess now though I wonder what the point of the pivot table is at all, as there's a very simple relationship going on here now. Unless the lack of index on the value columns dictates the need for one...
Or, could it be more likely/useful if:
The record's value column stores a GUID to the relationship record and not directly to the related record?
This relationship record holds all necessary information required to put together a decent query and ALSO includes the GUID of the related record?
Neither option clear up the ambiguity for me, unless I'm missing something.
You cannot see, query, or otherwise access the internal tables that underlie Salesforce's on-platform schema. When you build an application on the platform, you query relationships using SOQL relationship queries; there are no pivot tables involved in the work you can see and do on the platform.
While some presentations and documentation discuss at some level the underlying implementation, the precise details of the SQL tables, schemas, query optimizers, and so on are not public.
As a Salesforce developer or developer who interacts with Salesforce via the API, you do not need to worry about the underlying SQL implementation used on Salesforce's servers at almost any time. The main point at which that knowledge can become helpful is when you are working with massive data volumes (multiple millions of records). The most helpful documentation for that use case is Best Practices for Deployments with Large Data Volumes. The underlying schema is briefly discussed under Underlying Concepts. But bear in mind
As a customer, you also cannot optimize the SQL underlying many application operations because it is generated by the system, not written by each tenant.
The implementation details are also subject to change.
Metadata Tables and Data Tables
When an organisation declares an object’s field with a relationship type, Force.com maps the field to a Value field in MT_Data, and then uses this field to store the ObjID of a related object.
I believe the documentation you mentioned is using the identifier ObjId ambiguously, and here actually means what it refers to earlier in the document as GUID - the Salesforce Id. Another paragraph states
The MT_Name_Denorm table is a lean data table that stores the ObjID and Name of each record in MT_Data. When an application needs to provide a list of records involved in a parent/child relationship, Force.com uses the MT_Name_Denorm table to execute a relatively simple query that retrieves the Name of each referenced record for display in the app, say, as part of a hyperlink.
This also doesn't make sense unless ObjId is being used to mean what is called GUID in the visual depiction of the table above in the document - the Salesforce Id of the record.

Complex database design for many checkboxes

I'm currently creating a website for a local hospital. The problem I am currently facing: The website has too many checkboxes and fields that are enabled/disabled depending on the checkbox.
This is url to the website: http://ali.ezyro.com/adan/myForm.php
Since I have little experience with databases, what is the best way to design the database to hold all the data of this document?
This is a case where a relational database may not be your best option - it all depends on how the data is used within the system.
The straightforward option is to design one (very wide) table for each patient. Each attribute is modelled as a column; multi-valued attributes (check boxes) have one column for each valid option, single-valued attribute which require a lookup from a list of valid options use a foreign key to a table holding the valid lookups (e.g. in the patient table, you have a column called cervical_collar_id, and you have a separate table called cervical_collar_values with 1 - prehospital, 2-on arrival, 3-not required).
This allows you to store the data, and query it efficiently using standard SQL (find all patients who arrived with a prehospital cervical collar, for instance).
The "if you select box x, then box y becomes mandatory" logic should probably live in the application, not your schema.
But this is a difficult design to work with - adding attributes to the patient record is non-trivial. Wide tables are usually a bad sign.
You might decide that's a bad thing, and go for "entity/attribute/value" design. Lots of Stack Overflow answers will tell you the benefits and drawbacks of this - Google is your friend. TL;DR: even moderately complex queries become almost impossible.
You might instead decide to store the data as a document - most database engines store JSON and XML, and allow you to query this data efficiently. It has the benefit of being easier to develop, and easier to change - but you lose the built-in validation that the relational model gives you.

MySQL Relational Database with Large Data Sets Unique to Each User

I am working on a project which involves building a social network-style application allowing users to share inventory/product information within their network (for sourcing).
I am a decent programmer, but I am admittedly not an expert with databases; even more so when it comes to database design. Currently, user/company information is stored via a relational database scheme in MySQL which is working perfectly.
My problem is that while my relational scheme works brilliantly for user/company information, it is confusing me on how to implement inventory information. The issue is that each "inventory list" will definitely contain differing attributes specific to the product type, but identical to the attributes of each other product in the list. My first thought was to create a table for each "inventory list". However, I feel like this would be very messy and would complicate future attempts at KDD. I also (briefly) considered using a 'master inventory' and storing the information (e.g. the variable categories and data as a JSON string. But I figured JSON strings MySQL would just become a larger pain in the ass.
My question is essentially how would someone else solve this problem? Or, more generally, sticking with principles of relational database management, what is the "correct" way to associate unique, large data sets of similar type with a parent user? The thing is, I know I could easily jerry-build something that would work, but I am genuinely interested in what the consensus is on how to solve this problem.
Thanks!
I would check out this post: Entity Attribute Value Database vs. strict Relational Model Ecommerce
The way I've always seen this done is to make a base table for inventory that stores universally common fields. A product id, a product name, etc.
Then you have another table that has dynamic attributes. A very popular example of this is Wordpress. If you look at their data model, they use this idea heavily.
One of the good things about this approach is that it's flexible. One of the major negatives is that it's slow and can produce complex code.
I'll throw out an alternative of using a document database. In that case, each document can have a different schema/structure and you can still run queries against them.

Creating a MySQL Database Schema for large data set

I'm struggling to find the best way to build out a structure that will work for my project. The answer may be simple but I'm struggling due to the massive number of columns or tables, depending on how it's set up.
We have several tools, each that can be run for many customers. Each tool has a series of questions that populate a database of answers. After the tool is run, we populate another series of data that is the output of the tool. We have roughly 10 tools, all populating a spreadsheet of 1500 data points. Here's where I struggle... each tool can be run multiple times, and many tools share the same data point. My next project is to build an application that can begin data entry for a tool, but allow import of data that shares the same datapoint for a tool that has already been run.
A simple example:
Tool 1 - company, numberofusers, numberoflocations, cost
Tool 2 - company, numberofusers, totalstorage, employeepayrate
So if the same company completed tool 1, I need to be able to populate "numberofusers" (or offer to populate) when they complete tool 2 since it already exists.
I think what it boils down to is, would it be better to create a structure that has 1500 tables, 1 for each data element with additional data around each data element, or to create a single massive table - something like...
customerID(FK), EventID(fk), ToolID(fk), numberofusers, numberoflocations, cost, total storage, employee pay,.....(1500)
If I go this route and have one large table I'm not sure how that will impact performance. Likewise - how difficult it will be to maintain 1500 tables.
Another dimension is that it would be nice to have a description of each field:
numberofusers,title,description,active(bool). I assume this is only possible if each element is in its own table?
Thoughts? Suggestions? Sorry for the lengthy question, new here.
Build a main table with all the common data: company, # users, .. other stuff. Give each row a unique id.
Build a table for each unique tool with the company id from above and any data unique to that implementation. Give each table a primary (unique key) for 'tool use' and 'company'.
This covers the common data in one place, identifies each 'customer' and provides for multiple uses of a given tool for each customer. Every use and customer is trackable and distinct.
More about normalization here.
I agree with etherbubunny on normalization but with larger datasets there are performance considerations that quickly become important. Joins which are often required in normalized databases to display human readable information can be performance killers on even medium sized tables which is why a lot of data warehouse models use de-normalized datasets for reporting. This is essentially pre-building the joined reporting data into new tables with heavy use of indexing, archiving and partitioning.
In many cases smart use of partitioning on its own can also effectively help reduce the size of the datasets being queried. This usually takes quite a bit of maintenance unless certain parameters remain fixed though.
Ultimately in your case (and most others) I highly recommend building it the way you are able to maintain and understand what is going on and then performing regular performance checks via slow query logs, explain, and performance monitoring tools like percona's tool set. This will give you insight into what is really happening and give you some data to come back here or the MySQL forums with. We can always speculate here but ultimately the real data and your setup will be the driving force behind what is right for you.

"Metadata driven" means what? I keep hearing this phrase in ETL context but could never figure it out

Appologies if I am asking a inappropraite question but I have been hearing this phrase "Metadata driven" for years but could not ever understand.
Metadata as per my understanding is Data (iformation) about data! I understand this more or less!!
But when I hear "MetaData driven" (specaily in ETL world) could not figure it out exactly what it means.
I have good experience with one ETL tool SSIS, so example in it's context will be easy to unsersatnd.
Assume you are moving 5 rows from table A to table B and you would like to make sure that only the rows matching a particular criteria are affected. In this case your process depends on data and is, therefore, an example of a data-driven design.
Now, let's imagine you have a few "similar" source and/or target table schemas which are similar in the way you would like to process them but are different in their exact implementation (table name, column names, column data types, or even a DB type: Oracle, MS SQL, Sybase, even a flat file or an XML) so what you would like is to "plug-in" sources and targets, DB connections, etc for a particular ETL during the actual run of the ETL.
What you need is a clear separation of the "logical" ETL process from a "physical" implementation. In other words you would like to have an ETL being described in a generic logical units/terms which are substituted by actual physical ones during its run.
What you get then is a descrption of an ETL process that is generic enough for any situation and gets a proper customization to be run for specific source/target systems based on metadata of those sources and targets - a metada-driven design, which allows you to have a generic "logical" representation of your ETL process that becomes a "physical instantiation" at a run-time.