How to implement multiple JSON schemas per JSON data column in MySQL - mysql

I'm using javascript-based CAD configurators to load products (i.e. materials) to be configured into new products (separate table - i.e. widgets). My JSON data columns need to adapt and be valid for materials to be used in client-side configurators for the creation of different kinds of new widgets.
My thought is to add a "data_type" column where each "data_type" is associated with a JSON Schema schematic. This column could be a foreign key or a string as JSON Schemas could be stored in a table (a json "schema" column in a data_types table) or directory (tablename-datatype.schema.json).
At this stage, I'm thinking a data_types table would be more flexible and easier to maintain and serve schemas from. Additionally, this would enable schemas to be shared between tables and I could implement a versioning system to facilitate configurator evolution.
What are the options to implement multiple JSON schemas per column in MySQL? Are there better ways of accomplishing what I'm intending to?

In this use case:
I'm using javascript-based CAD configurators to load products (i.e. materials) to be configured into new products (separate table - i.e. widgets). My JSON data columns need to adapt and be valid for materials to be used in client-side configurators for the creation of different kinds of new widgets.
To facilitate flexible JSON schemas that can be validated and used with relation to individual JSON data columns in multiple tables, the structure I'll implement is:
JSON data is only for client-side processes. This data is related to MySQL entries that have columns and relations that are queried at a higher level. The JSON data requires a flexible architecture for client-side functionality.
Create a datum_schema table. Columns might include key (id), string (name), integer (version), and json (schema) columns. Schemas can be shared, ensure backwards compatibility, and served to multiple client-side technologies.
Tables where entries require a single JSON data record. Create a json column and a reference to a datum_schema record. In the use case, this would be configurators and widgets:
Creating configurators: you create a configurator and datum_schemas records. In the use case, I'll create two schemas: one for settings and one for inputs. Specific settings for loading a configurator instance are stored in a json column within the configurators table.
Configurator table entries store references to their setting and input schemas. User rights will enable some users to create only configurator entries and others to create entries and define schemas.
Creating widgets: as you're using a configurator to create a widget, widget json data will be the input values needed to recreate itself and a reference to its configurator record.
Tables where a single entry may need multiple JSON data records.
Create a separate table to store json data with references to the first table. In the use case, this would be materials.
Creating materials: in the materials table, an entry is made to store any higher level of queryable information (i.e category). In a separate material_data table, entries include a reference to the material, json data, and a reference to the datum_schema. If the material is used in a configurator, the json data will have been structured according to a datum_schema record created by the configurator.
Want to create new kinds of widgets? Define a configurator, the necessary schemas, and relevant material categories.
I believe this to be a reasonable solution but, as I'm new to MySQL and database design in general, I'd appreciate feedback.

Related

mySQL: multiple sources for table: split or not?

If a table has multiple sources (each source contains a different type of data), is it best practice to split that table into multiple tabes (with the necessary foreign keys) or do you nevertheless fit it into one table.
simplified example:
want to make a table containing client info
2 sources of data, all csv files:
static data, almost never changes (e.g. start of the relationship, headquarters, etc)
revenue data which changes monthly
In this case do you go for one table (e.g. t_client) containing both static and revenue data, which you then update monthly? Or do you make multiple tables: one with the static data (e.g. t_client_info) and one with the variable info (t_client_revenue, update monthly) and link them?
just want to know what is best practice

BigQuery - Flexible Schema in Record Field

I have a schema for BigQuery in which the Record field is JSON-like, however, the keys in the JSON are dynamic i.e. new keys might emerge with new data and it is hard to know how many keys in total there are. According to my understanding, it is not possible to use BigQuery for such a table since the schema of the record field type needs to be explicitly defined or else it will throw an error.
The only other alternative is to use JSON_EXTRACT function while querying the data which would parse through the JSON (text) field. Is there any other way we can have dynamic nested schemas in a table in BigQuery?
A fixed schema can be created for common fields, and you can set them as nullable. And a column as type string can be used to store the rest of the JSON and use the JSON Functions to query for data.
We all the time have a meta column in our table, which holds additional raw unstructured data as a JSON object.
Please note that currently you can store up to 2 Megabytes in a string column, which is decent for a JSON document.
To make it easier to deal with the data, you can create views from your queries that use JSON_EXTRACT, and reference the view table in some other more simpler query.
Also at the streaming insert phase, your app could denormalize the JSON into proper tables.

How to synchronise Core Data relationships?

I'm creating an app that pulls data from a web server (MySQL), parses it and stores it in a SQLite database using Core Data.
The MySQL database has a 'words' table. Each word can be in a 'category'. So the words table has a field for 'category_id' to join the tables.
I'm having some trouble getting my head around how to replicate this locally in my app. I currently have entities matching the structure of the MySQL database, but no relationships. It seems like in my 'words' entity I shouldn't need the 'category_id' field (I should instead have a one-to-one 'category' relation set-up).
I'm confused as to how to keep this Core Data relationship in sync with the web server?
Assuming you have an Entity for Word and Category you will need to make a relationship (naming may be a bit hazy). Also assuming a Category can have many words and
// Word Entity
Relationship Destination Inverse
category Categories words
// Category Entity
Relationship Destination Inverse
words Word category // To-Many relationship
You are correct you would not need the category_id field as all relationships are managed through the object graph that Core Data maintains. You will still need a primary key like server_id (or similar) in each entity or you will have trouble updating/finding already saved objects.
This is how I deal with syncing data from an external database (I use RESTful interfaces with JSON but that does not really matter)
Grab the feed sorted by server_id
Get the primary keys (server_id) of all the objects in the feed
Perform a fetch using the a predicate like ... #"(serverId IN %#)", primaryKeys
which is sorted by the primary key.
Step through each array. If the fetch result has my record then I update it. If it does not then I insert a new one.
You would need to do this for both Word and Category
Next fetch all objects that form part of a relationship
Use the appropriate methods generated by core data for adding objects. e.g. something like `[myArticle addWords:[NSSet setWithObjects:word1, word2, word3, nil];
It's hard for me to test but this should give you a starting point?
Good to see a fellow Shiny course attendee using stack overflow - it's not just me

Is the concept of abstraction relevant to tables in MySQL? If so, how can I do it?

I want to store data on various engines in a MySQL database, which includes both piston and rotary engines.
In OO languages, I can create and extend an Engine superclass to obtain PistonEngine and RotaryEngine subclasses.
The PistonEngine subclass would contain properties such as CylinderNo, PistonBore and PistonStroke.
The RotaryEngine subclass would contain properties like RotorThickness and RotorDiameter.
In MySQL, while I can create two separate tables for piston and rotary engines respectively, I would prefer to maintain an EngineType field as part of the engine data, and store all data common to both engine types in a single table.
How can I design my database to avoid data redundancy as much as possible?
I would create four tables in this situation.
One called EngineTypes which would just have the id/value pairs for your engine types
1 Rotary
2 Piston
One called Engines that contains the information that is relevant to both piston and rotary engines. It would have a column in it that contains the engine type id.
One table called RotaryEngineDetails that contains all your rotary engine specific data. It would have a foreign key to your engines table.
One table called PistonEngineDetails that contains all your piston engine specific data..
My suggestion would be to create a single table with all of the information needed for both engine types, including a key for the engine type itself. After that you could create a view for each type of engine "classes" so it would appear to be its own object. Although, depending on all of the data that you are storing, it may not make since as your structure would not be normalized.
Update
Based on comment, I have expanded my answer.
By definition, view is a virtual or logical table which is composed of result set of a SELECT query. Because view is like the table which consists of row and column so you can retrieve and update data on it in the same way with table. View is dynamic so it is not related to the physical schema and it is only stored as view definition. When the tables which are the source data of a view changes; the data in the view change also. (http://www.mysqltutorial.org/introduction-sql-views.aspx)
Here is a link to a reference on creating views.
Here is one more link.

Implementing custom fields with ALTER TABLE

We are currently thinking about different ways to implement custom fields for our web application. Users should be able to define custom fields for certain entities and fill in/view this data (and possibly query the data later on).
I understand that there are different ways to implement custom fields (e.g. using a name/value table or using alter table etc.) and we are currently favoring using ALTER TABLE to dynamically add new user fields to the database.
After browsing through other related SO topics, I couldn't find any big drawbacks of this solution. In contrast, having the option to query the data in fast way (e.g. by directly using SQL's where statement) is a big advantage for us.
Are there any drawbacks you could think of by implementing custom fields this way? We are talking about a web application that is used by up to 100 users at the same time (not concurrent requests..) and can use both MySQL and MS SQL Server databases.
Just as an update, we decided to add new columns via ALTER TABLE to the existing database table to implement custom fields. After some research and tests, this looks like the best solution for most database engines. A separate table with meta information about the custom fields provides the needed information to manage, query and work with the custom fields.
The first drawback I see is that you need to grant your application service with ALTER rights.
This implies that your security model needs careful attention as the application will be able to not only add fields but to drop and rename them as well and create some tables (at least for MySQL).
Secondly, how would you distinct fields that are required per user? Or can the fields created by user A be accessed by user B?
Note that the cardinality of the columns may also significantly grow. If every user adds 2 fields, we are already talking about 200 fields.
Personally, I would use one of the two approaches or a mix of them:
Using a serialized field
I would add one text field to the table in which I would store a serialized dictionary or dictionaries:
{
user_1: {key1: val1, key2, val2,...},
user_2: {key1: val1, key2, val2,...},
...
}
The drawback is that the values are not easily searchable.
Using a multi-type name/value table
fields table:
user_id: int
field_name: varchar(100)
type: enum('INT', 'REAL', 'STRING')
values table:
field_id: int
row_id: int # the main table row id
int_value: int
float_value: float
text_value: text
Of course, it requires a join and is a bit more complicated to implement but far more generic and, if indexed properly, quite efficient.
I see nothing wrong with adding new custom fields to the database table.
With this approach, the specific/most appropriate type can be used i.e. need an int field? define it as int. Whereas with a name/value type table, you'd be storing multiple data types as one type (nvarchar probably) - unless you complete that name/value table with multiple columns of different types and populate the appropriate one but that is a bit horrible.
Also, adding new columns makes it easier to query/no need to involve a join to a new name/value table.
It may not feel as generic, but I feel that's better than having a "one-size fits all" name/value table.
From an SQL Server point of view (2005 onwards)....
An alternative, would be to store create 1 "custom data" field of type XML - this would be truly generic and require no field creation or the need for a separate name/value table. Also has the benefit that not all records have to have the same custom data (i.e. the one field is common, but what it contains doesn't have to be). Not 100% on the performance impact but XML data can be indexed.