BigQuery - Flexible Schema in Record Field - json

I have a schema for BigQuery in which the Record field is JSON-like, however, the keys in the JSON are dynamic i.e. new keys might emerge with new data and it is hard to know how many keys in total there are. According to my understanding, it is not possible to use BigQuery for such a table since the schema of the record field type needs to be explicitly defined or else it will throw an error.
The only other alternative is to use JSON_EXTRACT function while querying the data which would parse through the JSON (text) field. Is there any other way we can have dynamic nested schemas in a table in BigQuery?

A fixed schema can be created for common fields, and you can set them as nullable. And a column as type string can be used to store the rest of the JSON and use the JSON Functions to query for data.
We all the time have a meta column in our table, which holds additional raw unstructured data as a JSON object.
Please note that currently you can store up to 2 Megabytes in a string column, which is decent for a JSON document.
To make it easier to deal with the data, you can create views from your queries that use JSON_EXTRACT, and reference the view table in some other more simpler query.
Also at the streaming insert phase, your app could denormalize the JSON into proper tables.

Related

How to implement multiple JSON schemas per JSON data column in MySQL

I'm using javascript-based CAD configurators to load products (i.e. materials) to be configured into new products (separate table - i.e. widgets). My JSON data columns need to adapt and be valid for materials to be used in client-side configurators for the creation of different kinds of new widgets.
My thought is to add a "data_type" column where each "data_type" is associated with a JSON Schema schematic. This column could be a foreign key or a string as JSON Schemas could be stored in a table (a json "schema" column in a data_types table) or directory (tablename-datatype.schema.json).
At this stage, I'm thinking a data_types table would be more flexible and easier to maintain and serve schemas from. Additionally, this would enable schemas to be shared between tables and I could implement a versioning system to facilitate configurator evolution.
What are the options to implement multiple JSON schemas per column in MySQL? Are there better ways of accomplishing what I'm intending to?
In this use case:
I'm using javascript-based CAD configurators to load products (i.e. materials) to be configured into new products (separate table - i.e. widgets). My JSON data columns need to adapt and be valid for materials to be used in client-side configurators for the creation of different kinds of new widgets.
To facilitate flexible JSON schemas that can be validated and used with relation to individual JSON data columns in multiple tables, the structure I'll implement is:
JSON data is only for client-side processes. This data is related to MySQL entries that have columns and relations that are queried at a higher level. The JSON data requires a flexible architecture for client-side functionality.
Create a datum_schema table. Columns might include key (id), string (name), integer (version), and json (schema) columns. Schemas can be shared, ensure backwards compatibility, and served to multiple client-side technologies.
Tables where entries require a single JSON data record. Create a json column and a reference to a datum_schema record. In the use case, this would be configurators and widgets:
Creating configurators: you create a configurator and datum_schemas records. In the use case, I'll create two schemas: one for settings and one for inputs. Specific settings for loading a configurator instance are stored in a json column within the configurators table.
Configurator table entries store references to their setting and input schemas. User rights will enable some users to create only configurator entries and others to create entries and define schemas.
Creating widgets: as you're using a configurator to create a widget, widget json data will be the input values needed to recreate itself and a reference to its configurator record.
Tables where a single entry may need multiple JSON data records.
Create a separate table to store json data with references to the first table. In the use case, this would be materials.
Creating materials: in the materials table, an entry is made to store any higher level of queryable information (i.e category). In a separate material_data table, entries include a reference to the material, json data, and a reference to the datum_schema. If the material is used in a configurator, the json data will have been structured according to a datum_schema record created by the configurator.
Want to create new kinds of widgets? Define a configurator, the necessary schemas, and relevant material categories.
I believe this to be a reasonable solution but, as I'm new to MySQL and database design in general, I'd appreciate feedback.

Can MySQL dynamicly create indexes for JSON column?

Given you have a JSON column called slug structured like this:
{
"en_US": "slug",
"nl_NL": "nl/different-slug"
}
This could be indexed by adding generated columns to the table that point to the values of en_US and nl_NL. This works fine but adding a third locale would require a table schema update.
Would it be possible to let MySQL automagicly index all the key value pairs in the JSON without explicitly defining them in the schema?
As mysql manual on json data type says:
JSON columns, like columns of other binary types, are not indexed directly; instead, you can create an index on a generated column that extracts a scalar value from the JSON column. See Indexing a Generated Column to Provide a JSON Column Index, for a detailed example.
So, the answer is no, mysql cannot index the contents of a json column automatically. You need to define and index generated columns.

How to define data type in mySql stored procedure to accept list of object as an in parameter?

I have 'one to many' relationship in two tables. In that case I want to write a store procedured in mySql which can accept the list of child table object and update the tables. The challenge I am facing is what will be the data type of the in parameter for list of object.
You can try to use VARCHAR(65535) in MySQL.
There is no list data type in MySQL.
Given the info that you are coming from Oracle DB, you might wanna know that MySQL does not have a strict concept of objects. And, as answered here, unfortunately, you cannot create a custom data type on your own.
The way to work around it is to imagine a table as a class. Thus, your objects will become records of the said table.
You have to settle for one of the following approaches:
Concatenated IDs: Store the concatenated IDs you want to operate on in a string equivalent datatype- like VARCHAR(5000) or TEXT. This way you can either split and loop over the string or compose a prepared statement dynamically and execute it.
Use a temporary table: Fetch the child table objects, on the fly, into the temporary table and process them. This way, once you create the temporary table with the fields & constraints that you like, you can use
SELECT ... INTO TEMPORARY_TABLE_NAME to populate the table accordingly. The SELECT statement should fetch the properties you need.
Depending on the size of the data, you might want to choose the temp table approach for larger data sets.
You can use Text data type for store large amount of data in single variable
you can define in sp as:
In Variable_name TEXT

Storing structured user data in members table column/s

I wanted to ask for some advice in structuring the SQL database I am creating.
UPDATE: Storing Data in MySQL as JSON seems to clearly indicate that storing JSON in MySQL is not a smart choice.
My initial idea was to create a table for each user named '{user_id}' with the following rows:
Datetime
Entry (one-digit int)
Comment (reasonably short string)
Activity (one word)
However, I have read that creating a table for each user is not advisable because it's unmanageable in the long run.
Specifically, I wanted to know how I could put all the information that I would have put in the '{user_id}' table in the user's row of my 'members' table.
I had a few ideas, but don't know how good they are:
Storing the user data as a JSON object (converted to a string) in an additional column 'data' of the 'members' table. Would that become unmanageable in the long run too (due to JSON object string becoming too long)?
Storing the user data in various additional columns of the 'members' table, maybe one for each of the parameters listed above (each of them being an array)
Storing the user data in various additional columns of the 'members' table, maybe one for each of the parameters listed above (each of them being a dictionary or some other data structure)
Are there any other better ways, or better data storage types than JSON objects?
What would be a good way of storing this information? Isn't handling the arrays/dictionaries going to become unmanageable over time when they become very big?
(one thing to keep in mind is that the 'data' entries would have to be daily modified and easily accessed)
I think you may simply want a single additional table, maybe called "activities" with a foreign key "user" to the "members" table.
Then for each row in each of the per user table that you were originally thinking of, you have a row in the activities table with the value of "user" being the user in question. Since each row is of relatively small bounded size, one would expect the database to handle it well, and efficiency issues can be addressed by indexing. Basically I am agreeing with #MikeNakis

Implementing custom fields with ALTER TABLE

We are currently thinking about different ways to implement custom fields for our web application. Users should be able to define custom fields for certain entities and fill in/view this data (and possibly query the data later on).
I understand that there are different ways to implement custom fields (e.g. using a name/value table or using alter table etc.) and we are currently favoring using ALTER TABLE to dynamically add new user fields to the database.
After browsing through other related SO topics, I couldn't find any big drawbacks of this solution. In contrast, having the option to query the data in fast way (e.g. by directly using SQL's where statement) is a big advantage for us.
Are there any drawbacks you could think of by implementing custom fields this way? We are talking about a web application that is used by up to 100 users at the same time (not concurrent requests..) and can use both MySQL and MS SQL Server databases.
Just as an update, we decided to add new columns via ALTER TABLE to the existing database table to implement custom fields. After some research and tests, this looks like the best solution for most database engines. A separate table with meta information about the custom fields provides the needed information to manage, query and work with the custom fields.
The first drawback I see is that you need to grant your application service with ALTER rights.
This implies that your security model needs careful attention as the application will be able to not only add fields but to drop and rename them as well and create some tables (at least for MySQL).
Secondly, how would you distinct fields that are required per user? Or can the fields created by user A be accessed by user B?
Note that the cardinality of the columns may also significantly grow. If every user adds 2 fields, we are already talking about 200 fields.
Personally, I would use one of the two approaches or a mix of them:
Using a serialized field
I would add one text field to the table in which I would store a serialized dictionary or dictionaries:
{
user_1: {key1: val1, key2, val2,...},
user_2: {key1: val1, key2, val2,...},
...
}
The drawback is that the values are not easily searchable.
Using a multi-type name/value table
fields table:
user_id: int
field_name: varchar(100)
type: enum('INT', 'REAL', 'STRING')
values table:
field_id: int
row_id: int # the main table row id
int_value: int
float_value: float
text_value: text
Of course, it requires a join and is a bit more complicated to implement but far more generic and, if indexed properly, quite efficient.
I see nothing wrong with adding new custom fields to the database table.
With this approach, the specific/most appropriate type can be used i.e. need an int field? define it as int. Whereas with a name/value type table, you'd be storing multiple data types as one type (nvarchar probably) - unless you complete that name/value table with multiple columns of different types and populate the appropriate one but that is a bit horrible.
Also, adding new columns makes it easier to query/no need to involve a join to a new name/value table.
It may not feel as generic, but I feel that's better than having a "one-size fits all" name/value table.
From an SQL Server point of view (2005 onwards)....
An alternative, would be to store create 1 "custom data" field of type XML - this would be truly generic and require no field creation or the need for a separate name/value table. Also has the benefit that not all records have to have the same custom data (i.e. the one field is common, but what it contains doesn't have to be). Not 100% on the performance impact but XML data can be indexed.