Analyzing json columns in mysql using Drill or Presto - apache-drill

I have a sharded table with one pk column and a text column. The text column holds an object in json format. I want to enable ad hoc business analytics by using drill or presto.
Just experimented with both but i am unable to figure out how to parse the json and access its fields in a query.
For drill i tried convert_from(features,'JSON') and for presto i tried json_parse(features). Both seem to convert column text to JSON as a simple select but i cannot access object fields in the same query.
Performance is important so need to avoid io, open to options requiring development effort or hardware scaling.

I was able to analyze json column in presto by using json_extract_scalar on output of json_parse ex. json_extract_scalar(json_parse(features),'$.r_id'). This returns me a string which i can cast to required data type.

Related

How to order by a DC2Type:json_array subfield

I'm working in a existing application and I'm asked to order by a child field of a DC2Type:json_array field in Symfony. Normally I would add the field as a column in the table. In this case this is not possible.
We have a JsonSerializable invoice entity with a normal date attribute. But also a data attribute which contains the due_date. I whould like to order by data[due_date] in a Symfony query. Is this at all possible?
tl;dr: No, not really.
According to Doctrine's type mapping matrix, json_array gets mapped to MySQL's MEDIUMTEXT column type, which by default obviously does not index its contents as json hence provides little to no performance advantage. (also, AFAICT, doctrine doesn't provide any json functionality besides converting db json from and to php arrays/nulls)
Maybe you could magically do some string search magic to extract a value to sort by it, but you still wouldn't get the performance boost a proper index provides. Depending on your data this could get noticably slow (and eat memory).
The JSON data type is fairly "new" to the relational database world and mappers like doctrine have not yet fully adopted it either. Extending doctrine to handle this data type will probably take lots of work. Instead you could rethink your table schema to include all the fields as columns you want to order by to use all benefits a relational database provides (like indexing).

Can MySQL JSON columns be indexed in version 8.0.19?

According to this webpage, MySQL JSON columns cannot be indexed.
MySQL Server Blog
"JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column."
Can someone please tell me if this is changed in latest MySQL community version 8.0.19?
What will give me the best performance? A index on a generated column or a duplicate (a non JSON column with the exact same text as in the JSON column) column with normal fulltext search?
This is still the case, from the documentation:
JSON columns, like columns of other binary types, are not indexed
directly; instead, you can create an index on a generated column that
extracts a scalar value from the JSON column. See Indexing a Generated
Column to Provide a JSON Column Index, for a detailed example.
and here also:
Indexing a Generated Column to Provide a JSON Column Index
As noted elsewhere, JSON columns cannot be indexed directly.
JSON is a handy way to store somewhat arbitrary information in a database table. If will regularly use the components of the JSON in RDBMS operations, such as indexing, then you should pull them out of the JSON (or copy them out).

Google-BigQuery - schema parsing of CSV file

We are using Java API to load a CSV file to Google Big Query. Is there a way to detect the columns on load and auto select the appropriate schema type?
For example, if a specific column has only float, then BigQuery assigns the column as float, if non numeric then it assigns column as string. Is there a method to do this?
The roundabout way is to assign each column as string by default when loading the CSV.
Then do a query on each column -
SELECT count(columnname)- count(float(columnname)) FROM dataset.table
(assuming I am only interested in isolating columns that have "float values" that I can use for math functions from my application)
Any other method to solve this problem?
Right now, BigQuery does not support schema inference, so as you suggest, your options are:
Provide the schema explicitly when loading data.
Load all data using the string type, and cast/convert at query time.
Note that you can use the allowLargeResults feature to clean up and rewrite your imported data (but note that you'll be charged for the query, which will increase your data ingestion costs).
For the record, schema auto-detect is now supported: https://cloud.google.com/bigquery/federated-data-sources#auto-detect

Best column data type for JSON data in PostgreSQL < 9.2?

I'm going to store records with arbitrary fields, and the custom ones will automatically go into a separate serialized field. I don't care that they're not searchable nor sortable.
I've chosen the JSON serialization format. What is the best column data type, provided I don't have the new json type?
The underlying data type in 9.2 is TEXT, so you should be able to use that - see http://michael.otacoo.com/postgresql-2/postgres-9-2-highlight-json-data-type/

sort custom encoded data in mysql

i need to sort my data with one of my column in table which is vendor_params; the thing is it is an custom encoded data, below i have mentioned how i saved data in db
vendor_min_pov="200"|vendor_min_poq=1
firstly i was thinking to sort it through php but it was increasing the page load time, as some time query returns large data in an object of different keys of the same array and there are other filtration applying on that array too; so its good to sort that out via sql query.
i tried to search how can i order encoded data; but the solutions i got mostly is for serialize data;
please help if some one can guide me how can i order the result of this table with the data values of vendor_min_pov in the column vendor_params
finally i use the other option to sort this type of data as to decode it i need to do bit tweakings on php to and it increase the load time so i sort the data from jquery on front end.
however what i was preferring was the suggestion of #mike which is using MID() by this we can sort these sort of thing