Google-BigQuery - schema parsing of CSV file - csv

We are using Java API to load a CSV file to Google Big Query. Is there a way to detect the columns on load and auto select the appropriate schema type?
For example, if a specific column has only float, then BigQuery assigns the column as float, if non numeric then it assigns column as string. Is there a method to do this?
The roundabout way is to assign each column as string by default when loading the CSV.
Then do a query on each column -
SELECT count(columnname)- count(float(columnname)) FROM dataset.table
(assuming I am only interested in isolating columns that have "float values" that I can use for math functions from my application)
Any other method to solve this problem?

Right now, BigQuery does not support schema inference, so as you suggest, your options are:
Provide the schema explicitly when loading data.
Load all data using the string type, and cast/convert at query time.
Note that you can use the allowLargeResults feature to clean up and rewrite your imported data (but note that you'll be charged for the query, which will increase your data ingestion costs).

For the record, schema auto-detect is now supported: https://cloud.google.com/bigquery/federated-data-sources#auto-detect

Related

How to order by a DC2Type:json_array subfield

I'm working in a existing application and I'm asked to order by a child field of a DC2Type:json_array field in Symfony. Normally I would add the field as a column in the table. In this case this is not possible.
We have a JsonSerializable invoice entity with a normal date attribute. But also a data attribute which contains the due_date. I whould like to order by data[due_date] in a Symfony query. Is this at all possible?
tl;dr: No, not really.
According to Doctrine's type mapping matrix, json_array gets mapped to MySQL's MEDIUMTEXT column type, which by default obviously does not index its contents as json hence provides little to no performance advantage. (also, AFAICT, doctrine doesn't provide any json functionality besides converting db json from and to php arrays/nulls)
Maybe you could magically do some string search magic to extract a value to sort by it, but you still wouldn't get the performance boost a proper index provides. Depending on your data this could get noticably slow (and eat memory).
The JSON data type is fairly "new" to the relational database world and mappers like doctrine have not yet fully adopted it either. Extending doctrine to handle this data type will probably take lots of work. Instead you could rethink your table schema to include all the fields as columns you want to order by to use all benefits a relational database provides (like indexing).

Analyzing json columns in mysql using Drill or Presto

I have a sharded table with one pk column and a text column. The text column holds an object in json format. I want to enable ad hoc business analytics by using drill or presto.
Just experimented with both but i am unable to figure out how to parse the json and access its fields in a query.
For drill i tried convert_from(features,'JSON') and for presto i tried json_parse(features). Both seem to convert column text to JSON as a simple select but i cannot access object fields in the same query.
Performance is important so need to avoid io, open to options requiring development effort or hardware scaling.
I was able to analyze json column in presto by using json_extract_scalar on output of json_parse ex. json_extract_scalar(json_parse(features),'$.r_id'). This returns me a string which i can cast to required data type.

sort custom encoded data in mysql

i need to sort my data with one of my column in table which is vendor_params; the thing is it is an custom encoded data, below i have mentioned how i saved data in db
vendor_min_pov="200"|vendor_min_poq=1
firstly i was thinking to sort it through php but it was increasing the page load time, as some time query returns large data in an object of different keys of the same array and there are other filtration applying on that array too; so its good to sort that out via sql query.
i tried to search how can i order encoded data; but the solutions i got mostly is for serialize data;
please help if some one can guide me how can i order the result of this table with the data values of vendor_min_pov in the column vendor_params
finally i use the other option to sort this type of data as to decode it i need to do bit tweakings on php to and it increase the load time so i sort the data from jquery on front end.
however what i was preferring was the suggestion of #mike which is using MID() by this we can sort these sort of thing

Best way to store an array in MySQL database?

Part of my app includes volume automation for songs.
The volume automation is described in the following format:
[[0,50],[20,62],[48,92]]
I consider each item in this array a 'data point' with the first value containing the position in the song and the second value containing the volume on a scale of 0-100.
I then take these values and perform a function client-side to interpolate this data with 'virtual' data points in order to create a bezier curve allowing smooth volume transition as an audio file is playing.
However, the need has arisen to allow a user to save this automation into the database for recall at a later date.
The datapoints can be unlimited (though in reality should never really exceed around 40-50 with most being less than 10)
Also how should I handle the data? Should it be stored as is, in a text field? Or should I process it in some way beforehand for optimum results?
What data type would be best to use in MySQL to store an array?
Definitely not a text field, but a varchar -- perhaps. I wouldn't recommend parsing the results and storing them in individual columns unless you want to take advantage of that data in database sense -- statistics etc.
If you never see yourself asking "What is the average volume that users use?" then don't bother parsing it.
To figure out how to store this data ask yourself "How will i use it later?" If you will obtain the array and need to utilize it with PHP you can use serialize function. If you will use the values in JavaScript then JSON encoding will probably be best for you (plus many languages know how to decode it)
Good luck!
I suggest you to take a look at the JSON data type. This way you can store your array in a more efficient way than text or varchar, and you can access your data directly form MySQL without having to parse the whole thing.
Take a look at this link : https://dev.mysql.com/doc/refman/5.7/en/json.html
If speed is the most important when retrieving the rows then make a new table and make it dedicated to holding the indices of your array. Use the data type of integer and have each row represent an index of the array. You'll have to create another numeric column which binds these together so you can re-assemble the array with an SQL query.
This way you help MySQL help you speed up access. If you only want certain parts of the array, you just change the range in the SQL query and you can reassemble the array however you want.
The best way to store array is JSON data type -
CREATE TABLE example (
`id` int NOT NULL AUTO_INCREMENT,
`docs` JSON,
PRIMARY KEY (`id`)
);
INSERT INTO example (docs)
VALUES ('["hot", "cold"]');
Read more - https://sebhastian.com/mysql-array/#:~:text=Although%20an%20array%20is%20one,use%20the%20JSON%20data%20type.

Order By varbinary column that holds docx files

I'm using MS SQL 2008 server, and I have a column that stores a word document ".docx".
Within the word document is a definition (ie: a term). I need to sort the definitions upon returning a dataset.
so basically...
SELECT * FROM DocumentsTable
Order By DefinitionsColumn ASC.
So my problem is how can this be accomplished, the binary comlumn only sorts on the binary value and not the word document content?
I was wondering if fulltext search/index would work. I already have that working, just not sure if I can use it with ORDER BY.
-Thanking all in advance.
I think you'd need to add another column, and populate this with the term from inside the docx. If it's possible at all to get SQL to read the docx (maybe with a custom .net function?) then it's going to be pretty slow.
Better to populate and maintain another column.
You have a couple options that may or may not be acceptable.
store the string definition contents
of the file in a field along side
the binary file column in the
record.
Only store the string definition in the record, and build the .docx
file at runtime for use within your
application.