How to read data from Geomesa HBase Table? - geomesa

I have a curiosity to know what value Geomesa store in the HBase? What processing/conversion Geomesa does before storing data into HBase?
For example, If I do get for an Id on HBase directly, then I get the response-1, but for the same Id if I do geomesa-hbase export, then I get response-2. What type of conversion do I need to do if I directly want to read response-1? Is it a Kryo-serialized byte[]?
Response-1
COLUMN CELL
d: timestamp=1567139694958, value=\x02\x00\x00\x00`OSM-node-541692281\xB9\x01\x08\x03\xC0^\x87\x9D\x0A\xBC\x01L#G\xD9\xA0\x01\x92\xA77\x7F\xF8\x00\x00\x00\x00\x00\x00\x01\x00\x00\x01a\x9F\xA1\x80h\x821-\xB1testUse\xF2OS\xCD\x01\x00\x00\x00\x01B\xDF\xAE\xC3\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x05\x183<>#HKTU^
1 row(s) in 0.8870 seconds
Response-2:
id,key:String,*geom:Point:srid=4326,timestamp:Timestamp,version:String,uid:String,user:String,featureSource:String,nodeId:Long,"tags:Map[String,String]",changeset:Long,visible:Boolean
OSM-node-5416922819,OSM-node-5416922819,POINT (-122.1189600788104 47.7001955),2018-02-16T17:20:17.000Z,1,-1,testUser,OSM,5416922819,,0,false

HBase always stores values as byte[]. In this case, the value is a Kryo-serialized SimpleFeature. You can deserialize it using a GeoMesa serializer instance.

Related

JSON format in wordpress database

I need to import JSON data in a WP database. I found the correct table of the database but the example JSON data present in table is not in a normal JSON format.
I need to import:
{"nome": "Pippo","cognome": "Paperino"}
but the example data in the table is:
a:2:{s:4:"nome";s:5:"Pippo";s:7:"cognome";s:8:"Paperino";}
How i can convert my JSON to "WP JSON"?
The data is serialized, that's why it looks weird. You can use maybe_unserialize() in WordPress, this function will unserialize the data if it was serialized.
https://developer.wordpress.org/reference/functions/maybe_unserialize/
Some functions serialize data before saving it in wordpress, and some will also unserialize when pulling from the DB. So depending on how you save it, and how you later extract the data, you might end up with serialized data.

Need help creating schema for loading CSV into BigQuery

I am trying to load some CSV files into BigQuery from Google Cloud Storage and wrestling with schema generation. There is an auto-generate option but it is poorly documented. The problem is that if I choose to let BigQuery generate the schema, it does a decent job of guessing data types, but only sometimes does it recognizes the first row of the data as a header row, and sometimes it does not (treats the 1st row as data and generates column names like string_field_N). The first rows of my data are always header rows. Some of the tables have many columns (over 30), and I do not want to mess around with schema syntax because BigQuery always bombs with an uninformative error message when something (I have no idea what) is wrong with the schema.
So: How can I force it to recognize the first row as a header row? If that isn't possible, how do I get it to spit out the schema it generated in the proper syntax so that I can edit it (for appropriate column names) and use that as the schema on import?
I would recommend doing 2 things here:
Preprocess your file and store the final layout of the file sans the first row i.e. the header row
BQ load accepts an additional parameter in form of a JSON schema file, use this to explicitly define the table schema and pass this file as a parameter. This allows you the flexibility to alter schema at any point in time, if required
Allowing BQ to autodetect schema is not advised.
Schema auto detection in BigQuery should be able to detect the first row of your CSV file as column names in most cases. One of the cases for which column name detection fails is when you have similar data types all over your CSV file. For instance, BigQuery schema auto detect would not be able to detect header names for the following file since every field is a String.
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
The "Header rows to skip" option in the UI would not help fixing this shortcoming of schema auto detection in BigQuery.
If you are following the GCP documentation for Loading CSV Data from Google Cloud Storage you have the option to skip n number of rows:
(Optional) An integer indicating the number of header rows in the source data.
The option is called "Header rows to skip" in the Web UI, but it's also available as a CLI flag (--skip_leading_rows) and as BigQuery API property (skipLeadingRows)
Yes you can modify the existing schema (aka DDL) using bq show..
bq show --schema --format=prettyjson project_id:dataset.table > myschema.json
Note that this will result in you creating a new BQ table all together.
I have way to schema for loading csv into bigquery. You just enough edit value column, for example :
weight|total|summary
2|4|just string
2.3|89.5|just string
if use schema generator by bigquery, field weight and total will define as INT64, but when insert second rows so error or failed. So, you just enough edit first rows like this
weight|total|summary
'2'|'4'|just string
2.3|89.5|just string
You must set field weight & total as STRING, and if you want to aggregate you just use convert type data in bigquery.
cheers
If 'column name' type and 'datatype' are the same for all over the csv file, then BigQuery misunderstood that 'column name' as data. And add a self generated name for the column. I couldn't find any technical way to solve this. So I took another approach. 
If the data is not sensitive, then add another column with the 'column name' in string type. And all of the values in the column in number type. Ex. Column name 'Test' and all values are 0. Upload the file to the BigQuery and use this query to drop the column name.
ALTER TABLE <table name> DROP COLUMN <Test>
Change and according to your Table.

MySQL to GeoMesa through .csv

I have a MySQL table whose data I have to export to .csv and then ingest this .csv to GeoMesa.
My Mysql table structure is like below:
[
Now, as you can see the the_geom attribute of table has data type point and in database it is stored as blob like shown below:
Now I have two problems :
When I export the MySQL data into a (.csv) file my csv file shows (...) for the_geom attribute as shown below instead of any binary representation or anything which will allow it to be ingested in GeoMesa. So, how to overcome this?
Csv file also shows # for any attribute with datetime datatype but if you expand the column the date time can be seen as sown in below picture (however my question is does it will cause problem in geomesa?).
For #1, MySQL's export is not automatically converting the Point datatype into text for you. You might need to call a conversion function such as AsWKT to output the geometry as Well Known Text. The WKT format can be used by GeoMesa to read in the Point data.
For #2, I think you'll need to do the same for the date field. Check out the date and time functions.

Apache Drill Query PostgreSQL Json

I am trying to query a jsonb field in PostgreSQL in drill and read it as if were coming from a json storage type but am running into trouble. I can conver from text to json but cannot seem to query the json object. At least I think I can convert to JSON. My goal is to avoid reading through millions of uneven json objects from PostgreSQL, perform joins and things with text files such as CSV files and XML files. Is there a way to query the text field as if it were coming from a json storage type without writing large files to disk?
The goal is to generate results implicitly which PostgreSQL nor Pentaho do and integrate these data sets with others of any format.
Attempt:
SELECT * FROM (SELECT convert_to(json_field,'JSON') as data FROM postgres.mytable) as q1
Sample Result:
[B#7106dd
Attempt to existing field that should be in any json object:
SELECT data[field] FROM (SELECT convert_to(json_field,'JSON') as data FROM postgres.mytable) as q1
Result:
null
Attempting to do anything with jsonb results in a Null Pointer Error.

Converting Hive Map to Redshift JSON

Can anyone recommend a code snippet, script, tool for converting a Hive Map field to a Redshift JSON field?
I have a Hive table that has two Map fields and I need to move the data to Redshift. I can easily move the data to a string format but then lose some functionality with that. Would prefer to have Map ported to JSON to maintain key, value pairs.
Thanks.
You might want to try the to_json UDF
http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/