Unknown data format returned by firebase - json

While obtaining record from firebase in appinventor, data is returned in following format.
{ a=1 , b=2, c=3}
What is this format? Is there anyway this format can be converted into a standard format like JSON.
P.S replacing '=' with ':' do not work either.

Related

How to Access the data inside the JSON object. Some of my columns in pyspark dataframe is in JSON format

I was trying to access the data inside object. The table is in pyspark dataframe, but some table values are in JSON format. I need t access the date from it and convert into meaningful format. Any help or idea will be great relief.
This is thing i'm working out:
I was able to extract data which are in array format using:
data_df=df_deidentifieddocuments_tst.select("_id",explode("annotationId").alias("annotationId")).select("_id","annotationId.*")
The above code doesn't work for date, as its showing type mismatch error
AnalysisException: cannot resolve 'explode(createdAt)' due to data type mismatch: input to function explode should be array or map type, not struct<$date:bigint>;

Athena (Trino SQL) parsing JSON document using fields (dot notation)

Athena (Trino SQL) parsing JSON document (table column called document 1 in Athena) using fields (dot notation)
If the underlying json (table column called document 1 in Athena) is in the form of {a={b ...
I can parse it in Athena (Trino SQL) using
document1.a.b
However, if the JSON contains {a={"text": value1 ...
the quote marks will not parse correctly.
Is there a way to do JSON parsing of a 'field' with quotes?
If not, is there an elegant way of parsing the "text" and obtain the string in value 1? [Please see my comment below].
I cannot change the quotes in the json and its Athena "table" so I would need something that works in Trino SQL syntax.
The error message is in the form of: SQL Error [100071] [HY000]: [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. SYNTAX_ERROR: Expression [redacted] is not of type ROW
NOTE: This is not a duplicate of Oracle Dot Notation Question
Dot notation works only for columns types as struct<…>. You can do that for JSON data, but judging from the error and your description this seems not to be the case. I assume your column is of type string.
If you have JSON data in a string column you can use JSON functions to parse and extract parts of them with JSONPath.

googleapis / python-bigquery: BadRequest: Could not parse as DATE with message 'Unable to parse'

Given the following code:
with io.StringIO() as buf:
buf.write(df_data.to_csv(header=True, index=False, quoting=csv.QUOTE_NONNUMERIC))
buf.seek(0)
try:
job = self.client.load_table_from_file(buf, dest_table)
job.result()
except:
buf.seek(0)
LOG.error("Failed to upload dataframe as csv: \n\n%s\n", buf.read())
raise
I am trying to load a pandas DataFrame to a bigquery table via converting to a CSV first. The problem I am faced with is that the BigQuery API fails with
google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Could not parse 'date_key' as DATE for field date_key (position 3) starting at location 0 with message 'Unable to parse'
I looked at this other issue, and there's seem to be a limitation on the accepted formats for DATEs when loading a CSV file.
This being said, the prints from the above except block results in the following:
ERROR utils.database._bigquery:_bigquery.py:255 Failed to upload dataframe as csv:
"clinic_key","schedule_template_time_interval_key","schedule_template_key","date_key","schedule_owner_key","schedule_template_schedule_track_key","schedule_content_label_key","start_time_key","end_time_key","priority"
"clitest11111111111111111111111","1","1","2021-01-01","1","1","1","19:00:00","21:00:00",1
"clitest11111111111111111111111","1","1","2021-01-01","1","1","2","20:00:00","20:30:00",2
"clitest11111111111111111111111","1","1","2021-01-01","1","1","3","20:20:00","20:30:00",3
Which to me seems to be a clearly well-formatted CSV file.
So my question is: How can I make BigQuery accept my CSV? What do I have to change?
N.B: I know there's a load_dataframe_to_table method on the bigquery.client.Client object, but I faced another issue that forced me to attempt the CSV method instead. See link to other issue here.
You need to drop the header row.
That location 0 suggests it dislikes the first row.
Your other date values look correct (YYYY-MM-DD).
Because column ordering is important with CSV, BigQuery can assume the mapping to its table.

Athena - DATE column correct values from JSON

I have a S3 bucket with many JSON files.
JSON file example:
{"id":"x109pri", "import_date":"2017-11-06"}
The "import_date" field is DATE type in standard format YYYY-MM-DD.
I am creating a Database connection in Athena to link all these JSON files.
However, when I create a new table in Athena and specify this field format as DATE I get: "Internal error" with no other explanation provided. To clarify, the table gets created just fine but if I want to preview it or query, I get this error.
However, when I specify this field as STRING then it works fine.
So the question is, is this a BUG or what should be the correct value for Athena DATE format?
The date column type does not work with certain combinations of SerDe and/or data source.
For example using a DATE column with org.openx.data.jsonserde.JsonSerDe fails, while org.apache.hive.hcatalog.data.JsonSerDe works.
So with the following table definition, querying your JSON will work.
create external table datetest(
id string,
import_date date
)
ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://bucket/datetest'

How to serialise a spark sql row when it contains a comma

I am using Spark Jobserver https://github.com/spark-jobserver/spark-jobserver and Apache Spark for some analytic processing.
I am receiving back the following structure from jobserver when a job finishes
"status": "OK",
"result": [
"[17799.91015625,null,hello there how areyou?]",
"[50000.0,null,Hi, im fine]",
"[0.0,null,All good]"
]
The result doesnt contain valid json, as explained here:
https://github.com/spark-jobserver/spark-jobserver/issues/176
So I'm trying to convert the returned structure into a json structure, however I cant simply make the result string insert ' (single quotes) based on the comma delimiter, as sometimes the result contains a comma itself.
How can i convert a spark Sql row into a json object in the above situation?
I actually found a better way in the end,
from 1.3.0 onwards you can use .toJSON on a Dataframe to convert it to json
df.toJSON.collect()
to output a dataframes schema to json you can use
df.schema.json