I try to parse H265 seq_parameter_data from an RBSP data... I use the reference from http://www-ee.uta.edu/Dip/Courses/EE5359/H.264%20Standard2007.pdf .
It shows how to parse at page 40.
But the pseudo-algorithms use data structures such as ScalingList4x4[ ], ScalingMatrix4x4Flag[ ] etc... What are they? Are they constant tables or we fill them? Anybody has knowlegde about it?
If you are trying to parse parameter sets (I assume this from your previous questions), you don't need these structures. You just need to skip certain amount of bits in order to skip the tables in full and move on to the values of your interest.
Related
I am testing Delta Lake for a simple use case that is very easy in Snowflake, but I'm having a heck of a time understanding if it can be done, much less actually doing it.
I want to be able to load a JSON file "raw," without specifying a schema, and I want to be able to query and flatten it later. In Snowflake, I can create a column of type VARIANT and load the JSON text there, and later I can ask for the different parts by using :: and lateral flatten, etc.
The examples I've seen so far about Delta Lake have had "schema inference" or "autoloading" stipulations, and with those it seems that even if I don't specify a schema, one is created for me and then I still have to guess (or look up) what columns Delta Lake created for me so I can query those parts of the JSON. It seems a little too complicated.
This page has the following comment:
When ingesting data, you may need to keep it in a JSON string, and some data may not be in the correct data type.
... but it provides no example of how to do that. To me this suggests that you can somehow store the raw JSON and query it later, but I don't know how. Just make a STRING column and insert the JSON as string? Can someone post an example?
Am I trialing the wrong tool for what I need, or am I missing something? Thank you for your help.
As far as I'm aware, there is no direct equivalent to the VARIANT column in Snowflake. What that page is suggesting is storing the data as a string, and then using the semi-structured access operators to parse it as JSON on the fly.
e.g. given a table named devices with a column named specifications of type string with value
"""{
"device": "potato phone",
"sku": "POTATO0001",
}"""
Then you can query it like this:
SELECT specifications:device, specifications:sku from devices
edit: to address some of your other questions
This doesn't do schema enforcement. It's possible to create a Struct column in delta lake that can store structured data, but all the data in that column need to be compatible with the Struct schema. If you are querying a JSON string column, you are on your own for schema management.
Following a previous question I started working on a Data Flow, with the purpose of flattening a JSON file, created as a result of an Application Insights REST query. You can find an anonymised version here.
My goal is to extract the data in the "rows" array of arrays, but I end up with the data duplicated in a cartezian manner (I got an original number of 18 rows and I end up with 324, 18*18).
I cannot understand what I am doing wrong or if it is an issue with the JSON "rows" array of arrays.
Here is my Data Flow - Source has the "Document per line" JSON option, "Single documents" raises a [unexpected character "] error, probably due to the strange formatting in the JSON:
This is the Data Preview in the Source - as you can see, it is only one "tables" node, with 18 elements in the "rows" array:
rows:
I tried to Flatten it, but I cannot map "rows" data to a column, I cannot use something like table.rows[0]:
Also, the rows data gets duplicated - 18 rows for each of the 18 rows outputted:
I am not sure how to get to the bottom of this, if it's the JSON format or if I am doing something wrong. From my experience it's probably the latter.
I think this is caused by your special format.
Please try this:
add a rowdata property
flatten your data
#Steve Zhao thank you! But that solution duplicated the data similar to the original situation:
I did not manage to treat these data as JSON so I ended up thinking about it as text that can be manipulated into an array.
So I split the text by "rows:" and retrieve the second part of the split (arrays in Expression Builder start from 1):
Then I split that text as an array:
Which then I can flatten (at last):
From here on I keep splitting these values to get the data I need - I was interested in the first two and fourth column.
I have this solution that helps me creating a Wizard to fill some data and turn into JSON, the problem now is that I have to receive a xlsx and turn specific data from it into JSON, not all the data but only the ones I want which are documented in the last link.
In this link: https://stackblitz.com/edit/xlsx-to-json I can access the excel data and turn into object (when I print document.getElementById('output').innerHTML = JSON.parse(dataString); it shows [object Object])
I want to implement this solution and automatically get the specified fields in the config.ts but can't get to work. For now, I have these in my HTML and app-component.ts
https://stackblitz.com/edit/angular-xbsxd9 (It's probably not compiling but it's to show the code only)
It wasn't quite clear what you were asking, but based on the assumption that what you are trying to do is:
Given the data in the spreadsheet that is uploaded
Use a config that holds the list of column names you want returned in the JSON when the user clicks to download
based on this, I've created a fork of your sample here -> Forked Stackbliz
what I've done is:
use the map operator on the array returned from the sheet_to_json method
Within the map, the process is looping through each key of the record (each key being a column in this case).
If a column in the row is defined in the propertymap file (config), then return it.
This approach strips out all columns you don't care about up front. so that by the time the user clicks to download the file, only the columns you want are returned. If you need to maintain the original columns, then you can move this logic somewhere more convenient for you.
I also augmented the property map a little to give you more granular control over how to format the data in the returned JSON. i.e. don't treat numbers as strings in the final output. you can use this as a template if it suites your needs for any additional formatting.
hope it helps.
What's the purpose (not what it becomes) of doing json_encode on this before I am putting into the database
rating: {cleanliness: 3, publicFacility: 1, roomFacility: 2, security: 2}
to become this
rating: "{"cleanliness":3,"publicFacility":1,"roomFacility":2,"security":2}"
I see no point of doing this cause I need to json_decode it again before serving it back... can anybody clear me out?
Do not store json encoded data in the database. You mitigate the whole point of a relational database this way and make searching for values an expensive task. I see in your sample the attributes cleanliness, publicFacility, roomFacility and security. Those should be columns in your database so you can search for something like "all entries with a cleanliness higher than 3".
It works with the JSON column type but it is more expensive than using normal columns.
Edit: Check the use-case for your database entry. If you are sure you never need to search in or order by the encoded attributes you can store data encoded as json string. However, if your database supports the JSON column type, you should use that one because it allows searching in the stored JSON (but is more expensive than searching in normal columns). </Edit>
Second point: The second code snipped (with the quotation marks) looks like invalid syntax for json.
When converting CSV to AVRO I would like to output all the rejections to a file (let's say error.csv).
A rejection is usually caused by a wrong data type - e.g. when a "string" value appears in a "long" field.
I am trying to do it using incompatible output, however instead of saving the rows that failed to convert (2 in the example below), it saves the whole CSV file. Is it possible to filter out somehow only these records that failed to convert? (Does NiFi add some markers to these records etc?)
Both processors: RouteOnAttribute and RouteOnContent route the whole files. Does the "incompatible" leg of the flow somehow mark single records with something like "error" attribute that is available after splitting the file into rows? I cannot find this in any doc.
I recommend using a SplitText processor upstream of ConvertCSVToAvro, if you can, so you are only converting one record at a time. You will also have a clear context for what the errors attribute refers to on any flowfiles sent to the incompatible output.
Sending the entire failed file to the incompatible relationship appears to be a purposeful choice. I assume it may be necessary if the CSV file is not well formed, especially with respect to records being neatly contained on one line (or properly escaped). If your data violates this assumption, SplitText might make things worse by creating a fragmented set of failed lines.