I've zip files in my container and I would get one or more files everyday and as they come in, I want to process the files. I have some questions.
Can I use Databricks autoloader feature to process zip files? Is zip file supported by Autoloader?
What settings need to be enabled to use Autoloader? I have my container and sas token.
Once the zip file is processed (unzip, read each of the file in the zip file), I should not read the zip again. How can I do this when I use Autoloader? Is there any specific setting?
Are there any samples available? I'm new to this area and trying to get more info.
Unfortunately, processing of Zip file using Azure DataBrick is not possible.
Auto Loader supports two modes for detecting new files: directory listing and file notification.
Auto Loader provides a Structured Streaming source called cloudFiles.
Given an input directory path on the cloud file storage, the
cloudFiles source automatically processes new files as they arrive,
with the option of also processing existing files in that directory.
Auto Loader can scale to loading data from storage accounts that
contain billions of files that need to be backfilled to pipelines
where millions of files are loaded in an hour.
For more information you can refer this Microsoft Document
Related
I am in the process of uploading a huge number of tests for my school (I am a computer science teacher). These come in the form of .h5p files. I need to parse information into the .h5p files from .txt documents, ready for uploading to Moodle courses. To do this, I have built an app to push the data from .txt files into the .json files in the .h5p file.
The problem is that my app converts the h5p to a zip, unzips it and then parses the information, rezips and then changes the extension again to h5p. Would you mind watching this video https://youtu.be/FTyQddAcWa8 and letting me know how I might be able to edit the .json files and then rezip ready for uploading to the Moodle courses? The files throw up errors once unzipped and then zipped again.
I think the unzipping process is altering the relative links.
Bottom line is, these tests are critical in my school of 1,274 children mitigating the impact of COVID-19 lockdown.
The unzipping process is not the problem, but the zipping is.
When you upload the file, H5P is complaining because it expects some flags to be set when zipping:
-D do not add directory entries
-X eXclude eXtra file attributes
I assume that at some point your script is calling zip. That call would need to pass the correct flags. On a command line, you'd use
zip -rDX myNewFile.h5p *
to pack all files in the current directory into a valid H5P content file named myNewFile.h5p. Just "translate" that into your script.
I was downloading Revit models from BIM360 team hub via ForgeAPI using the following uri.
https://developer.api.autodesk.com/oss/v2/buckets/:bucketKey/objects/:objectName
All my objectName ended with .rvt. So I downloaded and saved them as rvt file.
However I noticed that some of the files cannot be opened by Revit. They are actually not rvt files but zip files. So I have to change the extension to .zip and unzip the file to get real 'rvt` files.
My Problem is that not all files is zip file. I cannot tell from the API because the URI I request is always ended with .rvt.
Every Unix OS provides the file command, a standard utility program for recognising the type of data contained in a computer file:
https://en.wikipedia.org/wiki/File_(command)
A zip file is directly recognised and reported like this:
$ file test_dataset.zip
test_dataset.zip: Zip archive data, at least v2.0 to extract
A Revit RVT model is a Windows compound document file, so it generates the following output:
$ file little_house_2021.rvt
little_house_2021.rvt: Composite Document File V2 Document, Cannot read section info
Hence you can use the same algorithm as file does to distinguish between RVT and ZIP files.
Afaik, file just looks at the first couple of bytes in the given file.
The Python programming language offers similar utilities; try an Internet search
for distinguish file type python; the first hits explain
How to check type of files without extensions in Python
and point to the filetype Python project.
Other programming languages can provide similar functionality.
Automatic JSON Files upload to Blob Storage.
Description:
We have a SSIS job which will generate JSON files with data at a server path. We are manually copying the JSON files and dropping them in BLOB storage in order to trigger our logic app.
Now, Could anyone help to provide information on how we can automate the process of copying JSON files to BLOB automatically? ( Like do we have any approach or code to copy the JSON files at a specific time and copy those JSON files in BLOB )
The solution is to listen to the file system change at your server path, then to use Azure Storage SDK to upload these files which be triggered by the file changed event.
As reference, here are some resources about the API or SO threads of file changes listener in different languages, because I don't know what language you want to use.
C# FileSystemWatcher Class
Python How do I watch a file for changes?
Node.js Observe file changes with node.js
For other languages, I think you can easily get their solution by searching. And to upload files to Azure Storage, you just need to refer to Azure offical getstarted tutorials in dfferent languages to write your code.
I have an excel file (or csv), that holds a list of documents with their properties and absolute paths in local hard drive.
Now that we are going to use Alfresco (v5.0.d) as DMS, I have already created a custom aspect which reflect the csv fields and I'm looking for a better approach to import all document from the csv file into Alfresco repository.
You could simply write java application to parse your csv and upload files, file by file using the RESTful api and do not forget to replicate the folder tree in your alfresco repo (as it is not recommended to have more than 1000 folders/documents on the same level in the hierarchy since it would require some tweaking in a few non trivial usecases).
To create the folder, refer to this answer.
To actually upload the files, refer to my answer here.
I need to load a large number of pictures (around 30) in a sequence as a short movie, each .png has the size 960X540.
I don't want the loader depend on the name of each picture as I will make changes frequently.
Is there any suggestions?
Are you trying to load images from a local file system, or a remote web server?
If you want to load images from a local file system folder, you can use AIR's File/getDirectoryListing().
If you want to load images from a remote server, and you do not want to rely on a pre-defined file naming pattern, the server will need to be able to provide directory information, for example a PHP script that reads the directory contents and outputs XML or JSON. There's no general way for a client to probe a web server for files in a directory. Some web servers do have a default web directory listing script that shows when there is no "default" file in a folder (index.html, etc), but that probably won't be quite good enough for what you're trying to do.
As a final note, if you don't mind manually updating a file on the server that lists all the files as XML or JSON, you could create a simple AIR app to process a local file directory and generate the necessary XML or JSON and upload that to your server.