How can I use drill-embed with file extensions different from the default ones? - apache-drill

I'm trying to test apache drill via drill-embed but all my json files are jsonline files with the jl.gz file extension.
If I rename them to json.gz it works but this is undesirable in my case.
How can I tell drill that jl.gz files are actually json?
PS: I tried adding a bootstrap-storage-plugins.json to $CP but drill-embed doesn't seem to read it.

Yes, do not use bootstrapping. That is only for distributed environments and using the Web Console or REST API is recommended. It probably goes without saying that the gz file must be zipped, not an unzipped JSON file with a gz extension. Create a new storage plugin configuration, for example myplugin, based on the default dfs storage plugin.
Start the Drill shell and go to http://<IP address or host name>:8047. Select Storage in the toolbar. The dfs storage plugin configuration appears in the list of default configurations.
On the Storage tab, under Enabled Storage Plugins, click UPDATE to copy the dfs storage plugin configuration.
The configuration of the plugin appears.
Copy the configuration and go back (just cancel the configuration).
On the Storage tab, enter a name in New Storage Plugin. For example, enter myplugin. Each configuration registered with Drill must have a distinct name. Names are case-sensitive.
Click CREATE.
In Configuration, in the formats section, change the json format to specify extensions: "gz"
"json": {
"type": "json",
"extensions": [
"gz"
]
},
Click CREATE.
Now, in the Drill shell, you can query the JSON file named something.gz:
use myplugin;
select * from `/Users/me/donuts.gz` limit 2;

Related

"Automate JSON Files upload to Blob Storage"

Automatic JSON Files upload to Blob Storage.
Description:
We have a SSIS job which will generate JSON files with data at a server path. We are manually copying the JSON files and dropping them in BLOB storage in order to trigger our logic app.
Now, Could anyone help to provide information on how we can automate the process of copying JSON files to BLOB automatically? ( Like do we have any approach or code to copy the JSON files at a specific time and copy those JSON files in BLOB )
The solution is to listen to the file system change at your server path, then to use Azure Storage SDK to upload these files which be triggered by the file changed event.
As reference, here are some resources about the API or SO threads of file changes listener in different languages, because I don't know what language you want to use.
C# FileSystemWatcher Class
Python How do I watch a file for changes?
Node.js Observe file changes with node.js
For other languages, I think you can easily get their solution by searching. And to upload files to Azure Storage, you just need to refer to Azure offical getstarted tutorials in dfferent languages to write your code.

Is it possible to have JSON schema autocomplete and documentation with eclipse?

I have a massive JSON setup file that I use with one of my projects. I would like to write some documentation and validation rules via a JSON schema so the edition of this file is easier for someone who's not familiar with it.
I want to be able to open the json setup file in Eclipse and have autocomplete for the json properties via intellisense. I also want to see the documentation comments for any json setup property when the mouse is over it.
I can't find any kind of documentation about this feature. Is this possible with eclipse?
I will answer my own question, cause I found the way to do it pretty easy and works quite well.
So,
· Having a test.json file which contains the setup of your project which
you want to get autocomplete features on
· Having a c:\test.schema.json file which contains the schema for your
setup file
The steps to activate this feature on eclipse are:
1- Open eclipse
2- go to "Window > Preferences: JSON > JSON Catalog"
3- add a new entry with "test.json" as the file and
"file:/c:/test.schema.json" as the URL
4- enable syntax and schema validation under "Window > Preferences:
JSON > JSON Files > Validation"
5- Apply the changes
6- Open the test.json file with eclipse and check that ctrl + space performs autocomplete
The only tedious part of this process that I've found is that every time the json schema file gets modified, you have to go again to "Window > Preferences: JSON > JSON Catalog" and apply the changes again to refresh the new values

In what form does Chrome save local storage?

I am trying to figure out where and how does Chrome save local storage.
I found the following folder (in my home folder) that seems to contain the local storage:
\AppData\Local\Google\Chrome\User Data\Default\Local Storage
In this folder I see files that corresponds to different URLs (the files contain URLs in their names). For each URL I see two types of files:
LOCALSTORAGE file
LOCALSTORAGE-JOURNAL file
I am interested in local storage of one particular web site. For this web-site the LOCALSTORAGE file contains only 6KB and the LOCALSTORAGE-JOURNAL contains nothing (0 KB).
On the other hand, when I open the web site of interest in Chrome and then press F12 I see in the local storage 6 different URL and if I click on one of them I see key-value pairs.
So, what I see in the folder and in the Chrome development tool is not consistent. Why is that? How can one find content of local storage in the directories? Or is it impossible?
The file is in SQL Lite format. Install SQL Lite, then type the following commands:
cd %LocalAppData%\Google\Chrome\User Data\Default\Local Storage
sqlite3 *filename*
select * from ItemTable;
.quit
The ItemTable table contains key-value pairs, the semantics of which depend on the individual website.
see the description of localstorage file here
it says : The extension LOCALSTORAGE indicates an application support file created by the web browsers using WebKit, such as Google Chrome and Apple Safari. These files store browser settings or local data for a browser extension, and enables extensions to store a local cache of user data saved in an SQLite database format.
You can browse localstorage files by a sql-lite browser, such as the open source program called sql-lite database browser

How is HTML5 WebStorage data physically stored?

In using the HTML5 WebStorage functionality, I know that certain browsers, like Chrome, have developer tools that enable users to browse thru the contents of their WebStorage for debugging and trouble-shooting purposes.
I was wondering if it is possible to view the contents of web storage in the file system. Is this content stored in text files on the file system that are in some standard location? Or is this data stored in some proprietary binary format by the various browsers and is not designed to be accessible or viewable by browsing the file system?
My motivation for asking this question is to see if you can view the content of WebStorage on the file system as an aid to development and debugging, and also just out of curiosity too see how this data is actually stored.
Thanks.
Chrome uses SQLite for LocalStorage.
I confirmed this by going to AppData\Local\Google\Chrome\User Data\Default\Local Storage on my local PC and viewing the contents of a file. The files start with "SQLite format 3" when viewed via a text editor. You will need a SQLite database viewer to view the data.
On Mac OS X, this was at ~/Library/Application Support/Google/Chrome/Default/Local Storage
I used the Command Line Shell For SQLite to look around. Assuming www.example.com was a real site, you can run these commands:
$ sqlite3 http_www.example.com_0.localstorage
sqlite> .tables
ItemTable
sqlite> .schema
CREATE TABLE ItemTable (key TEXT UNIQUE ON CONFLICT REPLACE, value BLOB NOT NULL ON CONFLICT FAIL);
sqlite> select * from ItemTable;
stringkey|value
jsonkey|{"key","value"}
sqlite> .exit
See Where does firefox store javascript/HTML localStorage? for the Firefox storage location.  Chrome uses individual sqlite files per hostname and protocol, where Firefox uses a single webappsstore.sqlite file with the reversed hostname and protocol in a scope column.
See Where the sessionStorage and localStorage stored? for the Opera storage location. Opera uses an XML index file and individual XML files for the Base64 encoded data.
Just wanted to contribute for IE 11.
The localstorage is stored in: C:\Users[YOUR USER ACCOUNT]\AppData\LocalLow\Microsoft\Internet Explorer\DOMStore
However, it is hidden by default. To show this folder you have to:
Folder Options --> Uncheck "Hide protected operating system file"
Back to folder, you will see some sub folder inside. Go to each folder will see some XML files according for websites.

Hadoop: map/reduce from HDFS

I may be wrong, but all(?) examples I've seen with Apache Hadoop takes as input a file stored on the local file system (e.g. org.apache.hadoop.examples.Grep)
Is there a way to load and save the data on the Hadoop file system (HDFS)? For example I put a tab delimited file named 'stored.xls' on HDFS using hadoop-0.19.1/bin/hadoop dfs -put ~/local.xls stored.xls. How should I configure the JobConf to read it ?
Thanks .
JobConf conf = new JobConf(getConf(), ...);
...
FileInputFormat.setInputPaths(conf, new Path("stored.xls"))
...
JobClient.runJob(conf);
...
setInputPaths will do it.
Pierre, the default configuration for Hadoop is to run in local mode, rather than in distributed mode. You likely need to just modify some configuration in your hadoop-site.xml. It looks like your default filesystem is still localhost, when it should be hdfs://youraddress:yourport. Look at your setting for fs.default.name, and also see the setup help at Michael Noll's blog for more details.
FileInputFormat.setInputPaths(conf, new Path("hdfs://hostname:port/user/me/stored.xls"));
This will do