How to read files with headers in Apache Drill - apache-drill

When I setup Apache Drill like so:
"csv": {
"type": "text",
"extensions": [
"csv2"
],
"skipFirstLine": false,
"extractHeader": true,
"delimiter": ","
},
The SkipFirstLine disappears when I save the configuration file - Why?

Default value of these boolean properties skipFirstLine, extractHeader is false. It will only be seen in the plugin json if it's true.
Make "extractHeader": false and update plugin configuration, this will also disappear.

Related

Serilog doesn't parse file name templae

I'm using serilog with this configuration:
{
"Serilog": {
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": "Debug",
"Enrich": [ "FromLogContext", "WithMachineName", "WithThreadId" ],
"WriteTo": [
{ "Name": "Console" },
{
"Name": "File",
"Args": {
"path": "./logs/performance-{Date}.log",
"rollingInterval": "Day",
"fileSizeLimitBytes": 1000,
"rollOnFileSizeLimit": true,
"retainedFileCountLimit": null,
"shared": true
}
}
]
}
}
Output file should look like 20210613-performance.log But output file looks like {Date}-performance20210613.log.
What i'm doing wrong?
The {Date} placeholder is not a feature of the Serilog.Sinks.File sink that you're using. You're probably confusing with the (deprecated) Serilog.Sinks.RollingFile sink which has this feature.
With Serilog.Sinks.File, at this time, you cannot define where the date will appear. It's always appended to the end of the file name you choose (and before the sequence number if you also are rolling by file size).
There have been attempts to implement this feature, but as of this writing it's not yet there.

postman duplicate collection / export + re-import

relatively new to Postman, having problem with the following simple scenario - I have a collection of Postman requests that all point to a local IP where I am developing my application. Let's suppose I finished my local development, deployed the application on some other server, and want to repeat the requests I previously created on THAT server. I know that probably one way to do this would be to use variables.
Instead of that, though, I did an export of the collection, and did a manual edit of the exported JSON file, replacing all the old local IP's with the new server IP. Also changed the collection name, and ID to something arbitrary. While the import back to Postman works, and I see the requests, they all have the old IP still hanging there, as if my replace didn't work, or as if Postman somehow caches the requests and thinks that that new collection is the same as the old one. I also tried "Duplicating" a collection and exporting the duplicated one / replacing / importing again - but the behavior seems to be the same.
Did I miss something, or should I approach what I want to do differently?
Thank you.
duh, I am dumb enough to have been substituting the "raw" URL, while right below there were the old values for "host" and "port" that are the ones Postman constructs URL from:
{
"info": {
"_postman_id": "1499274a-07bc-4ed2-87d4-b10d0cef8f8f",
"name": "some-collection-DEVSERVER",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "login (success - bad locale)",
"request": {
"method": "POST",
"header": [
{
"key": "Content-Type",
"name": "Content-Type",
"value": "application/json",
"type": "text"
}
],
"body": {
"mode": "raw",
"raw": "{\n\t\"username\" : \"TEST\",\n\t\"password\" : \"123456\",\n\t\"locale\" : \"asd\"\n}"
},
"url": {
"raw": "http://SERVER-IP:SERVER-PORT/new-path/login",
"protocol": "http",
"host": [
"127",
"0",
"0",
"1"
],
"port": "8081",
"path": [
"old-path",
"login"
]
}
},
"response": []
},
...
]
}
So, after suggestion to use variables I ended up creating two Collection variables "base-URL-LOCAL" and "base-URL-SERVER", that play the role of constants, and a third variable "base-url" which e.g. could have the value of {{base-URL-LOCAL}} (both initial and current values have to be updated). In my exported JSON collection, i substituted all "url" elements with something like the following:
"url": {
"raw": "{{base-url}}/login",
"host": [
"{{base-url}}"
],
"path": [
"login"
]
}
That way somebody who gets my collection won't have to have pre-defined environments set up, and will have to edit collection variables, setting e.g. base-url to {{base-URL-SERVER}}

How to configure LumberJack(LogStash-forwarder) on windows

I've installed ELK on my ubuntu server using this manual,
Now i want to index some log files from a windows server so I installed a logstash forwarder (LumberJack), but I can't get it to run.
this is the logstash-forwarder.conf file :
{
"network": {
"servers": [ "http://XX.XX.XX.XX:5000" ],
"ssl key": "D:/lumberjack/pki/tls/certs/logstash-forwarder.crt",
"ssl ca": "D:/lumberjack/pki/tls/certs/logstash-forwarder.crt",
"timeout": 15,
},
"files": [
{
"paths": [
#single paths are fine
"D:/bea12/Oracle/Middleware/domains/Google/servers/RT1/logs/AppLogs/RT1_APP_9_0.log",
#globs are fine too, they will be periodically evaluated
#to see if any new files match the wildcard.
"/var/logauth.log"
],
]
}
}
and this is the Error I get when i'm trying to run the "lumberjack.exe" ,
That I created with go-build:
2015/04/30 18:17:39.052033 Failed unmarshalling json: invalid character '}' looking for beginning of object key string
2015/04/30 18:17:39.052033 Could not load config file d:\lumberjack\logstash-forwarder.conf: invalid character '}' looking for beginning of object key string
Can anyone please tell me what am I doing wrong?
By the way this is the command I'm using to run the forwarder:
lumberjack.exe -config="d:\lumberjack\logstash-forwarder.conf"
Ok.
So the problem was in the configuration file, There were 2 unnecesary commas and no need for the http:\ at the start:
{
"network": {
"servers": [ "XX.XX.XX.XX:5000" ],
"ssl key": "D:/lumberjack/pki/tls/certs/logstash-forwarder.key",
"ssl ca": "D:/lumberjack/pki/tls/certs/logstash-forwarder.crt",
"timeout": 15
},
"files": [
{
"paths": [
#single paths are fine
"D:/bea12/Oracle/Middleware/domains/google/servers/RT1/logs/AppLogs/RT1_APP_9_0.log",
#globs are fine too, they will be periodically evaluated
#to see if any new files match the wildcard.
"/var/logauth.log"
]
}
]
}
This is my suggested configuration file for LumberJack on windows.

Apache Drill - Query HDFS and SQL

I'm trying to explore Apache Drill. I'm not a Data Analyst, just an Infra support Guy. I see documentation on Apache Drill is too limited
I need some details about custom data storage that can be used with Apache Drill
Is it possible to query HDFS without Hive, using Apache Drill just like dfs do
Is it possible to query old age RDBMS like MySQL and Microsoft SQL
Thanks in advance
Update:
My HDFS Storage defention says error (Invalid JSON mapping)
{
"type":"file",
"enabled":true,
"connection":"hdfs:///",
"workspaces":{
"root":{
"location":"/",
"writable":true,
"storageformat":"null"
}
}
}
If I replace hdfs:/// with file:///, it seems to accept it.
I copied all the library files from the folder
<drill-path>/jars/3rdparty to <drill-path>/jars/
Cannot make it work. Please help. I'm not a dev at all, I'm Infra guy.
Thanks in advance
Yes.
Drill directly recognizes the schema of the file based on the metadata. Refer the link for more info -
https://cwiki.apache.org/confluence/display/DRILL/Connecting+to+Data+Sources
Not Yet.
While there is a MapR driver that lets you achieve the same but it is not inherently supported in Drill now. There have been several discussions around this and it might be there soon.
YES, it is possible that drill can communicate with both the Hadoop system and the RDBMS systems together. Infact you can have queries joining both the systems.
The HDFS storage plug in can be as :
{
"type": "file",
"enabled": true,
"connection": "hdfs://xxx.xxx.xxx.xxx:8020/",
"workspaces": {
"root": {
"location": "/user/cloudera",
"writable": true,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"parquet": {
"type": "parquet"
},
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"json": {
"type": "json"
}
}
}
The connection URL will be your mapR/Coudera URL with port number 8020 by default . You should be able to spot that in the configuration of Hadoop on your system with configuration key : "fs_defaultfs"

Creating Ext.data.Model using JSON config

In the app we're developing, we create all the JSON at the server side using dinamically generated configs (JSON objects). We use that for stores (and other stuff, like GUIs), with a dinamically generated list of its data fields.
With a JSON like this:
{
"proxy": {
"type": "rest",
"url": "/feature/163",
"timeout": 600000
},
"baseParams": {
"node": "163"
},
"fields": [{"name": "id", "type": "int" },
{"name": "iconCls", "type": "auto"},
{"name": "text","type": "string"
},{ "name": "name", "type": "auto"}
],
"xtype": "jsonstore",
"autoLoad": true,
"autoDestroy": true
}, ...
Ext will gently create an "implicit model" with which I'll be able to work with, load it on forms, save it, delete it, etc.
What I want is to specify through a JSON config not the fields, but the model itself. Is this possible?
Something like:
{
model: {
name: 'MiClass',
extends: 'Ext.data.Model',
"proxy": {
"type": "rest",
"url": "/feature/163",
"timeout": 600000},
etc... }
"autoLoad": true,
"autoDestroy": true
}, ...
That way I would be able to create a whole JSON from the server without having to glue stuff using JS statements on the client side.
Best regards,
I don't see why not. The syntax to create a model class is similar to that of store and components:
Ext.define('MyApp.model.MyClass', {
extend:'Ext.data.Model',
fields:[..]
});
So if you take this apart you could call Ext.define(className,config);
where className is a string and config is a JSON object and both are generated on the server.
There's no way to achieve what I want.
The only way you can do it is by means of defining the fields of the Ext.data.Store and have it to generate the implicit model by using the fields configuration.