Can we receive webhooks from Foundry when a dataset updates? - palantir-foundry

Can Foundry we receive Webhook calls to an arbitrary external endpoint when a dataset updates (e.g. new transaction committed, or jobspec updated)?

I checked around and couldn't find any hooks, maybe there is something at actions level, but that is downstream from transactions.
Alternatively you can workaround it by having a downstream transform that just exists to build once upstream has transactions, and use it to make said http requests from your transforms via the driver. I understand it's not the same as you requested, but it's the best workaround I can think of.
You'll have to request via your support to have whatever endpoint you want white listed within Foundry Configuration. This will likely have to be applied by Palantir, and it will raise security questions, so you should reach out internally via your support mechanisms.
Once you do that, something like this should work.
import requests
#transform_df(
_noop_df=Input("your upstream"),
Output("output dataset"))
def transform(_noop_df):
req = requests.Request('POST','http://stackoverflow.com',headers={'X-Custom':'Test'},data='a=1&b=2')
# write the result of the request into the output if you want.

Related

Pass JSON POST Data to Ignite 2.7 ComputeTask

I have a complicated API request that needs to pass JSON data to an Ignite ComputeTask, but I can only seem to pass in data through the URL query string, which seems awkward, and potentially limiting. I have two questions:
Does the Ignite REST API have a max GET request limit, and if so, is there a way to increase it?
Is there any way to pass in JSON data through a POST request? I've experimented with ConnectorMessageInterceptor, but the args parameter is just the value of p1 from the query string.
If you're OK with passing JSON data in as a GET parameter, you can set the max GET size in your jetty configuration in your connector configuration using <Set name="requestHeaderSize">BYTES</Set>, though this is clearly not an optimal solution.
The short answer is no, there's no built-in way to intercept JSON POST body data in Ignite's built-in REST API. Although the Ignite documentation suggests that you configure Jetty's handlers, Ignite 2.7's implementation of Jetty (see GridJettyRestProtocol) actually overrides the configured Handler with its own GridJettyRestHandler, which only accepts requests in the form /ignite?cmd=cmdName&p1=params&name=taskName. To work around this, you can drop the ignite-rest-http lib and roll your own jetty implementation. If that seems like too much work and don't mind a somewhat hacky solution, you can piggyback on ignite's optional lib structure, and copy in just the file org.apache.ignite.internal.processors.rest.protocols.http.jetty.GridJettyRestProtocol from the ignite-rest-http lib, which Ignite will automatically pick up at start up. In GridJettyRestProtocol swap out GridJettyRestHandler for your own custom AbstractHandler that accepts POST data. Remember to import jetty as a project dependency.
You can supply your own Jetty config, which will probably let you configure GET request limit.
Have you actually tried to do POST with application/x-www-form-urlencoded inside?

Caching API response from simple ruby script

I'm working on a simple ruby script with cli that will allow me to browse certain statistics inside the terminal.
I'm using API from the following website: https://worldcup.sfg.io/matches
require 'httparty'
url = "https://worldcup.sfg.io/matches"
response = HTTParty.get(url)
I have to goals in mind. First is to somehow save the JSON response (I'm not using a database) so I can avoid unnecessary requests. Second is to check if the new data is available, and if it is, to override the previously saved response.
What's the best way to go about this?
... with cli ...
So caching in memory is likely not available to you. In this case you can save the response to a file on disk.
Second is to check if the new data is available, and if it is, to override the previously saved response.
The thing is, how can you check if new data is available without doing a request for the data? Not possible (given the information you provided). So you can simply keep fetching data every 5 minutes or so and updating your local file.

how to dump http request body in resteasy & wildfly 8.2

I am looking for a way to dump http request & reaponse body (json format) in resteasy on wildfly 8.2.
I've checked this answer Dump HTTP requests in WildFly 8 but it just dumps headers.
I want to see the incoming json message and outgoing one as well.
Can configuration do it without filter or any coding?
Logging HTTP bodies is not something frequently done. That's probably the primary reason for RequestDumpingHandler in Undertow only logging the header values. Also keep in mind that the request body is not always very interesting to log. Think for example of using WebSockets or transmitting big files. You can write your own MessageBodyReader/Writer for JAX-RS, and write to a ByteArrayOutputStream first, then log the captured content before passing it on. However, given the proven infeasibility of this in production, I think your mostly interested in how to do this during development.
You can capture HTTP traffic (and in fact any network traffic) using tcpflow or Wireshark. Sometimes people use tools such as netcat to quickly write traffic to a file. You can use for example the Chrome debugger to read HTTP requests/responses (with their contents).

Preventing access to JSON data in an Angular app

I got a (Flask) backend powering an API that serves JSON to an Angular app.
I love the fact that my backend (algorithms, database) is totally disconnected from my frontend (design, UI) as it could literally run from two distinct servers. However since the view is entirely generated client side everyone can access the JSON data obviously. Say the application is a simple list of things (the things are stored in a JSON file).
In order to prevent direct access to my database through JSON in the browser console I found these options :
Encrypting the data (weak since the decrypting function will be freely visible in the javascript, but not so easy when dealing with minified files)
Instead of $http.get the whole database then filtering with angular, $http.get many times (as the user is scrolling a list for example) so that it is programmatically harder to crawl
I believe my options are still weak. How could I make it harder for a hacker to crawl the whole database ? Any ideas ?
As I understand this question - the user should be permitted to access all of the data via your UI, but you do not want them to access the API directly. As you have figured out, any data accessed by the client cannot be secured but we can make accessing it a little more of PITA.
One common way of doing this is to check the HTTP referer. When you make a call from the UI the server will be given the page the request is coming from. This is typically used to prevent people creating mashups that use your data without permission. As with all the HTTP request headers, you are relying on the caller to be truthful. This will not protect you from console hacking or someone writing a scraper in some other language. #see CSRF
Another idea is to embed a variable token in the html source that bootstraps your app. You can specify this as an angular constant or a global variable and include it in all of your $http requests. The token itself could be unique for each session or be a encrypted expiration date that only the server can process. However, this method is flawed as well as someone could parse the html source, get the code, and then make a request.
So really, you can make it harder for someone, but it is hardly foolproof.
If users should only be able to access some of the data, you can try something like firebase. It allows you to define rules for who can access what.
Security Considerations When designing web applications, consider
security threats from:
JSON vulnerability XSRF Both server and the client must cooperate in
order to eliminate these threats. Angular comes pre-configured with
strategies that address these issues, but for this to work backend
server cooperation is required.
JSON Vulnerability Protection A JSON vulnerability allows third party
website to turn your JSON resource URL into JSONP request under some
conditions. To counter this your server can prefix all JSON requests
with following string ")]}',\n". Angular will automatically strip the
prefix before processing it as JSON.
For example if your server needs to return:
['one','two'] which is vulnerable to attack, your server can return:
)]}', ['one','two'] Angular will strip the prefix, before processing
the JSON.
Cross Site Request Forgery (XSRF) Protection XSRF is a technique by
which an unauthorized site can gain your user's private data. Angular
provides a mechanism to counter XSRF. When performing XHR requests,
the $http service reads a token from a cookie (by default, XSRF-TOKEN)
and sets it as an HTTP header (X-XSRF-TOKEN). Since only JavaScript
that runs on your domain could read the cookie, your server can be
assured that the XHR came from JavaScript running on your domain. The
header will not be set for cross-domain requests.
To take advantage of this, your server needs to set a token in a
JavaScript readable session cookie called XSRF-TOKEN on the first HTTP
GET request. On subsequent XHR requests the server can verify that the
cookie matches X-XSRF-TOKEN HTTP header, and therefore be sure that
only JavaScript running on your domain could have sent the request.
The token must be unique for each user and must be verifiable by the
server (to prevent the JavaScript from making up its own tokens). We
recommend that the token is a digest of your site's authentication
cookie with a salt for added security.
The name of the headers can be specified using the xsrfHeaderName and
xsrfCookieName properties of either $httpProvider.defaults at
config-time, $http.defaults at run-time, or the per-request config
object.
Please Kindly refer the below link,
https://docs.angularjs.org/api/ng/service/$http
From AngularJS DOCs
JSON Vulnerability Protection
A JSON vulnerability allows third party website to turn your JSON resource URL into JSONP request under some conditions. To counter this your server can prefix all JSON requests with following string ")]}',\n". Angular will automatically strip the prefix before processing it as JSON.
There are other techniques like XSRF protection and Transformations which will further add security to your JSON communications. more on this can be found in AngularJS Docs https://docs.angularjs.org/api/ng/service/$http
You might want to consider using JSON Web Tokens for this. I'm not sure how to implement this in Flask but here is a decent example of how it can be done with a Nodejs backend. This example at least shows how you can implement it in Angularjs.
http://www.kdelemme.com/2014/03/09/authentication-with-angularjs-and-a-node-js-rest-api/
Update: JWT for Flask:
https://github.com/mattupstate/flask-jwt

wso2 API Manager and BAM - How to control API invocation?

How can I retrieve the number of API invocations? I know the data has to be somewhere because wso2 BAM shows piecharts with similar data...
I would like to get that number in a mediation sequencel; is that possible? Might this might be achieved via a DB-lookup?
The way how API Usage Monitoring in WSO2 API Manager works is, there is an API handler (org.wso2.carbon.apimgt.usage.publisher.APIUsageHandler) that gets invoked for each request and response passing through the API gateway. In this handler all pertinent information with regard to API usage is published to the WSO2 BAM server. The WSO2 BAM server persists this data in Cassandra database that is shipped with it. Then there is a BAM Toolbox that has been packaged with required analytic scripts written using Apache Hive that can be installed on the BAM server. These scripts would summarize the data periodically and persist the summarized data to an sql database. So the graphs and charts shown in the API Publisher web application are created using the summarized data from the sql database.
Now, if what you require is extractable from these summarized sql tables then i suppose the process is very straight forward. You could use the DBLookup mediator for this. But if some dimension of the data which you need has been lost due to the summarizing, then you will have a little more work to do.
You have two options.
The easiest approach which involves no coding at all would be to write a custom Hive script that suits your requirement and summarize data to a sql table. Then, like before use a DBLookup mediator to read the data. You can look at the existing Hive scripts that are shipped with the product to get an idea of how it is written.
If you dont want BAM in the picture, you still can do it with minimal coding as follows. The implementation class which performs the publishing is org.wso2.carbon.apimgt.usage.publisher.APIMgtUsageDataBridgeDataPublisher. This class implements the interface org.wso2.carbon.apimgt.usage.publisher.APIMgtUsageDataPublisher. The interface has three instace methods as follows.
public void init()
public void publishEvent(RequestPublisherDTO requestPublisherDTO)
public void publishEvent(ResponsePublisherDTO responsePublisherDTO)
The init() method runs just once during server startup. Here is where you can add all your logic which is needed to bootstrap the class.
The publishEvent(RequestPublisherDTO) is where you publish request events and publishEvent(ResponsePublisherDTO) is where you publish response events. The DTO objects are encapsulated representations of the request and response data respectively.
What you will have to do is write a new implementation for this interface and configure it as the value for DataPublisherImpl property in api-manager.xml. To make things easier you can simply extends the exsiting org.wso2.carbon.apimgt.usage.publisher.APIMgtUsageDataBridgeDataPublisher, write your necessary logic to persist usage data to an sql database within the init(), publishEvent(RequestPublisherDTO) and publishEvent(ResponsePublisherDTO) and at the end of each method just call its respective super class method.
E.g. the overriding init() will call super().init(). This way you are only adding the neccessary code for your requirement, and leaving the BAM stat collection requirement to the super class.