CSRF implementation in core CKAN code - fiware

This extension to CKAN could work to provide security features in CKAN. This extension provides Cookie-based CSRF protection for requests in CKAN. But according to this , it is not implemented as a part of core CKAN because the method used in this extension to provide CSRF protection is conflicting with the future plan of CKAN's own implementation of CSRF protection.
So, my question is
1. Is there any implementation of CSRF protection in core CKAN code?
2. What are different methods we can use to implement CSRF protection and what is the best method to implement it in core CKAN?

Not at present time. It really needs someone to do it.
The PR discussion you linked contains several suggestions of how it should be done. I imagine it just needs a bit of work to pull it together. I suggest creating a new GitHub issue with the proposal, before doing a PR. It would benefit lots of projects.
The CKAN team and I are always pleased to see PRs in CKAN, and that PR author made a great effort to generalize his/her solution and put it forward as a PR. It's just a shame that the author and the CKAN team didn't find agreement on that one. It happens in open source. Shepherding a project means tough choices sometimes, and not always right. It usually helps though to discuss before implementing. Plenty of PRs do get merged, and then everyone benefits.

The answer to (1) is sadly still, "No." However, there are a couple of options for (2):
Applying the SameSite attribute to all relevant cookies (especially the auth_tkt cookie) should cover 90%+ of cases in browsers that support the attribute. This could be achieved with something like Apache's mod_headers.
The discussion that you saw has since produced a stand-alone extension, https://github.com/qld-gov-au/ckanext-csrf-filter, which should provide effective CSRF protection on CKAN 2.8 and 2.9 (possibly 2.7, but I haven't specifically tested that).

Related

How to trigger dependabot scan on developer pull requests

I'm not sure if my use case is one dependabot is suited for, so hoping someone can tell me if it is or is not, and if it is, point me to some documentation on how to do what I'm describing:
I want to create workflow that:
runs dependabot scan on each developer pull request
dependabot only reports on newly introduced or updated
dependencies
pull request is blocked by any new dependencies with
vulnerabilities of medium or higher
dependabot does not create PR as a result of a PR scan
Is this possible?
This is possible with the dependency review action: https://github.com/actions/dependency-review-action
Seems the answer is no, dependabot cannot do this.
A colleague found this information:
"Dependabot alerts will find vulnerabilities that are already in your dependencies, but it's much better to avoid introducing potential problems than to fix problems at a later date."
At this location:
https://docs.github.com/en/code-security/supply-chain-security/understanding-your-software-supply-chain/about-dependency-review
And github has, on their roadmap, the ability to block pull requests that introduce vulnerable dependencies:
https://github.com/github/roadmap/issues/149

Making contexts explicit in the directory structure

I am looking for feedback on a certain directory structure for an application. I realize that this does not follow the classical stack overflow format where there is such a thing as "a correct answer", though think it is interesting nonetheless. To provide meaningful feedback, some context first needs to be understood, so please bear with me.
--
Two colleagues of mine and I have created an application that uses the Clean Architecture. HTTP requests to routes get turned into request models, which gets handed to use cases, which then spit out a response model that gets handed to a presenter.
The code is fully open source and can be found on GitHub. We also have some docs describing what the main directories are about.
We are thinking about reorganizing our code and would like to get feedback on what we've come up with so far. Primarily amongst the reasons for this reorganization are:
Right now we do not have a nice place to put things that are not part of our domain, yet somehow bind to it. For instance authorization code, which knows about donation ids (with authorization not being part of the core domain, while donation ids are).
It's nice to group cohesive things together. Our Donation code is cohesive and our Membership Application code is cohesive, while both don't depend on each other. This is closely related to the notion of Bounded Contexts in Domain Driven Design. Right now these contexts are not explicitly visible in our code, so it is easy to make them dependent on each other, especially when you are not familiar with the domain.
These are the contexts we have identified so far. This is a preliminary list and just to give you an idea, and not the part I want feedback on.
Donation
Membership
Form support stuff (validation of email, generation of IBAN, etc)
The part I want feedback on is the directory structure we think of switching to:
src/
Context_1/
DataAccess/
Domain/
Model/
Repositories/
UseCases/
Validation/
Presentation/
Authorization/
Context_2/
Factories/
Infrastructure/
tests/
Context_1/
Unit/
Integration/
EdgeToEdge/
System/
TestDoubles/
Context_2/
The Authorization/ folder directly inside of the context would provide a home for our currently oddly placed authorization code in Infrastructure. Other code not part of our domain, yet binding to it, can go directly into the context folder, and gets its own folder if there is a cohesive/related bunch of stuff amongst it, such as authorization.
I'm happy to provide additional information you need to provide useful feedback.
Right now we do not have a nice place to put things that are not part of our domain, yet somehow bind to it.
Right now these contexts are not explicitly visible in our code, so it is easy to make them dependent on each other, especially when you are not familiar with the domain.
There are both technical and non-technical ways to address this issue:
You can enforce stricter separation through class libraries. It is more obvious you are taking a dependency on something if you have to import a dll / reference another project. It will also prevent circular dependencies.
Code reviews / discipline is a non-technical way to handle it.
I've been using Hexagonal Architecture with DDD where the domain is in the middle. Other concerns such as repositories are represented by interfaces. Your adapters then take a reference to the domain, but never in the other direction. So you might have an IRepository in your domain, but your WhateverDatabaseRepository sits in it's own project. It is then the responsibility of the application services / command handlers to co-ordinate your use cases and load the adapters. This is also where you would apply cross-cutting concerns such as authorization.
I'd recommend watching Greg Young videos (try this one) and reading Vaughn Vernon's IDDD as it goes into how to structure applications and deals with questions like yours. (sorry that my answer is basically watch a 6hr video and read a 600+ page book, but they both really helped clarify some of the more "wooly" aspects of DDD for me)
As an example, see https://github.com/gregoryyoung/m-r/blob/master/SimpleCQRS/CommandHandlers.cs

What's the point of oEmbed API endpoints and URL schemes vs. link tags?

The oEmbed specification mentions 2 different ways of finding the oEmbed content of an URL:
Knowing the API endpoint of the website and passing it, through a GET parameter, the URL you want info about, if it matches the URL pattern it declared.
Discovering the URL of the oEmbed version thanks to a <link rel="alternate" type="application/json+oembed" ... /> (or text/xml+oembed) HTML header.
The 2nd ways seems more generic, as you don't have to store and maintain a whole list of providers. Moreover, lists of providers are the sign of a centralized internet, where only a few actors exist. This approach is hardly scalable.
I can see a use for the 1st approach, though, for websites that can parse resources made available by someone else. For example, I can provide an oEmbed version of video pages from website Foo. However, for several reasons, mainly security-related, I wouldn't trust someone who says "I can parse resource X for you" unless X's author is OK with that, which brings us back to approach 2.
So my question is: what did I miss here? What's the use of the 1st method of dealing with oEmbed? For instance, why store (and maintain up-to-date) a whole list of endpoints and patterns like oohEmbed does if you have a generic way of discovering it on-the-fly and for virtually any resource on the internet?
As a very closely related question, which I think may be asked at the same time (please correct me if I'm wrong): what happens if one doesn't provide a central endpoint for oEmbed contents, but rather, say, expect a '?version=oembed' parameter on each URL, that returns the oEmbed version instead of the standard one?
If I recall correctly, supporting both mechanisms was a compromise that we figured would help drive adoption. It's much easier to persuade large web properties to add a single endpoint vs. adding markup (that's irrelevant to most clients) to every response body. It was a pragmatic choice.
Longer term we planned to leverage some of the work Eran Hammer-Lahav was doing around discovery rather than re-inventing it (poorly, again). Unfortunately, his ideas still haven't gotten much traction and the web still lacks a good, standardized way to do this sort of thing.
I was hoping to find an answer here but it looks like everyone else is as confused as we are. The advantage of using option 1 in my opinion is that it only uses 1 json request instead of a potentially expensive html request followed by the json request. You can always use option 2 as a fallback in case you can't match a pattern in your pre-baked list of oEmbed providers.
OEmbed discovery is a major security concern. WordPress for example has a whitelist of supported OEmbed providers.
Suppose that every random URL at the internet can trigger an OEmbed code. That means everyone can hack your site.
Steps:
Create a new site, add an OEmbed discovery.
Post the URL to a form at your site. Now your site perform the OEmbed on my behalf.
Exploit:
by denial-of-service (DOS): e.g. redirect the URL to a tarpit or feed it a 1GB json response.
by cross site scripting (XSS): inject random HTML to pages that other people can see.
by stealing the admin's session-cookie via XSS: now the attacker can login to your CMS to upload files, and exploit even more.
It's XSS to the max, with little to stop it. The only sane thing to do, is whitelisting proper endpoints. That's the oEmbed endpoints are explicitly listed.
If you want something scalable, you might like www.noembed.com and www.embedly.com They provide OEmbed support for various sites which don't do OEmbed themselves.

Best Practices for Setup and Management of an Open Source Project

Later this year I want to release a PHP framework that I've been working on as open source. I do use source control (SVN), but it's on an extremely limited basis. I'm self-taught, I develop by myself and don't have the experience of working with large teams. I have some ideas about what can help make a project successful, but I'm fuzzy on some of the details. Since it's not yet released, I want to do everything I can to set up the right infrastructure from the beginning. What do I need to know in order to setup and manage a successful project?
Some ideas that I have to make it successful (beyond marketing it):
Good documentation and tutorials
Automated unit tests and builds to
push update to the website
A clear roadmap
Bug Tracking integrated with the
source control
A style guide to keep the code
consistent
A forum for the community to get
support, share ideas, etc.
A good example application built with
the framework
A blog to keep the community informed
Maintaining backwards compatibility
wherever possible
Some of my questions:
How do I setup and automate a one
step submit-test-commit-generate API
docs-push update to website process? Edit: Would Ant or Maven be good candidates for this? If so, do you know of any resources for setting up a PHP project using them?
How do I handle (technically)
submissions from other users? How can
I ensure that those submissions must
be approved before being integrated?
What are some of the pitfalls that
can be avoided in terms of the
project community? I'd prefer to have
it be as friendly and helpful as
possible without a lot of drama.
I'd love to learn from your experience on any of these points. If you think I'm missing anything big, please share that as well. Any resources (preferably geared toward a beginner) that you could point me towards would also be greatly appreciated.
I'm just getting started in community projects, but I'll give you some advice on what I know.
How do I setup and automate a one step submit-test-commit-generate API docs-push update to website process?
I've never implemented it as one process. You could just have a checklist, and possibly even create some scripts to do certain tasks. I've never worked with any source control that automates the uploading and such to be done by a script. Most of the time, there is some web interaction to be involved.
You don't want to push API changes until it's an official release.
EDIT: Working Environment
For PHP, most of the time, I either edit directly on the server and test it there, using a beta.example.com, or similar, before pushing to example.com. You could also set up an web environment on your home PC (using XAMPP for Windows, or the standard LAMP installation on Linux). You would probably just use a mirror of your repository here, so you'd do svn commit, or whichever is appropriate for the VCS or DVCS you choose.
The fun part is testing this with different PHP versions. I've not done this myself, but you could probably use a .htaccess file to run a different PHP binary in order to test it out. I'm not really sure what the best option is for this is.
I've not done much with API, as I've never created a library, but just doing a quick search I found http://www.phpdoc.org/. It looks like a mature project, so that might be a starting point.
As far as creating releases go, I generally create a script that only includes the files that are part of the distribution (it will filter out any VCS files, and anything that you don't want in the distributed file). You could write a script around find on linux (which is what I do most of the time), or there may be other better options.
How do I handle (technically) submissions from other users? How can I ensure that those submissions must be approved before being integrated?
This is mostly handled by the bug tracker, and limited access in the Version Control System. Usually, you, and the people you allow, can commit to the VCS. Other users can submit patches, but then you might have someone review the patch, test the patch, and commit. You could split these tasks up as a team, or assign a patch to one person and have them do it all.
What are some of the pitfalls that can be avoided in terms of the project community? I'd prefer to have it be as friendly and helpful as possible without a lot of drama.
I would just make sure to keep it as positive as possible with the project members and community. There's going to be some disagreements, and it will drive a few people away, but as long as you have a stable product that meets the needs of most people, I think that's all that anyone can expect.
One minor suggestion that's worked well for me: start using first-person plural pronouns, rather than singular ones. That is, talk about "we" and "us" rather than "I" and "me." It encourages other people to participate when they feel like part of team, rather than when they feel like they're contributing your own self-aggrandizement.
The most important thing you have to do is to attract users. Without users, you won't get any contributions and developers helping you out. Because developers are users first, and then they decide to extend/fix something they use and might become contributors.
So to get users, you should consider
describe what your framework does in one or two sentences at the top of your project page
mention how your framework can be used and for what, what situations it is most useful for
add a lot of examples on how to use it
mention whether your framework is stable, beta or alpha. That's important because user need to know that before they start using it
also mention whether you want to keep improving it and keep working on it - most users don't want to use a framework that's abandoned (also keep in mind that a lot of users check your commits to see whether you really are working on it - if your last commit to the repository was months ago then you're not really working on it, so cheating isn't possible)
If you got all this, and people start submitting patches, you can use a patch tool to apply those to your source. Depending on your version control system, you can either use the GNU patch, a diff/patch tool that comes with your version control or maybe even a GUI tool that helps you with this. SVN doesn't have a patch tool (yet), but 'svn diff' will create a patchfile which you can then apply with the GNU patch tool, or in case you're using TortoiseSVN, right-drag the patchfile to your working copy and have TortoiseMerge apply it for you.
And on how to best deal with the community:
answer questions in time, don't wait more than two or three days to answer questions
try to be nice, even with upset and angry people. Only if they keep bothering tell them to (still in a nice way if possible) go elsewhere
always keep discussions about the project on a mailing list. You don't want to repeat the same discussions over and over again - if you have a mailing list, just point users to the archives before the discussion starts all over again
And you should watch the talk "How Open Source Projects Survive Poisonous People (And You Can Too)" - it's really good and tells you a lot on how to deal not just with 'poisonous people' but also how to deal with all people involved in your project.
I'd like to add that you should make it as easy as possible for your users to get the whole thing running and modify the code - these 'power users' can be 'converted' into developers or at least people who send smaller patches.
Don't try to do it all yourself - for open source projects there are several hosting providers that solve most of the problems. I recommend codeplex or google code.
Setting up build scripts will depend a certain amount on what platform you set up, but in general it's easy to add any tool you want into the script once you start using any sort of build script.
If you really need the one step process you describe, you need a build server. I use TeamCity, which I have set up to watch for any changes in svn and trigger build/test whenever something is checked in. The build server will generally be able to perform any steps that you put into the build script.
Read up on Git as an alternative to SVN
free public repository/bug tracker/wiki/fork-enabled community in Github (which hosts symfony and PHPUnit amongst others)
"How do I handle (technically) submissions from other users? How can I ensure that those submissions must be approved before being integrated?" - with Git, pull what you/your closest team finds most interesting to the master branch
Consistent API
be inspired of other public API:s
only change in major versions
guessable
Interesting for both users & developers
clear goal (your roadmap - excellent)
useful, contra everything else available
easy to use, but still not easy-enough-to-write/maintain-yourself
You could check out either Ant or Phing to build your project. Include CodeSniffer in the build and you'll save time checking for basic formatting errors/differences.
These are all technical tips, about the soft part... treat humans with respect, a lot of interest and be overly excited about their contributions and make them feel that they're not wasting their time. That would appeal to me.
Take a look at Karl Fogel's book on Producing Open Source Software. It probably has everything that you asked.
You should also plan for engaging the community. I'd recommend reading Jono Bacon's The Art of Community [http://www.artofcommunityonline.org/].
You have a great set of ideas to start. You might have to start by trimming them down! Ask yourself what's necessary for a first release.
For automating the builds and tests, the scripting can be done with ant, maven or phing for PHP projects.
You'll probably need a host so you can demo the product. For PHP that is pretty easy to find.
You need an open source hosting provider-- especially github (but also google code, source forge, etc). Github provides bug tracking, default licenses, blog and great mechanisms for accepting changes from the community. Built on git, it facilitates distributed projects quite well.
Although it's good to have a one-step build and install in place, automating integration of others changes probably isn't important (or desirable) off the bat.
Good luck!

Are Mercurial's bundled extensions considered part of it's core feature set and approach to version control?

I'm currently trying to evaluate Mercurial, to get a feel for the philosophy the system tries to promote - but one thing that's got me confused is the presence of the bundled 'extensions' and how they fit into the mix.
In the core package, Mercurial ships with a bunch of functionality that is implemented as extensions but is disabled by default. (See: https://www.mercurial-scm.org/wiki/UsingExtensions#Extensions_Bundled_with_Mercurial)
Here's the thing I'm confused about:
Are these extensions considered first class citizens by the Mercurial dev team and therefore part of the overall Mercurial approach to DVCS?
Why are they implemented outside of the default features and disabled by default?
I don't need info on how activate extensions, that's pretty straight forward - it's the logic behind the separation that I'm curious about.
The reason I'm trying to get my head around this is because I don't really want to try and crowbar an opposing approach into Mercurial via extensions if it differs from the overall philosophy of the project.
Are these extensions considered first class citizens by the Mercurial dev team and therefore part of the overall Mercurial approach to DVCS?
Yes, although we won't generally advocate their use to new users, they are very useful for advanced usage. I guess everybody in the dev team has extension enabled (at least mq, patchbomb, and sometimes record).
Extension accepted in hgext/ are reviewed priori to inclusion, and we generally require them to provides tests. But they are often owned by outside contributors and aren't updated by the dev team except for API changes within core hg.
Why are they implemented outside of the default features and disabled by default?
We generally think that hg should stay simple and adding more commands might confuse users (e.g. if you have a simple workflow you don't need to learn about mq). But if a command is deemed useful for most users, it can migrate from an extension into core (that was the case for bisect, and it is the case of the subrepo functionality).
Almost immediately after posting, I learnt about the following hg command:
hg help extensions
This contained some information that I don't think is available in the Mercurial help docs:
Extensions are not loaded by default for a variety of reasons: they can increase startup overhead; they may be meant for advanced usage only; they may provide potentially dangerous abilities (such as letting you destroy or modify history); they might not be ready for prime time; or they may alter some usual behaviors of stock Mercurial. It is thus up to the user to activate extensions as needed.
This helps answer part of my question.