Obtain the source of a MediaWiki page programmatically - mediawiki

Is it possible to obtain the source of a MediaWiki page programmatically? I'd like to write a function that does the following (in Java-like pseudocode):
public static String getWikiText(articleURL){
//return the source of the page as wiki markup
}

Send a HTTP request with action=raw. (You could use the API as well, but that is more complicated.)

Related

Generating PDF from json and json-schema

We're currently using Alpaca Forms to generate forms which we use to edit data stored in json for our application. We're now looking for a way to, server side, generate PDF documents, using json-schema and the json.
Ideally this would be using C#, but frankly, we really could use any language, as long as we can put it behind a web service.
Basically, it would be Alpaca, but the output would be a PDF report, which could contain a cover page and other document friendly features. It would also use the "title" and "description" fields from the json-schema.
Any ideas other than trying to roll our own? I'd rather not PDF library, since most seem to not be that document oriented.
wkhtmltopdf is my personal favourite way of doing this. You should be able to convert your JSON schema into HTML, and then render it via that.
If you are using Node: You can use Chromium's (Opensource Chrome browser) Puppeteer to launch a headless browser, server side, which can generate a PDF.
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
page.setContent('<h1>My PDF!</h1>');
const myPdf = page.pdf();
See https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagepdfoptions for more PDF options when generating from Puppeteer
I haven't tested the code above but that's the basics of what I'm using to generate PDF's
npm i puppeteer will be needed :)
It may be difficult (impossible?) to convert the Alpaca Forms JSON automatically into something that lays out pages the way you want. For example, does the Alpaca JSON provide enough information to determine page size, orientation, headers, footers, numbering location and style etc?
Docmosis may be useful to you (please note I work for Docmosis). You would layout your PDF in a word processor document and call Docmosis to merge it with your JSON data to create the PDF. The fields in your template drive the data extraction so you can put <> into your template and the title will be extracted from the JSON. In the case of Alpaca it looks like the title might be keyed under "schema" in the JSON, so you would use <> in a Docmosis template. Page layout and styles come from the template. The quickest way to test this as a candidate solution is to create a cloud account, upload a template and call the REST API with your JSON to create the document. If that does what you need you have the cloud or on-premise options.
I hope that helps.
if ideally for you is C#, then to generate PDF from json you need:
Parse JSON to some C# collection;
Loop through collection and write data into PDF-table;
Save file as PDF.
For this purpose you will need some PDF-creating library.
Here is an example, how to implement it in C# using PDFFlow library:
Prepare class for storing data:
public sealed class ReportData
{
public string name { get; set; }
public string surname { get; set; }
public string address { get; set; }
}
Read data from JSON into collection:
string jsonFilePath = Path.Combine(SubfolderName, JSONFileName);
List<ReportData> jsonData = JsonSerializer.Deserialize<List<ReportData>>(File.ReadAllText(jsonFile));
Create PDF document and add table to it:
using Gehtsoft.PDFFlow.Builder;
var section = DocumentBuilder.New().AddSection();
var table = section.AddTable();
table
.AddColumnToTable("Name", 150)
.AddColumnToTable("Surname", 250)
.AddColumn("Address", 300);
Loop through collection and add data to a table:
foreach(var rowData in jsonData)
{
var row = table.AddRow();
row
.AddCellToRow(rowData.name)
.AddCellToRow(rowData.surname)
.AddCell(rowData.address);
}
Save document to PDF file:
section
.ToDocument()
.Build("Result.PDF");
To compile this code you need to add reference to PDFFlow library to your project.
Hope, this will help.
Here is a fully working example of PDF contract generation, demonstrating how to work with JSON as a data source: contract example

Can I access a blob URL in an external page? [duplicate]

I try to write an extension caching some large media files used on my website so you can locally cache those files when the extension is installed:
I pass the URLs via chrome.runtime.sendMessage to the extension (works)
fetch the media file via XMLHttpRequest in the background page (works)
store the file using FileSystem API (works)
get a File object and convert it to a URL using URL.createObjectURL (works)
return the URL to the webpage (error)
Unfortunately the URL can not be used on the webpage. I get the following error:
Not allowed to load local resource: blob:chrome-extension%3A//hlcoamoijhlmhjjxxxbl/e66a4ebc-1787-47e9-aaaa-f4236b710bda
What is the best way to pass a large file object from an extension to the webpage?
You're almost there.
After creating the blob:-URL on the background page and passing it to the content script, don't forward it to the web page. Instead, retrieve the blob using XMLHttpRequest, create a new blob:-URL, then send it to the web page.
// assuming that you've got a valid blob:chrome-extension-URL...
var blobchromeextensionurlhere = 'blob:chrome-extension....';
var x = new XMLHttpRequest();
x.open('GET', blobchromeextensionurlhere);
x.responseType = 'blob';
x.onload = function() {
var url = URL.createObjectURL(x.response);
// Example: blob:http%3A//example.com/17e9d36c-f5cd-48e6-b6b9-589890de1d23
// Now pass url to the page, e.g. using postMessage
};
x.send();
If your current setup does not use content scripts, but e.g. the webRequest API to redirect request to the cached result, then another option is to use data-URIs (a File or Blob can be converted to a data-URI using <FileReader>.readAsDataURL. Data-URIs cannot be read using XMLHttpRequest, but this will be possible in future versions of Chrome (http://crbug.com/308768).
Two possibilities I can think of.
1) Employ externally_connectable.
This method is described in the docs here.
The essence of it: you can declare that such and such webpage can pass messages to your extension, and then chrome.runtime.connect and chrome.runtime.sendMessage will be exposed to the webpage.
You can then probably make the webpage open a port to your extension and use it for data. Note that only the webpage can initiate the connection.
2) Use window.PostMessage.
The method is mentioned in the docs (note the obsolete mention of window.webkitPostMessage) and described in more detail here.
You can, as far as I can tell from documentation of the method (from various places), pass any object with it, including blobs.

file not downloading updated file

I am creating file using Java and generating link for download.
Let's say I create file myFile.xls and have link as below.
Please download data from here.
This will result as below.
Please download data from here.
Everytime I create new file, and click on above link, I always see the earlier file that I downloaded for the first time.
Is it happening because jsf cache the files?
Note : When I download file manually, I always see the updated file.
However using link, I always see the first file.
Any idea why this happening?
I think this is because of caching. If yes, how can I ignore this for this excel file only?
JSF isn't caching those resources at all. JSF is in the context of this question merely a HTTP traffic controller and HTML code generator. It's the webbrowser who's caching them. You can control this by setting the proper response headers as listed in this answer: How to control web page caching, across all browsers?.
The simplest way would be creating a servlet filter which is mapped on the URL pattern matching those downloads, e.g. /excel/* (your JSF source code and actual URL doesn't match each other, so it's somewhat guessing here), and set the headers in the doFilter() method:
#WebFilter("/excel/*")
public class NoCacheFilter implements Filter {
#Override
public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
HttpServletResponse response = (HttpServletResponse) res;
response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate"); // HTTP 1.1.
response.setHeader("Pragma", "no-cache"); // HTTP 1.0.
response.setDateHeader("Expires", 0); // Proxies.
chain.doFilter(req, res);
}
// ...
}
Or, if you're serving those files via a servlet, then you could also just set those headers over there.
An alternative is to fool the webbrowser that it's a brand new resource by inlining a query string with a timestamp.
Please download data from here.
where #{now} is just an java.util.Date registered as request scoped bean in faces-config.xml (in case you're using JSF utility library OmniFaces; it has already one builtin). The webbrowser will treat any resource with a different query string as an unique and independent resource and therefore not reuse the cached version of the resource on the same URI (if the query string is different).

How can call the Alfresco REST API using Json String?

Please provide me some references to call WebScripts in alfresco remotely using JSON..
Alfresco has some default Webscripts ,I need to invoke these Webscripts in different Application remotely...
There is no documentation that I know of at the present time that documents all web scripts that expect JSON to be posted along with a schema that defines the expected JSON. Honestly, we haven't done a good job identifying which out-of-the-box URLs are actually public. Some are there just for the Share application's use and could change without warning.
With that said, you can go to http://localhost:8080/alfresco/s/index and see a list of web scripts. And if you drill down into the web script (click on the web script's ID), you can see the source code for the JavaScript controller or, if the web script is implemented in Java, you can see the full class name that implements it. You can then inspect the source to see what it is expecting.
Another way to do it is to use Firebug or your browser's developer console to watch the network calls that go from your browser to the repository tier. Many of these calls include JSON being posted to repository tier web scripts.
Assuming you're referring to getting a webscript to respond with a json, there are a few steps.
1. Create a webscript, and possibly set json the default format (in the webscript definition file, i.e. mywebscript.get.desc.xml, add a tag
<format default="json">argument</format>
Create a JSON controller too, ie. mywebscript.get.json.js. This script can do two things:
a) get json parameter (if you sent a json in): if (json.has('myparam')) myVar = json.get('myparam');
b) provide some data to the model, ie. model.docs = companyhome.children
Your webscript also needs to format this json for json response, i.e. mywebscript.get.json.ftl would look something like this:
{ "docs": [
<#list docs as doc> {
"name": "${doc.name}",
"prop": "${doc.properties["mymodel:myprop"]}"
} <#if doc_has_next>,</#if>
</#list>
]
}

Preventing default views with RESTful api in CakePHP

I am following the tutorial in the CakePHP book that explains the basics of setting up a RESTful web service.
So far, I've updated my routes file to the following:
Router::mapResources('stores');
Router::parseExtensions('json');
I have also setup a blank layout in app/layouts/json and the appropriate json views. I am receiving my json output successfully when I navigate to controller/action.json
I am wondering though, without the.json extension it attempts to load the regular view. I am looking to build a pure api with only json output, is there any way to prevent regular render output instead?
You could force a rendering as JSON if you can recognise a JSON request another way. For example, if the Accepts HTTP header contains application/json, you could put this in your controller:
public function beforeFilter(){
if ($this->request->accepts('application/json')) {
$this->RequestHandler->renderAs($this, 'json');
}
parent::beforeFilter();
}
It's CakePHP 2.0 notation, but something similar probably exists for CakePHP 1.2 and 1.3.
You could also detect the request Content-Type instead, or as well, especially if Accepts is not used.
What are you seeing at the moment? If you've used bake Cake may have generated the views for you?
Just delete the views in /app/views/layout and /app/views/controllername
If you are trying to prevent the request from hitting the controller at all then I'm not so sure, you could just update your .htaccess file to only send requests ending in .json to the app or something similar.
here is what i did.
if i know i'm building only json API, i added to my AppController.php following:
public function beforeFilter()
{
if (empty($this->request->params['ext']) || $this->request->params['ext'] != "json")
{
$this->render(FALSE, 'maintenance'); //no view, only layout
$this->response->send();
$this->_stop();
}
}
and in my /app/Layouts/maintenance.ctp
echo __('Invalid extension');
this way all requests without the json extension will end up on the "maintenance" page where you can put any info you want, i'm planning to put there link to API docs.