Parsoid - parse wikitext locally - html

Is that even possible?
I am not sure, if I understand the project properly. I am trying to parse a big amount of wikitext into html using the Parsoid-JSAPI project.
Parsing works fine, but it is still calling the wikimedia API. I have run the server localy, but the library is still using the public internet API instead of my local server.
If i try to specify domain, calling Parsoid.parse("wikitext", {domain: 'localhost'}), it says No API URI available for prefix: null; domain: localhost
My config.yaml:
mwApis:
uri: 'http://localhost/w/api.php'
domain: 'localhost'

Parsing wikitext is possible, sure; that's what Parsoid does. Parsing Wikipedia content is not possible (without API calls) as 1) templates and other transcluded content needs to be resolved and 2) some of the markup is managed by extensions and Parsoid defers to them.
You can set up a local MediaWiki instance, set up all the required extensions, and import all the relevant pages (there is an "include templates" option when exporting content) but it's a lot of effort.

Related

Does code within NextJS API feature exposed to client side

It is well known that nextjs API routes provide a straightforward solution to build your API with Next.js. and that any file inside the folder pages/api is mapped to /api/* and will be treated as an API endpoint instead of a page.
I have just one doubt: is the code within the pages/api exposed to the world? I mean, can I build some logic there that has some key that must be hidden or maybe some MySQL connection?
Whether or not /api is in any way exposed to the world I do not know for sure, but according to Next documentation, "they are server-side only bundles."
In general though, for any key/sql connection that you want to run, I would put that into an .env.local file on your machine, a file that gets git ignored and never uploaded, and if you are hosting on Vercel, then use their environmental variables to store sensitive information.
You'd find environmental variables under:
{Your Account}/{Project}/Settings/Environmental Variables
p.s. Also from Next.js docs, I think you'd find this bit on getStaticProps useful.

Is there a way to upload html pages into AEM DAM

We have bunch of help web pages(Static). We are just uploading to siteadmin using a third party tool. Is there a way to manage them in DAM?
I remember that in older versions of AEM < 6.1 the uploaded static html pages also can be rendered as normal cq pages when accessed by the uri with the content paths. But from AEM 6.1 onwards because of the security reasons this feature has been disabled(which required some Felix configuration modification to re-enable it).
Security concerns:
1) There might be a chance of uploading a malicious files which can
damange the functionality of the website/system
2) Access these
uploaded files via content URL, might have a chance of files gets
executed in AEM (some sort of scripts execution) which can damage the
system/functionality. Etc.
Just to give you some idea how we can add the static html into AEM DAM
i have the below static html (simplestaticpage.html) which is uploaded into DAM path /content/dam/geometrixx-outdoors/simplebanner/ but when i access it via the content path url http://localhost:4502/content/dam/geometrixx-outdoors/simplebanner/simplestaticpage.html it will download as binary because of the default behaviour of the AEM DAM content Disposition
restrictions.
To enable the DAM static pages to render as normal cq:pages you need to remove the text/html mime types from Dam Safe Binary Filter(com.day.cq.dam.core.impl.servlet.DamContentDispositionFilter) as shown below.
After removal of this mime type from configuration, when i access the url http://localhost:4502/content/dam/geometrixx-outdoors/simplebanner/simplestaticpage.html the page renders fine.
Note: Also remember if this doesn't work you might require to add
Content Disposition Paths the in Apache Sling Content Disposition Filter
~ Hope it helps.
AEM Design importer does uploads html pages. You can create your own HTML pages independently and use it within your application.
https://docs.adobe.com/docs/en/aem/6-1/administer/personalization/campaigns/extending-the-design-importer-for-landingpages.html

HTML5 read files from path

Well, using HTML5 file handlining api we can read files with the collaboration of inpty type file. What about ready files with pat like
/images/myimage.png
etc??
Any kind of help is appreciated
Yes, if it is chrome! Play with the filesytem you will be able to do that.
The simple answer is; no. When your HTML/CSS/images/JavaScript is downloaded to the client's end you are breaking loose of the server.
Simplistic Flowchart
User requests URL in Browser (for example; www.mydomain.com/index.html)
Server reads and fetches the required file (www.mydomain.com/index.html)
index.html and it's linked resources will be downloaded to the user's browser
The user's Browser will render the HTML page
The user's Browser will only fetch the files that came with the request (images/someimages.png and stuff like scripts/jquery.js)
Explanation
The problem you are facing here is that when HTML is being rendered locally it has no link with the server anymore, thus requesting what /images/ contains file-wise is not logically comparable as it resides on the server.
Work-around
What you can do, but this will neglect the reason of the question, is to make a server-side script in JSP/PHP/ASP/etc. This script will then traverse through the directory you want. In PHP you can do this by using opendir() (http://php.net/opendir).
With a XHR/AJAX call you could request the PHP page to return the directory listing. Easiest way to do this is by using jQuery's $.post() function in combination with JSON.
Caution!
You need to keep in mind that if you use the work-around you will store a link to be visible for everyone to see what's in your online directory you request (for example http://www.mydomain.com/my_image_dirlist.php would then return a stringified list of everything (or less based on certain rules in the server-side script) inside http://www.mydomain.com/images/.
Notes
http://www.html5rocks.com/en/tutorials/file/filesystem/ (seems to work only in Chrome, but would still not be exactly what you want)
If you don't need all files from a folder, but only those files that have been downloaded to your browser's cache in the URL request; you could try to search online for accessing browser cache (downloaded files) of the currently loaded page. Or make something like a DOM-walker and CSS reader (regex?) to see where all file-relations are.

Why doesn't Wikipedia have extensions?

Look at a random wikipedia article like http://en.wikipedia.org/wiki/Impostor_syndrome, I see that there's no .html attached to the end of the address. In fact, if I do try to put a .html after it, Wikipedia tells me "Wikipedia does not have an article with this exact name." How come it doesn't need any file extensions?
More a superuser question?
There is no law saying that an html file has to end in .html or .htm and since wiki generates pages from a database there is really no file page there anyway (except in a cache).
Not having .htm or .php is moresensible - why do you care what technology they use when you ask for a url? It would be like having to put the operating system of the recipient at the end of their email address.
if you make a call to a website it probably looks like
www.example.com/siteA/index.html
this request just tells the webserver you want to see a resource that is called index.html in siteA.
the website that runs on this server has to determine what you want to see and how the data is loaded.
index.html could be a file in the siteA directory
or
it can be row with the key "index.html" in the siteA-table in your database.
so the part siteA/index.html is just a resource identifier. the grammar of this resource identifier is completely free and is determined per website.
url rewriting is also common to make url easier to read and remember.
for example there could be a rewrite rule to accomplish the following:
if the user enters something like
www.example.com/download/demo.zip
rewrite it so your website sees it like:
www.example.com/download.php?file=demo.zip
Wikipedia's servers map the url to the page you want. .html is just a naming convention that, today is mostly historical from the period of static pages when urls actually were names of files on the server. In fact, there may be no file at all, where the server queries the database and a web framework sends out the html on the fly.
Wikipedia is most likely using the Apache module mod_rewrite in order to not have to link paths directly to a file system path.
See: http://en.wikipedia.org/wiki/Rewrite_engine#Web_frameworks
However programming languages can also take control of the incoming URLs and return data depending on the structure of the link according to some set of rules, for example the Django web framework employees a URL dispatcher.
That's because Wikipedia uses MediaWiki's feature of URL shortening.
Actually when you search for a file it really loads a php file. Try searching for a word that doesn't exist, for example "Pazaz". The URL is http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=pazaz . Notice index.php in the URL.
To tell the truth it's not a MediaWiki feature, it's Apache. For further info http://www.mediawiki.org/wiki/Manual:Short_URL .
URL routing is your answer for example in ASP read below source from
The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes:
Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them
Construct outgoing URLs that can be used to call back to Controllers/Actions (for example: form posts, links, and AJAX calls)
I would suggest that sites like this use some sort of Model View Controller framework similar to Ruby on Rails where the url 'directories' form a part of a request/url route...
In frameworks that are MVC based, the url 'directories' can dictate what View/Controller to utilise as well as what action should be taken with the data.
eg: shop.com/product/carrots
Where product is a view/controller and carrots is the data. The framework then analyses which action/route to take. Default could be viewing the product information and price of the carrot.

Restlet - serving up static content

Using Restlet I needed to serve some simple static content in the same context as my web service. I've configured the component with a Directory, but in testing, I've found it will only serve 'index.html', everything else results in a 404.
router.attach("/", new Directory(context, new Reference(baseRef, "./content"));
So... http://service and http://service/index.html both work,
but http://service/other.html gives me a 404
Can anyone shed some light on this? I want any file within the ./content directory to be available.
PS: I eventually plan to use a reverse proxy and serve all static content off another web server, but for now I need this to work as is.
Well, I figured out the issue. Actually Restlet appears to route requests based on prefix, but does not handle longest matching prefix correctly, it also seems to ignore file extensions.
So for example, if I had a resource attached to "/other"... and a directory on "/". And I request /other.html, what actually happens is I get the "/other" resource. (The extension is ignored?), and not the static file from the directory as I would expect.
If aynone knows why this is, or how to change it, I'd love to know. It's not a big deal, just for testing. I think we'll be putting apache or nginx in front of this in production anyway.
Routers in Restlet by default use the "Template.MODE_STARTS_WITH" matching mode. You could always set the Router default by doing router.setMatchingMode(Template.MODE_EQUALS). This will turn on strict matching by default for attach. You can choose to override individual routes with setMatchingMode.
Good documentation on Restlet Routing