How do I integrate wikipedia data to my local mediawiki? - mediawiki

I have set up a local mediawiki instance. It's running fine. Now I want the entire wikipedia locally.
I found this dump : https://dumps.wikimedia.org/enwiki/latest/
Which files do I download?
Once downloaded, how do I set up push the data to mediawiki?
I used : https://github.com/rlewkowicz/docker-mediawiki-stack to set up MediaWiki on my AWS instance.
My end-target is to use the Wikipedia Search API from my AWS instance, instead of the publicly available endpoints.

The Wikipedia dump is huge and your installation will probably crash. If you want to try anyway, mwdumper is probably your best bet for the xml files. It's not very well maintained. I don't think there is any out-of-the box solution to automatically push updates.

Related

How to go about storing and accessing of images inside the blog using LAMP stack?

I want to create a technical blog using LAMP stack (Laravel Framework). I would like to know what is the best way of storing and accessing images inside a blog content?
There is one way of doing this that I could think of:
(1) Storing the images as a file and then accessing those images using path which is specified as the src attribute of the tag which could be the part of content fetched from the database.
The most correct thing would be to store it in storage. Laravel provides a powerful filesystem abstraction thanks to the wonderful Flysystem PHP package by Frank de Jonge. The Laravel Flysystem integration provides simple to use drivers for working with local filesystems, Amazon S3, and Rackspace Cloud Storage. Even better, it's amazingly simple to switch between these storage options as the API remains the same for each system.
That is, you can store them locally on your LAMP server or you can use an external server for that. Both ways are good, however it depends on your needs.
You have to store the relative path in the database. i.e. /path/to/image.jpg
Then to show these files with the Facade Storage you can show them easily.
If you are using the local driver, this will typically just prepend /storage to the given path and return a relative URL to the file. If you are using the s3 or rackspace driver, the fully qualified remote URL will be returned:
use Illuminate\Support\Facades\Storage;
$url = Storage::url('image.jpg');

How do I host and edit a JSON file separate from my GitHub repository?

I'm building a Discord bot and am trying to host it with Heroku and GitHub. I intend to store user data in a JSON file but cannot figure out how to edit the JSON file because I cannot edit it while it is in the repository. I am hoping there is a way to do it through Heroku, without using a separate website.
Note: I know how you would normally edit the JSON file, but because it is in a GitHub Repository it doesn't work the normal way.
Don't use a file as a database. Use a database as a database.
This is generally good advice, but especially important on Heroku where the ephemeral filesystem prevents changes to files from persisting long-term.
Heroku Postgres is a relatively easy way to get started. Its base plan is free.
I believe GitLab allows you to edit files in place, and they have a free tier like Github. As mentioned by Chris, this is generally not recommended, but it may work for your needs.
https://about.gitlab.com/

Heroku via Github, where is my JSON files updated?

This isn't exactly a question in need of help, however, I am curious as to which file is updated, when updated, when I use Heroku via Github. Would it be the one within my Github or does Heroku save those files and update them somewhere else?
All I'm trying to accomplish is edit a JSON file so I can store an integer to each player (I'm using a worker, for a discord bot). Also, yes, that seems like what I am trying to do. Anything that saves the information, doesn't require money and isn't too complex
EDIT:
This issue has been solved with the answer that Heroku simply cannot update JSON files. I have resolved it myself by moving my host onto a Raspberry Pi 3 Model B+. Thank you for all the answers.
When you use Heroku's GitHub Sync feature, a deployment will retrieve your code directly from GitHub.
Those files aren't saved anywhere else. A new deployment from master will take the code fresh from GitHub.
All I'm trying to accomplish is edit a JSON file so I can store an integer to each player (I'm using a worker, for a discord bot). Also, yes, that seems like what I am trying to do. Anything that saves the information, doesn't require money and isn't too complex
Heroku's filesystem is ephemeral. Any changes you save to the local filesystem will be lost when your dyno restarts, which happens frequently. If you scale your application to multiple dynos you'll also run into trouble since the ephemeral filesystems are dyno-local.
Your best bet is to use a proper client-server datastore, like PostgreSQL. Heroku provides its own Postgres service, which has a free tier. If Postgres isn't to your liking, feel free to choose something else.

indexedDB in a Chrome App

I'm building a chrome app which requires a persistent and local database, which in this case can be either indexedDB or basic object storage. I have several questions before i begin developing the app:
Is it possible to persist indexedDB data after un-installation of the chrome app and chrome browser?
If the indexedDB file/data persist can i locate and view it?
If I can locate but can't view it, is it possible to change the location of the indexedDB file?
Can I store the indexedDB in a file located on desktop or any other custom location?
If I had these requirements, I see a couple of options that you might pursue
Write a simple database backed by the FileSystem API, and periodically lock the database and back up that file. This would be pretty cool because I don't know of anyone who has implemented a simple FileSystem API backed database, but I could see it being useful for other purposes.
Any edits to the database would be also made to a copy of the database stored on your backup server, and I would write functions that could import snapshots from your backup.
Simply write functions to export from your indexedDB to some format into a backup, and to import from the backup.
All options seem quite time consuming. It would be cool if when you create an indexedDB, you could specify an HTML FileSystem API entry file to back it, and that way you wouldn't have to do 1 or 2.
I agree that it seems like quite an oversight that an indexedDB is quite difficult to back up.
I am writing a basic browser only application. No back end server code at this time. So I also have storage requirements. But I am not doing backup. I am looking at pouchdb as a solution: http://pouchdb.com/
Everything is looking good so far. They also mention that they would work well with Google Apps.
http://pouchdb.com/faq.html#native_support
The nice thing is you could sync your pouchdb data with a server couchdb instance.
http://pouchdb.com/api.html#replication
http://pouchdb.com/api.html#sync
If you want to keep the application local to the browser with no server support you could backup the entire database by using a batch fetch.
http://pouchdb.com/api.html#batch_fetch
I would run the result through gzip before you put it on the filesystem.
I am currently attempting this very same thing. I am using the Chrome Sync File System Api (http://goo.gl/5q8Z9M), but running into some instances where my file (or its contents) is deleted. With this approach I am writing out a JSON object. Hope this helps.

Dealing with mass images, some small - some large, in spring/java application using mysql

I was wondering what the best pattern was to handle the management of images these days when using spring/java and mysql.
I have several options. Some of the
images are just small avatars for
the users. Is it fine to put these
directly into mysql? Or use the file
system?
For the larger images, is file
system pretty much the only option,
and then use mysql to store the
location on the file system?
Where is a good spot to put them on
a linux server? /var/files/images?
Since the files are hidden from the
war deployment directory, what is
the best way to stream them? Use
some kind of a file output stream as
the response body for an http
request?
Also, do I have to develop all of
the file management stuff myself,
like cleaning up unused files and
the like?
What about image security? Some images should not be accessed by everyone. I think I'd need to use a separate url with Spring security checking the current user for this.
I'd appreciate advice on all of these questions. Thanks.
You could use MySQL, and that would have the advantage of centralization and easy cleanup, but IMHO it's a waste of the database's resources if you plan to scale.
For data like images where everything is public, consider something like Amazon S3 which allows you to serve images directly from S3's web servers. If you plan to host everything yourself, just serve from a directory. Just remember to turn directory listings off :)