Proper way to create new SQL table in a MediaWiki Extension - mediawiki

I am creating a mediawiki extension that needs is own SQL table. What's the proper way to create it in the extension, and do I need to create separate SQL code for each database I want to support, the way the OATHAuth extension does? That doesn't seem like the right approach.
(OATHAuth hooks onLoadExtensionSchemaUpdates)

For a schema change for an extension, the LoadExtensionSchemaUpdates hook page is the right place to look.
The LoadExtensionSchemaUpdates hook is
fired when MediaWiki is updated (php maintenance/update.php) to allow extensions to update the
database
If you take a look at the ArticleFeedbackv5.sql file, that's basically what the sql file should look like
CREATE TABLE IF NOT EXISTS /*_*/aft_feedback (
-- id is no auto-increment, but a in PHP generated unique value
aft_id binary(32) NOT NULL PRIMARY KEY,
aft_page integer unsigned NOT NULL,
...
) /*$wgDBTableOptions*/;
-- sort indexes (central feedback page; lots of data - more details indexes for most popular actions)
CREATE INDEX /*i*/relevance ON /*_*/aft_feedback (aft_relevance_score, aft_id, aft_has_comment, aft_oversight, aft_archive, aft_hide);
CREATE INDEX /*i*/age ON /*_*/aft_feedback (aft_timestamp, aft_id, aft_has_comment, aft_oversight, aft_archive, aft_hide);
CREATE INDEX /*i*/helpful ON /*_*/aft_feedback (aft_net_helpful, aft_id, aft_has_comment, aft_oversight, aft_archive, aft_hide);
-- page-specific
CREATE INDEX /*i*/relevance_page ON /*_*/aft_feedback (aft_page, aft_relevance_score);
CREATE INDEX /*i*/age_page ON /*_*/aft_feedback (aft_page, aft_timestamp);
CREATE INDEX /*i*/helpful_page ON /*_*/aft_feedback (aft_page, aft_net_helpful);
-- index for archive-job
CREATE INDEX /*i*/archive_queue ON /*_*/aft_feedback (aft_archive, aft_archive_date);
-- index for mycontribs data
CREATE INDEX /*i*/contribs ON /*_*/aft_feedback (aft_user, aft_timestamp);
CREATE INDEX /*i*/contribs_anon ON /*_*/aft_feedback (aft_user, aft_user_text, aft_timestamp);
And then you reference this file in the hook in your hook file
public static function loadExtensionSchemaUpdates( $updater = null ) {
// that's for adding a table
$updater->addExtensionTable(
'aft_feedback',
dirname( __FILE__ ) . '/sql/ArticleFeedbackv5.sql' //that's the previous sql file
);
...
}
Don't forget to add the hook in the hook section in your extension.json file. The table will be created when you run the update.php file.
As for your question regarding creating a file for each database engine, we can't use the same file for each of them as there might be some differences in the queries. See what Tgr said.
If you haven't yet, take a look at the hook file.
protected function execute() {
switch ( $this->updater->getDB()->getType() ) {
case 'mysql':
case 'sqlite':
$this->updater->addExtensionTable( 'oathauth_users', "{$this->base}/sql/mysql/tables.sql" );
$this->updater->addExtensionUpdate( [ [ $this, 'schemaUpdateOldUsersFromInstaller' ] ] );
$this->updater->dropExtensionField(
'oathauth_users',
'secret_reset',
"{$this->base}/sql/mysql/patch-remove_reset.sql"
);
$this->updater->addExtensionField(
'oathauth_users',
'module',
"{$this->base}/sql/mysql/patch-add_generic_fields.sql"
);
$this->updater->addExtensionUpdate(
[ [ __CLASS__, 'schemaUpdateSubstituteForGenericFields' ] ]
);
$this->updater->dropExtensionField(
'oathauth_users',
'secret',
"{$this->base}/sql/mysql/patch-remove_module_specific_fields.sql"
);
/*$this->updaterAddExtensionUpdate(
[ [ __CLASS__, 'schemaUpdateTOTPToMultipleKeys' ] ]
);*/
break;
case 'oracle':
$this->updater->addExtensionTable( 'oathauth_users', "{$this->base}/sql/oracle/tables.sql" );
break;
case 'postgres':
$this->updater->addExtensionTable( 'oathauth_users', "{$this->base}/sql/postgres/tables.sql" );
break;
}
return true;
}

Yes you do need a patch file for each supported DB engine. In theory it could be the same patch file, in practice there are usually subtle syntax differences that necessitate separate files. There are some planned changes to replace the current schema updater system with abstract schema changes so that extension developers do not have to deal with DB engine differences, but it will take a while for that to happen.
In practice, most extensions deal with this by only supporting MySQL and maybe Postgres.

Related

Couchbase unable to retrieve document No index available Code 4000

I have a bucket in Couchbase called mybucket. When I select Documents and then choose my bucket, It has an option to retrieve the documents. When I choose the first one, the web platform of Couchbase shows the content of that document to me:
{
"type": "activity",
"version": "1.0.0",
....
}
So, with that, I am sure that I can see some documents that have "type" = "activity" in my bucket. However, when I want to retrieve them using the Query editor and the following N1QL query:
select * from `mybucket` where `type` = "activity" limit 10;
I get the following response:
[
{
"code": 4000,
"msg": "No index available on keyspace `default`:`mybucket` that matches your query. Use CREATE PRIMARY INDEX ON `default`:`mybucket` to create a primary index, or check that your expected index is online.",
"query": "select * from `mybucket` where `type` = \"activity\" limit 10;"
}
]
Bucket Retrieve documents uses DCP stream vs Query Editor uses N1QL which required the secondary index or primary index
Option 1) CREATE INDEX ix1 ON mybuckte(type);
OR
Option 2) CREATE PRIMARY INDEX on mybucket;

Couchbase FTS Index update through REST API

From latest couchbase doc,Could see FTS index can be created/updated using below
PUT /api/index/{indexName}
Creates/updates an index definition.
I have created index with name fts-idx and created successfully.
But looks like update of index is failing with REST API.
Response:
responseMessage : ,{"error":"rest_create_index: error creating index: fts-idx, err: manager_api: cannot create index because an index with the same name already exists: fts-idx"
Anything i have missed here.
I was able to replicate this issue, and I think figured it out. It's not a bug, but it should really be documented better.
You need to pass in the index's UUID as part of the PUT (I think this is a concurrency check). You can get the index's current uuid via GET /api/index/fts-index (it's in indexDef->uuid)
And once you have that, make it part of your update PUT body:
{
"name": "fts-index",
"type": "fulltext-index",
"params": {
// ... etc ...
},
"sourceType": "couchbase",
"sourceName": "travel-sample",
"sourceUUID": "307a1042c094b7314697980312f4b66b",
"sourceParams": {},
"planParams": {
// ... etc ...
},
"uuid": "89a125824b012319" // <--- right here
}
Once I did that, the update PUT went through just fine.

Parse complex Json string contained in Hadoop

I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.

Iterating through couchbase keys without a view

In couchbase, I was wondering if there was a way - WITHOUT using a view - to iterate through database keys. The admin interface appears to do this, but maybe its doing something special. What I'd like to is make a call like this to retrieve an array of keys:
$result = $cb->get("KEY_ALBERT", "KEY_FRED");
having the result be an array [KEY_ALEX, KEY_BOB, KEY_DOGBERT]
Again, I don't want to use a view unless there's no alternative. Doesn't look like its possible, but since the "view documents" in the admin appears to do this, I thought i'd double-check. I'm using the php interface if that matters.
Based on your comments, the only way is to create a simple view that emit only the id as par of the key:
function(doc, meta) {
emit( meta.id );
}
With this view you will be able to create query with the various options you need :
- pagination, range, ...
Note: you talk about the Administration Console, the console use an "internal view" that is similar to what I have written above (but not optimized)
I don't know about how couchbase admin works, but there are two options. First option is to store your docs as linked list, one doc have property (key) that points to another doc.
docs = [
{
id: "doc_C",
data: "somedata",
prev: "doc_B",
next: "doc_D"
},
{
id: "doc_D",
data: "somedata",
prev: "doc_C",
next: "doc_E"
}
]
The second approach is to use sequential id. You should have one doc that contain sequence and increment it on each add. It would be something like this:
docs = [
{
id: "doc_1",
data: "somedata"
},
{
id: "doc_2",
data: "somedata"
}
...
]
In this way you can do "range requests". To do this you form array of keys on server side:
[doc_1, doc_2 .... doc_N]and execute multiget query. Here is also a link to another example
The couchbase PHP sdk does support multiget requests. For a list of keys it will return an array of documents.
getMulti(array $ids, array $cas, int $flags) : array
http://www.couchbase.com/autodocs/couchbase-php-client-1.1.5/classes/Couchbase.html#method_getMulti

How does sqlmetal generate association names?

The association names sqlmetal generates have been the source of much frustration. Sometimes the association is simply the column name with "Id" taken off the end, sometimes it generates an association name based on the foreign key constraint name.
I simply cannot figure out what steps it uses to generate these names, and a recent schema change has drastically altered the association names once again, so I'd like to get a handle on this.
I have two tables which reference each other in a sort of chain. Something like this:
class In
{
int Id;
EntityRef<Out> Yields; // YieldsId => FK_Out_Source
EntitySet<Out> In_Source; // FK_In_Source
}
class Out
{
int Id;
EntityRef<In> Yields; // YieldsId => FK_In_Source
EntitySet<In> Out_Source; // FK_Out_Source
}
These were the classes prior to the schema change, where there was an extra FK field between In and Out tables. After deleting that field, sqlmetal now generates this:
class In
{
int Id;
EntityRef<Out> Yields; // YieldsId => FK_Out_Source
EntitySet<Out> Out; // FK_In_Source
}
class Out
{
int Id;
EntityRef<In> In; // YieldsId => FK_In_Source
EntitySet<In> Out_Source; // FK_Out_Source
}
The previous classes were perfectly symmetrical as they should be, but now the generated classes are completely asymmetrical. Can anyone explain this?
Since there seems to be no rhyme or reason to this, I created a command line tool that wraps sqlmetal and rewrites the association names. It's included in my open source Sasa utilities framework, and is called sasametal.