Puppeteer: Shoutbox/MessageBoard logging - puppeteer

I'm scraping a shoutbox which is limited to 10 messages; it's asynchronous and when the 11th item appears the first one is gone.
I set up a puppeteer, it scrapes the structure correctly as an array, which I dump to mongodb. The easiest way automating this I came up with is running script with the watch command and static interval.
The question is how to skip duplicates items in log, items shouldn't be unique, just don't dump the same twice. And there's probably a better way to cycle this process.attached screenshot

You can use db.collection.distinct() in MongoDB to obtain the distinct messages from your database:
db.messages.distinct( 'message' );
Alternatively, you can use db.collection.createIndex() to create a unique index in your database so that the collection will not accept insertion or update of a document where the index key value matches an existing value in the index:
db.messages.createIndex( { 'message' : 1 }, { 'unique' : true } );
In your Puppeteer script, you can use page.evaluate() in conjunction with the Set object to obtain distinct messages from the web page that you are scraping:
const distinct_messages = await page.evaluate( () => new Set( Array.from( document.querySelectorAll( '.message' ), e => e.textContent ) ) );

Related

How to query .tab pages from local wikidata instance using API

I am using the Extension:JsonConfig on my docker instance of wikidata that has some tables loaded onto it. The configuration for the extension in my LocalSettings.php is as follows,
$wgJsonConfigEnableLuaSupport = true;
$wgJsonConfigModels['Tabular.JsonConfig'] = 'JsonConfig\JCTabularContent';
$wgJsonConfigs['Tabular.JsonConfig'] = [
'namespace' => 486,
'nsName' => 'Data',
// page name must end in ".tab", and contain at least one symbol
'pattern' => '/.\.tab$/',
'license' => 'CC0-1.0',
'isLocal' => true,
'store' => true,
];
When i query the local instance using the following url,
http://<DOMAIN_HERE>/w/api.php?action=query&list=search&srsearch=tab contentmodel:Tabular.JsonConfig &srnamespace=486&srlimit=10&format=json
i receive the following response
{"batchcomplete":"","limits":{"search":10},"query":{"searchinfo":{"totalhits":0},"search":[]}}
which means that no matches have been found even though tables that match the query statement do exist.
This same query works with commons database when the following is done
https://commons.wikimedia.org/w/api.php?action=query&list=search&srsearch=tab%20contentmodel:Tabular.JsonConfig%20&srnamespace=486&srlimit=10&format=json
Can anyone point me out as to what i am doing wrong here?

How to delete all in json server

I am using this json server in my Angular app, to create, fetch, and delete posts.
In the following method, I delete a post with a specified id:
deleteConsumer(post: Post): Observable<Post> {
const url = `${this.apiUrl}/${post.id}`;
return this.httpClient.delete<Post>(url);
}
I looked at the .delete code and searched for something like a .deleteall but could not find it. Is there really no such method that would delete everything?
If there really isn't, then my attempt at doing it myself is not paying off, because what I have done is not working:
deleteConsumers(): Observable<Post> {
let i: number = 0;
this.httpClient.get<Post[]>(this.apiUrl).forEach(
() => {
++i;
const url = `${this.apiUrl}/${i}`;
return this.httpClient.delete<Post>(url);
}
);
}
Obviously, this is wrong in terms of return type, but I cannot figure out what to do... How can I modify the first method, so it would go through all the json objects in my db.json file; meaning iterate through all the existing posts and delete them all?
I did encounter this when using json-server with Vue.js and I realized that there was no special function to delete all at once. I had to work around it.
So, for example in your case, I would first map the posts array to get a new array with only the post ids:
const postsIdsArray = this.posts.map((post) => post.id)
Then, assuming you already have a function to delete one post given the id, I would then execute the function for each of the ids in the array:
postsIdsArray.forEach((id) => this.deletePost(id))
Just combine the two lines in one JavaScript function (in this case I used Vue.js):
deleteAllPosts(){
const postsIdsArray = this.posts.map((post) => post.id)
postsIdsArray.forEach((id) => this.deletePost(id))
}

YII2 Session saving multiple record on one request

i'm trying to use Dbsession to track user' activity and i got everything set and running according to yii documentation, but when a user load a page multiple session record was saved in the database in one request. image below shows the data in the database what is the cause of this and any solution to fix this?
In my config file i have this
'session' => [
// this is the name of the session cookie used for login on the frontend
//'name' => 'advanced-frontend',
'class' => 'yii\web\DbSession',
'writeCallback' => function ($session) {
return [
'user_id' => \Yii::$app->user->id,
'ip' => \Yii::$app->clientip->get_ip_address(),
];
},
],
First column (id) is primary key and should be unique (it is declared in this way in migration). You have probably messed something with table schema - you should not be able to save 3 records with the same ID. DbSession is using upsert() and relies on uniqueness of id column.
Make sure that id column is primary key, or at least have UNIQUE constraint.

Checking 2 Column for Duplicate

Currently, I have a system to hold main data
1) The email
2) The owner(user_id)
Every time someone uploads , I need to make sure that it doesn't not exist in the system. The catch is as I upload more and more, the amount of time taken to check for duplicate will grow steeply, just like the graph as shown.
Question
1) How do i check for duplicate efficiently?
2) I indexed the user_id and the email should I Fulltext it? I wont be reading the text but will be searching for it as a whole, so index is more logical?
3) I also read about creating Hash combining email&owner id then index the hash. Will it be a big difference from the current method?
4) Last method i thought of was to create a primary key for both email and user_id , once again idk how the performance would turn out.
Please advice.
Code
$exist = DB::table('contact')->where('email', $row['email'])->where('user_id', $user_id)->count();
if($exist < 1){
DB::table('contact')->insert(
['email' => $row['email'], 'name' => $row['name'], 'user_id' => $user_id]
);
}
Use Laravel Validator:
public function store(Request $request)
{
$this->validate($request, [
'user_id' => 'required|unique',
'email' => 'required|unique',
]);
//some logic here
}
Also you should use unique constraint in your database.

Magento add a column to sales_flat_quote table and add data

I am coming from a previous enviornment where doing things like modifying queries and adding columns was just a matter of writing the sql and executing it. However, now that I'm working in Magento I want to do things "the Magento way".
Scenario: we use paypal express, and before the controller redirects to paypal, I would really like to add a field (if not there already) in sales_flat_quote, called paypal_status - and set the value = 1 (we'll call it, sent to paypal).
On return I want to update that to either 2 or 3 (returned and pending transaction, or returned and captured transaction).
So there are two things I need to know how to do:
have something like $db->addColumn('paypal_status') where it will only add if not exists, and
write UPDATE sales_flat_quote SET paypal_status = 1 WHERE entity_id =
{whatever}
This will be inside the ...Paypal_Express class.
Open database and fire this SQL: Alter table sales_flat_quote Add paypal_status tinyint(1) NOT NULL DEFAULT 1;
Alternatively, you can write following in your SQL file (located at CompanyName\MyModuleName\sql\companyname_modulename_setup) of your custom module. This file will get executed only one time , that is the first time when the module is installed. At that time your custom column will not be there in database so it will create one.
$installer = $this;
$installer->startSetup();
$installer->run("ALTER TABLE `{$installer->getTable('sales/quote')}` ADD `paypal_status` tinyint(1) NOT NULL DEFAULT 1 COMMENT 'My Custom Paypal Status';");
$installer->endSetup();
Clear all cahces.
To save data :
$myValue = 2;
Mage::getSingleton("checkout/cart")->getQuote()->setPaypalStatus($myValue)->save();
Mage::getSingleton("checkout/cart")->getQuote() will give you current quote.
In your sql file at CompanyName\MyModuleName\sql\companyname_modulename_setup copy the following code in order to create the column.
$installer = $this;
$installer->startSetup();
$installer->getConnection()
->addColumn($installer->getTable('sales/quote'),
'paypal_status',
array(
'type' => Varien_Db_Ddl_Table::TYPE_INTEGER,
'nullable' => true,
'comment' => 'Paypal Status',
)
);
$installer->endSetup();
Logout and login, and flush magento cache in order to add the column to the table.
The Express Checkout controller is in app/code/core/Mage/Paypal/Controller/Express/Abstract.php. If you want to add a field before the controller redirects to paypal you can modify the _initCheckout() method like this:
protected function _initCheckout()
$quote = $this->_getQuote();
if (!$quote->hasItems() || $quote->getHasError()) {
$this->getResponse()->setHeader('HTTP/1.1','403 Forbidden');
Mage::throwException(Mage::helper('paypal')->__('Unable to initialize Express Checkout.'));
}
$quote->setPaymentStatus(1); // Here is your change
$this->_checkout = Mage::getSingleton($this->_checkoutType, array(
'config' => $this->_config,
'quote' => $quote,
));
return $this->_checkout;
}