Pulling all posts in vBulletin - mysql

I want to create a script that counts threads words' count in a vBulletin forum. Bascially, I want to pull that database from the mysql database, and play with it. I don't have experience working with vBulettin, so I'm thinking of two ways:
Does vBulletin provides API to handle database stuff. (Allow me to grab all the threads content, and URLs). I'm almost sure there is, a link where to start?
Is there a solution doing this without the interferance of vBulletin. This means grab the data manually from the mysql database and do stuff the typical way.
I'll prefer the second method if the vBulettin learning curve is too steep. Thanks for the advice.

Is this for vBulletin 3 or 4?
I mostly work with vB3, and the quickest way to include all of the vBulletin resources is to create a new php file in your forums directory with the following code.
<?php
error_reporting(E_ALL & ~E_NOTICE & ~8192);
require_once('./global.php');
var_dump($vbulletin);
That $vbulletin variable is the registry object that contains just about everything you're ever going to need, including the database connection and it's read and write functions, userinfo data, cleaned _POST and _REQUEST values, and a lot more (phrases, session data, caches, etc).
There are 4 database functions you'll use the most.
$vbulletin->db->query_read() // fetch more than one row
$vbulletin->db->fetch_array() // converts the query_read returned data into an array
$vbulletin->db->query_first() // fetches a single row
$vbulletin->db->query_write() // update, insert or delete rows, tables, etc.
query_read is what you would use when you expect more than one result that you intend to loop through. For example, if you wanted to count all the words in a single thread, you would would need to query the post table with the threadid, loop through each post in that thread and count all the words in the message.
Note: "TABLE_PREFIX" is a constant set in config.php. It's important to always prepend the table name with that constant in case other forums decide to prefix their tables.
<?php
error_reporting(E_ALL & ~E_NOTICE & ~8192);
require_once('./global.php');
$threadid = 1;
// fetch all post from a specfic thread
$posts = $vbulletin->db->query_read("
SELECT pagetext
FROM " . TABLE_PREFIX . "post
WHERE threadid = $threadid
");
/**
* Loop through each post.
*
* Here we use the "fetch_array" method to convert the MySQL data into
* a useable array. 99% of the time you'll use "fetch_array" when you
* use "query_read".
*
* $post will contains the post data for each loop. In our case, we only
* have "pagetext" avaliable to use since that's all we told MySQL we needed
* in our query. You can do SELECT * if you want ALL the table data.
*/
while ($post = $vbulletin->db->fetch_array($posts)) {
$totalWords = $totalWords + str_word_count($post['pagetext']);
}
/**
* Print the total number of words this thread contains.
*
* The "vb_number_format" is basically wrapper of php's "number_format", but
* with some vBulletin extras. You can visit the /includes/functions.php file
* for all the functions available to you. Many of them are just convenient
* functions so you don't have to repeat a lot of code. Like vBDate(), or
* is_valid_email().
*/
echo sprintf("Thread ID %i contains %s words", $threadid, vb_number_format($totalWords));
The query_first function is what you would use when you need to fetch a single row from the database. No looping required or anything like that. If, for instances, you wanted to fetch a single user's information from the database - which you don't need a query for, but we'll do it as an example - you would use something like this.
<?php
error_reporting(E_ALL & ~E_NOTICE & ~8192);
require_once('./global.php');
$userid = 1;
$user = $vbulletin->db->query_first("
SELECT *
FROM " . TABLE_PREFIX . "user
WHERE userid = $userid
");
echo sprintf("Hello, %s. Your email address is %s and you have %s posts",
$user['username'],
$user['email'],
vb_number_format($user['posts'])
);
Lastly, if you wanted to update something in the database, you would use "query_write". This function is pretty straight forward. This function just takes any MySQL update insert or delete query. For example, if I needed to update a user's yahoo id, you would do.
<?php
error_reporting(E_ALL & ~E_NOTICE & ~8192);
require_once('./global.php');
$userid = 1;
$update = $vbulletin->db->query_write("
UPDATE " . TABLE_PREFIX . "user
SET yahoo = 'myYahooID#yahoo.com'
WHERE userid = $userid
");
if ($update) {
$userinfo = fetch_userinfo($userid);
echo sprintf("Updated %s yahoo ID to %s", $userinfo['username'], $userinfo['yahoo']);
}
Hopefully this will help you get started. I would recommend using vBulletin 3 for a little while until you're comfortable with it. I think it'll be easier on you. Most of this will translate to vBulletin 4 with some tweaking, but that code base is not as friendly to new comers.

Related

How to get data from big table row by row

I need to get all data from mysql table. What I've try so far is:
my $query = $connection->prepare("select * from table");
$query->execute();
while (my #row=$query->fetchrow_array)
{
print format_row(#row);
}
but there is always a but...
Table has about 600M rows and apparently all results from query is store in memory after execute() command. There is not enough memory for this:(
My question is:
Is there a way to use perl DBI to get data from table row by row?Something like this:
my $query = $connection->prepare("select * from table");
while (my #row=$query->fetchrow_array)
{
#....do stuff
}
btw, pagination is to slow:/
apparently all results from query is store in memory after execute() command
That is the default behaviour of the mysql client library. You can disable it by using the mysql_use_result attribute on the database or statement handle.
Note that the read lock you'll have on the table will be held much longer while all the rows are being streamed to the client code. If that might be a concern you may want to use SQL_BUFFER_RESULT.
The fetchall_arrayref
method takes two parameters, the second of which allows you to limit the number of rows fetched from the table at once
The following code reads 1,000 lines from the table at a time and processes each one
my $sth = $dbh->prepare("SELECT * FROM table");
$sth->execute;
while ( my $chunk = $sth->fetchall_arrayref( undef, 1000 ) ) {
last unless #$chunk; # Empty array returned at end of table
for my $row ( #$chunk ) {
print format_row(#$row);
}
}
When working with Huge Tables I build Data Packages with dynamically built SQL Statements like
$sql = "SELECT * FROM table WHERE id>" . $lastid . " ORDER BY id LIMIT " . $packagesize
The Application will dynamically fill in $lastid according to each Package it processes.
If table has an ID Field id it has also an Index built on that Field so that the Performance is quite well.
It also limits Database Load by little rests between each Query.

How should I select the next item to process when running processors in parallel?

I'm asking this question without database specifics because it feels like the answer may lie in a common design pattern, and I don't necessarily need a system specific solution ( my specific system setup is referenced at the end of the question ).
I've got a database of companies containing an id, a url, and a processing field, to indicate whether or not that company is currently being processed by one of my crawlers. I run many crawlers in parallel. Each one needs to select a company to process and set that company as processing before it starts so that each company is only being processed by a single crawler at any given time.
How should I structure my system to keep track of what companies are being processed?
The challenge here is that I cannot search my database for a company that is not being processed and then update that company to set it as processed, because another crawler may have chosen it in the meantime. This seems like something that must be a common problem when processing data in parallel so I'm looking for a theoretical best practice.
I used to use MySQL for this and used the following code to maintain consistency between my processors. I'm redesigning the system, however, and now ElasticSearch is going to be my main database and search server. The MySQL solution below always felt like a hack to me and not a proper solution to this paralellization problem.
public function select_next()
{
// set a temp variable that allows us to retrieve id of the row that is updated during next query
$sql = 'SET #update_id := 0';
$Result = $this->Mysqli->query( $sql );
if( ! $Result )
die( "\n\n " . $this->Mysqli->error . "\n" . $sql );
// selects next company to be crawled, marks as crawling in the db
$sql = "UPDATE companies
SET
crawling = 1,
id = ( SELECT #update_id := id )
WHERE crawling = 0
ORDER BY last_crawled ASC, id ASC
LIMIT 1";
$Result = $this->Mysqli->query( $sql );
if( ! $Result )
die( "\n\n " . $this->Mysqli->error . "\n" . $sql );
// this query returned at least one result and there are companies to be crawled
if( $this->Mysqli->affected_rows > 0 )
{
// gets the id of the row that was just updated in the previous query
$sql = 'SELECT #update_id AS id';
$Result = $this->Mysqli->query( $sql );
if( ! $Result )
die( "\n\n " . $this->Mysqli->error . "\n" . $sql );
// set company id
$this->id = $Result->fetch_object()->id;
}
}
One approach that is often used for such problems is sharding. You can define a deterministic function that assigns each row in the database to a crawler. In your case, such function can simply be a company id modulo number of crawlers. Each crawler can sequentially process companies that belong to this worker shard, which guarantees no companies are ever processed simultaneously.
Such approach is used for example by Reduce part of MapReduce.
An advantage is that no transactions or locking are required which are hard to implement and often are a bottleneck, especially in a distributed environment. A disadvantage is that work can be partitioned not equally between crawlers, in which case some crawlers are idle when others are still processing.

odd sql error, variable not being recognized correctly

I'm currently in hour two of this issue, I can't explain it so I will simply show what is going on. I don't know if this matters at all, but I am using the linkedIN API to retrieve a user's linkedIn unique ID.
In English, what I'm doing:
User Signs in with LinkedIn
I read-in user's LinkedIn ID (returned from the API)
If ID exists in database, say "hello", if not, show them a form to register
The issue I am having:
The following line works and properly returns the 1 user I have in the database with a linkedIn ID of OtOgMaJ2NM
$company_data = "SELECT * FROM s_user WHERE `LI_id` = 'OtOgMaJ2NM'";
The following query returns no results - using the same database with the same record in the table s_user:
$linkedIn_id = "<?js= id ?>";
echo $linkedIn_id;
The following code outputs OtOgMaJ2NM with no trailing spaces.
So far so good ... expcept when I run the query this time using the variable, no records are returned!
$company_data = "SELECT * FROM s_user WHERE `LI_id` = '$linkedIn_id'";
Further notes:
When I echo $company_data the same query is displayed when I use the variable as did when I used the plain text version of the query.
Anyone have ANY ideas?
Thanks,
Evan
I can only assume that when echoing variables it strips the tags, so when you're using it with the query you're actually saying:
$company_data = "SELECT * FROM s_user WHERE `LI_id` = '<?js= OtOgMaJ2NM ?>'";
I could be wrong, but have you tried stripping the tags from the variable?
If you send the variable between the "", the MySQL engine will search for $linkedIn_id literally and not for its content.
Seems you are using php, but I'm not sure about the right syntax. Take a look in the docs.

MySQL Security Check

Evening all,
Before i make my site live i obviously want to ensure it's secure (or as secure as possible).
I have a search form, an opportunity for a user to upload an entry to a database, and that's about it i think.
So i just want to check what i should be doing to protect things. Firstly, my database is accessed by a dedicated user account (not admin or root), so i think i've got that part locked down.
Secondly, on all my search queries i have this sort of format:
$result = mysql_query(
"SELECT *
FROM table
WHERE fieldname = '" . mysql_real_escape_string($country) . "'
AND county = '" . mysql_real_escape_string($county) . "'
ORDER BY unique_id DESC");
Finally, on the $_POST fields from my submission form, i treat the variables with this BEFORE they are inserted into the database:
$variable = mysql_real_escape_string($variable);
$result = mysql_query(
"INSERT INTO table (columnone)
VALUES ($variable)";
Could anyone let me know what else i should be considering or whether this is acceptable enough?
Thanks in advance, as always,
Dan
The code looks fine, though you should look into using PDO prepared statements if at all possible.
Beyond that, make sure that whatever account your PHP code is using to connect to MySQL has the absolute minimum in the way of permissions. Most web-facing scripts do NOT need alter/drop/create type privileges. Most can get away with only update/insert/select/delete, and maybe even less. This way, even if something goes horribly wrong with your code-level security, a malicious user can't send you a '; drop table students -- type query (re: bobby-tables.com)
Everything you show looks fine in terms of protection against SQL injection, except for
$variable = mysql_real_escape_string($variable);
$result = mysql_query(
"INSERT INTO table (columnone)
VALUES ($variable)";
this desperately needs quotes around $variable - or as #Dan points out, a check for whether it's a number - to be secure. mysql_real_escape_string sanitizes string data only - that means, any attempt to break out of a string delimited by single or double quotes. It provides no protection if the inserted value is not surrounded by quotes.
Have you considered using like MYSQL PDO and bound parameters in your SQL?
http://php.net/manual/en/pdostatement.bindparam.php
My understanding is that this is considerably more secure that using mysql_real_escape_string.

Using enum in drupal

I have a mysql table id,name,gender,age religion( enum('HIN','CHR','MUS') ,category(enum('IND','AMR','SPA') where last 2 are enum datatype and my code in drupal was
$sql="SELECT * FROM {emp} WHERE age=".$age." and religion=".$rel." and category=".$categ;
$result=db_query_range($sql,0,10);
while($data=db_fetch_object($result))
{
print $data->id." ".$data->name."<br>";
}
I get no result or error . I'm trying different query with each field and all are fine except using enum.
for ex: $sql='SELECT * FROM {emp} WHERE religion="'.$rel.'"';
Is there any problem in using enum datatype in drupal
Enum is not something that I believe drupal can make with the schema API, which is what you in most cases want to use for modules and stuff. Also you are lacking an ending ) in your reference to it, but I'm sure you did it right when your made the table.
Enum is only a constraint that is built into the database when inserting values. So if you try to insert an invalid value, you will insert an empty string instead. So it wont have any effect on Drupal querying to get data. It also wont have any effect when Drupal insert values, other than converting invalid values to empty strings. You might want to check the your data, to see if it is as expected. You might just get no results because your query doesn't match anything.
Another thing is the way you construct your queries is a big NO NO, as it's very insecure. What you should do is this:
db_query("SELECT ... '%s' ...", $var);
Drupal will replace %s with your var and make sure there is no SQL injection and other nasty things. %s indicates the var is a string, use %d for ints and there are a few others I can't remember just now. You can have several placeholders like this, and they will be insert in order, much like the t function.
Seconding Googletorps advise on using parameterized queries (+1). That would not only be more secure, but also make it easier to spot the errors ;)
Your original query misses some quotes around your (String) comparison values. The following should work (Note the added single quotes):
$sql = "SELECT * FROM {emp} WHERE age='" . $age . "' and religion='" . $rel . "' and category='" . $categ . "'";
The right way to do it would be something like this:
$sql = "SELECT * FROM {emp} WHERE age='%s' and religion='%s' and category='%s'";
$args = array($age, $rel, $categ);
$result = db_query_range($sql, $args ,0 , 10);
// ...