Stumbleupon type query - mysql

Wow, makes your head spin!
I am about to start a project, and although my mySql is OK, I can't get my head around what required for this:
I have a table of web addresses.
id,url
1,http://www.url1.com
2,http://www.url2.com
3,http://www.url3.com
4,http://www.url4.com
I have a table of users.
id,name
1,fred bloggs
2,john bloggs
3,amy bloggs
I have a table of categories.
id,name
1,science
2,tech
3,adult
4,stackoverflow
I have a table of categories the user likes as numerical ref relating to the category unique ref. For example:
user,category
1,4
1,6
1,7
1,10
2,3
2,4
3,5
.
.
.
I have a table of scores relating to each website address. When a user visits one of these sites and says they like it, it's stored like so:
url_ref,category
4,2
4,3
4,6
4,2
4,3
5,2
5,3
.
.
.
So based on the above data, URL 4 would score (in it's own right) as follows: 2=2 3=2 6=1
What I was hoping to do was pick out a random URL from over 2,000,000 records based on the current users interests.
So if the logged in user likes categories 1,2,3 then I would like to ORDER BY a score generated based on their interest.
If the logged in user likes categories 2 3 and 6 then the total score would be 5. However, if the current logged in user only like categories 2 and 6, the URL score would be 3. So the order by would be in context of the logged in users interests.
Think of stumbleupon.
I was thinking of using a set of VIEWS to help with sub queries.
I'm guessing that all 2,000,000 records will need to be looked at and based on the id of the url it will look to see what scores it has based on each selected category of the current user.
So we need to know the user ID and this gets passed into the query as a constant from the start.
Ain't got a clue!
Chris Denman

What I was hoping to do was pick out a random URL from over 2,000,000 records based on the current users interests.
This screams for predictive modeling, something you probably wouldn't be able to pull off in the database. Basically, you'd want to precalculate your score for a given interest (or more likely set of interests) / URL combination, and then query based on the precalculated values. You'd most likely be best off doing this in application code somewhere.
Since you're trying to guess whether a user will like or dislike a link based on what you know about them, Bayes seems like a good starting point (sorry for the wikipedia link, but without knowing your programming language this is probably the best place to start): Naive Bayes Classifier
edit
The basic idea here is that you continually run your precalculation process, and once you have enough data you can try to distill it to a simple formula that you can use in your query. As you collect more data, you continue to run the precalculation process and use the expanded results to refine your formula. This gets really interesting if you have the means to suggest a link, then find out whether the user liked it or not, as you can use this feedback loop really improve the prediction algorithm (have a read on machine learning, particularly genetic algorithms, for more on this)

I did this in the end:
$dbh = new NewSys::mySqlAccess("xxxxxxxxxx","xxxxxxxxxx","xxxxxxxxx","localhost");
$icat{1}='animals pets';
$icat{2}='gadget addict';
$icat{3}='games online play';
$icat{4}='painting art';
$icat{5}='graphic designer design';
$icat{6}='philosophy';
$icat{7}='strange unusual bizarre';
$icat{8}='health fitness';
$icat{9}='photography photographer';
$icat{10}='reading books';
$icat{11}='humour humor comedy comedian funny';
$icat{12}='psychology psychologist';
$icat{13}='cartoons cartoonist';
$icat{14}='internet technology';
$icat{15}='science scientist';
$icat{16}='clothing fashion';
$icat{17}='movies movie latest';
$icat{18}="\"self improvement\"";
$icat{19}='drawing art';
$icat{20}='latest band member';
$icat{21}='shop prices';
$icat{22}='recipe recipes food';
$icat{23}='mythology';
$icat{24}='holiday resorts destinations';
$icat{25}="(rude words)";
$icat{26}="www website";
$dbh->Sql("DELETE FROM precalc WHERE member = '$fdat{cred_id}'");
$dbh->Sql("SELECT * FROM prefs WHERE member = '$fdat{cred_id}'");
#chos=();
while($dbh->FetchRow()){
$cat=$dbh->Data('category');
$cats{$cat}='#';
}
foreach $cat (keys %cats){
push #chos,"\'$cat\'";
push #strings,$icat{$cat};
}
$sqll=join("\,",#chos);
$words=join(" ",#strings);
$dbh->Sql("select users.id,users.url,IFNULL((select sum(scoretot.scr) from scoretot where scoretot.id = users.id and scoretot.category IN \($sqll\)),0) as score from users WHERE MATCH (description,lasttweet) AGAINST ('$words' IN BOOLEAN MODE) AND IFNULL((SELECT ref FROM visited WHERE member = '$fdat{cred_id}' AND user = users.id LIMIT 1),0) = 0 ORDER BY score DESC limit 30");
$cnt=0;
while($dbh->FetchRow()){
$id=$dbh->Data('id');
$url=$dbh->Data('url');
$score=$dbh->Data('score');
$dbh2->Sql("INSERT INTO precalc (member,user,url,score) VALUES ('$fdat{cred_id}','$id','$url','$score')");
$cnt++;
}
I came up with this answer about three months ago, and just cannot read it. So sorry, I can't explain how it finally worked, but it managed to query 2 million websites and choose one based on the history of a users past votes on other sites.
Once I got it working, I moved on to another problem!
http://www.staggerupon.com is where it all happens!
Chris

Related

Joining 2 tables together and using the where function based on a separate mysql query

I am building a training platform for work. I have created the requirements for a user to be trained based on a role given to them. If that role is aligned to a document it will sit against the user. I have managed to get most of the way but am struglling on the best way to finish the where statement within mysqli.
tbldocfiles is a list of my files. I am looking at docid (could be multiple files associated to the document)
tbltrainingaccess sets the roles (driver, warehouseman, customer services) and shows which role (by id) is associated to the document in docfiles.
tblusertraining is the list of users and what role they have associated to them. (driver, warehouseman, customer services).
I am listing the documents associated to the user so have thought the following is the best way:
Look at the user and how many roles he/she is allocated
Look at the roles returned in point 1 (where function)
Identify and match the documents that have the same roles as the user (Join function)
create the list, then look at the unique values for docid. (distinct value)
Example User Bri has the driver and warehouseman role.
There are 5 documents in the db, 3 of them are associated to the driver role (docid 1,2,3) and 2 of them are associated to the warehouseman role (docid 2,4) the 5th document is associayted to customerservice.
My query should do this:
List all documents associated to the roles, that are associated to the user Bri
1
2
3
2
4
Now select unique values (using docid) from the above list:
1,2,3,4.
So my answer will be a used as a count function at the end using mysql_fetch_rows
SELECT DISTINCT tbldocfiles.docid FROM tbldocfiles LEFT JOIN tbltrainingaccess ON (tbldocfiles.docid = tbltrainingaccess.docid) where groupid='1' or groupid='9'
The above code works. but i've got myself confused.
The where statement needs to be the result of a query similar to :
select * from tblusertrainingrole where userid='1' (1 will be a variable based on page selection)
the result in this would be 1, 9 which are the groupid results.
Basically any help would be appreciated! I am sure it will be simple but have burnt myself out on this for a while and most answers in here helped with joining but not the where statement (that I could find)
Thank you in advance everyone!
You can do a select statement in the where. Since it is an or statement you can use in for the results. Please replace * with the column name for the value you need. Should look like
where groupid in (select * from tblusertrainingrole where userid = '1')

Looking for correct query for where.not.any in Rails ActiveRecord

Beginner Rails Question.
I have a table of Users and a table of Teams.
A User has many teams and Teams belong to User.
I want to query if a user does not have a team.
I'm using this query:
User.joins(:teams).where.not(teams: {team_name: 'coconuts'})
This works except if the user has more than one team.
For example User Bill is on the coconuts team and the breadfruit team.
The above query returns Bill when he should be excluded because he is on the coconuts team.
I see why this is happening but I'm having trouble thinking of another query that will work for this scenario.
What is the correct way to grab this data?
I'm using Rails 4.
Try to the following, please consider simple and clean code vs performance:
team = Team.find_by(name: 'coconuts')
excluded_user_ids = team.user_ids
User.where.not(id: excluded_user_ids)
# If you want more a little bit efficiently and suppose you have the join model `Membership`
excluded_user_ids = team.memberships.pluck(:user_id)
# Or if you want more efficiently (just 1 query) and suppose you're using Postgresql
User
.left_outer_joins(:teams)
.group('users.id')
.select("users.*, count(teams.id) AS count_foo, count(teams.id) filter (where teams.name = 'coconuts') AS count_bar")
.having('count_foo != count_bar')
Using just Ruby, and not active record, you can do
User.select {|user| user.teams.pluck(:team_name).exclude?('coconuts')
}

MySQL finding data if any 4 of 5 columns are found in a row

I have an imported table of several thousand customers, the development I am working on runs on the basis of anonymity for purchase checkouts (customers do not need to log in to check out), but if enough of their details match the database record then do a soft match and email the (probably new) email address and eventually associate the anonymous checkout with the account record on file.
This is rolling out this way due to the age of the records, many people have the same postal address or names but not the same email address, likewise some people will have moved house and some people will have changed name (marriage etc).
What I think I am looking for is a MySQL CASE system, however the CASE questions on Stack Overflow I've found don't appear to cover what I'm trying to get from this query.
The query should work something like this:
$input[0] = postcode (zip code)
$input[1] = postal address
$input[2] = phone number
$input[3] = surname
$input[4] = forename
SELECT account_id FROM account WHERE <4 or more of the variables listed match the same row>
The only way I KNOW I can do this is with a massive bunch of OR statements but that's excessive and I'm sure there's a cleaner more concise method.
I also apologise in advance if this is relatively easy but I don't [think I] know the keyword to research constructing this. As I say, CASE is my best guess.
I'm having trouble working out how to manipulate CASE to fit what I'm trying to do. I do not need to return the values only the account_id from the valid row (only) that matches 4 or 5 of the given inputs.
I imagine that I could construct a layout that does this:
SELECT account_id CASE <if postcode_column=postcode_var> X=X+1
CASE <if surname_column=surname_var> X=X+1
...
...
WHERE X > 3
Is CASE the right idea?
If not, What is the process I need to use to achieve the desired results?
What is [another] MySQL keyword / syntax I need to research, if not CASE.
Here is your pseudo query:
SELECT account_id
FROM account
WHERE (postcode = 'pc')+
(postal_address = 'pa')+
(phone_number = '12345678901')+
(surname = 'sn')+
(forename= 'fn') > 3

Get the number of created Wikipedia articles during a specific period of time

I want to get number of articles written/created in some language (say English) during a specific week (say last week). How can I run this query on Wikipedia?
I have no experience in wikipedia-api
You can run this query on the Wikipedia database:
SELECT COUNT(*)
FROM enwiki_p.recentchanges
WHERE rc_new = 1
AND rc_namespace = 0
AND rc_timestamp BETWEEN 20160417000000 AND 20160424000000
See the results here. If you want to count the number of new articles for another language, you can change enwiki_p to another two-character language code.

MySQL: showing totals

I am trying to figure out how to have PHP check and print 2 different functions.
Both of these questions are referring to table called "remix".
The first, and more important problem at the minute, is I would like to know how to show how many DIFFERENT values are under "author", as to compile the amount of total authors registered. I need to know not only how to most efficiently use COUNT on returning UNIQUE names under "author", but how to show it inline with the total number of rows, which are currently numbered.
The second question would be asking how I would be able to set up a top 3 artists, based on how many times their name occurs in a list. This also would show on the same page as the above code.
Here is my current code:
require 'remix/archive/connect.php';
mysql_select_db($remix);
$recentsong = mysql_query("SELECT ID,song,author,filename FROM remix ORDER by ID desc limit 1;");
$row = mysql_fetch_array($recentsong);
echo'
<TABLE BORDER=1><TR><TD WIDTH=500>
Currently '.$row['ID'].' Remixes by **(want total artists here)** artists.<BR>
Most recent song: <A HREF=remix/archive/'.$row['filename'].'>'.$row['song'].'</A> by <FONT COLOR=white>'.$row['author'].'</FONT>
So as you can see, I have it currently set up to show the most recent song (not the most efficient way), but want the other things in there, such as at least the top contributor, but don't know if I would be able to put it all in one php block, break it, or be able to do it all within one quarry call, with the right code.
Thanks for any help!
I'm not sure I really understood everything in your question but we'll work this through together :p
I've created an SQLFiddle to work on some test data: http://sqlfiddle.com/#!2/9b613/1/0.
Note the INDEX on the author field, it will assure good performance :)
In order to know how to show how many DIFFERENT values are under "author" you can use:
SELECT COUNT(DISTINCT author) as TOTAL_AUTHORS
FROM remix;
In order to know the total number of rows, which are currently numbered you can use:
SELECT COUNT(*) as TOTAL_SONGS
FROM remix;
And you can combine both in a single query:
SELECT
COUNT(DISTINCT author) as TOTAL_AUTHORS,
COUNT(*) as TOTAL_SONGS
FROM remix;
To the top 3 subject now. This query will give you the 3 authors with the greatest number of songs, first one on top:
SELECT
author,
COUNT(*) as AUTHOR_SONGS
FROM remix
GROUP BY author
ORDER BY AUTHOR_SONGS DESC
LIMIT 3;
Let me know if this answer is incomplete and have fun with SQL !
Edit1: Well, just rewrite your PHP code in:
(...)
$recentsong = mysql_query("SELECT COUNT(DISTINCT author) as TOTAL_AUTHORS, COUNT(*) as TOTAL_SONGS FROM remix;");
$row = mysql_fetch_array($recentsong);
(...)
Currently '.$row['TOTAL_SONGS'].' Remixes by '.$row['TOTAL_AUTHORS'].' artists.<BR>
(...)
For the top3 part, use another mysql_query and create your table on the fly :)