Rtweet search multiple accounts - rtweet

I have a project which I need to get tweets from companies twitter accounts. I plan to use the rtweet package in R and run the search_fullarchive. My question is how can I link a list of accounts to the q? I'm trying just with 2 accounts names in the search_tweets fucntion, but it is not returning any observations. However, when I just use 1 name account it works.
queryaccounts <- "from:Amazon AND Microsoft"
ts11<- search_tweets(q = queryaccounts,n = 10,type = "recent")
Does anyone Know how to solve?

Try this:
queryaccounts <- c("from:Amazon", "from:Microsoft")
ts11<- search_tweets(q = queryaccounts,n = 10,type = "recent")

Related

Parsing data requests from google flights using google flights package

I'm working on interacting with the google flights api (qpx). I am using the following link and working with the following experimental package to feed in information for a request:
https://github.com/rweyant/googleflights
Below is the code I have thus far for anyone interested in replicating my results:
#call library and data-------------------------------------------------------------------
library(googleflights)
library(MUCflights) #to access airport codes
data("airports")
#codes for countries i'm interested in------------------------------------------
code_list = airports
#later interface for updating codes
my_destinations = matrix(c("San Juan", "Amsterdam", "Berlin",
"San Diego", "Lima", "Cali", "Havana"))
my_home = matrix(c("LGA", "JFK"))
#loop extract
code_list = airports
code_bucket = NULL
for (i in my_destinations) {
print(i)
drop = code_list[code_list$City == i,c("City","IATA")]
drop = as.data.frame(drop)
print(drop)
code_bucket = rbind(code_bucket, drop)
code_bucket = as.data.frame(code_bucket)
}
#clean my code bucket---------------------------------------------------------------
code_bucket = na.omit(code_bucket)
code_bucket = code_bucket[code_bucket$IATA != "",]
code_bucket
#feed in codes into function---------------------------------------------------------
#each ping to QPX will combine NYC to x
#data i want
# pricing
# times
key = "(key is here)"
set_apikey(key)
result_flights = search(my_home[1], code_bucket[2,2], "2016-11-27", "2016-11-28")
I've been looking through the package details to understand the functionality and noticed that the request comes back as a list as opposed to a JSON, which seems to be for the application of a "summarise_segment" function that isn't working for me. Here is the link to the function I'm referencing:
https://github.com/rweyant/googleflights/blob/master/R/unpack.R
I'm wondering if anyone has any luck or ideas for parsing out the request that returns? The resulting list is large and I'm reaching the limits of my knowledge on dealing with these structures. Any help in pointing me in the right direction would be appreciated!

VBA code to analyze a HTML table based off certain conditions

So I need to screen scrape data off a website and return it to a spreadsheet based off if a charge amount matched as well the date was the most recent in the table. If there was simply one line in the table, the macro pulls that accordingly. So most of the code is good, I am connected to the website, pulling everything effectively. Where I am struggling is getting the logic to work where the two amounts match as well as the date being the most recent in the HTML table.
I guess what my question is how do I loop through Item(5) the column of that table and specify it to choose the most recent date, also setting the value so that it only finds the one equal to the charge amount. I only want a one to one match. I am new to this so if anyone wants to help me I would greatly appreciate it.
Set IHEC = iHTMLDoc.getElementsByTagName("TR")
If IHEC.Length > 2 Then
For index = 0 to IHEC.Length - 1
Set IHEC_TD = IHEC.Item(index).getElementsByTagName("TD")
Do Until IHEC.Length <2 Or index = IHEC.Length - 1
If IHEC.TD.Item(3).innerText = myBilledAmount Then
myItem1 = IHEC_TDItem(0).innerText
myItem2 = IHEC_TDItem(1).innerText
myItem3 = IHEC_TDItem(2).innerText
myItem4 = IHEC_TDItem(3).innerText
myItem5 = IHEC_TDItem(4).innerText
myItem6 = IHEC_TDItem(5).innerText
myItem7 = IHEC_TDItem(6).innerText
myItem8 = IHEC_TDItem(7).innerText
myItem9 = IHEC_TDItem(8).innerText
End If
End If
Loop
Next Index

Include nested entity details but don't group by then when grouping by other fields

I working with Database first C# MVC, EF6, LINQ and JSon to try and pass data to both Highcharts and Google Maps for some of my reporting.
If I could add an image I would show you the relevant portion of my model, but sadly I need more reputation to do that...
The portion of the Entity Model I'm concentrating on right now is based on a central Docket that contains a BuildingCode as part of a one-to-many relationship to a building with and address and further relationship to the buildings polygons (for mapping). Dockets are also classified by one or more DocketTypes and thus there is a many-to-many relationship between Dockets and DocketTypes, which is not directly exposed to through the EF.
As an example a Docket which represents an investigation, could be related to the theft of a mobile phone in building A located on Campus X, not only was the cellphone stolen but the assailant also assaulted the victim in order to steal the mobile phone. So there are 2 DocketTypes here 1. Theft of mobile phone and 2. assault. Note: this is fictitious and for illustration purposes only .
One of my fundamental reports requires that I count how many docketTypes affect each building and each campus in a given period. When I display this I also need to show what the DocketTypes are.
I have no end of nightmare trying to find a way to get this right, I keep running into circular reference errors and needing to use explicit conversions when trying to model the data with LINQ so that I can pass a single nested object through JSON to the client side where displaying will occur.
In the below code I am told I need an Explicit conversion:
Cannot implicitly convert type 'Campus_Investigator.ViewModels.DocketTypeViewModel' to 'System.Collections.Generic.IEnumerable<Campus_Investigator.ViewModels.DocketTypeViewModel>'. An explicit conversion exists (are you missing a cast?)
var currentDocketQuery = from d in db.Dockets
from dt in d.DocketTypes
from bp in d.BuildingDetail.BuildingPolygons
where d.OccurrenceStartDate >= datetime && d.BuildingDetail.CampusName == Campus
select new CampusBuildingDocketTypeViewModel()
{
BuildingCode = d.BuildingDetail.BuildingCode,
BuildingName = d.BuildingDetail.BuildingName,
//BuildingPolygons = d.BuildingDetail.BuildingPolygons,
DocketTypes = new DocketTypeViewModel()
{
Category = dt.Category,
SubCategory = dt.SubCategory,
ShortDescription = dt.ShortDescription
}
};
I appreciate any ideas on how I can explicitly convert this or is that a better method I can use and avoid the circular reference error?
You included some redundant part in your query (which performs some inner join). The from bp in d.BuildingDetail.BuildingPolygons is joined in but then is not shown in the result. So it totally does not make sense. There may be duplicated elements in the result due to that. The from dt in d.DocketTypes is wrong joined in, although you need it in the result but because the DocketTypes is output per d in db.Dockets, so it's just simply queried like this:
var currentDocketQuery = from d in db.Dockets
where d.OccurrenceStartDate >= datetime && d.BuildingDetail.CampusName == Campus
select new CampusBuildingDocketTypeViewModel()
{
BuildingCode = d.BuildingDetail.BuildingCode,
BuildingName = d.BuildingDetail.BuildingName,
//BuildingPolygons = d.BuildingDetail.BuildingPolygons,
DocketTypes = d.DocketTypes
};
In fact I can see the commented line //BuildingPolygons = d.BuildingDetail.BuildingPolygons, so if you want to include that, it should also work.
If the DocketTypes has different type of d.DocketTypes, then you need a simple projection like this:
var currentDocketQuery = from d in db.Dockets
where d.OccurrenceStartDate >= datetime && d.BuildingDetail.CampusName == Campus
select new CampusBuildingDocketTypeViewModel()
{
BuildingCode = d.BuildingDetail.BuildingCode,
BuildingName = d.BuildingDetail.BuildingName,
//BuildingPolygons = d.BuildingDetail.BuildingPolygons,
DocketTypes = d.DocketTypes.Select(e => new DocketTypeViewModel()
{
Category = e.Category,
SubCategory = e.SubCategory,
ShortDescription = e.ShortDescription
})
};
I managed to solve this one by using the below. The major hassle with this is the circular referencing that exists in the model. When JSON serializes these, everything falls apart so it takes a lot of transforming to make sure that I only extract what I need. In this case grouped campus and building data (below includes the polygons which where only half commented out in the above) and then the include the detail of the DocketTypes that occurred at each building.
var datetime = DateTime.Now.AddDays(-30);
var campusDocket = from d in db.Dockets
where d.OccurrenceStartDate >= datetime && d.BuildingDetail.CampusName == Campus
group d by new { d.BuildingDetail.CampusName, d.BuildingDetail.BuildingCode, d.BuildingDetail.BuildingName } into groupdata
select new CampusBuildingDocketTypeViewModel
{
BuildingCode = groupdata.Key.BuildingCode,
BuildingName = groupdata.Key.BuildingName,
CampusName = groupdata.Key.CampusName,
Count = groupdata.Count(),
BuildingPolygons = from bp in db.BuildingPolygons
where bp.BuildingCode == groupdata.Key.BuildingCode
select new BuildingPolygonViewModel
{
Accuracy = bp.Accuracy,
BuildingCode = bp.BuildingCode,
PolygonOrder = bp.PolygonOrder,
Latitude = bp.Latitude,
Longitude = bp.Longitude
},
DocketTypes = from doc in db.Dockets
from dt in doc.DocketTypes
where doc.OccurrenceStartDate >= datetime && doc.BuildingCode == groupdata.Key.BuildingCode
select new DocketTypeViewModel
{
Category = dt.Category,
SubCategory = dt.SubCategory,
ShortDescription = dt.ShortDescription
}
};
The Answer again is ViewModels. I'm finding ViewModels seem to solve a lot of problems...

How to obtain a list of titles of all Wikipedia articles

I'd like to obtain a list of all the titles of all Wikipedia articles. I know there are two possible ways to get content from a Wikimedia powered wiki. One would be the API and the other one would be a database dump.
I'd prefer not to download the wiki dump. First, it's huge, and second, I'm not really experienced with querying databases. The problem with the API on the other hand is that I couldn't figure out a way to only retrieve a list of the article titles and even if it would need > 4 mio requests which would probably get me blocked from any further requests anyway.
So my question is
Is there a way to obtain only the titles of Wikipedia articles via the API?
Is there a way to combine multiple request/queries into one? Or do I actually have to download a Wikipedia dump?
The allpages API module allows you to do just that. Its limit (when you set aplimit=max) is 500, so to query all 4.5M articles, you would need about 9000 requests.
But a dump is a better choice, because there are many different dumps, including all-titles-in-ns0 which, as its name suggests, contains exactly what you want (59 MB of gzipped text).
Right now, as per the current statistics the number of articles is around 5.8M.
To get the list of pages I did use the AllPages API. However, the number of pages I get is around 14.5M which is ~3 times of what I was expecting. I restricted myself to namespace 0 to get the list. Following is the sample code that I am using:
# get the list of all wikipedia pages (articles) -- English
import sys
from simplemediawiki import MediaWiki
listOfPagesFile = open("wikiListOfArticles_nonredirects.txt", "w")
wiki = MediaWiki('https://en.wikipedia.org/w/api.php')
continueParam = ''
requestObj = {}
requestObj['action'] = 'query'
requestObj['list'] = 'allpages'
requestObj['aplimit'] = 'max'
requestObj['apnamespace'] = '0'
pagelist = wiki.call(requestObj)
pagesInQuery = pagelist['query']['allpages']
for eachPage in pagesInQuery:
pageId = eachPage['pageid']
title = eachPage['title'].encode('utf-8')
writestr = str(pageId) + "; " + title + "\n"
listOfPagesFile.write(writestr)
numQueries = 1
while len(pagelist['query']['allpages']) > 0:
requestObj['apcontinue'] = pagelist["continue"]["apcontinue"]
pagelist = wiki.call(requestObj)
pagesInQuery = pagelist['query']['allpages']
for eachPage in pagesInQuery:
pageId = eachPage['pageid']
title = eachPage['title'].encode('utf-8')
writestr = str(pageId) + "; " + title + "\n"
listOfPagesFile.write(writestr)
# print writestr
numQueries += 1
if numQueries % 100 == 0:
print "Done with queries -- ", numQueries
print numQueries
listOfPagesFile.close()
The number of queries fired is around 28900, which results in approx. 14.5M names of the pages.
I also tried the all-titles link mentioned in the above answer. In that case as well I am getting around 14.5M pages.
I thought that this overestimate to the actual number of pages is because of the redirects, and did add the 'nonredirects' option to the request object:
requestObj['apfilterredir'] = 'nonredirects'
After doing that I get only 112340 number of pages. Which is too small as compared to 5.8M.
With the above code I was expecting roughly 5.8M pages, but that doesn't seem to be the case.
Is there any other option that I should be trying to get the actual (~5.8M) set of page names?
Here is an asynchronous program that will generate mediawiki pages titles:
async def wikimedia_titles(http, wiki="https://en.wikipedia.org/"):
log.debug('Started generating asynchronously wiki titles at {}', wiki)
# XXX: https://www.mediawiki.org/wiki/API:Allpages#Python
url = "{}/w/api.php".format(wiki)
params = {
"action": "query",
"format": "json",
"list": "allpages",
"apfilterredir": "nonredirects",
"apfrom": "",
}
while True:
content = await get(http, url, params=params)
if content is None:
continue
content = json.loads(content)
for page in content["query"]["allpages"]:
yield page["title"]
try:
apcontinue = content['continue']['apcontinue']
except KeyError:
return
else:
params["apfrom"] = apcontinue

Django ORM query field weight?

I'm doing the following query:
People.objects.filter(
Q(name__icontains='carolina'),
Q(state__icontains='carolina'),
Q(address__icontains='carolina'),
)[:9]
I want the first results of the query to be the people who is named "Carolina" (and also matches other fields, but name first). The problem is that I don't think is any way to determine a field "weight" or "priority".
Any idea?
Thanks!
You'll need to do 3 queries for this to work:
names_match = People.objects.filter(name__icontains='carolina')[:9]
states_match = People.objects.filter(state__icontains='carolina')[:9]
addresses_match = People.objects.filter(address__icontains='carolina')[:9]
all_objects = list(names_match) + list(states_match) + list(addresses_match)
all_objects = all_objects[:9]
There are two problems with this approach, which are fairly easily worked round:
It does unnecessary queries (what if names_match contained enough items already).
It allows for duplicates (what if someone in North Carolina is called Carolina?)
This should work:
qs = People.objects.filter(name__icontains='carolina') | People.objects.filter( Q(state__icontains = 'carolina'), Q(address__icontains='carolina')).distinct()
qs = list(qs)[:9]
Or if you want a pure duplicate free list:
qs = list(set(qs))[:9] #for a duplicate free list