get_decennial() returns error for 2020 data? - census

I've been working with census data from 2000, 2010, and 2020 for a local geography. My pulls for 2000 and 2010 work with no problem, but I get an error when I try to pull 2020 data.
vars = c("P001001", #Total population
"H013001", #Total households
"P016002", #Total population under 18
"P020002" #Total!!Households with one or more people under 18 years
)
#import 2020 census data by block
phl_block_demos = get_decennial(geography = "block",
year = 2020,
variables = vars,
geometry = T, #as an sf
sumfile = "sf1",
state = "PA",
county = "Philadelphia",
output = "wide") |>
st_transform(crs = st_crs("EPSG:4326")) |>
mutate(tract_num = substr(GEOID, 1, 11))
R returns
Using the PL 94-171 Redistricting Data summary file
Error in UseMethod("select") :
no applicable method for 'select' applied to an object of class "character"
I get the same error message when I change the geography to "tract", when I set geometry = F, and when I run only the part of the code before the first pipe. The only thing that seems to work is picking a different census year, since 2000 and 2010 work fine.
Any ideas what the issue is? The data were just released earlier this month so I haven't been able to find anything about this online yet.

Related

Understanding Mutual Information on Titanic Dataset

I was reading about Mutual Information on the Kaggle courses: https://www.kaggle.com/code/ryanholbrook/mutual-information
After that I tried it out on the Titanic Competition Dataset and I encountered a weird behaviour .
I will post the code further below.
I ranked all the features with mutual information and received the following output:
PassengerId 0.665912
Name 0.665912
Ticket 0.572496
Cabin 0.165236
Sex 0.150870
Fare 0.141621
Age 0.066269
Pclass 0.058107
SibSp 0.023197
Embarked 0.016668
Parch 0.016366
According to the documentation
Mutual information (MI) [1] between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.
From my point of view at least PassengerId should be independent as well as the name. because I used factorize() on all objects. Which leaves me with 100% unique values for both Id and Names. There are in total 891 rows in the training dataset.
# number of unique values for top 2 MI
print(X_mi["PassengerId"].nunique())
print(X_mi["Name"].nunique())
891
891
My question is how does this happen? And why does PassengerId and Name with all unique values score even higher than lets say Age or Sex?
I followerd the Kaggle course on the link above. Only difference should be that I used
from sklearn.feature_selection import mutual_info_classif
instead of
from sklearn.feature_selection import mutual_info_regression
because my target is a discrete target.
Here is the relevant code:
X_train_full = pd.read_csv("/kaggle/input/titanic/train.csv")
X_test_full = pd.read_csv("/kaggle/input/titanic/test.csv")
X_mi = X_train_full.copy()
y_mi = X_mi.pop("Survived")
# Label encoding for categoricals
for colname in X_mi.select_dtypes("object"):
X_mi[colname], _ = X_mi[colname].factorize()
# fill all NaN values of age with mean and cast type to int
X_mi["Age"] = X_mi["Age"].transform(lambda age: age.fillna(age.mean()))
X_mi["Age"] = X_mi["Age"].transform(lambda age: age.astype("int"))
# cast fare type to int
X_mi["Fare"] = X_mi["Fare"].transform(lambda fare: fare.astype("int"))
# All discrete features should now have integer dtypes (double-check this before using MI!)
discrete_features = X_mi.dtypes == int
from sklearn.feature_selection import mutual_info_classif
def make_mi_scores(X, y, discrete_features):
mi_scores = mutual_info_classif(X, y, discrete_features=discrete_features)
mi_scores = pd.Series(mi_scores, name="MI Scores", index=X.columns)
mi_scores = mi_scores.sort_values(ascending=False)
return mi_scores
mi_scores = make_mi_scores(X_mi, y_mi, discrete_features)
print(mi_scores) # show features with their MI scores
Any explanation or suggestions what I might have done wrong?
From Data Analytics point of view I might have some mistakes but how does a fully unrelated feature like PassengerId score so high and higher than the others?
Thank you :)

How would I be able to compare 2 sets of JSON data and check if one is in the other

So I am coding a Discord bot using Python that gets the data from Roblox's API to check if a user is in a allied group. So I made a loop to find if the user is in the same group but it does not work as intended. When I was testing it, it would say I was not in the group even though I was. I received no errors when running this code:
for x in data:
num = data.index(x)
if data[num]["Name"] == "ΒΕ • British Empire":
print("its bigs squad")
print(data[num]["Role"])
bruh = str(data[num]["Role"])
dat = requests.get("https://api.roblox.com/groups/8616457/allies")
date = dat.json()['Groups']
groups = list(date)
for f in groups:
e = groups.index(f)
print(e)
if data[num]['Name'] == groups[e]['Name']:
print("in group")
member = ctx.author
role = discord.utils.get(member.guild.roles, name=bruh)
await member.add_roles(role)
await ctx.send(f"Gave the **{bruh}** role!") #fix
else:
print("not in group")```
###JSON data I am comparing
{"Groups":[{"Name":"ΒΕ • Administration","Id":8664914,"Owner":{"Name":"EA6Y","Id":100403322},"EmblemUrl":"http://www.roblox.com/asset/?id=6109292568","Description":"𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭 & 𝐀𝐝𝐦𝐢𝐧𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧\n▬▬▬▬▬▬▬▬▬▬▬\n\nThe Development & Administration team include developers, administrators, and moderators that keep the group running smoothly, in-game, and in the communication servers."```
###Also comparing this one
[{"Name":"Roblox","Id":7,"EmblemUrl":"http://www.roblox.com/asset/?id=607941191","EmblemId":607941191,"Rank":255,"Role":"Owner","IsPrimary":false,"IsInClan":false},{"Name":"Roblox Wiki","Id":127081,"EmblemUrl":"http://www.roblox.com/asset/?id=607954721","EmblemId":607954721,"Rank":254,"Role":"Wiki System Operator","IsPrimary":false,"IsInClan":false}]

Get empty prediction with Facebook Prophet

Following the basic steps to create Prophet model and forecast
m = Prophet(daily_seasonality=True)
m.fit(data)
forecast = m.make_future_dataframe(periods=2)
forecast.tail().T
the result is as following (no yhat value ??)
The data passed in to fit the model has two columns (date and value).
Not sure what I have missed out here.
I managed to get it works by creating a new dataframe:
df_p = pd.DataFrame({'ds': d.index, 'y': d.values})

Reading content from REST web service in MATLAB

I'm using the webread() function to retrieve sunrise and sunset data from the sunrise-sunset.org api here.
here is what my code looks like:
function [E_total] = solar_energy(lng, lat, yr, month, day)
% Generate URL
url = strcat('https://api.sunrise-sunset.org/json?lat=', num2str(lat),...
'&lng=', num2str(lng), '&date=', num2str(yr), '-', num2str(month),...
'-',num2str(day));
% Retrieve data
forecast = webread(url)
if isempty(forecast) % Failed, use default estimates
sunrise = 6;
sunset = 18;
noon = 12;
elseif forecast.status == 'OK'
% Success! Parse retrieved data...
forecast.results
dv = datevec(forecast.results.sunrise)
sunrise = dv(6)/3600 + dv(5)/60 + dv(4)
dv = datevec(forecast.results.sunset)
sunset = dv(6)/3600 + dv(5)/60 + dv(4)
dv = datevec(forecast.results.solar_noon)
noon = dv(6)/3600 + dv(5)/60 + dv(4)
end
here is an example of what I get for sunrise and sunset data for 12/2/2017 at the washington national monument.
Input:
solar_energy( -77.0353, 38.8895, 2017, 12, 02)
Here is what I get:
forecast =
results: [1x1 struct]
status: 'OK'
ans =
sunrise: '12:00:01 AM'
sunset: '12:00:01 AM'
solar_noon: '9:38:38 AM'
day_length: '00:00:00'
civil_twilight_begin: '12:00:01 AM'
civil_twilight_end: '12:00:01 AM'
nautical_twilight_begin: '12:00:01 AM'
nautical_twilight_end: '12:00:01 AM'
astronomical_twilight_begin: '12:00:01 AM'
astronomical_twilight_end: '12:00:01 AM'
Is there something wrong in my method or is there an issue with this api?
The data is retrieved successfully, but for all the dates the sunrise and sunset times either read 12:00:00 AM or they are at 9 something AM.
Ok, I think I found your problem. I gave a look at the API documentation and I discovered it supports an additional parameter called formatted, which is described as follows:
formatted (integer): 0 or 1 (1 is default). Time values in response
will be expressed following ISO 8601 and day_length will be expressed
in seconds. Optional.
I tried appending it to the request created in your function. In the meanwhile, I also fixed a small problem concerning the date parameter you were using in your call:
date (string): Date in YYYY-MM-DD format. Also accepts other date
formats and even relative date formats. If not present, date defaults
to current date. Optional.
The 'DD' format for days express the day value in two digits (for example: 21 if day value is 21, 02 if day value is 2). Using num2str doesn't reproduce this behavior since num2str(2) = '2' and num2str(02) = '2'. A quick fix is using datestr(day,'dd') instead.
Here is the final result:
solar_energy(-77.0353, 38.8895, 2017, 12, 02);
function [E_total] = solar_energy(lng, lat, yr, month, day)
url = strcat('https://api.sunrise-sunset.org/json', ...
'?lat=', num2str(lat), ...
'&lng=', num2str(lng), ...
'&date=', num2str(yr),'-',num2str(month),'-',datestr(day,'dd'), ...
'&formatted=0');
forecast = webread(url);
if (isempty(forecast) || ~strcmp(forecast.status,'OK'))
sunrise = 6;
noon = 12;
sunset = 18;
else
forecast.results
end
end
This is the result that the code above produces:
sunrise: '2017-12-02T12:09:28+00:00'
sunset: '2017-12-02T21:46:20+00:00'
solar_noon: '2017-12-02T16:57:54+00:00'
day_length: 34612
civil_twilight_begin: '2017-12-02T11:40:03+00:00'
civil_twilight_end: '2017-12-02T22:15:44+00:00'
nautical_twilight_begin: '2017-12-02T11:06:58+00:00'
nautical_twilight_end: '2017-12-02T22:48:50+00:00'
astronomical_twilight_begin: '2017-12-02T10:34:47+00:00'
astronomical_twilight_end: '2017-12-02T23:21:01+00:00'
As you can see, the values returned look correct. So the problem is caused by how the API handles the conversion of the dates it retrieves from the database in ISO 8601 to another format.
Of course, you have to change the way you are currently parsing the values returned by the API. This should do the job:
datevec(struct.sunrise,'yyyy-mm-ddTHH:MM:ss');

How can I make this query work in D7?

I'm trying to rewrite this database query from the line 52 of my template.php D6 site
$uid = db_query('SELECT pm.author FROM {pm_message} pm INNER JOIN {pm_index} pmi ON pmi.mid = pm.mid AND pmi.thread_id = %d WHERE pm.author <> %d ORDER BY pm.timestamp DESC LIMIT 1', $thread['thread_id'], $user->uid);
into D7 standards.
But it keeps giving me
Recoverable fatal error: Argument 2 passed to db_query() must be an
array, string given, called in
C:\wamp2\www\site-name\sites\all\themes\simpler\template.php on line
52 and defined in db_query() (line 2313 of
C:\wamp2\www\site-name\includes\database\database.inc).
This DB query is part of a template.php snippet that shows user pictures in Private Messages module, and makes it look like Facebook or other social networking site. You can see the full snippet here. Because Private Messages has a unified value $participants (or the message thread) this DB query is basically trying to isolate the last author except the current user.
What is the correct syntax?
As the error message says: 'Argument 2 passed to db_query() must be an array ...'.
Drupal 7 switched the database layer to use PDO, so placeholder replacement in db_query() changed a bit - try:
$query = 'SELECT pm.author FROM {pm_message} pm'
. ' INNER JOIN {pm_index} pmi ON pmi.mid = pm.mid AND pmi.thread_id = :thread_id'
. ' WHERE pm.author <> :uid'
. ' ORDER BY pm.timestamp DESC LIMIT 1';
$args = array(
':thread_id' => $thread['thread_id'],
':uid' => $user->uid,
);
$uid = db_query($query, $args)->fetchField();
Splitted and reformatted for readability. Untested, so beware of typos.
Note the ->fetchField() at the end - this will only work for queries returning exactly one field (like this one). If you need to fetch more fields or records, look at the DatabaseStatementInterface documentation.