Techniques to analyze urbanization pattern using population and urban area data? - data-analysis

I have some cities' population data (number) and urban area (sq.m.) of 3 consecutive years (1991,2001&2011). I want to get the patterns of urbanization of those cities.
Could anyone suggest how to do that using depth & standards methods?

Related

Analyse data with degree of affection

Hello everyone! I'm a newbie studying Data Analysis.
If you'd like to see relationship how A,B,C affects outcome, you may use several models such as KNN, SVM, Logistics regression (as far as I know).
But all of them are kinda categorical, rather than degree of affection.
Let's say, I'd like to show how Fonts and Colors contribute the degree of attraction (as shown).
What models can I use?
Thousands thanks!
If your input is only categorical variables (and having a few values each), then there are finitely many potential samples. Therefore, the model will have finitely many inputs, and, therefore, only a few outputs. Just warning.
If you use, say, KNN or random forest, you can assign L2 norm as your accuracy metric. It will emphasize that 1 is closer to 2 than 5 (pls don't forget to normalize).

What is the term for the conceptual distance between ordered nodes?

What is the name for the ordered relation between nodes?
For example: A color ontology represented in a trie has ordered color objects such that the marginal node between yellow and blue is green, and node between blue and green is teal, etc. I called this indexical.
I found that the term indexical is owned by linguistics (en.wikipedia.org/wiki/Indexicality). I had used the term indexical in academic presentations with a civil engineering audience that has more of a computer science awareness than usual -- nobody questioned my definition.
Searching online I found 'edit distance' and 'ordered.' Neither has the meaning I want.
In my presentation, I use the spork, the spoon and fork, as an example of a marginal object that requires a new node between spoon and fork (www.youtube.com/watch?v=ruJ76-o5lxU).
A broad example: Take every product in a grocery store and line them up, arrange those items with a closeness that represents their similarity. So, oranges and apples will be closer together than beef and fish. Which will both be closer together than to paper towels.
EDIT1: Revised the examples to any position between two points.
EDIT2: Simplified question.

Map City Boundary Data To Population Google Fusion Tables

I'm trying to find KML or polygon data for every city and then combine population data with it to create a heatmap in Google Fusion Tables. I'm not sure where to get the polygon data for the cities though?
I'm using this to compare population sizes against distance to map out territories that are worth while to campaign in while having reasonable driving distances.
Any suggestions are welcome. Thanks!
Natural Earth has a large data set that might be useful.
From their Features page, under Cultural:
Urban polygons – derived from 2002-2003 MODIS satellite data.
They're available at 10 m & 50 m resolutions.
http://www.naturalearthdata.com/downloads/ (Note: The urban polygons are in the Cultural data set!)
It depends on the city, but most large ones (i.e. Seattle, San Fran, LA, New York, Chicago, etc.) have county or state GIS data warehouses that can be accessed for free for public data such as transportation networks, zipcode or census tract (block group) polygons.
It'll take a bit of sifting, but google search engine with keywords: "city name" + words like kml, shapefile, zipcodes, GIS data should give you a place to start.
One example would be to type in "LA GIS data" into Google search.
If you're struggling and want specific links, let me know which cities you're interested in.

Extract GLM coefficients for one factor only

I am running a series of GLMs for a number of species of the form:
glm.sp<-glm(number~site+as.factor(year)+offset(log(visits)),family=poisson,data=data.sp)
Note that the year term is deliberately a factor as it's not reasonable to assume a linear relationship. The yearly coefficients produced from this model are a measure of the number of each species per year, taking account of the ammount of effort (visits). I then want to extract, exponentiate and index (relative to the last year) the year coefficients and run a GAM on them.
Currently I do this by eyeballing the coefficients and calling them directly:
data.sp.coef$coef<-exp(glm.sp$coefficients[60:77])
However as the number of sites and the number of years recorded for each species are different this means I need to eyeball each species. For example, a different species might have the year coefficients at 51:64. I'd rather not do that, and feel there must be a better way of calling out the coefficients for the years.
I've tried, the below (which doesn't work!)
> coef(glm.sp)["year"]
<NA>
NA
And I also tried saving all the coefficients as a dataframe and using a fuzzy search to extract all the values that contained "year" (the coefficients are automatically saved in the format yearXXXX-YY).
I'm certain I'm missing something simple, so would very much appreciate being proded in the right direction!
Thanks
Matt

Area of overlap of two circular Gaussian functions

I am trying to write code that finds the overlap of between 3D shapes.
Each shape is defined by two intersecting normal distributions (one in the x direction, one in the y direction).
Do you have any suggestions of existing code that addresses this question or functions that I can utilize to build this code? Most of my programming experience has been in R, but I am open to solutions in other languages as well.
Thank you in advance for any suggestions and assistance!
The longer research context on this question: I am studying the use of acoustic space by insects. I want to know whether randomly assembled groups of insects would have calls that are more or less similar than we observe in natural communities (a randomization test). To do so, I need to randomly select insect species and calculate the similarity between their calls.
For each species, I have a mean and variance for two call characteristics that are approximately normally distributed. I would like to use these two call characteristics to build a 3D probability distribution for the species. I would then like to calculate the amount by which the PDF for one species overlaps with another.
Please accept my apologies if the question is not clear or appropriate for this forum.
I work in small molecule drug discovery, and I frequently use a program (ROCS, by OpenEye Scientific Software) based on algorithms that represent molecules as collections of spherical Gaussian functions and compute intersection volumes. You might look at the following references, as well as the ROCS documentation:
(1) Grant and Pickup, J. Phys. Chem. 1995, 99, 3503-3510
(2) Grant, Gallardo, and Pickup, J. Comp. Chem. 1996, 17, 1653-1666