Analyzing user driving pattern data - data-analysis

so I have a lot of GPXs of users driving data from a game project where object which are placed on the road and then the user collects it. I want to somehow analyze these data to find out how users tend to drive given different objects, which ones draw them the most, which ones draw least. I have not done any data analysis before, so how can I analyze these data to get this sort of information? This might sound very novice, but yeah any help is appreciated.

You would probably like to do this in Python if you are novice, and then you can use a library like this one (gpxpy) to explore your data.
That is a GPX parser, I believe it will provide you with the data you like to see.
In their documentation you can see that you can use it like that :
import gpxpy
import gpxpy.gpx
# Open a file
gpx_file = open('yourfile.gpx', 'r')
# Parse the file
gpx = gpxpy.parse(gpx_file)
# Iterate over the tracks
for track in gpx.tracks:
for segment in track.segments:
for pt in segment.points:
print(f'Point at ({pt.latitude},{pt.longitude}) -> {pt.elevation}')
for waypoint in gpx.waypoints:
print(f'waypoint {waypoint.name} -> ({waypoint.latitude},{waypoint.longitude})')
for route in gpx.routes:
print('Route:')
for pt in route.points:
print(f'Point at ({pt.latitude},{pt.longitude}) -> {pt.elevation}')
Once you have those points you can calculate the distances, speeds, etc. from the coordinates.

Related

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

Understanding openaddresses data format

I have downloaded us-west geolocation data (postal addresses) from openaddresses.io. Some of the addresses in the datasets are not complete i.e., some of them doesn't have info like zip_code. Is there a way to retrieve it or is the data incomplete?
I have tried to search other files hoping to find any related info. The complete dataset doesn't contain any info relate to it. City of Mesa, AZ has multiple zip codes, so it is hard to assign one to the address. Is there any way to address this problem?
This is how data looks like (City of Mesa, AZ)
LON,LAT,NUMBER,STREET,UNIT,CITY,DISTRICT,REGION,POSTCODE,ID,HASH
-111.8747353,33.456605,790,N DOBSON RD,,SRPMIC,,,,,dc0c53196298eb8d
-111.8886227,33.4295194,2630,W RIO SALADO PKWY,,MESA,,,,,c38b700309e1e9ce
-111.8867018,33.4290795,2401,E RIO SALADO PKWY,,TEMPE,,,,,9b912eb2b1300a27
-111.8832045,33.4232903,700,S EVERGREEN RD,,TEMPE,,,,,3435b99ab3f4f828
-111.8761202,33.4296416,2100,W RIO SALADO PKWY,,MESA,,,,,b74349c833f7ee18
-111.8775844,33.4347782,1102,N RIVERVIEW,,MESA,,,,,17d0cf1542c66083
Short Answer: The data incomplete.
The data in OpenAddresses.io is only as complete as the datasource it pulls from. OpenAddresses is just an aggregation of publicly available datasets. There's no real consistency between government agencies that make their data available. As a result, other sections of the OpenAddresses dataset might have city names or zip codes, but there's often something missing.
If you're looking to fill in the missing data, take a look at how projects like Pelias use multiple data sources to augment missing data.
Personally, I always end up going back to OpenStreetMaps (OSM). One could argue that OpenAddresses is better quality because it comes from official sources and doesn't try to fill in data using approximations, but the large gaps of missing data make it far less useful, at least on its own.

How to download IMF Data Through JSON in R

I recently took an interest in retrieving data in R through JSON. Specifically, I want to be able to access data through the IMF. I know virtually nothing about JSON so I will share what I [think I] know so far, and what I have accomplished.
I browsed their web page for JSON, which helped a little bit. It gave me the start point URL. Here is the web page; http://datahelp.imf.org/knowledgebase/articles/667681-using-json-restful-web-service
I managed to download (using the GET() and the fromJSON() functions) some lists, which are really bulky. I know enough about the lists that the "call" was successful, but I cannot for the life of me get actual data. So far, I have been trying to use the rawToChar() function on the "content" data but I am virtually stuck there.
If anything, I managed to create data frames that contain the codes, which I presume would be used somewhere in the JSON link. Here is what I have.
all.imf.data = fromJSON("http://dataservices.imf.org/REST/SDMX_JSON.svc/Dataflow/")
str(all.imf.data)
#all.imf.data$Structure$Dataflows$Dataflow$Name[[2]] #for the catalogue of sources
catalogue1 = cbind(all.imf.data$Structure$Dataflows$Dataflow$KeyFamilyRef,
all.imf.data$Structure$Dataflows$Dataflow$Name[[2]])
catalogue1 = catalogue1[,-2] # catalogue of all the countries
data.structure = fromJSON("http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/IFS")
info1 = data.frame(data.structure$Structure$Concepts$ConceptScheme$Concept[,c(1,4)])
View(data.structure$Structure$CodeLists$CodeList$Description)
str(data.structure$Structure$CodeLists$CodeList$Code)
#Units
units = data.structure$Structure$CodeLists$CodeList$Code[[1]]
#Countries
countries = data.frame(data.structure$Structure$CodeLists$CodeList$Code[[3]])
countries = countries[,-length(countries)]
#Series Codes
codes = data.frame(data.structure$Structure$CodeLists$CodeList$Code[[4]])
codes = codes[,-length(codes)]
# all.imf.data # JSON from the starting point, provided on the website
# catalogue1 # data frame of all the data bases, International Financial Statistics, Government Financial Statistics, etc.
# codes # codes for the specific data sets (GDP, Current Account, etc).
# countries # data frame of all the countries and their ISO codes
# data.structure # large list, with starting URL and endpoint "IFS". Ideally, I want to find some data set somewhere within this data base.
"info1" # looks like parameters for retrieving the data (for instance, dates, units, etc).
# units # data frame that indicates the options for units
I would just like some advice about how to go about retrieving any data, something as simple as GDP (PPP) for a constant year. I have been following an article in R blogs (which retrieved data in the EU's database) but I cannot replicate the procedure for the IMF. I feel like I am close to retrieving something useful but I cannot quite get there. Given that I have data frames that contain the names for the databases, the series and the codes for the series, I think it is just a matter of figuring out how to construct the appropriate URL for getting the data, but I could be wrong.
Provided in the data frame codes are the codes for the data sets I presume. Is there a way to make a call for the data for, let's say, the US for BK_DB_BP6_USD, which is "Balance of Payments, Capital Account, Total, Debit, etc"? How should I go about doing this in the context of R?

Latitude/Longitude Generation to be used as sample data

I am writing a demo web application that tracks multiple devices through my companies platform. I have the app working, but need a csv file that will simulate devices moving on a map as if they were a tracker attached to a car. The simulator works by reading 1 row of data every second (1 lat/lng point). Here is an example of the first few lines of a file that would work if the points weren't scattered across the US (the SclId is the device name).
SclId Latitude Longitude
HAT-0 44.968046 -94.420307
HAT-1 44.33328 -89.132008
HAT-2 33.755787 -116.359998
HAT-3 33.844843 -116.54911
HAT-4 44.92057 -93.44786
HAT-5 44.240309 -91.493619
HAT-0 44.968041 -94.419696
HAT-1 44.333304 -89.132027
HAT-2 33.755783 -116.360066
HAT-3 33.844847 -116.549069
HAT-4 44.920474 -93.447851
HAT-5 44.240304 -91.493768
If I had something that could create simulation data with mouse clicks it would save me a lot of time creating another program requiring me to drive around with a device and record the data to a CSV. Any help/recommendations would be greatly appreciated. If you don't understand the question please ask for clarification!
There is a website that I use to draw waypoint and download it as any format e.g., GPX, KML, JSON, CSV, etc.
https://www.alltrails.com/explore/map/map-e727fa5--12

How do you know what SRID to use for a shp file?

I am trying to put a SHP file into my PostGIS database, the the data is just a little off. I think this is because I am using the wrong SRID. The contents of the PRJ file are as follows:
GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]]
What SRID does this correlate to? And more generally, how can I look up the SRID based on the information found in the PRJ file? Is there a lookup table somewhere that lists all SRID's and their 'geogcs' equivalents?
The data imported using srid=4269 and 4326 were the exact same results.
Does this mean I'm using the wrong SRID, or is this just expected margin of error?
The shp file is from here.
To elaborate on synecdoche's answer, the SRID is sometimes called an "EPSG" code. The SRID/EPSG code is a defacto short-hand for the Well-Known-Text representations of projections.
You can do a quick search on the SRID table to see if you can find an exact or similar match:
SELECT srid, srtext, proj4text FROM spatial_ref_sys WHERE srtext ILIKE '%BLAH%'
Above was found at http://www.bostongis.com/?content_name=postgis_tut01.
You can also search on spatialreference.org for these kinds of things. The search tool is primitive so you may have to use a Google search and specify the site, but any results will show you the ESRI PRJ contents, the PostGIS SQL INSERT, and a bunch of other representations.
I think your PRJ is at: http://spatialreference.org/ref/sr-org/15/
Prj2EPSG is a small website aimed at exactly this problem; paste in the PRJ contents and it does its best to find a matching EPSG. They also have a web service API. It's not an exact science. They seem to use Lucene and the EPSG database to do text searches for matches.
The data seems to be NAD83, which has an SRID of 4269. Your PostGIS database has a spatial_ref_sys table which is the SRID lookup table.
If the data looks the same with an SRID of 4269 (NAD83) and 4326 (WGS84), then there's something wrong.
Go and download the GDAL utilities , the ogrinfo (which would spit the projection information) and ogr2ogr utilities are invaluable.
James gave already a link to spatialreference.org. That helps to find spatial reference information... I assume you did load the spatial_ref_sys.sql when you prepared your postgis instance.
And to be honest, I don't think the problem is in the PostGIS side of things.
I usually keep my data in different SRIDs in my PostGIS dbs. However, I always need to project to the output SRS. You are showing OpenStreetMap pre-rendered tiles, and I bet they have been drawn using SRID 900913 (the Google Map's modified mercator projection that now everyone uses to render).
My recommendation to you is:
1- Set the right projection in the OpenLayers code which matches whatever tiles you are reading from.
2.- Keep the data in the database in whatever SRID you want (as long as it is correct of course).
3.- Make sure the server you are using to generate the images from your data (ArcGIS Server, Mapserver, GeoServer or whatever it is) is reprojecting to that same SRS.
Everything will match.
Cheers
Use GDAL's OSR Python module to determine the code:
from osgeo import osr
srsWkt = '''GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]]'''
# Load in the projection WKT
sr = osr.SpatialReference(srsWkt)
# Try to determine the EPSG/SRID code
res = sr.AutoIdentifyEPSG()
if res == 0: # success
print('SRID=' + sr.GetAuthorityCode(None))
# SRID=4269
else:
print('Could not determine SRID')
Be sure to take a look at: http://www.epsg-registry.org/
Use the Query by Filter option and enter: North American Datum 1983.
This yields -> EPSG:6269.
Hope this works for you.