I'm looking for a SODA API function that would allow me to select/extract the day of week, hour, etc from a floating timestamp field. In other words, I'm looking for a SODA equivalent to SQL's DATEPART.
The API has a few function for manipulating timestamp fields, such as date_trunc_y(...), but this is not quite what I'm looking for.
Socrata has since added date extract functions that operate on timestamp columns.
See the "SoQL" Query functions listed here:
https://dev.socrata.com/docs/functions/
Socrata seems to use a Postgresql backend, so you might find the syntax comforting.
Related
I am trying to understand the uses and limitation of outer joins in tableau (tableau online in this case). I have found the beavaiour of tableau to be not what I have expected.
I have provided as detailed a description to my problems below, to avoid any ambiguotity and since I don't know where to start anymore. I hope I have not gone overboard(edits welcome).
Specifics of my use case
I am creating a join between two .csv files that have logged natural data at specific time intervals. One set has hourly time intervals, the other at intervals of minutes (which is variable due to various factors).
'Rain' data set(1):
Date and Time | Rain
01/01/2018 00:00 | 0
01/01/2018 01:00 | 0.4
01/01/2018 02:00 | 1.4
01/01/2018 03:00 | 0.4
'Fill' data set (2):
Date and Time | Fill
24/04/2018 06:04 | 78
24/04/2018 12:44 | 104
24/04/2018 18:51 | 96
25/04/2018 00:20 | 84
Unsurprisingly, I have many nulls in the data (which is not a problem to me) as:
'Rain' has a longer time series
In either data set, the majority of date times do not have an exact equivalent in the other
screenshot of data join here
What I am trying to achieve
I am trying to graph the two data sets in such a way that that I can compare the full data sets against each other, in all of the following ways:
Monthly or Yearly aggregation (average)
Hourly aggregation (average)
Exact times
Problems (and my limited assumptions)
Once graphed in tableau some values had 'null' DateTime values*.
Once graphed in tableau it appears as if many points are simply missing**
Graphing using 'Fill' time series
Graphing using 'Rain' time series
I had assumed (giving the full outer join of 'Date and Time(s)') tableau would join the data sets in chronological order with a common date time series
* I had assumed it impossible for the join conditions to have 'null' values without throwing an error. Also, the data is clean and uniform
** And this is when aggregating monthly, which I assumed would not be affected by any (if any) hourly/minute mismatches
So, finally the question #
In my reading of the online help documentation I am struggling to find a functionality that is native to tableau that can help me achieve these specific goals. I am reaching the worrying conclusion that tableau was not built for this type of 'visual analytics'.
Is there a functionality native to tableau that will allow me to combine the data in the way I described above?
Approaches I have considered
Since I have two .csv files I could combine both set so that I have the full, granular 'Date and Time' fields in one tall list.
However, I would like to find a method that is natural to tableau (online) because in future, at least some of the data wil come from a database (postgres) connection but others will likely have to remain as upload as a .csv or excel files.
Again I ask
What am I overlooking in regards how (and why) to use tableau?
I am not looking for a complete solution, but what tools could I use to achieve this?
Many thanks for any help
Your databases more specifically datasources are in a different level of granularity one is in hours(Higher Level of granularity) and other is in minutes (Lower level of granularity) but your requirmenet is different
Year/Month -- High aggregation
Hourly -- Medium agregation
Exact -- Lower aggregation
When you join two data sources on dates and times (Which would never match) you will get these kind of weird results.
Possible Solution:
Their is a tableau prep tool, use the tool and make both data sources at same level of aggregation, in you case dataset 2 will be aggregated to hour level and the join both the tables, In this case you need to check last requirement (Exact times) as I assume you are looking for the charts at minutes level
Other solution is use blending where primary datasource will be dataset 1 and secondary datasource will be dataset 2, in this case you will get the required data where tableau manages the aggregation and granularity.
Let me know how it goes
So it appears as if various solutions are available.
I want to post this now but will re-edit when I get a bit more time
Option 1
One work-around/solution I found was to create a calculated field as mentioned here and then graph everything against this time series.
This worked well for me even after having created 20+ sheets and numberous dashboards.
As mentioned below, other uses may not provide this flexibility.
Calculation:
IFNULL([Date and Time (Fill.csv)],[Date and Time (Rain.csv)]))
Option 2
This is as mentioned by matt_black a join of the data performs the job quite well. It seems less hacky and is perfect when starting from a clean slate.
I had difficulty creating a join on data sources already in use (will do more poking around on this)
Option 3 ?
As in the answer provided by Siva, blending maybe an option.
I have not confirmed this as of yet.
In order to analyze dates and times I am creating a MySQL table where I want to keep the time information. Some example analyses will be stuff like:
Items per day/week/month/year
Items per weekday
Items per hour
etc.
Now in regards to performance, what way should I record in my datatable:
date type: Unix timestamp?
date type: datetime?
or keep date information in one row each, e.g. year, month, day in separate fields?
The last one, for example, would be handy if I'm analysing by weekday; I wouldn't have to perform WEEKDAY(item.date) on MySQL but could simply use WHERE item.weekday = :w.
Based on your usage, you want to use the native datetime format. Unix formats are most useful when the major operations are (1) ordering; (2) taking differences in seconds/minutes/hours/days; and (3) adding seconds/minutes/hours/days. They need to be converted to internal date time formats to get the month or week day, for instance.
You also have a potential indexing issue. If you want to select ranges of days, hours, months and so on for your results, then you want an index on the column. For this purpose an index on a datetime is probably sufficient.
If the summaries are by hour, you might find it helpful to stored the date component in a date field and the hour in a separate column. That would be particularly helpful if you are combining hours from different days.
Whether you break out other components of the date, such as weekday and month, for indexing purposes would depend on the volume of data in the table, performance requirements, and the queries you are planning on running. I would not be inclined to do this, except as a later optimization.
The rule of thumb is: store things as they should be stored, don't do performance tweaks until you're hitting the bottleneck. If you store your date as separate fields, you'll eventually stumble upon a situation you need this date as a whole inside your database (e.g. update query for a particular range of time), and this will be like hell - condition from 3 april 2015 till 15 may 2015 would be as giant as possible.
You should keep your dates as date type. This will grant you maximum flexibility, (most probably) query readability and will keep all of your opportunities to work with them. The only thing I really can recommend is storing the same date divided into year/month/day in next columns - of course, this will bloat your database and require extreme caution on update scenarios, but this will allow you to use any variant of source data in your queries.
I've got a dataset that I want to be able to slice up by date interval. It's a bunch of scraped web data and each item has a unix-style milisecond timestamp as well as a standard UTC datetime.
I'd like to be able to query the dataset, picking out the rows that are closest to various time intervals:
e.g.: Every hour, once a day, once a week, etc.
There is no guarantee that the timestamps are going to fall evenly on the interval times, otherwise I'd just do a mod query on the timestamp.
Is there a way to do this with SQL commands that doesn't involve stored procs or some sort of pre-computed support tables?
I use the latest MariaDB.
EDIT:
The marked answer doesn't quite answer my specific question but it is a decent answer to the more generalized problem so I went ahead and marked it.
I was specifically looking for a way to query a set of data where the timestamp is highly variable and to grab out rows that are reasonably close to periodic time intervals. E.g.: get all the rows that are the closest to being on 24 hour intervals from right now.
I ended up using a modulus query to solve the problem: timestamp % interval < average spacing between data points. This occasionally grabs extra points and misses a few but was good enough for my graphing application.
And them I got sick of the node-mysql library crashing all the time so I moved to MongoDB.
You say you want 'closest to various time intervals' but then say 'every hour/day/week', so the actual implementation will depend on what you really want, but you can use a host of standard date/time functions to group records, for example count by day:
SELECT DATE(your_DateTime) AS Dt, COUNT(something) AS CT
FROM yourTable
GROUP BY DATE(your_DateTime)
Count by Hour:
SELECT DATE(your_DateTime) AS Dt,HOUR(your_DateTime) AS Hr, COUNT(something) AS CT
FROM yourTable
GROUP BY DATE(your_DateTime), HOUR(your_DateTime)
See the full list of supported date and time functions here:
https://mariadb.com/kb/en/date-and-time-functions/
Is there any concept in mysql to localize the Date time function. I need Date time to be in Ethiopian format. Or atleast a mysql conversion function that displays datetime with respect to Ethiopian Date time.
No, you can not. In MySQL, proleptic Gregorian calendar is used - and there are no other formats. That means there are no native ways to produce such conversions.
However, you can create your own output conversion. To do this, you will need to know both calendars. There are good on-line convertors which you can use to get some examples. So you will store your dates in Gregorian (i.e. MySQL internal) format, but convert them in your application like you need.
MySQL does not a conversion function for ethiopic calendar.
The most straightforward approach would be to create a date table that contains gregorian date on one column and ethiopian on the next, allowing you to easily join it on a standard date to get your ethiopian date.
It doesn't support Ethiopian calendar but you can convert Gregorian date to Ethiopic by creating mysql function and also you can customize it for other database and other programming language.
This is the GitHub link click here
Distilling this project down to the simplest of terms;
Users click a button, a record is made with a timestamp of NOW().
NOW() of course equals the time on the server of record creation.
I need to show them stats based on their timezone, not mine.
What is the best method for dealign with time zone offsets in MySql? Is there a specific field format that is designed to deal with an offset?
I will need to run things along the lines of:
SELECT DATE_FORMAT(b_stamp, '%W') AS week_day, count(*) AS b_total, b_stamp
FROM table
WHERE
(b_stamp >= DATE_SUB(NOW(), INTERVAL 7 DAY))
AND
(user_id = '$user_id') GROUP BY week_day ORDER BY b_stamp DESC
I would rather not ask the user what time zone they are in, I assume JS is the only way to pull this data out of the browser. Maybe if they are on a mobile device, and this is not a web based app, I could get it there, but that may not be the direction this goes in.
I am considering the best way may be to determine their offset, and set a variable to "server_time" +/- their_offset. This makes it appear as if the server is in a different location. I believe this would be best, as there would be no additional +/- logic I need to add to the code, muddying it and making it ugly.
On the other hand, that puts the data in the database with time stamps that are all over the board.
Suggestions?
You can use javascript to get timezone from client as follows:
var timeZone=(new Date().gettimezoneOffset()/60)*(-1);
print the variable out and test before using it. I think this will be your simplest bet.
Other than using JS, you could get the time zone of their IP address (using something like ip2location) then use MySQL's CONVERT_TZ() function.