R Date Formatting from RVEST to Short Date - html

I have a dataset which has differing types of date formats and I need to standardize them.
URL | Last_Updated | Reviewed | Date_Found | Crawl_Date
URL.html | January 21, 2016 | April 11, 2016 | 2019-02-11T03:50:01Z/ | 2021-03-04 01:27:08
And secondly, I need to get a "max" field, but it has to prioritize the last_updated date field. I'm assuming it'll end up being some kind of if/then statement but I'm not sure how to proceed without a standardized date format for the 4 dates I have. Ultimately, I need to use this "max_date" to identify the number of days between the crawl_date and the last_updated date.
So far, I have this -
URLtoDate$date_diff <- as.Date(as.character(URLtoDate$crawl_date), format="%Y/%m/%d")- as.Date(as.character(URLtoDate$max.date), format="%Y/%m/%d")
But, as my dates aren't in the same format, I'm getting NAs across the dataset. Any help is greatly appreciated.

Related

What data type to be used for storing dates like '01-05'?

I want to store the list of all public holidays in a year.
Then the employees can avail the leaves for these days. The leave details will be stored in another table.
My issue is that I need to have a table for listing all the public holidays and the corresponding leaves.
Ex- 1st May must be listed for May Day.
However, I can't give date here (01-05-2014) because the same dates (01-05) will occur in each year.
So, how can I store these dates in a mysql table.
My current table structure is:
mysql> desc table;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| leaveCode | text | NO | | NULL | |
| date | date | YES | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
Here, I use date as the data type for the date.
But, It's not working.
When I tried inserting the values, the dates field is not getting populated.
<pre>
INSERT INTO table VALUES ('', 'May Day', 'MD', '01-05');
</pre>
I need a method to store these dates in the format - DD-MM in the table (Ex: 01-05).
Can someone please help me.
I'd rather prefer to store the day and the month in separate numeric fields, just because if you store a date, you must have to choose an arbitrary year with 366 days...
The concept of Date is a moment in time. You're not storing a date, although you could use the Date representation to store your infomation.
BUT, if you really want to use the Date column, you can choose some arbitrary leap year and set your "date" as YYYY-MM-DD
You can't store a holiday list as just month and day, because holidays vary from year-to-year on the calendars most commonly used for business. Religious holidays are notorious for wandering around the calendar. Consider:
Easter
Passover
Ramadan
Add to that holidays such as the Chinese New Year, American Thanksgiving, and American President's Day and you might start to think that calendar days have nothing to do with holidays. Okay, there are a few -- New Year's Day, May Day, Fourth of July, and Bastille Day. In the United States, there is a tendency to move national holidays onto a Monday, when the holiday would occur on the weekend.
You need a holiday table or calendar table with holiday information. When I've needed such a beast in the past, I've just used the holidays list provided with gmacs. I'm pretty sure there are other sources on the web for this information.

Store multiple ranges in mysql table to make queries faster/easier

I have a data with multiple time ranges, for e.g. consider following columns
| from1 | to1 | from2 | to2 | from3 | to3 |
| 06:00 | 07:30 | 09:30 | 12:30 | 13:30 | 15:45 |
| 05:00 | 06:30 | 08:15 | 14:40 | 16:30 | 18:25 |
Now if I want to search for a time say 08:30, I would have to add 3 clauses in the query to match if the input occurs in range any out of all three from-to pairs.
In above case, it would return second row as 08:30 lies in the second from-to pair.
I want to know what would be the best practice to do this? It is ok even if I have to change my data model and not store those ranges in columns like I shown above. so that I can quickly and easily search through thousands of records
I can't think of a better alternative to this, please suggest.
I found this when i was searching for this problem.
It seems like your data model is not normalized. You should consider morjas suggestion about creating an additional table.
Below is a really ugly query that checks whether a date is in any of the three ranges, and then returns the matching rate.
select case
when date '2010-12-05' between range1_from and range1_to then range1_rate
when date '2010-12-05' between range2_from and range2_to then range2_rate
when date '2010-12-05' between range3_from and range3_to then range3_rate
end as rate
from events
where date '2010-12-05' between range1_from and range1_to
or date '2010-12-05' between range2_from and range2_to
or date '2010-12-05' between range3_from and range3_to;
ref.SQL query for finding a value in multiple ranges
Store your times as DATETIME and use BETWEEN
So:
where myDate BETWEEN '2011-03-18 08:30' AND '2011-09-18 09:00'

manipulate string to extract date-time

I have MeasureDateTime (nvarchar(50)) column in my SQL Server table.
I need to get
|measureFilePath | MeasureDateTime | MeasureName |
| 12Nc121 |Thu Jun 19 15:00:05 2011| Annulus 4th RMS (Waves) |
| 12NB121 |Thu Jul 19 15:38:05 2012| 3.0mm 4th RMS (Waves) |
| 12NXc121 |Tue May 15 12:13:02 2012| BC (mm) |
| 12NA121 |Tue May 15 12:13:02 2012| CT (mm) |
| 12Nc111 |Tue May 15 12:13:02 2012| Reference Angle (deg.) |
| 12Nc231 |Wed May 15 12:03:02 2013| Temperature (C) |
I want to get last 6 months of data using the MeasureDateTime column, for example.
But the problem is MeasureDateTime is of nvarchar type.
Anyone know how to do this? Is it even possible?
Try
CONVERT(datetime,SUBSTRING(MeasureDateTime,4,100),101)
to convert the varchar column into a datetime format.
SUBSTRING(MeasureDateTime,4,100) removes the weekday part of the date string and the CONVERT() call with format 101 will accept the US-type date format.
The select could look like
SELECT * FROM table WHERE DATEADD(month, 6, CONVERT(datetime,SUBSTRING(MeasureDateTime,4,100),101)) > getdate()
I ignored the inserted time information before. Here another attempt on the date conversion:
convert(datetime,substring(stuff(dt,11,0,right(dt,5)),4,21),101)
This approach relies on equal string lengths. I don't know for certain whether that is a given here.
This is a bit convoluted but may work:
WHERE DATEDIFF(Month, CAST(RIGHT(MeasureDateTime, LEN(MeasureDateTime) - 4) as datetime), GETDATE()) <= 6
Try using a trigger for this table when inserting. Try the solution provided by #cars10 to get the date and store it in a datetime field. Always keep datetime fields as datetime rather than varchar fields. Later on you my have to retrieve records based on date and this datetime field will prove to be useful.

How to store very old dates in database?

It's not actually a problem I'm having, but imagine someone's building a website about the medieval times and wants to store dates, how would they go about it?
The spec for MySQLs DATE says it won't go below the year 1000. Which makes sense when the format is YYYY-MM-DD. How can you store information about the death of Kenneth II of Scotland in 995? Of course you can store it as a string, but are there real date-type options?
Actually, you can store dates below year 1000 in MySQL despite even documentation clarification:
mysql> describe test;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| birth | date | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
-you still need to input year in YYYY format:
mysql> insert into test values (1, '0995-03-05');
Query OK, 1 row affected (0.02 sec)
mysql> select * from test;
+------+------------+
| id | birth |
+------+------------+
| 1 | 0995-03-05 |
+------+------------+
1 row in set (0.00 sec)
-and you'll be able to operate with this as a date:
mysql> select birth + interval 5 day from test;
+------------------------+
| birth + interval 5 day |
+------------------------+
| 0995-03-10 |
+------------------------+
1 row in set (0.03 sec)
As for safety. I've never faced a case when this will not work in MySQL 5.x (that, of cause, does not mean that it will 100% work, but at least it is reliable with certain probability)
About BC dates (below Christ). I think that is simple - in MySQL there's no way to store negative dates as well. I.e. you will need to store year separately as a signed integer field:
mysql> select '0001-05-04' - interval 1 year as above_bc, '0001-05-04' - interval 2 year as below_bc;
+------------+----------+
| above_bc | below_bc |
+------------+----------+
| 0000-05-04 | NULL |
+------------+----------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+--------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------+
| Warning | 1441 | Datetime function: datetime field overflow |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)
But I think, in any case (below/above year 0) it's better to store date parts as integers in that case - this will not rely to undocumented feature. However, you will need to operate with those 3 fields not as the dates (so, in some sense that is not a solution to your problem)
Choose a dbms that supports what you want to do. Among other free database management systems, PostgreSQL supports a timestamp range from 4713 BC to 294276 AD.
If you break up the date into separate columns for year, month, and day, you also need more tables and constraints to guarantee that values in those columns represent actual dates. If those columns let you store the value {2013, 2, 29}, your table is broken. A dbms that supports dates in your range entirely avoids this kind of problem.
Other problems you might run into
Incorrect date arithmetic on dates that are out of range.
Incorrect locale-specific formatting on dates that are out of range.
Surprising behavior from date and time functions on dates that are out of range.
Gregorian calendar weirdness.
Gregorian calendar weirdness? In Great Britain, the day after Sep 2, 1752 is Sep 14, 1752. PostgreSQL documents their rationale for ignoring that as follows.
PostgreSQL uses Julian dates for all date/time calculations. This has
the useful property of correctly calculating dates from 4713 BC to far
into the future, using the assumption that the length of the year is
365.2425 days.
Date conventions before the 19th century make for interesting reading,
but are not consistent enough to warrant coding into a date/time
handler.
Sadly, I think that currently the easiest option is to store year, month and day in separate fields with year as smallint.
To quote from http://dev.mysql.com/doc/refman/5.6/en/datetime.html
For the DATE and DATETIME range descriptions, “supported” means that although earlier values might work, there is no guarantee.
So there's a good change that a wider range will work given a sufficiently configured MySQL installation.
Make sure not to use TIMESTAMP, which seems to have a non-negative range.
The TIMESTAMP data type is used for values that contain both date and time parts. TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC.
Here is a JavaScript example how far before the UNIX epoch(1) you can get with 2^36 seconds * -1000 (to get to milliseconds for Javascript).
d = new Date((Math.pow(2, 36) - 1) * -1000)
Sun May 13 -208 18:27:45 GMT+0200 (Westeuropäische Sommerzeit)
So I would suggest to store historical dates as BIGINT relative to the epoch.
See http://dev.mysql.com/doc/refman/5.6/en/integer-types.html for MxSQL 5.6.
(1)
epoch = new Date(0)
Thu Jan 01 1970 01:00:00 GMT+0100 (Westeuropäische Normalzeit)
epoch.toUTCString()
"Thu, 01 Jan 1970 00:00:00 GMT"

I need a template using SQL to have the first column be each date in a date range including those without data for that date

I am not even sure this is possible but here goes. I am developing a report that shows units into the dock and units received into Oracle. I need to show each day (every day) how many were received at the dock and how many were received into Oracle. The trick is including the days where nothing was received at all.
Date | Dock Count | Oracle Count
------------------------------------------
Monday 11/1 | 12 | 10
Tuesday 11/2 | 5 | 7
Wednesday 11/3| 0 | 0
Thursday 11/4 | 22 | 10
Friday 11/5 | 0 | 12
So, is there a way to do this? I thought about taking a list of dates from another table that I know has data for each day but that just seems like an inefficient way of doing this.
Note: I am using MySQL
Due to Lieven's comment I realize this is inevitably not possible at the current time. Referencing another table that has all of the required dates is the only way. Therefore I searched around as I was sure I wasn't the only one looking to create a table like this.
The current answer to this issue was that I edited the procedure provided in the response to this user's question Get a list of dates between two dates
I am only placing this as an answer in the event someone else searches the database having the same problem I did.