HTML import to wordpress, set publish date to match filename - html

I have a collection of 16,000 html files that I'm uploading to wordpress. I'm using HTML import 2. The date for the articles is located in two spots, neither of which can be detected by the plugin:
1.) each file is titled mmddyyxxxxxxx.htm
2.) the date exists in the same format in a paragraph at the bottom of the page, but surrounded by varying text. format: (yyyy, mm, dd)
and ideas?

Easiest solution here would be to use the "set timestamps to last time the file was modified." option while importing. Since the filenames have stamps in their dates, you can write a simple script to make the timestamps match. This can be done in bash, or PHP with the touch() function.
You may need to break up your files in usable groups, since glob() has a limit, but, here's a simple example to accomplish this:
<?php
# change mod+access times based on filenames
$files = glob("myfiles/*.htm");
foreach( $files as $file ) {
$temp = pathinfo( $file ); // may have relative path in it
$name = $temp['filename']; // just "mmddyyxxxxxxx" at this point
// assuming date format in filenames are fixed-lengths, you can rebuild
// timestamp to yyyy-mm-dd format with this:
$date = sprintf("20%s-%s-%s", // cheap trick to start years with 20
substr( $name, 4, 2 ),
substr( $name, 2, 2 ),
substr( $name, 0, 2 )
);
$stamp = strtotime( $name ); // timestamp
touch( $file, $stamp, $stamp ); // sets both mod + access time
}
?>
In case that your date format in the filename isn't fixed, you may need to get more creative.

Related

Undesirable rounding off of numbers in generated PHPexcel

I am generating a downloadable excel file from mysql database using phpexcel. In that, there is a field having 18 digit number, which in the mysql, it is defined as Bigint. The number is defined as a hyperlink in the code. Now, there is the following problem -
The last 4 digits of the number hyperlink are displayed as 0000 although on clicking the number hyperlink, it is opening correctly. Example 860814069447613475 is shown as 860814069447610000 in the generated excel
Here is my code -
$objPHPExcel->getActiveSheet()
->getStyle('A'.(string)$n)
->getNumberFormat()
->setFormatCode(
PHPExcel_Style_NumberFormat::FORMAT_NUMBER
);
$n = 2;
while ($row = mysqli_fetch_array($result)){
$objPHPExcel->getActiveSheet()->setCellValue('A'.(string)$n,
$row['t_id']);
$objPHPExcel->getActiveSheet()->setCellValue('B'.(string)$n,
$row['t_text']);
$objPHPExcel->getActiveSheet()->setCellValue('C'.(string)$n,
$row['user_name']);
$objPHPExcel->getActiveSheet()->setCellValue('D'.(string)$n,
$row['description']);
$objPHPExcel->getActiveSheet()->setCellValue('E'.(string)$n
,$row['time']);
$objPHPExcel->getActiveSheet()->setCellValue('F'.(string)$n,
$row['place']);
$objPHPExcel->getActiveSheet()->getStyle("A$n:F$n")->getAlignment()-
>setWrapText(true);
$objPHPExcel->getActiveSheet()
->getCell('A'.(string)$n)
->getHyperlink()
->setUrl('http://t.com/'.$row['user_name'].'/status/' . $row['t_id']);
// Config
$link_style_array = [
'font' => [
'color' => ['rgb' => '0000FF'],
'underline' => 'single'
]
];
// Set it!
$objPHPExcel->getActiveSheet()->getStyle('A'.(string)$n)->applyFromArray($link_style_array);
$n++;
}
Found code which worked for me -
setCellValueExplicit('A'.(string)$n, $row['t_id'],
PHPExcel_Cell_DataType::TYPE_STRING);

How to remove duplicates in a CSV file?

I have a large file with a bunch of movie data, including a unique ID for each movie. although every ID on each line is unique, some lines include duplicate movie data.
For example:
ID,movie_title,year
1,toy story,1995
2,jumanji,1995
[...]
6676,toy story,1995
6677,jumanji,1995
In this case, I'd like to remove completly the 6677,toy story,1995 and 6677,jumanji,1995 lines. This occurs with more than just one movie, so I can't do a simple find and replace. I've tried to use Sublime Text's Edit>Permute Lines>Unique feature and it works fine, but I end up losing the first column of the data (the unique IDs).
can anyone recommend a better way to get rid of these duplicate lines?
The following perl script does the trick. Effectively, all occurrences of a movie but the first will be deleted from the list of entries. Do not forget to add the file paths. Execute with 'perl ' from the command line (mac os ships with perl):
use IO::File;
my (
$curline
, $fh_in
, $fh_out
, $dict
, #fields
, $key
, $value
);
$fh_in = new IO::File("<..."); # add input file name
$fh_out = new IO::File(">..."); # add output file name
while (<$fh_in>) {
chomp;
$curline = $_;
#fields = split ( /,/, $curline );
($key, $value) = (join(',', #fields[1..$#fields]), $fields[0]);
if (!exists($$dict{$key})) {
$$dict{$key} = 1;
$fh_out->print("$curline\n");
}
}
$fh_out->close();
exit(0);
Explanation
The code processes the input line by line
It maintains an hash of movie identifiers seen.
Movie identifiers are defined as the line content without the id number and the immediately following comma.
A line is printed iff the movie identifier has not yet been seen.
Caveat
Evidently, this solution is not robust against spelling errors.
A certain degree of error tolerance can be added by normalizing keys. Example (case-insensitive matching):
my $key_norm; # move that out of the loop in production code
$key_norm = lc($key);
if (!exists($$dict{$key_norm})) {
$$dict{$key_norm} = 1;
$fh_out->print("$curline\n");
}
Neither elegance nor performance had a say in authoring this code ;-)

Saving text box input to XML or txt file in HTML

I'm working on a HTML page project where I have 2 text boxes and basically I want to save the input data that the user put in the text boxes. What we did in my C# class was that we saved all input into a XML file so I'm assuming there is a similar way? Either to a XML or some other file that can store text?
Anyone that knows a solution?
I recommend the following php script
<?php
// check that form was submitted
// (you'll need to change these indices to match your form field names)
if( !empty( $_POST['firstname'] ) && !empty( $_POST['lastname'] ) ){
// remove html tags from submission
// (since you don't want them)
$firstname = strip_tags( $_POST['firstname'] );
$lastname = strip_tags( $_POST['lastname'] );
// create the date
// (you can change the format as desired)
$date = date( 'Y-m-d' );
// create an array that holds your info
$record = array( $firstname,$lastname,$date );
// save the record to your .txt file (I still recommend JSON)
$json = json_encode( $record );
$file = '/_server_/path/to/yourfile.txt';
file_put_contents( $json,$file );
}

Freaking behaviour with Zend_Date and Cronjob

We have an Cron-Script, which detects - if some users got kicked out of our application.
We can detect this, if a specific value is 1 - but in the the stream, no new entries get set.
Scripts run every hour. Mostly non are detected. But since 2012-10-31 23:59:03 every user got detected. If i run the script on my local maschine or even on the same machine as the cron runs. Everything got handled as it should.
First things first, our script:
require_once ('cron_init.php');
ini_set('date.timezone', 'Europe/Berlin');
ini_set('max_execution_time', 30);
ini_set('memory_limit', -1);
error_reporting(E_ALL);
ini_set("display_errors", 1);
Zend_Date::setOptions(array('fix_dst' => true));
$userinfos = new Repricing_Dbservices_Userinfos();
$users = $userinfos->getUsersForRepricing();
$repricingstream = new Repricing_Dbservices_Repricingstream();
$error = new Repricing_Dbservices_Error();
if($users!==false AND count($users)>0){
$counter = 0;
$errCounter = 0;
$jetzt = new Zend_Date();
$jetzt->setTimezone('Europe/Berlin');
$jetzt = $jetzt->get(Zend_Date::TIMESTAMP);
foreach($users as $user){
$stream = $repricingstream->getStreamLimit($user);
$last = new Zend_Date($stream);
$last->setTimezone('Europe/Berlin');
$last = $last->get(Zend_Date::TIMESTAMP);
$diff = (($jetzt-$last)/60);
$error->setError(1, 'DIED', $diff, $user);
if($diff > 50 ){
$errCounter++;
$userinfos->setUserFree($user);
$error->setError(1, 'DIED', 'ANSTOSSEN', $user);
}
$counter++;
}
$error->setError(1, $errCounter, 'ANSTOSSEN_ALL', 'ALL');
}
Usually $diff >= 0 AND $diff <= 4 but, we detected, that $diff is always round about 381595. If we run it out of cron $diff is, as it should.
We also detected, that $jetzt is now ( as it should ) only $last is much more later. 381595 later. But that shouldnt be. The last stream-date is fully normal. We cant understand this behaviour of. Zend_Date with cron. Bevor 2012-10-21 23:59:03 the script run 2 weeks as it should. We cant explain, how come. Can you?
Consider this:
$right = new Zend_Date('2012-11-01 12:12:12', Zend_Date::ISO_8601);
var_dump( $right->getIso() ); // 2012-11-01T12:12:12+00:00
var_dump( $right->getTimestamp() ); // 1351771932
$wrong = new Zend_Date('2012-11-01 12:12:12', null, 'en_US');
var_dump( $wrong->getIso() ); // 2012-01-11T12:12:12+00:00
var_dump( $wrong->getTimestamp() ); // 1326283932
Now the real freaky part: on my PC it's the second behavior that is default - i.e., when no additional params are given to Zend_Date constructor.
The point is, Zend_Date is a bit... too helpful when trying to parse datetime strings. For example, it's taking the locale into account - but the locale both of server and client! And if the string cannot be parsed within this locale's rules, it silently gives up - and tries to use another rule.
That's why 2012-10-29 was parsed as October, 29 (despite of what locale suggested, as there's no 29th month) - but 2012-11-01 became January, 11 - and messed up your script big time. )

Best way to find illegal characters in a bunch of ISO-889-1 web pages?

I have a bunch of html files in a site that were created in the year 2000 and have been maintained to this day. We've recently began an effort to replace illegal characters with their html entities. Going page to page looking for copyright symbols and trademark tags seems like quite a chore. Do any of you know of an app that will take a bunch of html files and tell me where I need to replace illegal characters with html entities?
You could write a PHP script (if you can; if not, I'd be happy to help), but I assume you already converted some of the "special characters", so that does make the task a little harder (although I still think it's possible)...
Any good text editor will do a file contents search for you and return a list of matches.
I do this with EditPlus. There are several editors like Notepad++, TextPad, etc that will easily help you do this.
You do not have to open the files. You just specify a path where the files are stored and the Mask (*.html) and the contents to search for "©" and the editor will come back with a list of matches and when you double click, it opens the file and brings up the matching line.
I also have a website that needs to regularly convert large numbers of file names back and forth between character sets. While a text editor can do this, a portable solution using 2 steps in php was preferrable. First, add the filenames to an array, then do the search and replace. An extra piece of code in the function excludes certain file types from the array.
Function listdir($start_dir='.') {
$nonFilesArray=array('index.php','index.html','help.html'); //unallowed files & subfolders
$filesArray = array() ; // $filesArray holds new records and $full[$j] holds names
if (is_dir($start_dir)) {
$fh = opendir($start_dir);
while (($tmpFile = readdir($fh)) !== false) { // get each filename without its path
if (strcmp($tmpFile, '.')==0 || strcmp($tmpFile, '..')==0) continue; // skip . & ..
$filepath = $start_dir . '/' . $tmpFile; // name the relative path/to/file
if (is_dir($filepath)) // if path/to/file is a folder, recurse into it
$filesArray = array_merge($filesArray, listdir($filepath));
else // add $filepath to the end of the array
$test=1 ; foreach ($nonFilesArray as $nonfile) {
if ($tmpFile == $nonfile) { $test=0 ; break ; } }
if ( is_dir($filepath) ) { $test=0 ; }
if ($test==1 && pathinfo($tmpFile, PATHINFO_EXTENSION)=='html') {
$filepath = substr_replace($filepath, '', 0, 17) ; // strip initial part of $filepath
$filesArray[] = $filepath ; }
}
closedir($fh);
} else { $filesArray = false; } # no such folder
return $filesArray ;
}
$filesArray = listdir($targetdir); // call the function for this directory
$numNewFiles = count($filesArray) ; // get number of records
for ($i=0; $i<$numNewFiles; $i++) { // read the filenames and replace unwanted characters
$tmplnk = $linkpath .$filesArray[$i] ;
$outname = basename($filesArray[$i],".html") ; $outname = str_replace('-', ' ', $outname);
}