Ashley Sheridan​.co.uk

Replacing Non-Displaying MS Word Characters

Posted on

I work on a lot of CMS's, and website content generally follows the same general process:

  1. Original copy is written, and saved as an MS Word document.
  2. This file is sent on for review and final edits.
  3. Content from final version is copied and pasted directly from Word into the CMS.

Now generally, this process is fine. However, if any of a number of particular characters was used in the original document, a feature of Word converts them to a more stylised version of that character. An example would be the quotation marks, that are automatically changed to inverted quote marks. While these look fine on the document, they are not suitable for online display.

This function accepts a string of input and outputs the clean sring with the characters correctly conerted for online display, removing the ? symbols and empty boxes that are left behind otherwise.

function removeMSCrap($crap) { $find = Array(chr(128), chr(133), chr(8226), chr(145), chr(8217), chr(146), chr(8220), chr(147), chr(8221), chr(148), chr(8226), chr(149), chr(8211), chr(150), chr(8212), chr(151), chr(153), chr(169), chr(174)); $replace = Array("€", "…", "″", "'", "'", "'", "'", """, """, """, """, "•", "•", "–", "–", "—", "™", "©", "®"); $roses = str_replace($find, $replace, $crap); return $roses; }