XML Cleansing
When writing XML, all text should be converted to UTF8 if it's not coming from a known source. This blog post provides a ruby function to do the conversion.
So when you need better sorting order - use utf8_unicode_ci, and when you utterly interested in performance - use utf8_general_ci.