I would like to use a piece of refactoring work that I had done to demonstrate a few basic techniques.
In a project, I would added Quoted-Printable support for a .NET vCard Reader. As I did not want to reinvent the wheel, I did surely look around MSDN and the Internet news groups to grab some piece of codes that satisfy such requirements:
- .NET codes
- Efficient
- Free
Amazingly, not many around. And I found two pieces of C# codes, one of which was from Bill Gearhart.
According to the author, this QuotedPrintable class provides "robust and fast implementation of Quoted Printable Multipart Internet Mail Encoding (MIME) which encodes every character, not just special characters for transmission over SMTP."
Here's source code in QuotedPrintable Class from Bill Gearheart, with in-source comments removed.
using System; using System.IO; using System.Text; using System.Text.RegularExpressions; using System.Security; namespace mimelib { public class QuotedPrintable { private QuotedPrintable() { } public const int RFC_1521_MAX_CHARS_PER_LINE = 75; public static string Encode(string toencode) { return Encode(toencode, RFC_1521_MAX_CHARS_PER_LINE); } public static string Encode(string toencode, int charsperline) { if (toencode == null) throw new ArgumentNullException(); if (charsperline <= 0) throw new ArgumentOutOfRangeException(); string line, encodedHtml = ""; StringReader sr = new StringReader(toencode); try { while((line=sr.ReadLine())!=null) encodedHtml += EncodeSmallLine(line); return FormatEncodedString(encodedHtml, charsperline); } finally { sr.Close(); sr = null; } } public static string EncodeFile(string filepath) { return EncodeFile(filepath, RFC_1521_MAX_CHARS_PER_LINE); } public static string EncodeFile(string filepath, int charsperline) { if (filepath == null) throw new ArgumentNullException(); string encodedHtml = "", line; FileInfo f = new FileInfo(filepath); if (! f.Exists) throw new FileNotFoundException(); StreamReader sr = f.OpenText(); try { while((line=sr.ReadLine())!=null) encodedHtml += EncodeSmallLine(line); return FormatEncodedString(encodedHtml, charsperline); } finally { sr.Close(); sr = null; f = null; } } public unsafe static string EncodeSmall(string s) { if (s == null) throw new ArgumentNullException(); string result = ""; fixed (char* pChar = s) { char* pCurrent = pChar; do { int code = (*pCurrent); result += String.Format("={0}", code.ToString("X2")); pCurrent++; } while (*pCurrent != 0); } return result; } public static string EncodeSmallLine(string s) { if (s == null) throw new ArgumentNullException(); return EncodeSmall(s + "\r\n"); } public unsafe static string FormatEncodedString(string qpstr, int maxcharlen) { if (qpstr == null) throw new ArgumentNullException(); string strout = ""; StringWriter qpsw = new StringWriter(); try { fixed(char* pChr = qpstr) { char* pCurrent = pChr; int i = 0; do { strout += pCurrent->ToString(); i++; if (i==maxcharlen) { qpsw.WriteLine("{0}=", strout); qpsw.Flush(); i=0; strout = ""; } pCurrent++; } while(*pCurrent != 0); } qpsw.WriteLine(strout); qpsw.Flush(); return qpsw.ToString(); } finally { qpsw.Close(); qpsw = null; } } static string HexDecoderEvaluator(Match m) { string hex = m.Groups[2].Value; int iHex = Convert.ToInt32(hex, 16); char c = (char) iHex; return c.ToString(); } static string HexDecoder(string line) { if (line == null) throw new ArgumentNullException(); //parse looking for =XX where XX is hexadecimal Regex re = new Regex( "(\\=([0-9A-F][0-9A-F]))", RegexOptions.IgnoreCase ); return re.Replace(line, new MatchEvaluator(HexDecoderEvaluator)); } public static string DecodeFile(string filepath) { if (filepath == null) throw new ArgumentNullException(); string decodedHtml = "", line; FileInfo f = new FileInfo(filepath); if (! f.Exists) throw new FileNotFoundException(); StreamReader sr = f.OpenText(); try { while((line=sr.ReadLine())!=null) decodedHtml += Decode(line); return decodedHtml; } finally { sr.Close(); sr = null; f = null; } } public static string Decode(string encoded) { if (encoded == null) throw new ArgumentNullException(); string line; StringWriter sw = new StringWriter(); StringReader sr = new StringReader(encoded); try { while((line=sr.ReadLine())!=null) { if (line.EndsWith("=")) sw.Write(HexDecoder(line.Substring(0, line.Length-1))); else sw.WriteLine(HexDecoder(line)); sw.Flush(); } return sw.ToString(); } finally { sw.Close(); sr.Close(); sw = null; sr = null; } } } }
The code from Bill Gearhart looked elegant and efficient, however, there are 3 catches:
- The C# code was apparently translated from C code. And the code require "unsafe" and Security namespace.
- The code can not handle multi-byte character.
- Every character will be encoded into 3 bytes of quoted printable character. While this may be desired in Email MIME encoding, however, in vCard, the conventional practice is to encode only those out of range of the ASCII printable characters (33-126).
I want pure .NET C# codes, otherwise, I would just grab a piece of C code, compile it to a dll and then use PInvoke to use the QuotedPrintable function.
The algorithm of producing Quoted-Printable is not a rocket science and I like the overall structure of the C# code from Bill Gearhart, so I would just do refactoring over Bill Gearheart's code, rather than start everything from scratch.
Handling quoted printable encoding requires byte handling, and actually .NET Framework has pretty rich supports for byte handling, so there's no need to use PChar which result in sacrificing the safety net of .NET Framework.
Here's the code which was refactored or translated from the above code.
using System; using System.IO; using System.Text; using System.Text.RegularExpressions; namespace Fonlow.VCard { /// <summary> /// Provide encoding and decoding of Quoted-Printable. /// </summary> public class QuotedPrintable { private QuotedPrintable() { } /// <summary> /// // so including the = connection, the length will be 76 /// </summary> private const int RFC_1521_MAX_CHARS_PER_LINE = 75; /// <summary> /// Return quoted printable string with 76 characters per line. /// </summary> /// <param name="textToEncode"></param> /// <returns></returns> public static string Encode(string textToEncode) { if (textToEncode == null) throw new ArgumentNullException(); return Encode(textToEncode, RFC_1521_MAX_CHARS_PER_LINE); } private static string Encode(string textToEncode, int charsPerLine) { if (textToEncode == null) throw new ArgumentNullException(); if (charsPerLine <= 0) throw new ArgumentOutOfRangeException(); return FormatEncodedString(EncodeString(textToEncode), charsPerLine); } /// <summary> /// Return quoted printable string, all in one line. /// </summary> /// <param name="textToEncode"></param> /// <returns></returns> public static string EncodeString(string textToEncode) { if (textToEncode == null) throw new ArgumentNullException(); byte[] bytes = Encoding.UTF8.GetBytes(textToEncode); StringBuilder builder = new StringBuilder(); foreach (byte b in bytes) { if (b != 0) if ((b < 32) || (b > 126)) builder.Append(String.Format("={0}", b.ToString("X2"))); else { switch (b) { case 13: builder.Append("=0D"); break; case 10: builder.Append("=0A"); break; case 61: builder.Append("=3D"); break; default: builder.Append(Convert.ToChar(b)); break; } } } return builder.ToString(); } private static string FormatEncodedString(string qpstr, int maxcharlen) { if (qpstr == null) throw new ArgumentNullException(); StringBuilder builder = new StringBuilder(); char[] charArray = qpstr.ToCharArray(); int i = 0; foreach (char c in charArray) { builder.Append(c); i++; if (i == maxcharlen) { builder.AppendLine("="); i = 0; } } return builder.ToString(); } static string HexDecoderEvaluator(Match m) { if (String.IsNullOrEmpty(m.Value)) return null; CaptureCollection captures = m.Groups[3].Captures; byte[] bytes = new byte[captures.Count]; for (int i = 0; i < captures.Count; i++) { bytes[i] = Convert.ToByte(captures[i].Value, 16); } return UTF8Encoding.UTF8.GetString(bytes); } static string HexDecoder(string line) { if (line == null) throw new ArgumentNullException(); Regex re = new Regex("((\\=([0-9A-F][0-9A-F]))*)", RegexOptions.IgnoreCase); return re.Replace(line, new MatchEvaluator(HexDecoderEvaluator)); } public static string Decode(string encodedText) { if (encodedText == null) throw new ArgumentNullException(); using (StringReader sr = new StringReader(encodedText)) { StringBuilder builder = new StringBuilder(); string line; while ((line = sr.ReadLine()) != null) { if (line.EndsWith("=")) builder.Append(line.Substring(0, line.Length - 1)); else builder.Append(line); } return HexDecoder(builder.ToString()); } } } }
Comparing two piece codes, you will see the refactoring was based on a few measurements:
- Wherever you see PChar things like "char*", then .NET Framework's char array, byte array and Stream can be used. For example, you may use functions like:
Encoding.UTF8.GetBytes(textToEncode);
String.ToCharArray(text);
- To rebuild a string for encoding or decoding, StringBuilder is handy and efficient.
As you can see, conforming to .NET methodology of programming may result in shorter code and simpler algorithm.
Here's the code for Unit Test:
[Test]
public void TestQuotedPrintable()
{
string text = "Quoted-printable, or QP encoding, is an encoding using printable characters (i.e. alphanumeric and the equals sign \" = \") to transmit 8-bit data over a 7-bit data path. It is defined as a MIME content transfer encoding for use in Internet e-mail." + Environment.NewLine +
"Any 8-bit byte value may be encoded with 3 characters, an \" = \" followed by";
string encodedText = QuotedPrintable.Encode(text);
string decodedText = QuotedPrintable.Decode(encodedText);
Assert.IsTrue(text == decodedText);
}
[Test]
public void TestQuotedPrintableUnicode()
{
string text = "Quoted-printable, or QP encoding, is an encoding using printable characters (i.e. alphanumeric and the equals sign \" = \") to transmit 8-bit data over a 7-bit data path. It is defined as a MIME content transfer encoding for use in Internet e-mail." + Environment.NewLine +
"Any 8-bit byte value may be encoded with 3 characters, an \" = \" followed by中文";
string encodedText = QuotedPrintable.Encode(text);
string decodedText = QuotedPrintable.Decode(encodedText);
Assert.IsTrue(text == decodedText);
}
[Test]
public void TestQuotedPrintableEscape()
{
string text = "Quoted-printable, or QP encoding,=0D=0A is an encoding using printable characters (i.e. alphanumeric and the equals sign \" = \") to transmit 8-bit data over a 7-bit data path. It is defined as a MIME content transfer encoding for use in Internet e-mail." + Environment.NewLine +
"# Le PS doit discuter avec François Bayrou,=0A=0D=== selon Moscovici Les valeurs suivies à la Bourse de Paris à la mi-séance L'acidification des océans rend les îles plus vulnérables";
string encodedText = QuotedPrintable.Encode(text);
string decodedText = QuotedPrintable.Decode(encodedText);
Assert.IsTrue(text == decodedText);
}
The full source code including unit tests is included in this link of a VS 2005 solution for vCard parser.Hint:
The codes may still be improved for better performance.