Matthew Pavey's Blog

Matthew Pavey is a follower of Christ, devoted husband and father, avid reader, and software developer.

Thursday, April 9, 2015

ASP.Net C# Convert HTML to PDF using iTextSharp

The following article on Stack Overflow was a life-saver in a recent project where we needed to convert HTML to PDF.

We were using C# and needed to convert a well-formed string of HTML to a PDF file. Using iTextSharp (5.5.5) and itextsharp.xmlworker (5.5.5), both available in the NuGet Package Manager in Visual Studio 2013, and with a great working example from the Stack Overflow answer we ended up with the following:

public static ReturnValue ConvertHtmlToPdfAsBytes(string HtmlData) < // variables ReturnValue Result = new ReturnValue(); // do some additional cleansing to handle some scenarios that are out of control with the html data HtmlData = HtmlData.ReplaceValue("
", "
"); // convert html to pdf try < // create a stream that we can write to, in this case a MemoryStream using (var stream = new MemoryStream()) < // create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF using (var document = new Document()) < // create a writer that's bound to our PDF abstraction and our stream using (var writer = PdfWriter.GetInstance(document, stream)) < // open the document for writing document.Open(); // read html data to StringReader using (var html = new StringReader(HtmlData)) < XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html); >// close document document.Close(); > > // get bytes from stream Result.Data = stream.ToArray(); // success Result.Success = true; > > catch (Exception ex) < Result.Success = false; Result.Message = ex.Message; >// return return Result; >

The ReturnValue class was simply a helper class that looks like this:
// return value class public class ReturnValue < // constructor public ReturnValue() < this.Success = false; this.Message = string.Empty; >// properties public bool Success = false; public string Message = string.Empty; public Byte[] Data = null; > 


We also had another method to physically create the PDF file in case you didn't want just the bytes array directly, for example:

public static ReturnValue ConvertHtmlToPdfAsFile(string FilePath, string HtmlData) < // variables ReturnValue Result = new ReturnValue(); try < // convert html to pdf and get bytes array Result = ConvertHtmlToPdfAsBytes(HtmlData: HtmlData); // check for errors if (!Result.Success) < return Result; >// create file File.WriteAllBytes(path: FilePath, bytes: Result.Data); // result Result.Success = true; > catch(Exception ex) < Result.Success = false; Result.Message = ex.Message; >// return return Result; > 

It's important to remember that in order for this to work, you must have valid well-formed HTML; otherwise you can certainly expect for iTextSharp to throw an error. But if you have control over the HTML that you need to convert, this solution is great, and produces very nice PDF files.

It's worth noting that in our case we didn't need to pass the CSS in separately using the overloaded ParseXHtml constructor, ParseXHtml(PdfWriter writer, Document doc, Stream inp, Stream inCssFile), because we were including our CSS styles in our HTML data string instead, which for our solution was a bit cleaner.

Matt Pavey is a Microsoft Certified software developer who specializes in ASP.Net, VB.Net, C#, AJAX, LINQ, XML, XSL, Web Services, SQL, jQuery, and more. Follow on Twitter @matthewpavey