How can I convert HTML content to pdf with images using ItextSharp in c#

In this blog I will explain the following things

Conversion of html to pdf using itextsharp.
Add css files while generating the pdf.
Convert html Images or canvas to the pdf.
Conversion of pdf base64 string to blob using javascript.

About ItextSharp

ItextSharp is a nuget package used to generated pdf's.
It supports various functionalities to generate pdf like "convert html to pdf".
It also supports custom tag processing which helps to "convert html with images".

In this blog i will use my previous blog output as reference, For more information click here

Html Output

Conversion of html to pdf using itextsharp

Nuget Packages:

iTextSharp(v5.5.13.1)
itextsharp.xmlworker(v5.5.13.1)

Create a controller add "Index" action method with "Index" view.

Index.cshtml file

@{
    ViewBag.Title = "Index";
    Layout = "~/Views/Shared/_Layout.cshtml";
}

<h2>Index</h2>

<h4 class="text-primary">Chart rendered from asp.net mvc</h4>
<div style="width: 900px; height: 800px">
    <canvas id="scatterChart" name="Img1"></canvas>
    <button id="downloadPdf">Generate pdf</button>
</div>

Index action method

 // GET: Chart
        public ActionResult Index()
        {
            return View();
        }

Note

There are some limitations with itextsharp so for that reason i created another view "pdf.cshtml" i will explain about the limitations at the end.

pdf.cshtml file

@{
    ViewBag.Title = "pdf";
}

<h2>Index</h2>

<h4 class="text-primary">Chart rendered from asp.net mvc</h4>
<img id="Img1" src="" />

Explanation

"text-primary" is a bootstrap class which is used here to explain us how to generate pdf with css
I replaced the canvas tag with img tag to render in pdf.

In your controller add the below post method to get pdf based base64 string

 [HttpPost]
        public JsonResult GeneratePdf(DownloadPdf downloadPdf)
        {
            PdfImages = downloadPdf.PdfImages;
            string htmlString = HtmlToStringConverter.RenderViewToString(this, "pdf", null);
            var tagProcessors = (DefaultTagProcessorFactory)Tags.GetHtmlTagProcessorFactory();
            tagProcessors.RemoveProcessor(HTML.Tag.IMG); // remove the default processor
            tagProcessors.AddProcessor(HTML.Tag.IMG, new CustomImageTagProcessor()); // use our new processor

            var output = new MemoryStream();
           // css files code resolves the css while generating pdf
            List<string> cssFiles = new List<string>();
            cssFiles.Add(@"/Content/bootstrap.css");
            cssFiles.Add(@"/Content/Site.css");

            var input = new MemoryStream(Encoding.UTF8.GetBytes(string.Format(htmlString)));
            var document = new Document();
            var writer = PdfWriter.GetInstance(document, output);
            writer.CloseStream = false;
            document.Open();
            var htmlContext = new HtmlPipelineContext(null);

            // htmlContext.SetTagFactory(iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory());

            htmlContext.SetTagFactory(tagProcessors);
            //map the css files to apply the css styles in the pdf.
            ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
            cssFiles.ForEach(i => cssResolver.AddCssFile(System.Web.HttpContext.Current.Server.MapPath(i), true));

            var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));
            var worker = new XMLWorker(pipeline, true);
            var p = new XMLParser(worker);
            p.Parse(input);
            document.Close();
            output.Position = 0;

            FileContentResult result = this.File(output.ToArray(), "application/pdf");
            string base64String = Convert.ToBase64String(result.FileContents, 0, result.FileContents.Length);
            // return result.FileContents;
            return new JsonResult { Data = new { success = true, pdfString = base64String } };
        }

In the above code the following lines helps to "Add css files while generating the pdf."

// css files code resolves the css while generating pdf
            List<string> cssFiles = new List<string>();
            cssFiles.Add(@"/Content/bootstrap.css");
            cssFiles.Add(@"/Content/Site.css");

            //map the css files to apply the css styles in the pdf.
            ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
            cssFiles.ForEach(i => cssResolver.AddCssFile(System.Web.HttpContext.Current.Server.MapPath(i), true));

Add the "PdfImages" property to your controller

 public static List<Models.PdfImage> PdfImages { get; set; }

Add a modal class "DownloadPdf" which accepts the urls of Images with keys(id's)

DownloadPdf.cs file

public class DownloadPdf
    {
        public List<PdfImage> PdfImages { get; set; }
    }

    public class PdfImage
    {
        public string Key { get; set; }

        public string Base64ImgUrl { get; set; }
    }

Now use the static class HtmlToStringConverter to get your html as string

public static class HtmlToStringConverter
    {
        public static string RenderViewToString(this Controller controller, string viewName, object model)
        {
            var context = controller.ControllerContext;
            if (string.IsNullOrEmpty(viewName))
            {
                viewName = context.RouteData.GetRequiredString("action");
            }

            var viewData = new ViewDataDictionary(model);

            using (var sw = new StringWriter())
            {
                var viewResult = ViewEngines.Engines.FindPartialView(context, viewName);
                var viewContext = new ViewContext(context, viewResult.View, viewData, new TempDataDictionary(), sw);
                viewResult.View.Render(viewContext, sw);

                return sw.GetStringBuilder().ToString();
            }
        }
    }

Note(Recommended)

Place the HtmlToStringConverter static class inside the namespace that refers to your controller

Now create a class "CustomImageTagProccessor.cs" file which plays a key role on

"Convert html Images or canvas to the pdf."

 public class CustomImageTagProcessor : iTextSharp.tool.xml.html.Image
    {
        public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
        {
            IDictionary<string, string> attributes = tag.Attributes;
            string id = string.Empty;
            string src = "test";
            //string src;

            if (!attributes.TryGetValue(HTML.Attribute.ID, out id))
                return new List<IElement>(1);

            src = ChartController.PdfImages.Where(e => e.Key == id).FirstOrDefault().Base64ImgUrl;

            if (src.StartsWith("data:image/", StringComparison.InvariantCultureIgnoreCase))
            {
                var tempbase64Data = src.Substring(src.IndexOf(",") + 1);
                var tempLength = tempbase64Data.Length;

                var base64Data = Regex.Match(src, @"data:image/(?<type>.+?),(?<data>.+)").Groups["data"].Value;

                int length = base64Data.Length;
                int rem = base64Data.Length % 4;
                switch (rem) // Pad with trailing '='s
                {
                    case 0: break; // No pad chars in this case
                    case 2: base64Data += "=="; break; // Two pad chars
                    case 3: base64Data += "="; break; // One pad char
                    default:
                        throw new System.Exception(
                 "Illegal base64url string!");
                }

                var imagedata = Convert.FromBase64String(base64Data);
                var image = iTextSharp.text.Image.GetInstance(imagedata);

                var list = new List<IElement>();
                var htmlPipelineContext = GetHtmlPipelineContext(ctx);
                list.Add(GetCssAppliers().Apply(new Chunk((iTextSharp.text.Image)GetCssAppliers().Apply(image, tag, htmlPipelineContext), 0, 0, true), tag, htmlPipelineContext));
                return list;
            }
            else
            {
                return base.End(ctx, tag, currentContent);
            }
        }

    }

The above CustomImageTagProcessor class resolves the Images and the image will be encoded in the pdf.

Now the final json result will get a pdfstring which is a base64 encoded format. use a javascript file to call the "GeneratePdf" post method and capture the json result and convert the pdfstring to blob and render in the browser or download

JavaScript file

$(document).ready(function () {
    $("#downloadPdf").click(function () {
        console.log("button clicked");
        var pdfImages = [];
        var pdfImage = new Object();
        console.log($("#scatterChart"));
        pdfImage["Key"] = $("#scatterChart").attr("name");
        pdfImage["Base64ImgUrl"] = $("#scatterChart")[0].toDataURL(1.0);
        console.log(pdfImage["Base64ImgUrl"]);
        pdfImages[0] = pdfImage;
        GetPdfString(pdfImages);
    })
})

GetPdfString = function (pdfImages) {
    var jsonObject = new Object;
    jsonObject["PdfImages"] = pdfImages;
    $.ajax({
        url: '/Chart/GeneratePdf',
        data: JSON.stringify(jsonObject),
        type: "post",
        dataType: "json",
        contentType: "application/json",
        success: function (response) {
            ConvertPdfStringToPdf(response.pdfString);
        }
    })
}

ConvertPdfStringToPdf = function (pdfString) {
    // Json.pdfString is base64 encoded

    const binaryString = window.atob(pdfString);
    const len = binaryString.length;
    const bytes = new Uint8Array(len);
    for (let i = 0; i < len; ++i) {
        bytes[i] = binaryString.charCodeAt(i);
    }
    var file = new Blob([bytes], { type: 'application/pdf' });

    if (window.navigator && window.navigator.msSaveOrOpenBlob) {

        window.navigator.msSaveOrOpenBlob(file, "test.pdf");
    }

    var fileURL = URL.createObjectURL(file);
    let aEle = document.createElement('a');
    aEle.href = fileURL;
    aEle.setAttribute("target", "_blank");
    //aEle.download = "test.pdf";
    aEle.click();
}

ConvertPdfStringToPdf(pdfString) is used for Conversion of pdf base64 string to blob using javascript.

Pdf Output:

Limitations of ItextSharp

I used to files Index.cshtml to render as view and pdf.cshtml to render html to pdf
I textsharp doesn't understand scripts so that i removed the layout in pdf.csthml file and your html files should be formatted properly (Ex: <img href="#" src=""> without end tag itextsharp wont accept it, It should be either like <img /> or <img></img>)

Thanks & Regards,

PRADEEP KUMAR BAISETTI

Associate Trainee – Enterprise Application Development 

Digital Transformation

w. www.mouritech.com