ITextSharp is a.NET PDF library which allows you to generate PDF (Portable Document Format). There are many other feature of ITextSharp but currently we are implementing the feature to generate PDF from HTML content in ASP.NET MVC.

Is there a .dll I can use which uses a PDF file as an input and HTML file as an output?I want to convert from PDF to HTML. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from PDF and then creating new HTML file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?

3 Answers

Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.

If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).


As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.


You can download this free tool: PDFToHTML

Then in your program just fork a new process and run the executable passing the PDF file. I just tested it now and it seems to work ok.

If you don't mind paying, Aspose offers a very good solution, this is what we use at my company.

