A while back, I posted a call to share your case studies . Today, I am pleased to present the first one. One of the reasons I wanted to start this series is that I think it is helpful to the community to see how people are using python to solve real problems and how these solutions can be helpful to a broad cross-section ofpeople.
For this particular case study, we are fortunate enough to have a full open source python solution for converting data into PDF . Without further fanfare, I would like to turn the discussion over to Vasudev Ram , the creator of xtopdf .
InterviewBefore we get started, can you explain what xtopdfis?
xtopdf contains tools to convert various file formats (collectively called x) to PDF format. The PDF files created are viewable and printable by any PDF viewer/reader, such as Adobe Acrobat Reader, XPDF , GhostView, andKGhostView.
xtopdf currently supports the following inputformats:
Plain text (not including Postscript, which is also a kind of“text”). . DBF files (XBasefiles) . CSV files (Comma Separated Valuesfiles) . TDV files (Tab Delimited Valuesfiles)Can you give us a little background on the problem you were trying to solve when you started working onxtopdf?
For xtopdf, I did not start with any existing problem - although as it turns out, it seems to be of use to some organizations, so I guess you could definitely say it solves some problems, postfacto.
xtopdf got started after I had read an article about ReportLab , in IBM developerWorks in 2002 or so. It was by Cameron Laird, of phaseit.net. Phaseit is a boutique consultancy and he has published a lot of articles on software topics, in well-known pubs like Unix Review and other magazines. He also maintains some pages on PDF technologies on his site .
His article’s title was “ PDF on the Server” and it showed how to use Reportlab with Python to dynamically generate a PDF on the server and send it to a client. This was some years ago. I liked the idea of programmatic PDF generation, so downloaded the open source version of ReportLab, and played around with it. As part of that, since I knew how to read data from a DBF ( XBASE ) file (having done it before in C), I wrote a tool called DBFtoPDF, which reads both the metadata and the data records of a DBF file, and publishes both to PDF , neatly formatted for human consumption, using ReportLab. That was done using raw ReportLab and Python. ReportLab is a powerful PDF library, but to use it in the way that I did initially, you have to use its low-level primitives, which is somewhattedious.
But then, being a developer, I worked on refactoring the code and making it more general, split the single DBFtoPDF program into a module for DBF reading, and another one for PDF writing. Then realized that each was usableindependently.
Since the PDF writing part was now a separate module, it was now easy to plug it together with other modules from reading / extracting data from various other sources, such as CSV , TSV , XLS , many databases, files on the Net,etc.
That was the start of xtopdf as a project and generic toolkit rather than just being a one-off tool for a single purpose. Then over a period of time, I worked on adding support for many other data formats / input sources to xtopdf. See this presentation for my information. This presentation also on shows how to use xtopdf in command-line, desktop GUI and web environments. I also wrote proof of concept programs for all of these areas (and some of them are useful as tools in their own right, without further modification), added them one by one to the package, and published various blog posts about this, overtime.
Early on, I also released xtopdf as an open source project on SourceForge.net, under the New BSD License. (Later on moved it to Bitbucket.) And posted announcements about xtopdf now and then in various relevant forums such as comp.lang.python, comp.lang.python.announce, and comp.text.pdf - all Usenet newsgroups, and on the ReportLab mailinglist.
Also in my regular browsing of various tech sites and forums and bulletin boards on topics like Python, PDF , and/or on various programmer’s blog which I used to read, whenever there was a discussion that was relevant to PDF generation, I would sometimes mention xtopdf and say what it could do, and give a link toit.
I suppose, due to this publicizing that I did over time, people got to know about it, and some of them (like Packt, SFLC and ESRI ) must have tried it out, and some found ituseful.
How has your solution changed overtime?
Over time and with enhancements, xtopdf has become a somewhat higher-level abstraction over ReportLab, but for only a subset of ReportLab’s functionality - writing text content (from any source) to PDF , in the form of a report, with headers, footers, page numbers, pagination, etc. That is, text-oriented reports. It is somewhat more convenient and productive to use than ReportLab for this intended subset of use, IMO .
How would you describe the process improvements you have been able to make by usingxtopdf?
I would not say that there is anything very innovative or radically new in xtopdf or my use of Python for xtopdf. All the concepts involved in it, such as conversion of data from one format to another, report generation with the chrome I mentioned, are all well-known techniques. The utility of xtopdf lies more in putting them all together. However, the users I mentioned above, do find it useful. Also, I’ve discussed xtopdf with Cameron, who has lots of experience in many tech areas, and he has told me that he thinks xtopdf can be useful in (partially or wholly) automating many common reporting workflows that are done manually - common examples being where someone has to generate a report from various organizational data sources, such as a few different databases and spreadsheets and flat files of various formats. In such as case, xtopdf can be used fairly easily to programmatically pull all the needed data out of those sources (the xtopdf package has many PoCs with code for those common sources), combine it in any way you wish, using Python’s powerful tools for data manupulation such as lists, dicts, slicing, and other language features, and put it all to PDF , with the kind of formatting that you want.
Do you have any tips or lessons learned that could help others when they find them