PDF Rendering Service
Technologies Used
MongoDB, Redis, Chromium (via puppeteer), Handlebars.js, npm, socket.io
Goal
Generating and mailing letters is a common task for every business, and my current employer is no exception. Many letters are created using Mail Merge or a third-party application – processes that are inaccessible to the Node.js services we develop.
The first service to require PDF rendering was one that created monthly statements for mortgage borrowers. The original implementation used the now deprecated PhantomJS headless browser and was baked-in to the single-threaded service. Additionally, there was only one template for every statement.
As more services were developed that also needed to generate PDFs, and as the business defined additional borrower statement templates, it became clear that an external, general purpose service was needed. I identified several requirements for this improved service:
- Rendering must be handled by worker processes, allowing the service to scale
- PhantomJS must be replaced by Chromium (via puppeteer)
- Templates must be self-contained and follow a directory structure convention so they can be uniformly stored and used by the service
- Templates must continue to use Handlebars and support LESS compilation
- External assets, such as images and stylesheets, must be inlined with the Handlebars markup so rendered HTML contains all required data. This allows PDFs to be rendered immediately on page load.
Approach
Five repositories constitute the overall PDF Rendering ecosystem:
Repository | Description |
svc-pdfrender | Master process. Presents REST API and websocket interface for render status updates. Submits render jobs to workers via a redis queue. |
svc-pdfrender-worker | Worker process. Receives jobs from master via render job queue, emits status updates, and saves final results to GridFS |
common-pdfrender | Contains code shared between Master and Worker processes, including methods for working with template and render files. |
cli-pdfrender | Development tool for creating and publishing templates. |
lib-pdfrender | Abstraction Library for services to interact with svc-pdfrender |
Template Development
A final, working template is one that is tarballed so it can be stored in MongoDB’s GridFS and used by the worker processes. Templates are written using the Handlebars templating engine, which supports partials and custom helper functions. In order for the worker processes to properly handle partials and helpers, the template must conform to a specific directory structure.
.env | Defines environment variables for setting the URI of the rendering service and local dev port |
assets | Contains non-HTML assets, such as images and stylesheets, which are referenced relatively in the template and inlined by cli-pdfrender |
partials | Handlebars partials live in this directory, and partial names are registered using file names |
data.json | Sample data to feed into the template for testing purposes |
helpers.js | Node module that must export a function that receives hbs as its only argument. This function may require external node modules and is used to register helper functions |
index.hbs | Entrypoint for the Handlebars template |
package.json | npm-style manifest for the template. In addition to defining a template name and leveraging semantic versioning, this object defines a time-to-live (TTL) for the template so non-published templates are auto-purged, along with npm dependencies required by the helpers.js function. |
While developing a template, cli-pdfrender can be used to watch the template directory and render changes as they are applied. Once a template is uploaded and renders successfully, it is stored in GridFS and a websocket event is broadcast with the template_id and render_id. If in development mode, cli-pdfrender will launch a local Chrome instance and use an iframe to display the rendered template once a render event is received via websocket. Of course, the CLI can also publish a template by setting TTL to 0.
Template Usage
Dependent services use lib-pdfrender to interact with svc-pdfrender. This library abstracts REST calls and websocket events so the user only needs to specify a renderer URI, template name when constructing a PDFRender object, listen for render and error events, and call the render method with data to be rendered.
// instantiation const letter555Renderer = new PDFRender('https://renderer.example.com', 'letter_555'); // listen for completed events letter555Renderer.on('completed', (renderJob) => { // renderJob contains the URIs where you can download the html, pdf and metadata console.dir(renderJob.uris); // do what you'd like with the uris to the files. // the renderJob class also contains a convenience method for downloading the pdf to a custom directory return renderJob.download(outputDir).then((outPath) => console.log(`pdf downloaded to ${outPath}`)); }); letter555Renderer.on('error', (err) => //handle error); // initialize to have the templateID set up letter555Renderer.init().then(() => { // ready to render }).catch((err) => { // init errors will also be catchable here }); let data = [{}, {}]; data.forEach((d) => { // render jobs will be queued until the websocket connection to the renderer is established return letter555Renderer.render(d).then((job) => { // listen for job-specific events here }); });
On construction, the PDFRender instance uses the provided template name to look up the template_id of the template. Optionally, a specific template version may be provided. If there is a conflicting version number for any reason, an incremental build number is used.
To submit a render job, the render method issues a POST request using the template_id and sets the request body to a JSON object which is used to populate the template. The service immediately returns a render_id, and the library listens via websocket for render and error events related to that render_id. Those events are relayed through JavaScript events for consumption by the dependent service, which is expected to download the rendered PDF and/or HTML immediately. Render files are auto-purged 10 minutes after creation.
Worker Processes
While the CLI tool inlines assets into the handlebars files, partials and helpers still need to be registered by the worker at render time. Registering partials is straightforward – read each partial and call the registration method passing in the filename less its .hbs extension.
Helpers are more complex as they are registered with the function exported by helpers.js. This module may include npm modules which need to be installed prior to registration. Each worker uses npm programmatically to read dependencies from the template directory and install them to a working directory prior to registration.
Potential Improvements
This service was originally written in just one month, so there are a few points for improvement
- cli-pdfrender enforces a specific directory structure but does not provide an init command to start a new template
- Helper registration runs arbitrary JavaScript. This is a significant security concern, one that was de-prioritized as the service lives on an internal network. Future versions should sandbox execution of helpers.js so malicious code cannot hijack the worker threads.
- Additional templating engines beyond Handlebars should be supported, such as pug, as well as other style languages like SCSS