Tuesday, November 19, 2024

Formatting HTML for PDF Documents Generated in Power Automate

Microsoft has a pre-release out which adds the PDF Print function in PowerApps to allow you to print containers.  It is handy in certain scenarios, but I believe in most situations where you need to generate documents for official policies and procedures, it is worthwhile to print/retain the source documents used to create them using any function that translates what you see into what somebody else sees.

Therefore, I believe in most situations where we are creating documents alongside official processes (HR, Legal, etc.), creating and storing the original HTML is essential instead of relying on a tool we have zero control over to serve as our record of what was created.

So this covers the older method of generating PDF documents from HTML using Power Automate.

First off, this brings up a few questions around HTML and how to write HTML for documents versus writing HTML for websites.  I even let Google's AI take a stab at giving me an answer on this and it seems this is definitely something that's still very TBD.

So here are a few guidelines of tested methods and formatting that absolutely work when you are creating documents for printing into PDFs from Microsoft's current Convert File -> PDF function in Power Automate.  That does not mean that they necessarily work in formatting within PowerApps, and the HTML Viewer or Rich Text editor might not necessarily work with these, but in most cases they do.  I'll note when they do not.

NOTE: PowerApps doesn't like double quotes in values and you generally will find/replace double quotes with single ones and everything will work.  Putting this directly into a Default value in a control won't work from the Advanced tab, but it will from the Display tab.  If you drop it into the Advanced tab it takes it literally.  If you do it in the Display tab it will auto-convert double quotes to single ones:

Pasting here won't work with double quotes

Pasting here will work fine


Inline CSS

The term CSS is immediately going to put you on the wrong track.  "Cascading Style Sheets" is shortened to CSS and nowadays just refers to whatever additional markup we're kind of sticking on the back-end of HTML to have more options for formatting things.  However, in the case of documents we never really have CSS sheets like many web developers use.  We mainly mean CSS in terms of just the formatting, not literally having some super-document that explains how our other document needs to be formatted.

Because CSS comes from a world where HTML is assumed to be used for a website, every tutorial is going to show you how to put your CSS in some headers, then reference them throughout the document because it is a "best practice".  It isn't.  Never was.  Never will be.  It is a practice, but it is by no means a "best practice".  In particular, this is definitely not a best practice for documents.

For documents, Inline CSS is correct. Therefore, inline CSS is the best practice for documents.

That means inline CSS like this is correct:

<h1 style="color:blue;">A Blue Heading</h1>

And internal CSS like this is wrong:

<style>
h1   {color: blue;}
</style>
<h1>A Blue Heading<h1>

Again, for documents.

Does the second approach work?  Sometimes.  But not reliably.  

Finally, the other "best practice" of having actual dedicated style sheets doesn't work at all.  So this will never work:

<head>
  <link rel="stylesheet" href="styles.css">
</head>
<body>
<h1>Will never be properly formatted in a PDF</h1>

No matter what formatting is inside the styles.css document, it won't work.  

The original primary reason behind CSS was for web optimization for file size. For documents, this argument is moot.

The problem re: documents (particularly legal ones), is we need the definitions exactly where the formatting occurs.  If you want your text to be blue, make it blue right there.  Don't make it blue at the top of the page and have to go figure out where you do that later.

The automated conversion tools (either Microsoft's or even libraries for Python) have more accurate results with inline CSS.  

So if you are looking at examples online, they will mostly be internal CSS examples.  You'll need to alter them to be inline CSS.  

For example, if they show you this:

<style>
p {
  border: 2px solid;
}
</style>

Which would normally alter all of our paragraph tags (<p>) to have a two pixel solid border around  them.  Instead, we'd add this individually to any paragraph we want to alter like this:

<p style='border: 2px solid;'>

Not every tag works this way though, so you will need some trial and error in certain situations.

Document Formatting with Fonts and Margins

As I mentioned above, some tags don't quite work the way you'd want in all scenarios.  Some of the larger document formatting items like spacing around the borders need to be at the top level of the document and don't work otherwise.

<!DOCTYPE html>
<html lang=en>
        <head></head>
        <body style="margin-left:5%; margin-right:5%;font-family: Calibri, sans-serif">

        ...whatever the rest of your document is here...

        </body>
</html>

If you want to adjust margins for the page then you need to include them in the <body style= tag.  The PDF converter will recognize these for setting up the page and entire document with the correct sizing, margins, and font.  

Now, you can still change fonts anywhere inside the document.  However, for most documents you want a common font throughout.  Only additional formatting like color, size, emphasis/italics, and bold wind up changing elsewhere within the document.

Setting these values at the top of the document gets you on target and no issues should occur.

Table Formatting

Table formatting can be fairly complex, but it is important to remember that the PowerApps controls do not display these properly but the PDF might.  So be cautious on using the PowerApps controls specifically where table formatting is concerned for validation.  You might instead use various online HTML formatting tools to build out how you want it to look and then trust that the PDF converter will work properly.

The HTML Viewer control generally works just fine.  However, the Rich Text Editor does not.  Anything you create in the Rich Text Editor will display as it appears in that editor inside the HTML Viewer.  However, items that display fine inside the HTML Viewer will not necessarily display or work if altered in the Editor.

That being said, you should always test your tables and individual settings to ensure things work properly.  

Specifically the following items don't inside a Rich Text Editor within your app but will work fine when converted to PDF:

  • background-color or background
  • height
  • width

Take this as an example:

<table style='height: 45px;' width='650'>
   <tbody>
      <tr>
         <td style='width: 334.712px; background: #981e32;color: white;'>
                 <center><strong>Details</strong></center>
        </td>
      </tr>
   </tbody>
</table>

This is perfectly valid HTML, but may not display properly inside PowerApps.  If you put this inside the HTML Viewer control's default value you get exactly what you'd see in the PDF or any browser, which is:


But if you drop that same HTML into the default value of a Rich Text Editor you get:

Plus, if you attempt to edit/save this inside that same editor and attempt to convert it to PDF, then it will strip out some of the values and your PDF will not look correct.

Images

Web pages often reference standalone images that exist in their own files outside of the html itself.  However, you absolutely can embed images directly into your HTML.  In fact, this is one of those scenarios where the Rich Text Editor is your friend.  

  • Drop two controls in your PowerApp:  Rich Text Editor, Text Input
  • Change the Text input to Multiline vs. Single
  • Change the Text Input Default property to point at your Rich Text Editor's HTMLText property

Now, if you copy/paste an image into your Rich Text Editor you'll see the image converted into base64.  You can take this text from the Text Input now and paste it into your HTML to display the image anywhere on your page:

Except now your HTML is very annoying to read

You will use code similar to this to embed an image directly your HTML:

<img src="...whatever your image data is...">

NOTE: With legal and HR documents in particular, you might be including a digitized signature of an official.  The same process above applies for inserting that as well.

The Power Automate Flow

PowerApps now has the ability to print a container that doesn't require a Flow, but this method gives you the ability to keep the source HTML file and the generated PDF and retain them both.

For the Flow itself, it is fairly straightforward.  In this example, all you need to do is to pass in the filename and the content into your flow:


This generates two files:

  • filename.html
  • filename.pdf
You can put the .html somewhere to retain it for process adjustments and/or error-checking using some text-search tools and then use and possibly retain the .pdf for legal purposes.

At the end of this flow you can email the files, save the source somewhere, etc.  I would highly recommend that when doing this as a a part of any legal process you retain these files and store them securely.

NOTE: I will include a more process-oriented flow to save versions of these documents in another blog post.  Sometimes it can help to retain the source files for generating PDFs for legal reasons.

Hyperlinks for Emails and URIs

I've gone back and forth w/ legal and HR teams over the years re: embedded links in documents.  As the years have gone by, it seems they're finally OK with putting these into documents regularly and then not having to do the literal URL somewhere afterward.  Meaning, that print documents won't necessarily show the email/web link but digital ones will.

Regardless, standard HTML functions work just fine here in all scenarios:

  • <a href="mailto:dieInAFire@copilot.microsoft.com">Send email</a>
  • <a href="https://<MicrosoftIsPaddingTheirAIRevenue.com">Click here</a>
Keep in mind, the same rule regarding double quotes vs. single quotes continues to apply here.  If you're going to put this into the Advanced tab or the equation editor at the top of the screen then you'll need to convert all double quotes in your HTML to single quotes:
  • <a href='mailto:dieInAFire@copilot.microsoft.com'>Send email</a>
  • <a href='https://<MicrosoftIsPaddingTheirAIRevenue.com'>Click here</a>

JavaScript

No.  Just...no.

Final Thoughts

It is tempting to use Microsoft's new method and I think there are use-cases for it.  However, for official documents where we want to keep source and perhaps send emails/PDFs or some future format, HTML source is widely consumable and transferred easily.

Knowing how to do the basics of HTML formatting to use for PDFs also translates easily for Outlook emails as well.  However, I would be careful with images as sometimes different email clients handle them poorly.  

Regardless, formatted HTML gives you a very flexible method for allowing departments to build/edit their own documents to get things 80% of the way there.  It also allows for different technical capabilities to learn and practice their skills here and understand how these documents are built and designed under the hood.

I can't count the number of times I have been thankful that I generated a source document and a final document so I could easily review/re-send/edit items when certain things changed at the last minute.  Having to go through an elaborate series of integrations to pull documents from a myriad of systems to re-generate them all because your legal team changed their mind about one sentence is something none of us would be happy about when it is far easier to simply alter the source document before rebuilding the PDFs.

No comments:

Post a Comment

Because some d-bag is throwing 'bot posts at my blog I've turned on full Moderation. If your comment doesn't show up immediately then that's why.

DIAF Visualpath team