I am using XML ===XSLT===> LaTeX ===pdflatex/lualatex===> PDF for more than a decade now. The whole pipeline is driven by a batch file that takes all XML files in an input folder, uses a temp folder for the intermediate LaTeX, and an output folder for PDF.
I can produce HTML files from the same XML sources directly: XML ===XSLT===> HTML.
For differences between PDF and HTML versions I have some special tags and attributes in my XML sources.
If I want to change something in the layout, I modify the XSLT script and run the old XML sources through the pipeline again in one go.
There was some up front effort in designing the XML tag system and writing the XSLT scripts. But since my later layout changes were minor, the required tweaks were easy.
A quite simple homegrown DTD. Making it as simple as possible keeps the complexity of the XSLT scripts low. Most of it is similar to HTML (<paragraph>, <italics>, <bold>) with a few special attributes to add some semantics or processing hints.[1] Sometimes I am using several intermediated steps to produce the final HTML. It might then be useful to version the DTDs with a fixed REQUIRED version attribute in the root element, that must to appear in the XML files, to avoid applying a wrong XSLT script to an outdated version of my XML sources.[2]
For a customer who needed to import large semi-structured legacy Word documents from another company into a database system, I once implemented the following process: The Word documents were converted to a relatively simple homegrown XML format based on the structural elements of the Word document. The resulting XML documents were manually corrected where the structural elements were incorrect. Some special attributes were added inside the XML documents to associate text passages with already existing database keys. When this was finished, an XSLT script was applied that split the large XML file into smaller ones based on this database keys; a human readable prefix, the key and a date went into the file name. These files were converted in bulk to LaTeX and then to PDF. Afterwards, I used a little tool to bulk upload only the fresh PDFs into the correct database entries based on the keys in their filenames.
For one of my side-projects, a C# application, I am using another, object-oriented approach, where I have an abstract base class for reporting and two derived classes, one that outputs HTML and one that outputs LaTeX. The LaTeX output is then fed into lualatex to produce PDFs. You can check out the free Herodotus edition of my (closed-source) Factonaut project at https://www.factonaut.com/ to see it in action.
[1] Using parameter entities for re-usability, such as
<!ENTITY % output_attr SYSTEM "output_attr.ent">
<!ATTLIST foo %output_attr; >
<!ATTLIST bar %output_attr; >
in the DTD referring to an `output_attr.ent` file with the following contents:
I can produce HTML files from the same XML sources directly: XML ===XSLT===> HTML.
For differences between PDF and HTML versions I have some special tags and attributes in my XML sources.
If I want to change something in the layout, I modify the XSLT script and run the old XML sources through the pipeline again in one go.
There was some up front effort in designing the XML tag system and writing the XSLT scripts. But since my later layout changes were minor, the required tweaks were easy.