Skip to content Skip to sidebar Skip to footer

How To Handle Special Characters When Converting From Html To Docx

I have a application that converts html files to DocX using DocX4J. I´m having problems with special characters like ç,á,é,í,ã,etc. My text font in the html files is Arial bu

Solution 1:

Following the tip given by JasonPlutext, I found an example of how to map a font to the XHTMLImporter at the DocX4J forum (http://www.docx4java.org/forums/docx-java-f6/docx-to-html-and-back-to-docx-t1913.html).

Now my code is working! See the final version below.


public WordprocessingMLPackage export(String xhtml) {

WordprocessingMLPackage wordMLPackage = null;
try {
    RFonts arialRFonts = Context.getWmlObjectFactory().createRFonts();
    arialRFonts.setAscii("Arial");
    arialRFonts.setHAnsi("Arial");
    XHTMLImporterImpl.addFontMapping("Arial", arialRFonts);

    wordMLPackage = WordprocessingMLPackage.createPackage();
    XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
    List<Object> content = importer.convert(xhtml,null);
    wordMLPackage.getMainDocumentPart().getContent().addAll(content);
}
catch (Docx4JException e) {
    // ...
}
return wordMLPackage;
}

Post a Comment for "How To Handle Special Characters When Converting From Html To Docx"