Thanks again for your reply,
Your explanation makes sense.
I went ahead and removed the tounicode cmap just to see what would happen
if (CosDictKnown (cosFont, ASAtomFromString ("ToUnicode")))
{
CosDictRemove (cosFont,ASAtomFromString ("ToUnicode"));
}
As you predicted this fixes some issues and introduces new ones.
The results differed from the refry method, in some cases the refried PDF did not contain extractable text, in other cases the PDF without "ToUnicode Cmap" had no extractable text.
Maybe I could combine the information of different text extraction methods to make an educated gues which one (or combination of) is best :S
I suppose looking at individual textruns (with all its complexity) would not help me either...
Kind regards,
Robert