Quantcast
Channel: Adobe Community: Message List
Viewing all articles
Browse latest Browse all 97645

Re: Refying PDF with subset embedded fonts fixes text extraction

$
0
0

Thanks again for your reply,

 

Your explanation makes sense.

 

I went ahead and removed the tounicode cmap just to see what would happen

 

       if (CosDictKnown (cosFont, ASAtomFromString ("ToUnicode")))

        {

         CosDictRemove (cosFont,ASAtomFromString ("ToUnicode"));

        }

 

As you predicted this fixes some issues and introduces new ones.

 

The results differed from the refry method, in some cases the refried PDF did not contain extractable  text, in other cases the PDF without "ToUnicode Cmap" had no extractable text.

 

Maybe I could combine the information of different text extraction methods to make an educated gues which one (or combination of) is best :S

 

I suppose looking at individual textruns (with all its complexity) would not help me either...

 

Kind regards,

Robert


Viewing all articles
Browse latest Browse all 97645

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>