http://www.w3.org/ns/prov#value | - I do not think you should have any problems with that (only for really badly scanned teyt, when it cannot decide whether something is a letter or a blob, then you will maybe have to manually correct) - I have used a variety of software on croatian language (we have some weird characters in our alphabet) and it worked out fine.
|