Many non-Latin character based languages do not have upper-case/lower-case characters and therefore this sort of issue will rely on the pure level-upon-level relationship between word strings related by frequent proximity to one another in text, and any other representation of language (voice, symbols, signs, etc).