Henry Hu
2018-06-15 13:44:14 UTC
Hello world!
I am trying to use unit irregex to match regular expressions in UTF-8
text. Is anyone familiar with a way to ask for the codepoint indices
rather than byte indices for the match?
For example:
(irregex-match-start-index (irregex-search (irregex "Ä" 'utf8) "ÄÄÄÄÄÄÄ"))
returns 6 when I want it to return 3, since there are 3 characters (6
bytes) before my match.
I tried (use utf8), but it is documented that it doesn't affect irregex and
it sure enough doesn't. I tried using the 'utf8 option while compiling my
regex, but it doesn't change the index returned by
irregex-match-start-index.
Thank you for any ideas you might have!
I am trying to use unit irregex to match regular expressions in UTF-8
text. Is anyone familiar with a way to ask for the codepoint indices
rather than byte indices for the match?
For example:
(irregex-match-start-index (irregex-search (irregex "Ä" 'utf8) "ÄÄÄÄÄÄÄ"))
returns 6 when I want it to return 3, since there are 3 characters (6
bytes) before my match.
I tried (use utf8), but it is documented that it doesn't affect irregex and
it sure enough doesn't. I tried using the 'utf8 option while compiling my
regex, but it doesn't change the index returned by
irregex-match-start-index.
Thank you for any ideas you might have!