Isn't bracket regular expression compatible with UTF8?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Isn't bracket regular expression compatible with UTF8?

Fernando Cabral
I have been trying something like *poder[^[:alpha:]*  so I  could find the
word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
an alpha character in Portuguese.)

In English it could be like finding "power" but not "powerless".

Problem is that it seems [^[alpha]] includes accented characters like "á",
"é", "ã".

That is, accented characters are not understood as alpha, but not alpha.

Please, note that I have compiled it with the UTF8 flag:
*   re.Compile(poder[^[:alpha]], RegExp.utf8)*

Any hints?

- fernando
--
Fernando Cabral


Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: [hidden email]
Facebook: [hidden email]
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user
Reply | Threaded
Open this post in threaded view
|

Re: Isn't bracket regular expression compatible with UTF8?

Tobias Boege-2
On Tue, 04 Jul 2017, Fernando Cabral wrote:

> I have been trying something like *poder[^[:alpha:]*  so I  could find the
> word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
> an alpha character in Portuguese.)
>
> In English it could be like finding "power" but not "powerless".
>
> Problem is that it seems [^[alpha]] includes accented characters like "á",
> "é", "ã".
>
> That is, accented characters are not understood as alpha, but not alpha.
>
> Please, note that I have compiled it with the UTF8 flag:
> *   re.Compile(poder[^[:alpha]], RegExp.utf8)*
>
> Any hints?
>

In your mail I can see three distinct attempts at writing down a
negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
but the correct syntax is

  [[:^alpha:]]

You want to check this first.

Regards,
Tobi

--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user
Reply | Threaded
Open this post in threaded view
|

Re: Isn't bracket regular expression compatible with UTF8?

Fernando Cabral
Tobi wrote:

> n your mail I can see three distinct attempts at writing down a
> negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
> but the correct syntax is
>
>   [[:^alpha:]]
>
> You want to check this first.


Right again, Tobi. I can't understand how I missed this. Thank you.

- fernando

2017-07-05 6:37 GMT-03:00 Tobias Boege <[hidden email]>:

> On Tue, 04 Jul 2017, Fernando Cabral wrote:
> > I have been trying something like *poder[^[:alpha:]*  so I  could find
> the
> > word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
> > an alpha character in Portuguese.)
> >
> > In English it could be like finding "power" but not "powerless".
> >
> > Problem is that it seems [^[alpha]] includes accented characters like
> "á",
> > "é", "ã".
> >
> > That is, accented characters are not understood as alpha, but not alpha.
> >
> > Please, note that I have compiled it with the UTF8 flag:
> > *   re.Compile(poder[^[:alpha]], RegExp.utf8)*
> >
> > Any hints?
> >
>
> In your mail I can see three distinct attempts at writing down a
> negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
> but the correct syntax is
>
>   [[:^alpha:]]
>
> You want to check this first.
>
> Regards,
> Tobi
>
> --
> "There's an old saying: Don't change anything... ever!" -- Mr. Monk
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Gambas-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gambas-user
>



--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: [hidden email]
Facebook: [hidden email]
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user