Reg expression still beating me up

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reg expression still beating me up

Fernando Cabral
In the piece of code bellow, RegExp.Replace will never return.

Sentencas[i] = "Test string."
Print "Before replacing"
Sentencas[i] = RegExp.Replace(Sentencas[i], "[.:!?;]*[ ]*?\n*?", "",
RegExp.UTF8)
Print "After replacing"

It beats me, because what it should do is very simple: optionally find one
of the punction marks (.:?!;) optionally followed by any number of white
space, optionally followed by any number of "\n" (end of line). Replace
whatever is found with an empty string.

In the text string, it should find the dot (.) and replace it with nothing.
So, the returned string should be "Test string".

Alas! It will never come back. Same if I replace the test string with
"Test string. \n" or "Test string.\n"

Now, this works as expected, but this is not what I need:  "[.:!?;][
]*?\n*?", ""
To my eyes, "[.:!?;]*[ ]*?\n*?" is a perfectly valid regular expression.

Any hints?


--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: [hidden email]
Facebook: [hidden email]
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user
Reply | Threaded
Open this post in threaded view
|

Re: Reg expression still beating me up

Tobias Boege-2
On Sun, 28 May 2017, Fernando Cabral wrote:

> In the piece of code bellow, RegExp.Replace will never return.
>
> Sentencas[i] = "Test string."
> Print "Before replacing"
> Sentencas[i] = RegExp.Replace(Sentencas[i], "[.:!?;]*[ ]*?\n*?", "",
> RegExp.UTF8)
> Print "After replacing"
>
> It beats me, because what it should do is very simple: optionally find one
> of the punction marks (.:?!;) optionally followed by any number of white
> space, optionally followed by any number of "\n" (end of line). Replace
> whatever is found with an empty string.
>
> In the text string, it should find the dot (.) and replace it with nothing.
> So, the returned string should be "Test string".
>
> Alas! It will never come back. Same if I replace the test string with
> "Test string. \n" or "Test string.\n"
>
> Now, this works as expected, but this is not what I need:  "[.:!?;][
> ]*?\n*?", ""
> To my eyes, "[.:!?;]*[ ]*?\n*?" is a perfectly valid regular expression.
>
> Any hints?
>

RegExp.Replace() wants to replace *all* occurences of the expression.
It is basically a loop of RegExp.Exec() followed by a substitution, as
long as the RegExp.Exec() call finds something.

Now look at your expression. Since everything is optional, your expression
matches the empty string. RegExp.Exec() will always find the empty string
and replace it with itself, giving you an infinite loop.

I think that the behaviour of RegExp.Replace() in this case is sound and
you should use a better expression, that is guaranteed to match a string
of positive length or not match at all.

Regards,
Tobi

--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user