Eliminating duplicates real fast... but adding text do textarea works in a snail pace

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Eliminating duplicates real fast... but adding text do textarea works in a snail pace

Fernando Cabral
I want to share with you what seems to be my final solution for the problem
concerning breaking a text into words, counting them all and eliminating
duplicates. I was in for a surprise. Maybe you are too.

First, to eliminate duplicates and count occurrences. Here is the code.
Very simple, very time efficient: only 40 MILLISECONDS to sort 68,626
words, find and copy 8,984 unique words, prepending a count number and then
sorting again:















* MatchedWords.Sort(gb.ascent + gb.language + gb.IgnoreCase) For i = 0 To
MatchedWords.Max    n = 1    For j = i + 1 To MatchedWords.Max      If
(Comp(MatchedWords[i], MatchedWords[j], gb.language + gb.ignorecase) = 0)
Then         n += 1      Else         Break      Endif    Next
UniqWords.Push(Format(n, "0###") & "#" & MatchedWords[i])    i += (n - 1)
Next UniqWords.Sort(gb.descent + gb.language + gb.ignorecase)*
So, sorting, comparing, copying and sorting again was not the issue.
Preparing to display was. So much so that the following function took me
30+ seconds to add those 8984 words to the TextArea to be displayed:



*Public Sub AppendText(Text As String)   TextArea1.text &= TextEnd*

But, I was able to reduce that to 32 MILLISECONDS merely by concatenating
the words into a single string before calling AppendText() just once:






*str = "" For i = 0 To UniqWords.Max   str &= UniqWords[i] &
"\n" Next FMain.AppendText(str)*
So, concatenating here is two orders of magnitude faster than concatenating
a TextArea. Even thou both were just string concatenation.

In the end, what was taking 30+ do execute came down to 135 MILLISECONDS!
That's a 222 times reduction.

The lesson I have re-learned one more time: measure, don't guess. What
seems the culprit might not be. And a innocent-looking function might be
the killer.

Thank you guys for your help. I've learned a lot about Gambas as well as
about algorithms.

Regards

- fernando

--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: [hidden email]
Facebook: [hidden email]
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Eliminating duplicates real fast... but adding text do textarea works in a snail pace

Jussi Lahtinen
Gambas have built in profiler. You might want to get familiar with it.


Jussi

On Sat, Jul 1, 2017 at 3:26 PM, Fernando Cabral <
[hidden email]> wrote:

> I want to share with you what seems to be my final solution for the problem
> concerning breaking a text into words, counting them all and eliminating
> duplicates. I was in for a surprise. Maybe you are too.
>
> First, to eliminate duplicates and count occurrences. Here is the code.
> Very simple, very time efficient: only 40 MILLISECONDS to sort 68,626
> words, find and copy 8,984 unique words, prepending a count number and then
> sorting again:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * MatchedWords.Sort(gb.ascent + gb.language + gb.IgnoreCase) For i = 0 To
> MatchedWords.Max    n = 1    For j = i + 1 To MatchedWords.Max      If
> (Comp(MatchedWords[i], MatchedWords[j], gb.language + gb.ignorecase) = 0)
> Then         n += 1      Else         Break      Endif    Next
> UniqWords.Push(Format(n, "0###") & "#" & MatchedWords[i])    i += (n - 1)
> Next UniqWords.Sort(gb.descent + gb.language + gb.ignorecase)*
> So, sorting, comparing, copying and sorting again was not the issue.
> Preparing to display was. So much so that the following function took me
> 30+ seconds to add those 8984 words to the TextArea to be displayed:
>
>
>
> *Public Sub AppendText(Text As String)   TextArea1.text &= TextEnd*
>
> But, I was able to reduce that to 32 MILLISECONDS merely by concatenating
> the words into a single string before calling AppendText() just once:
>
>
>
>
>
>
> *str = "" For i = 0 To UniqWords.Max   str &= UniqWords[i] &
> "\n" Next FMain.AppendText(str)*
> So, concatenating here is two orders of magnitude faster than concatenating
> a TextArea. Even thou both were just string concatenation.
>
> In the end, what was taking 30+ do execute came down to 135 MILLISECONDS!
> That's a 222 times reduction.
>
> The lesson I have re-learned one more time: measure, don't guess. What
> seems the culprit might not be. And a innocent-looking function might be
> the killer.
>
> Thank you guys for your help. I've learned a lot about Gambas as well as
> about algorithms.
>
> Regards
>
> - fernando
>
> --
> Fernando Cabral
> Blogue: http://fernandocabral.org
> Twitter: http://twitter.com/fjcabral
> e-mail <http://twitter.com/fjcabrale-mail>: [hidden email]
> Facebook: [hidden email]
> Telegram: +55 (37) 99988-8868
> Wickr ID: fernandocabral
> WhatsApp: +55 (37) 99988-8868
> Skype:  fernandojosecabral
> Telefone fixo: +55 (37) 3521-2183
> Telefone celular: +55 (37) 99988-8868
>
> Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
> nenhum político ou cientista poderá se gabar de nada.
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Gambas-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gambas-user
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gambas-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gambas-user
Loading...