Discussion:
[rust-dev] Count of lines
Petr Novotnik
2014-09-15 06:34:36 UTC
Permalink
Hello folks,

recently I've been playing around with Rust and I'm really impressed. I
like the language a lot!

While writing a program to count the number of lines in a file, I
realized it ran twice as slow as an older program I wrote in Go some
time ago. After more experiments, I've figured out that Go's speed boost
comes from an optimized function in its standard library utilizing SSE
instructions, in particular the function bytes.IndexByte
(http://golang.org/pkg/bytes/#IndexByte).

I was wondering whether Rust would ever provide such optimized
implementations as well as part of its standard library or whether it's
up to developers to write their own versions of such functions.

Maybe I just missed something, so I'm attaching the rust program:

http://pastebin.com/NfFgMNGe

And for the sake of completeness, here's the Go version I compared with:

http://pastebin.com/4tiLsRpu

Pete.
Huon Wilson
2014-09-15 08:49:59 UTC
Permalink
Are you compiling with optimisations? (Pass the -O flag to rustc.) For
me, the Go and the optimised Rust are pretty similar, with the Go a few
percent faster on a 1.3GB MP4 file, and the two languages
indistinguishable on a 30MB XML file.


Also, Rust prefers iterators to manual indexing since it avoids
unnecessary bounds checks, that is, replace the inner loop with:

for b in buf.slice_to(n).iter() {
if *b == b'\n' {
lines += 1;
}
}

However, in this case, LLVM optimises that to a large chunk of SSE
instructions https://gist.github.com/huonw/37cdd3dea3518abdb1c4 which
seem to be reading only 2 bytes per memory access and only 4 bytes per
tick of the loop (i.e. it's rather unoptimal); this means that the
iterator version is only minutely faster than the naive range(0, n) loop
in this case.

I also wrote a 16-bytes-at-a-time SSE byte counter (it's unlikely to be
optimal), which is about 20% faster than the Go on that same 1.3GB MP4
file and 3-4 times faster on the XML, so everyone has room for
improvement. :) Code: https://gist.github.com/huonw/b6bfe4ad3623b6c37717


Btw, the Go is significantly slower when the file contains a lot of
newlines, e.g. this file is 75% \n's

yes 'foo' | head -100000000 > newlines.txt

The Rust built with -O takes ~0.4s for me, and the Go takes 2.6s. The
crossover point on my computer for very regular files like that is about
1 \n in every 30 bytes.


Huon
Post by Petr Novotnik
Hello folks,
recently I've been playing around with Rust and I'm really impressed.
I like the language a lot!
While writing a program to count the number of lines in a file, I
realized it ran twice as slow as an older program I wrote in Go some
time ago. After more experiments, I've figured out that Go's speed
boost comes from an optimized function in its standard library
utilizing SSE instructions, in particular the function bytes.IndexByte
(http://golang.org/pkg/bytes/#IndexByte).
I was wondering whether Rust would ever provide such optimized
implementations as well as part of its standard library or whether
it's up to developers to write their own versions of such functions.
http://pastebin.com/NfFgMNGe
http://pastebin.com/4tiLsRpu
Pete.
_______________________________________________
Rust-dev mailing list
Rust-dev at mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
Petr Novotnik
2014-09-15 16:52:26 UTC
Permalink
Huon,

your optimized version is blazing fast! :)

I tried again all versions of the program on different kinds of input.
It really seems that the performance difference I initially experienced
is due to the character of my log file (100+ chars per line) - here,
Go's optimized bytes.IndexByte
(http://golang.org/src/pkg/runtime/asm_amd64.s#L1283) simply dominates
the workload.

Is there any chance for such specialized/optimized implementations to
end up in the std. library?

Many thanks so far,
Pete.
Post by Huon Wilson
Are you compiling with optimisations? (Pass the -O flag to rustc.) For
me, the Go and the optimised Rust are pretty similar, with the Go a few
percent faster on a 1.3GB MP4 file, and the two languages
indistinguishable on a 30MB XML file.
Also, Rust prefers iterators to manual indexing since it avoids
for b in buf.slice_to(n).iter() {
if *b == b'\n' {
lines += 1;
}
}
However, in this case, LLVM optimises that to a large chunk of SSE
instructions https://gist.github.com/huonw/37cdd3dea3518abdb1c4 which
seem to be reading only 2 bytes per memory access and only 4 bytes per
tick of the loop (i.e. it's rather unoptimal); this means that the
iterator version is only minutely faster than the naive range(0, n) loop
in this case.
I also wrote a 16-bytes-at-a-time SSE byte counter (it's unlikely to be
optimal), which is about 20% faster than the Go on that same 1.3GB MP4
file and 3-4 times faster on the XML, so everyone has room for
improvement. :) Code: https://gist.github.com/huonw/b6bfe4ad3623b6c37717
Btw, the Go is significantly slower when the file contains a lot of
newlines, e.g. this file is 75% \n's
yes 'foo' | head -100000000 > newlines.txt
The Rust built with -O takes ~0.4s for me, and the Go takes 2.6s. The
crossover point on my computer for very regular files like that is about
1 \n in every 30 bytes.
Huon
Post by Petr Novotnik
Hello folks,
recently I've been playing around with Rust and I'm really impressed.
I like the language a lot!
While writing a program to count the number of lines in a file, I
realized it ran twice as slow as an older program I wrote in Go some
time ago. After more experiments, I've figured out that Go's speed
boost comes from an optimized function in its standard library
utilizing SSE instructions, in particular the function bytes.IndexByte
(http://golang.org/pkg/bytes/#IndexByte).
I was wondering whether Rust would ever provide such optimized
implementations as well as part of its standard library or whether
it's up to developers to write their own versions of such functions.
http://pastebin.com/NfFgMNGe
http://pastebin.com/4tiLsRpu
Pete.
_______________________________________________
Rust-dev mailing list
Rust-dev at mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
_______________________________________________
Rust-dev mailing list
Rust-dev at mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
Loading...