From 746a0e3e540428482758de0418bcb7931f43d712 Mon Sep 17 00:00:00 2001 From: Justin Ethier Date: Sat, 3 Jun 2017 18:21:37 -0400 Subject: [PATCH] WIP --- tail-blog-post-notes.txt | 173 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+) create mode 100644 tail-blog-post-notes.txt diff --git a/tail-blog-post-notes.txt b/tail-blog-post-notes.txt new file mode 100644 index 00000000..6de5d7b0 --- /dev/null +++ b/tail-blog-post-notes.txt @@ -0,0 +1,173 @@ +Initial speed test for `tail` - Cyclone does very poorly compared to other Schemes: + + [justin@justin-pc r7rs-benchmarks]$ ./bench cyclone tail + + Testing tail under Cyclone + Including postlude /home/justin/Documents/r7rs-benchmarks/src/Cyclone-postlude.scm + Compiling... + Running... + Running tail:25 + Elapsed time: 32.261715 seconds (32) for tail:25 + +!CSVLINE!+cyclone-0.5.1,tail:25,32.261715 + + real 0m32.379s + user 0m31.783s + sys 0m0.513s + + +Change `Makefile.config` in cyclone-bootstrap to do profiling. The `-O2` option in the lines below are replaced with `-g -pg`: + + CFLAGS ?= -g -pg -fPIC -rdynamic -Wall -Iinclude -L. + COMP_CFLAGS ?= -g -pg -fPIC -rdynamic -Wall -I$(PREFIX)/include -L$(PREFIX)/lib + +Then Cyclone is rebuilt: + + [justin@justin-pc cyclone-bootstrap]$ sudo make clean ; ./install.sh + +Once this is done a `gmon.out` file will be generated each time Cyclone or a compiled Cyclone program is executed. + + +Set up for running the `tail` benchmark directly: + + [justin@justin-pc r7rs-benchmarks]$ cd /tmp/larcenous/Cyclone/ + [justin@justin-pc Cyclone]$ cp -r ~/Documents/r7rs-benchmarks/inputs/ . + [justin@justin-pc Cyclone]$ mkdir outputs + +Recompile `tail.scm` to get a version with profiling, and then run it to generate a `gmon.out` file: + + [justin@justin-pc Cyclone]$ cyclone tail.scm + [justin@justin-pc Cyclone]$ ./tail < inputs/tail.input + +[justin@justin-pc Cyclone]$ ls +gmon.out inputs outputs tail tail.c tail.o tail.scm +[justin@justin-pc Cyclone]$ gprof ./tail gmon.out > report.txt + +Let's examine the start of `report.txt` to see the functions that are taking up the most of the program's runtime: + + Flat profile: + + Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls s/call s/call name + 99.19 61.09 61.09 2331598 0.00 0.00 Cyc_length + 0.10 61.15 0.06 11221 0.00 0.00 gc_minor + 0.08 61.20 0.05 777200 0.00 0.00 __lambda_3 + 0.08 61.25 0.05 777097 0.00 0.00 __lambda_300 + 0.03 61.27 0.02 4664280 0.00 0.00 Cyc_st_add + 0.03 61.29 0.02 1562923 0.00 0.00 gc_thr_add_to_move_buffer + 0.03 61.31 0.02 777572 0.00 0.00 __lambda_278 + 0.03 61.33 0.02 777178 0.00 0.00 dispatch + 0.03 61.35 0.02 777147 0.00 0.00 __lambda_297 + 0.03 61.37 0.02 777135 0.00 0.00 __lambda_368 + 0.03 61.39 0.02 59 0.00 0.00 gc_empty_collector_stack + 0.02 61.41 0.02 5441402 0.00 0.00 Cyc_is_pair + 0.02 61.42 0.02 ck_pr_cas_char + 0.02 61.43 0.01 3109193 0.00 0.00 Cyc_is_null + 0.02 61.44 0.01 3109170 0.00 0.00 Cyc_car + +Well that's interesting, `tail` is spending all of its time computing `Cyc_length`. + + +Run gdb: + + [justin@justin-pc Cyclone]$ cat inputs/tail.input + 25 + "inputs/bib" + "outputs/tail.output" + ignored + +A compiled Cyclone program is just a regular C program so we can use `gdb` to debug it: + + [justin@justin-pc Cyclone]$ gdb ./tail + GNU gdb (GDB) 7.12.1 + Copyright (C) 2017 Free Software Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later + This is free software: you are free to change and redistribute it. + There is NO WARRANTY, to the extent permitted by law. Type "show copying" + and "show warranty" for details. + This GDB was configured as "x86_64-pc-linux-gnu". + Type "show configuration" for configuration details. + For bug reporting instructions, please see: + . + Find the GDB manual and other documentation resources online at: + . + For help, type "help". + Type "apropos word" to search for commands related to "word"... + Reading symbols from ./tail...done. + (gdb) run + Starting program: /tmp/larcenous/Cyclone/tail + [Thread debugging using libthread_db enabled] + Using host libthread_db library "/usr/lib/libthread_db.so.1". + [New Thread 0x7ffff6031700 (LWP 1850)] + 25 + "inputs/bib" + "outputs/tail.output" + ^C + Thread 1 "tail" received signal SIGINT, Interrupt. + 0x00007ffff6fd44ed in read () from /usr/lib/libc.so.6 + (gdb) break Cyc_length + Breakpoint 1 at 0x53b145: file runtime.c, line 1713. + (gdb) c + Continuing. + ignored + +After continuing a few times, the code breaks here: + +#0 Cyc_length (data=0x7cd4e0, l=0x7ffffffbc660) at runtime.c:1713 +#1 0x00000000004af96b in __lambda_368 (data=0x7cd4e0, argc=3, _=0x7ffff6635340, k_734906=0x7ffff63331c0, f_731928=0x7ffffffb2dd0, lis1_731927=0x7ffffffbc660, + lists_731926_raw=0x7ca420 ) at scheme/base.c:22680 +#2 0x00000000004afde1 in __lambda_367 (data=0x7cd4e0, argc=1, self_736727=0x7ffffff9d4c0, r_734921=0x7ca420 ) at scheme/base.c:22703 +#3 0x00000000004b7aa7 in __lambda_299 (data=0x7cd4e0, argc=3, _=0x7ffff66350e0, k_735061=0x7ffffff9d4c0, char_731994=0x2a, port_731993_raw=0x7ffff6333200) at scheme/base.c:23706 +#4 0x000000000055cf5e in do_dispatch (data=0x7cd4e0, argc=3, func=0x4b7523 <__lambda_299>, clo=0x7ffff66350e0, b=0x7ffffff9ccf0) at dispatch.c:6 + +Opening the source code for `scheme/base.c` you can see this is in the `for-each` function: + + static void __lambda_368(void *data, int argc, closure _,object k_734906, object f_731928, object lis1_731927, object lists_731926_raw, ...) { + load_varargs(lists_731926, lists_731926_raw, argc - 3); + Cyc_st_add(data, "scheme/base.sld:for-each"); + if( (boolean_f != Cyc_is_null(lis1_731927)) ){ + +And in `scheme/base.sld` you can see where `length` is being called: + + (define (for-each f lis1 . lists) + (if (not (null? lis1)) + (if (pair? lists) + (let recur ((lists (cons lis1 lists))) + (receive (cars cdrs) (%cars+cdrs lists) + (if (pair? cars) + (begin + (apply f cars) + (recur cdrs))))) + ;; Fast path. + (if (eq? 1 (length lis1)) + (f (car lis1)) + (begin (f (car lis1)) + (for-each f (cdr lis1))))))) + +The code can be simplified to make it more obvious what is going on: + + (define (for-each f lis1 . lists) + (if (not (null? lis1)) + (if (eq? 1 (length lis1)) + (f (car lis1)) + (begin (f (car lis1)) + (for-each f (cdr lis1)))))) + +Basically on every iteration of `for-each` the code is calling `length` to see if `f` can be called directly. Well, that's not good - the main `for-each` loop itself has a time complexity of O(N). The runtime depends directly on the length of `lis1`. But each time `length` is called it must examine the entire contents of `lis1`, which is another O(N) operation. Combined with the outer loop this raises the overall time complexity to O(N^2) - which can really add up for large values of N. + + +This reminds me of an old article from Joel Spolsky that talks about the same issue with respect to strings: +http://global.joelonsoftware.com/English/Articles/Interviewing.html + +Is their function fast? Look at how many times they call strlen. I've seen O(n^2) algorithms for strrev when it should be O(n), because they are calling strlen again and again in a loop. + + +Anyway the solution is to check directly for null, instead of scanning the whole string: + + (if (null? (cdr lis1)) + (f (car lis1)) + (begin (f (car lis1)) + +After rebuilding with this fix we can re-run the tail benchmark. Remember the older code took over 32 seconds to finish: + +