Common Lisp Series package

Introduction

Richard C. Water’s SERIES package for Common Lisp introduces the Series data type representing lazy sequences and a set of operations on them. The interesting thing about this package is that it allows to substitute many typical loop constructions with more functional approach but without losing on performance.

Restrictions

The idea is to use CL macrosystem to process the composition of expressions into the call graph and then effectively generate plain loops. Of course this could only be done for a subset of expressions which satisfy the set of limitations on optimizable expressions:

Expressions must be statically analyzable
Expressions must be straight-line computations
Procedures called by expressions must be preorder
Intermediate values in computations must be sequences
Every non-directed data-flow in expression must be on-line

The good thing about these restrictions is that the SERIES package during the code walking phase explicitely checks them and issue a warning if some expression could not be optimized. In this case a developer have an opportunity to manually optimize it or restructure to create optimizable expression.

How to install and use

Just use

(ql:quickload "series")

to install.

In order to use one could either call the functions directly from the series package or use the function setup all necessary redefinitions for defun, let etc to try to automatically analyze the code and to setup helper reader macros:

(eval-when (:compile-toplevel :execute :load-toplevel)
  (series::install))

Performance

In order to test it I decided to compare with a pure functional implementation of some extremely simple function. All comparisons were made with LispWorks 7.0 32bit for Mac OSX. In all cases I compare performance of pure-functional implementation of the algorithm vs implementation with SERIES, and providing comparison with the simple loop implementation.

Sum of the list

First trivial task is to calculate the sum of the squares of numbers in a list. Pure functional implementation:

(defun sum-squares-cl (lst)
  (declare (optimize (safety 0) (speed 3)))
  (reduce #'+ (mapcar (lambda (x) (expt x 2)) lst)))

The pure functional implementation is a reduce over mapped list with applied square function. Pure loop-based implementation looks like this:

(defun sum-squares-loop2 (lst)
  (declare (optimize (safety 0) (speed 3)))
  (loop for x in lst
        summing (expt x 2)))

Here we use summing facility of the loop macro, avoiding introducing accumulator variable.

The implementation using SERIES package:

(defun sum-squares-series2 (lst)
  (declare (optimize (safety 0) (speed 3)))
  (collect-sum (mapping ((x (scan lst))) (expt x 2))))

Here the conversion of the list lst into the series happens with function scan. Series mapping macro defines a list of let like bindings as a first argument and a body of the function as a second. collect-sum allows to collect resuls. More generic implementation below:

(defun sum-squares-series1 (lst)
  (declare (optimize (safety 0) (speed 3)))
  (collect-fn 'integer (lambda () 0) #'+
              (mapping ((x (scan lst)))
                       (expt x 2))))

More generic version uses the collect-fn function - the building block for accumulation functions using series expressions.

Lets generate a test data using using alexandria’s iota function which generates a list of integers up to its argument (thanks to Steve Losh for suggestion)

(defparameter *data* (loop :for i :below 4000 :collect (iota i)))

Simple comparison (gives the following results:

SERIES-TEST 4 > (time (loop for d in *data* do (sum-squares-cl d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-CL D))

User time    =        1.870
System time  =        0.002
Elapsed time =        1.858
Allocation   = 155486812 bytes
0 Page faults
Calls to %EVAL    76022
NIL


SERIES-TEST 5 > (time (loop for d in *data* do (sum-squares-loop2 d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-LOOP2 D))

User time    =        0.928
System time  =        0.001
Elapsed time =        0.908
Allocation   = 59525512 bytes
1 Page faults
Calls to %EVAL    76022
NIL


SERIES-TEST 6 > (time (loop for d in *data* do (sum-squares-series1 d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-SERIES1 D))

User time    =        0.861
System time  =        0.001
Elapsed time =        0.843
Allocation   = 59521180 bytes
0 Page faults
Calls to %EVAL    76022
NIL

SERIES-TEST 7 > (time (loop for d in *data* do (sum-squares-series2 d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-SERIES2 D))

User time    =        0.943
System time  =        0.002
Elapsed time =        0.927
Allocation   = 59518424 bytes
0 Page faults
Calls to %EVAL    76022
NIL

From this results one can see that the pure functional implementation is slowest, while using both versions of the series function gives the same performance as the version with loop macro.

Let’s try to macroexpand the series implementation of this function:

SERIES-TEST 8 > (pprint (macroexpand '(collect-sum (mapping ((x (scan lst))) (expt x 2)))))

(COMMON-LISP:LET* ((#7=#:OUT-86150 LST))
  (COMMON-LISP:LET (#8=#:ELEMENTS-86142
                    #1=#:LISTPTR-86144
                    #6=#:TEMP-86145
                    (#2=#:LIMIT-86146 0)
                    (#3=#:INDEX-86147 -1)
                    #4=#:LSTP-86148
                    #9=#:ITEMS-86152
                    (#5=#:SUM-86139 0))
    (DECLARE (TYPE LIST #1#)
             (TYPE SERIES::VECTOR-INDEX+ #2#)
             (TYPE SERIES::-VECTOR-INDEX #3#)
             (TYPE BOOLEAN #4#)
             (TYPE NUMBER #5#))
    (LOCALLY
      (DECLARE (TYPE ARRAY #6#))
      (IF (SETQ #4# (LISTP #7#))
          (SETQ #1# #7# #6# #())
        (LOCALLY
          (DECLARE (TYPE ARRAY #7#))
          (SETQ #6# #7#)
          (SETQ #2# (SERIES::ARRAY-FILL-POINTER-OR-TOTAL-SIZE #7#))))
      (TAGBODY
       #10=#:LL-86153 (IF #4#
                          (PROGN
                            (IF (ENDP #1#) (GO SERIES::END))
                            (SETQ #8# (CAR #1#))
                            (SETQ #1# (CDR #1#)))
                        (PROGN
                          (INCF #3#)
                          (LOCALLY
                            (DECLARE (TYPE ARRAY #7#) (TYPE SERIES::VECTOR-INDEX #3#))
                            (IF (>= #3# #2#) (GO SERIES::END))
                            (SETQ #8# (THE SERIES::*TYPE* (ROW-MAJOR-AREF #7# #3#))))))
               (SETQ #9# ((LAMBDA (X) (EXPT X 2)) #8#))
               (SETQ #5# (+ #5# #9#))
               (GO #10#)
       SERIES::END)
      #5#)))

Up to the TAGBOBY intstruction follows declarations. As one can see, the actual algorithm implementation starts with TAGBODY, with the iteration label #10 and end of loop SERIES::END.

Sum of the list with conditions.

Now lets evolve this example. The task is to calculate the sum of first 5 numbers, which squares are above 50. For pure functional implementation we need to add the take-first-n function, which will take first N element of the list (basically a wrapper around subseq:

(defun take-first-n (lst n)
  (declare (optimize (safety 0) (speed 3)))
  (let ((l (length lst)))
    (subseq lst 0 (min n l))))

Now the implementation is trivial:

(defun sum-squares-of-5-above-50-cl (lst)
  (declare (optimize (safety 0) (speed 3)))
  (reduce #'+
          (take-first-n 
           (remove-if (lambda (x) (< x 50))
                      (mapcar (lambda (x) (expt x 2)) lst))
           5)))

Obivious problem of this implementation is that we create a 3 intermediate lists: first with mapcar, next with remove-if and last with the take-first-n.

The implementation with SERIES will look similar but will not need the wrapper function:

(defun sum-squares-of-5-above-50-series (lst)
  (declare (optimize (safety 0) (speed 3)))
  (collect-sum
   (subseries 
    (choose-if (lambda (x) (>= x 50))
               (mapping ((x (scan lst)))
                        (expt x 2)))
    0 5)))

The loop-based implementation not as easily readable:

(defun sum-squares-of-5-above-50-naive (lst)
  (declare (optimize (safety 0) (speed 3)))
  (let ((sum 0))
    (loop for x in lst
          for counter = 0
          for y = (expt x 2)
          thereis (< counter 5)
          if (> y 50)
          do
            (incf counter)
            (incf sum y))
    sum))

Here we need intermediate variable - counter - to be able to break the loop. Let’s measure the performance:

SERIES-TEST 9 > (time (loop for d in *data* do (sum-squares-of-5-above-50-cl d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-OF-5-ABOVE-50-CL D))

User time    =        1.233
System time  =        0.187
Elapsed time =        1.856
Allocation   = 194243052 bytes
3006 Page faults
Calls to %EVAL    76022
NIL

SERIES-TEST 10 > (time (loop for d in *data* do (sum-squares-of-5-above-50-series d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-OF-5-ABOVE-50-SERIES D))

User time    =        0.107
System time  =        0.003
Elapsed time =        0.088
Allocation   = 2419888 bytes
37 Page faults
Calls to %EVAL    76022
NIL

SERIES-TEST 11 > (time (loop for d in *data* do (sum-squares-of-5-above-50-naive d)))
Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-OF-5-ABOVE-50-NAIVE D))

User time    =        0.036
System time  =        0.000
Elapsed time =        0.019
Allocation   = 2399620 bytes
0 Page faults
Calls to %EVAL    76022
NIL

As it can be seen the the performance difference more than 10 times between functional and series/loop based implementation.

Conclusion

It is preferrable to use SERIES package instead of pure functional approach then possible since it will keep the code as readable/maintainable without giving up on performance. Obiviously not all algorithms could be implemented with SERIES; the authors of SERIES package claim however that analyzed 80% of loops could be implemented with SERIES.

Update

Updated test results after Steve Losh’s suggestion.