3 min read
kdb+: Iterators for Beginners’
We’ve had several requests for a blog on kdb+ iterators, so here it is. I was part of a team of developers who were given the opportunity to learn kdb+ and one of the challenges that we faced was learning to think in terms of vectors and lists, instead of loops.
Iterators in kdb+ (recently re-coined from “adverbs”) allow programmers to modify the behaviour of functions in order to gain the desired effect of looping. At first, this may be startling to a programmer who is used to traditional programming languages, but when kdb+ iterators are mastered they yield cleaner, more efficient and less error prone code.
As Jeff Borror says in “Q for Mortals” – “Proficiency in the use of iterators is one skill that separates q pretenders from q contenders”. Coming from a java background, here’s a simple cheat sheet that helped me get my head around this.
It should be noted that kdb+ does provide the “do” and “while” functions as part of the language which emulate a traditional “for” and “while” loop respectively but, once you gain a decent understanding of iterators, they should rarely (if ever) be needed. It is also worth noting that the “traditional” languages are moving more towards functional style programming, such as Java’s stream library introduced in Java 8.
Each
- apply a function to each element in a collection/list
- like “foreach” loop in other languages
- apply a function to multiple collections of arguments
In the example below we want to find out the average of each list of prices.
Each-both (‘)
The example below shows how to join a list of symbols in one list with a corresponding price in another list, to create a simple matrix of two lists.
q)tickerSyms:`APPL`AMZN`GOOG`MSFT
q)prices:116.85 3221.66 1556.0 208.64
q)tickerSyms,'prices
`APPL 116.85
`AMZN 3221.66
`GOOG 1556f
`MSFT 208.64
NOTE: You can also use the formal syntax below, but they do the same thing:
q),'[tickerSyms;prices]
`APPL 116.85
`AMZN 3221.66
`GOOG 1556f
`MSFT 208.64
The count of x and y must be equal, or you will get a ‘length error. Alternatively, x or y can be supplied as an atom (single-element) and this will be extended to match the length of the opposing argument.
Each-prior (‘:)
- allows the interpreter to look back at the previous element in the list
- used with functions that take two arguments, with x being the “current” value and y being the “previous”
The example below uses each prior to compute the percentage change between a list of prices.
q)prices:103.2 103.21 103.22 103.15 102.15 101.95
q)\P 2 // Set display precision to 2 s.f.
q){100*(x-y)%y}':[prices]
0n 0.0097 0.0097 -0.068 -0.97 -0.2
NOTE: x value will always be null on the first pass as there is no previous element in the list.
Over (/) and Scan (\)
Scan and Over modify functions to make them cumulative (i.e. the result of an iteration is passed forward to the next).
- Both work in the same way, but their output results differ:
- Scan: returns result of each iteration
- Over: returns final result only
- Both behave slightly differently depending on the inputs supplied
Over (/) and Scan (\) – Single-argument Functions
For/Do
In the Fibonacci example below the initial input is a list (0, 1). The function sums the last 2 elements of the list and appends it to the end of the list. The new list is used as the input in the next iteration, this is repeated 6 more times. The scan example below returns the result of each iteration, making it easier to see what the iterator is doing, this can be helpful when debugging.
q){x,sum -2#x}\[7;0 1]
0 1
0 1 1
0 1 1 2
0 1 1 2 3
0 1 1 2 3 5
0 1 1 2 3 5 8
0 1 1 2 3 5 8 13
0 1 1 2 3 5 8 13 21
When using over only the result of the final iteration is returned.
q){x,sum -2#x}/[7;0 1]
0 1 1 2 3 5 8 13 21
While
In the example below an initial value of 1 is passed into a function that triples the input. The function will repeat, passing the result onto the next iteration as long as the “while_condition” is true. The scan example below shows each intermediary.
q){x*3}\[{x<15};1]
1 3 9 27
Again, using over will only return the result of the final iteration.
q){x*3}/[{x<15};1]
27
Converge
If the number of iterations or a condition is not supplied, the function will repeat until the result converges. In other words, the function will run using the output of the previous iteration until the ‘x’ value repeats itself. Below example shows a multi-level nested list in kdb+, we can use over (/) in conjunction with raze to converge on a flat list.
q)nestedList:(1;(2;(3;4;5));(6;7);8;enlist 9)
q)nestedList
1
(2;3 4 5)
6 7
8
,9
q)raze nestedList
1
2
3 4 5
6
7
8
9
q)raze raze nestedList
1 2 3 4 5 6 7 8 9
q)raze/[nestedList]
1 2 3 4 5 6 7 8 9
q)raze/[nestedList]~raze raze nestedList
1b
But instead of having to use “raze” multiple times to flatten the nested list, we can instead use raze combined with /.
Raze will keep executing until it has flattened the entire nested structure. In this example it will be execute 3 times but with complex data it can execute many times more.
q)megaNestedList:(1;(2;(3;(4;(5;(6;(7;(8;(9;(10))))))))));
q)raze/[megaNestedList]
1 2 3 4 5 6 7 8 9 10
Over (/) and Scan (\) – Multi-parameter Functions
When using the Scan/Over iterators with functions that accept multiple arguments, the iterator behaves slightly different than with single-argument functions. I would describe this as similar to foreach in other languages, where an initial seed value is provided.
The initial value of 0 is passed to the function as x, and the first element of the list is used as y. The result is passed forward to the next iteration as the x value, and the next y value in the list is used and so on. See each intermediate result using the scan iterator below:
q){x+y}\[0;1 2 3 4]
1 3 6 10
For the same example using over instead, we get the final result only:
q){x+y}/[0;1 2 3 4]
10
Same idea is true for functions with more arguments:
In this example of scan/over we can see the atomic z argument provided being extrapolated throughout each iteration, similar to previous explained with each-both.
q){x+y+z}\[5 6;1 2 3;10]
16 17
28 29
41 42
q){x+y+z}/[5 6;1 2 3;10]
41 42
NOTE: z argument in this example is extended to the same depth of the largest collection (y) – this only happens when passing an atom/scalar.