How to implement efficient element-wise 'ax+b' function

Hi Alex!

I added the following to my timing script:

# try "warming up" jtwb
print ('call jtwb() twice to "warm up"')
yjt1 = jtwb (w, b, x)
yjt2 = jtwb (w, b, x)
yjt1 = None
yjt2 = None

(So there are now three calls to jtwb() before the actual timing loop.)

This changed the results for the jtwb timing only modestly:

jtwb time: 0.010964112281799316
(was:      0.011733412742614746)

And yjt (the result of the jtwb() call) still agrees exactly with ywb.

Best.

K. Frank