Translating DNN to BigQuery SQL

Hi,

Is there project translating a trained DNN into an SQL query to run classification inside RDBMS query without taking data out of DB?

I implemented a proof of concept algorithm, which takes network weights/biases trained in PyTorch and list of activation functions (ReLU, Sigmoid) and produces a pure SQL subquery, which plays a role of SQL predicate.

Performance in case of BigQuery must be lower than running DNN as usual at least because of missing float types smaller than 8 bytes and absence dedicated dot operator.

I took the model from a popular pima indian diabetes exercise as an example:

model = nn.Sequential(
    nn.Linear(8, l1outs),
    nn.ReLU(),
    nn.Linear(l1outs, l2outs),
    nn.ReLU(),
    nn.Linear(l2outs, 1),
    nn.Sigmoid()
)

Equivalent SQL query selecting observations with diabetes probability +50%:

SELECT *
FROM pima_observations AS pima_observations
WHERE ((SELECT
            ARRAY_AGG((1 / (1 + EXP((0 - ((("b" + ("w0" * "l1out"[(0 + 1)])) + ("w1" * "l1out"[(1 + 1)])) + ("w2" * "l1out"[(2 + 1)]))))))) AS "nnOut"
        FROM UNNEST(ARRAY [-1.4292612075805664], ARRAY [0.46713218092918396], ARRAY [0.31342944502830505], ARRAY [8.353928476572037e-2]) AS "w"("b", "w0", "w1", "w2")
        CROSS JOIN (SELECT
                        ARRAY_AGG(greatest(0, ((((("b" + ("w0" * "l0out"[(0 + 1)])) + ("w1" * "l0out"[(1 + 1)])) + ("w2" * "l0out"[(2 + 1)])) + ("w3" * "l0out"[(3 + 1)])) + ("w4" * "l0out"[(4 + 1)])))) AS "l1out"
                    FROM UNNEST(ARRAY [-0.41208043694496155, -0.8970054984092712, -7.326607406139374e-2], ARRAY [0.2061747908592224, -0.11880557984113693, 8.147265017032623e-2], ARRAY [-0.16841396689414978, 0.4053782522678375, -0.4211576282978058], ARRAY [0.2145361751317978, -1.4171559363603592e-2, 0.2435821145772934], ARRAY [0.3863857090473175, -8.417066186666489e-2, -0.4402327537536621], ARRAY [2.057105116546154e-2, -0.984534740447998, 2.4828349705785513e-3]) AS "w"("b", "w0", "w1", "w2", "w3", "w4")
                    CROSS JOIN (SELECT
                                    ARRAY_AGG(greatest(0, (((((((("b" + ("w0" * pregnant)) + ("w1" * glucose)) + ("w2" * pressure)) + ("w3" * skin)) + ("w4" * insulin)) + ("w5" * bmi)) + ("w6" * pedigree)) + ("w7" * age)))) AS "l0out"
                                FROM UNNEST(ARRAY [0.11652952432632446, -0.8223007917404175, -0.12624654173851013, 0.2801574766635895, 0.23825359344482422], ARRAY [-0.1514103263616562, 1.0374555587768555, -0.3030771315097809, -0.3144024610519409, 0.5980736613273621], ARRAY [-0.2902229130268097, 0.22740879654884338, -0.21591763198375702, -0.21931664645671844, -1.4002146199345589e-2], ARRAY [-0.25702202320098877, -0.34332337975502014, 0.26572340726852417, 8.408840000629425e-2, -0.255145400762558], ARRAY [-0.30213668942451477, 1.0697017423808575e-2, -0.23025500774383545, -3.6737680435180664e-2, -0.32139626145362854], ARRAY [0.15019038319587708, -8.385843830183148e-4, -0.15994291007518768, -0.14777924120426178, -1.51359923183918e-2], ARRAY [-8.555655926465988e-2, -1.5387434512376785e-2, -0.2668708860874176, -0.32633376121520996, -0.31143277883529663], ARRAY [0.2731674313545227, 0.23497509956359863, 3.350885212421417e-2, -0.26817217469215393, 5.861988756805658e-4], ARRAY [2.377433329820633e-2, 7.723002135753632e-2, -0.33851397037506104, 0.20330695807933807, 0.5181198120117188]) AS "w"("b", "w0", "w1", "w2", "w3", "w4", "w5", "w6", "w7")) AS "l1") AS "l2")[1] > 0.5)

I extended the topic above in the article with step-by-step instructions.

1 Like