First of all, thanks for this awesome explanation. I found it very easy to read, and I think I finally grasped the concept of gradient boosting. All the other explanations I've seen just hit you with the formulas without giving an intuition.
I just didn't get your comparison with random forest in "Draft 5". The point in using only some samples per tree and only some features per node, in random forests, is that you'll have a lot of trees voting for the final decision and you want diversity among those trees (correct me if I'm wrong here).
From what I understood of gradient boosting, you have a tree with only one node/feature voting for the direction of the gradient in each step, and you don't reuse the same feature in a future step. If you don't have several trees voting in parallel to decide one gradient direction, whats the point of limiting the features+samples that a tree has access to?

