Enable JavaScript to see more content
Recently hyped ML content linked in one simple page
Sources: reddit/r/{MachineLearning,datasets}, arxivsanity, twitter, kaggle/kernels, hackernews, awesomedatasets, sota changes
Made by: Deep Phrase HK Limited
1


[1910.11141] Automatically Batching ControlIntensive Programs for Modern Accelerators
We demonstrated the efficacy of the method by mechanically batching a (recursive) implementation of the No UTurn Sampler, obtaining speedups varying (with batch size) up to three orders of magnitude
Abstract: We present a general approach to batching arbitrary computations for
accelerators such as GPUs. We show ordersofmagnitude speedups using our
method on the No UTurn Sampler (NUTS), a workhorse algorithm in Bayesian
statistics. The central challenge of batching NUTS and other Markov chain Monte
Carlo algorithms is datadependent control flow and recursion. We overcome this
by mechanically transforming a singleexample implementation into a form that
explicitly tracks the current program point for each batch member, and only
steps forward those in the same place. We present two different batching
algorithms: a simpler, previously published one that inherits recursion from
the host Python, and a more complex, novel one that implemenents recursion
directly and can batch across it. We implement these batching methods as a
general program transformation on Python source. Both the batching system and
the NUTS implementation presented here are available as part of the popular
TensorFlow Probability software package.
‹Figure 2. Syntax of locally batchable programs. We use [·] to denote ordered lists. The symbols x, y range over variable names, and i, j index blocks within the same function. We present a unary syntax for succinctness; the nary generalization is standard. (Local Static Autobatching)Figure 4. Syntax of program counter batchable programs. We use [·] to denote ordered lists. The symbols x, y range over variable names, and i, j index blocks of the program. This syntax is also unary for succinctness. The difference from locally autobatched programs (Figure ??) is that all control flow graphs are merged, and Call operations are replaced with explicit stack manipulation operations (Push and Pop for data and PushJump and Return for the program counter). (Program Counter Autobatching)Figure 5. Performance of autobatched No UTurn Sampler on the Bayesian logistic regression problem (100 latent dimensions, 10,000 data points). The batch size refers to the number of chains running in tandem. The reported gradients are the total across all chains, excluding waste due to synchronization. We compare the performance of program counter autobatching compiled with XLA to our local static autobatching executed in TensorFlow’s Eager mode. We also include two baselines. One is the same program executed directly in Eager mode without autobatching (perforce running one batch member at a time). The other is the widely used and welloptimized Stan implementation of (a variant of) the same NUTS algorithm. Batching provides linear scaling on all tested platforms, until the underlying hardware saturates. See text for details of the experimental setup. (Program Counter Autobatching)Figure 6. Utilization of batch gradient computation on the correlated Gaussian test problem. Utilization is less than 100% above 1 batch member because different batch members choose to use different numbers of gradients at each trajectory. We can see from the localstatic line that on this problem, the longest trajectory that NUTS chooses at any iteration tends to be about four times longer than the average. Program counter autobatching recovers more utilization by batching gradients across 10 consecutive NUTS trajectories, instead of having to synchronize on trajectory boundaries. (Experiments)›



Related: TFIDF
[1903.01855] TensorFlow Eager: A MultiStage, PythonEmbedded DSL for Machine Learning[1603.04467] TensorFlow: LargeScale Machine Learning on Heterogeneous Distributed Systems[1709.08357] Generating Functionally Equivalent Programs Having NonIsomorphic ControlFlow Graphs

Related: TFIDF
[1903.01855] TensorFlow Eager: A MultiStage, PythonEmbedded DSL for Machine Learning[1603.04467] TensorFlow: LargeScale Machine Learning on Heterogeneous Distributed Systems[1709.08357] Generating Functionally Equivalent Programs Having NonIsomorphic ControlFlow Graphs