Publications

Origins of the Modern MOOC (xMOOC)

Online education has been around for decades,with many universities offering online courses to a small, limited audience.What changed in 2011 was scale and availability, when Stanford University offered three courses free to the public, each garnering signups of about 100,000 learners or more.The launch of these three courses, taught by Andrew Ng, Peter Norvig, Sebastian […]

Mechatronic design of an integrated robotic hand

Historically, robotic hand research has tended to focus on two areas: severely underactuated hands, and high-degree-offreedom fully actuated hands. Comparatively little research has been done in between those spaces. Furthermore, despite the large number of robotic hand designs that have been proposed in the past few decades, very few robot hands are available for purchase […]

Deep Learning with COTS HPC Systems

Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud- like computing infrastructure and thousands of CPU cores. In this paper, we present technical details […]

Parsing with Compositional Vector Grammars

Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge […]

Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors

Knowledge bases provide applications with the beneﬁt of easily accessible, systematic relational knowledge but often suffer in practice from their incompleteness and lack of knowledge of new entities and relations. Much work has focused on building or extending them by ﬁnding patterns in large unannotated text corpora. In contrast, here we mainly aim to complete […]

An Experimental and Theoretical Comparison of Model Selection Methods

In the model selection problem, we must balance the complexity of a statistical model with its goodness of fit to the training data. This problem arises repeatedly in statistical estimation, machine learning, and scientific inquiry in general. Instances of the model selection problem include choosing the best number of hidden nodes in a neural network, […]

An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering

Assignment methods are at the heart of many algorithms for unsupervised learning and clustering — in particular, the well-known -means and Expectation-Maximization (EM) algorithms. In this work, we study several different methods of assignment, including the Õhard” assignments used by -means and the Õsoft” assignments used by EM. While it is known that -means minimizes […]

Preventing “Overfitting” of Cross-Validation data

Suppose that, for a learning task, we have to select one hypothesis out of a set of hypotheses (that may, for example, have been generated by multiple applications of a randomized learning algorithm). A common approach is to evaluate each hypothesis in the set on some previously unseen cross-validation data, and then to select the […]

Improving Text Classification by Shrinkage in a Hierarchy of Classes

When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. This paper shows that the accuracy of a naive Bayes text classifier can be significantly improved by taking advantage of a hierarchy of classes. We adopt an […]

Applying Online-search to Reinforcement Learning

In reinforcement learning it is frequently necessary to resort to an approximation to the true optimal value function. Here we investigate the benefits of online search in such cases. We examine “local” searches, where the agent performs a finite-depth lookahead search, and “global” searches, where the agent performs a search for a trajectory all the […]

On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples

We consider feature selection in the “wrapper” model of feature selection. This typically involves an NP-hard optimization problem that is approximated by heuristic search for a “good” feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorous bound for generalization error under feature selection. The search heuristics typically used […]

A sparse sampling algorithm for near-optimal planning in large Markov decision processes

An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochas-tic environments with very large or even infi-nite state spaces, traditional planning and reinforcement learning algorithms are often in-applicable, since their running time typically scales […]