An introduction to the theory, design, and implementation of text-based information systems. Topics include text analysis, retrieval models (e.g., Boolean, vector space, probabilistic), text categorization, text filtering, clustering, retrieval system design and implementation, and applications to web information management. The course objective is to provide a thorough background on the concepts, algorithms, and applications of text retrieval. Students also learn how to implement a retrieval system.
Introduction to the theory and methodology of the science of linguistics with special reference to phonology and syntax.
Introduces the field of natural language processing and computational linguistics. Topics include finite-state methods, parsing, probabilistic methods, machine learning in NLP, computational semantics and applications of NLP technology. The course is mostly about concepts rather than programming, though some programming assignments will be given.
What would it take for a computer to read, and understand, a newspaper? In order to construct a representation of the meaning of a text, it is necessary to parse its sentences, i.e. identify their grammatical structures. Although compilers contain parsers too, natural languages are both more complex and more ambiguous than programming languages. Computational linguists have developed a number of expressive grammar formalisms that are intended to capture these additional complexities, as well as parsing models that use machine-learning techniques to find the most likely structure of a sentence. In order to achieve wide coverage (i.e. be able to deal with actual newspaper text) and high accuracy, these models require significant amounts of labeled training data -- so-called treebanks, or corpora of sentences that were annotated with the correct analysis. In order to create a parser for a specific formalism, it is thus often necessary to first translate a treebank into the desired formalism. This course will give an overview over the most commonly used formalisms in natural language processing and current research on grammar extraction and wide-coverage parsing.
Algorithms and models for grammar induction, parsing and machine translation. The first part of the course will give an overview of the grammar formalisms, statistical models, and search algorithms used in natural language parsing and grammar induction. The second part of the course will show how many of these ideas can be extended and applied to machine translation. We will also look at ways to implement and train very large-scale NLP systems (using MapReduce/Hadoop, Bloom filters, etc.) The course will consist of a mixture of lectures and seminar-style presentations done by students
Introduction to aspects of the tools and methods of studies in speech and natural language processing (NLP), with a focus on programming for NLP and speech applications, statistical methods for data analysis, and tools for displaying and manipulating speech data.
Introduces problems of document representation, information need specification, and query processing. Describes the theories, models, and current research aimed at solving those problems. Primary focus is on bibliographic, text, and multimedia records.
Mathematical models of linguistic structure and their implementation in computational algorithms used in automatic speech understanding and speech synthesis. Statistical and automata theoretic techniques are studied allowing a quantitative description of acoustic-phonetics, phonology, phonotactics, lexicons, syntax, and semantics. The methods are used to build components of a speech understanding system.
Introduction to mathematical probability; includes the calculus of probability, combinatorial analysis, random variables, expectation, distribution functions, moment-generating functions, and central limit theorem.
Rigorous introduction to a wide range of topics in optimization, including a thorough treatment of basic ideas of linear programming, with additional topics drawn from numerical considerations, linear complementarity, integer programming and networks, polyhedral methods. 4 hours of credit requires approval of the instructor and department with completion of additional work of substance.
Iterative and analytical solutions of constrained and unconstrained problems of optimization; gradient and conjugate gradient solution methods; Newton's method, Lagrange multipliers, duality and the Kuhn-Tucker theorem; and quadratic, convex, and geometric programming. 4 hours of credit requires approval of the instructor and department with completion of additional work of substance.
Mathematical models for channels and sources; entropy, information, data compression, channel capacity, Shannon's theorems, rate-distortion theory.
Various topics, such as ridge regression; robust regression; jackknife, bootstrap, cross-validation and resampling plans; E-M algorithm; projection pursuit; all with a strong computational flavor. May be repeated if topics vary.
Distributions, transformations, order-statistics, exponential families, sufficiency, delta-method, Edgeworth expansions; uniformly minimum variance unbiased estimators, Rao-Blackwell theorem, Cramer-Rao lower bound, information inequality; equivariance
This is an introductory course in functional analysis and infinite dimensional optimization, with applications in least-squares estimation, nonlinear programming in Banach spaces, optimal and robust control of lumped and distributed parameter systems, and differential games.
Introductory description of the major subjects and directions of research in artificial intelligence; topics include AI languages (LISP and PROLOG), basic problem solving techniques, knowledge representation and computer inference, machine learning, natural language understanding, computer vision, robotics, and societal impacts.
Theory and basic techniques in machine learning. Presents the main theoretical paradigms and key ideas developed in machine learning in the context of applications such as natural language and text processing, computer vision, data mining, adaptive computer systems and others. Reviews several supervised and unsupervised learning approaches: methods for learning linear representations; on-line learning, Bayesian methods; decision-trees; features and kernels; clustering and dimensionality reduction.
Introduction to the central learning frameworks and techniques that have emerged in the field of natural language processing and found applications in several areas in text and speech processing: from information retrieval and extraction, through speech recognition to syntax, semantics and language understanding related tasks. Presents the theoretical paradigms - learning theoretic, probabilistic, and information theoretic - and the relations among them, as well as the main algorithmic techniques developed within these and in key natural language applications.
This course provides an introduction to modern techniques for statistical analysis of complex and massive data. Examples of these are regression and classification, nonparametric function estimation, model selection, regularization, dimensionality reduction, and clustering analysis. Applications are discussed as well as computation and theoretical foundations.
As the first introductory course for databases, this course studies the fundamentals of using and implementing relational database management systems. First, from the user perspective (i.e., how to use a database system), the course will discuss conceptual data modeling, the relational and other data models, database schema design, relational algebra, and the SQL query language. Further, from the system perspective (i.e., how to design and implement a database system), the course will study data representation, indexing, query optimization and processing, and transaction processing.
This course introduces the basic concepts, techniques, and systems of data warehousing and data mining, including data mining concepts, data preprocessing, data warehousing and data generalization, data cube and OLAP, mining frequent patterns, association and correlation, classification, clustering, and data mining applications.
Advanced course which introduces data mining concepts, principles and algorithms. Course will cover: introduction, data warehouse and OLAP technology for data mining, data preprocessing, primitives, languages, system architectures for data mining, concept description, association analysis, sequential pattern analysis, classification and prediction, cluster analysis, mining complex types of data, data mining applications and trends in data mining.
Examines information processing approaches to computer vision, and algorithms and architectures for artificial intelligence and robotics systems capable of vision: inference of three-dimensional properties of a scene from its images, such as distance, orientation, motion, size and shape, acquisition and representation of spatial information for navigation and manipulation in robotics.
Visual scene understanding is the ability to infer general principles and current situations from imagery in a way that helps achieve goals. Goals are highly varied, depending to a large extent on the agent. In biological systems, vision is usually just one piece of the puzzle, but is often a particularly important one, as it allows us to make detailed inferences about distant surfaces and objects.
Formal models and concepts in vision and language; detailed analysis of computer vision, language, and learning problems; relevant psychological results and linguistic systems; and survey of the state of the art in artificial intelligence.
Logic is fundamental to many research areas in Artificial Intelligence. It appears most prominently in Knowledge Representation and Reasoning, but is also key to Natural Language Processing, Machine Learning, and Robotics. This course will cover methods in artificial intelligence that are based on results in logic. This class will cover techniques, formulations, and problems in knowledge representation and logical AI. Among them, it will discuss representing knowledge about time, space, entities, relationships, default knowledge and inference, beliefs and beliefs over others beliefs, and semantic information. Tools that we will examine will include mainly logics: Description logics, Modal logics, First-Order logic, Set Theory, and ad-hoc formulations and languages. The class will be in the format of paper presentation by students and individual or group research-level projects.
Fundamentals of robotics, rigid motions, homogeneous transformations, forward and inverse kinematics, velocity kinematics, motion planning, trajectory generation, sensing, vision, and control.