This page intentionally left blank
P, NP, and NPCompleteness The Basics of Computational Complexity The focus of thi...
99 downloads
837 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
This page intentionally left blank
P, NP, and NPCompleteness The Basics of Computational Complexity The focus of this book is the P versus NP Question and the theory of NPcompleteness. It also provides adequate preliminaries regarding computational problems and computational models. The P versus NP Question asks whether finding solutions is harder than checking the correctness of solutions. An alternative formulation asks whether discovering proofs is harder than verifying their correctness. It is widely believed that the answer to these equivalent formulations is positive, and this is captured by saying that P is different from NP. Although the P versus NP Question remains unresolved, the theory of NPcompleteness offers evidence for the intractability of specific problems in NP by showing that they are universal for the entire class. Amazingly enough, NPcomplete problems exist, and hundreds of natural computational problems arising in many different areas of mathematics and science are NPcomplete. oded goldreich is a Professor of Computer Science at the Weizmann Institute of Science and an Incumbent of the Meyer W. Weisgal Professorial Chair. He is an editor for the SIAM Journal on Computing, the Journal of Cryptology, and Computational Complexity and previously authored the books Modern Cryptography, Probabilistic Proofs and Pseudorandomness, the twovolume work Foundations of Cryptography, and Computational Complexity: A Conceptual Perspective.
P, NP, and NPCompleteness The Basics of Computational Complexity ODED GOLDREICH Weizmann Institute of Science
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521192484 © Oded Goldreich 2010 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2010 ISBN13
9780511907937
eBook (EBL)
ISBN13
9780521192484
Hardback
ISBN13
9780521122542
Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or thirdparty internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
to Dana
Contents
List of Figures Preface Overview To the Teacher Notations and Conventions Main Definitions and Results 1 1.1 1.2
1.3
page xi xiii xvii xxi xxv xxvii
Computational Tasks and Models Teaching Notes Representation Computational Tasks 1.2.1 Search Problems 1.2.2 Decision Problems 1.2.3 Promise Problems (an Advanced Comment) Uniform Models (Algorithms) 1.3.1 Overview and General Principles 1.3.2 A Concrete Model: Turing Machines 1.3.2.1 The Actual Model 1.3.2.2 The ChurchTuring Thesis 1.3.3 Uncomputable Functions 1.3.3.1 On the Existence of Uncomputable Functions 1.3.3.2 The Halting Problem 1.3.3.3 A Few More Undecidability Results 1.3.4 Universal Algorithms 1.3.4.1 The Existence of Universal Algorithms 1.3.4.2 A Detour: Kolmogorov Complexity 1.3.5 Time (and Space) Complexity 1.3.6 Oracle Machines and TuringReductions vii
1 2 3 5 5 6 8 8 9 11 12 16 18 18 19 21 22 23 24 26 29
viii
1.4
1.5
2 2.1 2.2
2.3
2.4 2.5 2.6 2.7 2.8
3 3.1
3.2 3.3
Contents
1.3.7 Restricted Models NonUniform Models (Circuits and Advice) 1.4.1 Boolean Circuits 1.4.1.1 The Basic Model 1.4.1.2 Circuit Complexity 1.4.2 Machines That Take Advice 1.4.3 Restricted Models 1.4.3.1 Boolean Formulae 1.4.3.2 Other Restricted Classes of Circuits Complexity Classes Exercises
31 31 32 32 35 36 37 38 39 40 41
The P versus NP Question Teaching Notes Efficient Computation The Search Version: Finding versus Checking 2.2.1 The Class P as a Natural Class of Search Problems 2.2.2 The Class NP as Another Natural Class of Search Problems 2.2.3 The P versus NP Question in Terms of Search Problems The Decision Version: Proving versus Verifying 2.3.1 The Class P as a Natural Class of Decision Problems 2.3.2 The Class NP and NPProof Systems 2.3.3 The P versus NP Question in Terms of Decision Problems Equivalence of the Two Formulations Technical Comments Regarding NP The Traditional Definition of NP In Support of P Being Different from NP Philosophical Meditations Exercises
48 49 50 53 54 56 57 58 59 59 62 63 65 66 69 70 71
Polynomialtime Reductions Teaching Notes The General Notion of a Reduction 3.1.1 The Actual Formulation 3.1.2 Special Cases 3.1.3 Terminology and a Brief Discussion Reducing Optimization Problems to Search Problems SelfReducibility of Search Problems
74 75 75 76 77 79 81 83
Contents
3.4
4 4.1 4.2 4.3
4.4 4.5
5 5.1
5.2 5.3
3.3.1 Examples 3.3.2 SelfReducibility of NPComplete Problems Digest and General Perspective Exercises
ix
85 87 88 89
NPCompleteness Teaching Notes Definitions The Existence of NPComplete Problems Bounded Halting and NonHalting Some Natural NPComplete Problems 4.3.1 Circuit and Formula Satisfiability: CSAT and SAT 4.3.1.1 The NPCompleteness of CSAT 4.3.1.2 The NPCompleteness of SAT 4.3.2 Combinatorics and Graph Theory 4.3.3 Additional Properties of the Standard Reductions 4.3.4 On the Negative Application of NPCompleteness 4.3.5 Positive Applications of NPCompleteness NP Sets That Are Neither in P nor NPComplete Reflections on Complete Problems Exercises
96 97 98 99 102 103 104 105 109 113 120 121 122 126 130 133
Three Relatively Advanced Topics Teaching Notes Promise Problems 5.1.1 Definitions 5.1.1.1 Search Problems with a Promise 5.1.1.2 Decision Problems with a Promise 5.1.1.3 Reducibility Among Promise Problems 5.1.2 Applications and Limitations 5.1.2.1 Formulating Natural Computational Problems 5.1.2.2 Restricting a Computational Problem 5.1.2.3 Nongeneric Applications 5.1.2.4 Limitations 5.1.3 The Standard Convention of Avoiding Promise Problems Optimal Search Algorithms for NP The Class coNP and Its Intersection with NP Exercises
142 142 142 143 143 144 145 146 146 147 147 148 149 151 154 158
Historical Notes
165
x
Contents
Epilogue: A Brief Overview of Complexity Theory
169
Appendix: Some Computational Problems A.1 Graphs A.2 Boolean Formulae
177 177 179
Bibliography Index
181 183
List of Figures
0.1 1.1 1.2 1.3 1.4 1.5 2.1 2.2 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2
Outline of the suggested course. page xxiv A single step by a Turing machine. 12 Multiple steps of the machine depicted in Figure 1.1. 15 A circuit computing f (x1 , x2 , x3 , x4 ) = (x1 ⊕ x2 , x1 ∧ ¬ x2 ∧ x4 ). 34 Recursive construction of parity circuits and formulae. 38 39 A 3DNF formula computing x1 ⊕ x2 ⊕ x3 . Solving S by using a solver for R. 64 65 Solving R by using a solver for SR . The Cookreduction that arises from a Karpreduction. 78 The Cookreduction that arises from a Levinreduction. 80 The three proofs of Theorem 3.8. 95 Overview of the emulation of a computation by a circuit. 106 Consecutive computation steps of a Turing machine. 107 The idea underlying the reduction of CSAT to SAT. 111 The reduction to G3C – the clause gadget and its subgadget. 119 The reduction to G3C – connecting the gadgets. 120 The (nongeneric) reductions presented in Section 4.3. 121 A schematic depiction of a promise problem. 145 The world view under P = coN P ∩ N P = N P . 158
xi
Preface
The quest for efficiency is ancient and universal, as time and other resources are always in shortage. Thus, the question of which tasks can be performed efficiently is central to the human experience. A key step toward the systematic study of the aforementioned question is a rigorous definition of the notion of a task and of procedures for solving tasks. These definitions were provided by computability theory, which emerged in the 1930s. This theory focuses on computational tasks, considers automated procedures (i.e., computing devices and algorithms) that may solve such tasks, and studies the class of solvable tasks. In focusing attention on computational tasks and algorithms, computability theory has set the stage for the study of the computational resources (like time) that are required by such algorithms. When this study focuses on the resources that are necessary for any algorithm that solves a particular task (or a task of a particular type), it is viewed as belonging to the theory of Computational Complexity (also known as Complexity Theory). In contrast, when the focus is on the design and analysis of specific algorithms (rather than on the intrinsic complexity of the task), the study is viewed as belonging to a related area that may be called Algorithmic Design and Analysis. Furthermore, Algorithmic Design and Analysis tends to be subdivided according to the domain of mathematics, science, and engineering in which the computational tasks arise. In contrast, Complexity Theory typically maintains a unity of the study of computational tasks that are solvable within certain resources (regardless of the origins of these tasks). Complexity Theory is a central field of the theoretical foundations of computer science (CS). It is concerned with the study of the intrinsic complexity of computational tasks. That is, a typical Complexity theoretic study refers to the computational resources required to solve a computational task (or a class of such tasks), rather than referring to a specific algorithm or an algorithmic xiii
xiv
Preface
schema. Actually, research in Complexity Theory tends to start with and focus on the computational resources themselves, and addresses the effect of limiting these resources on the class of tasks that can be solved. Thus, Computational Complexity is the general study of what can be achieved within limited time (and/or other limitations on natural computational resources). The most famous question of Complexity Theory is the PvsNP Question. This question can be phrased as asking whether finding solutions to certain problems is harder than checking the correctness of solutions to these problems. Indeed, this phrasing refers to socalled search problems (i.e., problems of searching for solutions). An alternative phrasing, which refers to socalled decision problems, asks whether or not deciding the validity of assertions can be facilitated by the presentation of adequate proofs. Equivalently, the question is whether discovering proofs (of the validity of assertions) is harder than verifying their correctness; that is, is proving harder than verifying? The fundamental nature of the PvsNP Question is evident in each of the foregoing formulations, which are in fact equivalent. It is widely believed that the answer to these equivalent formulations is that finding (resp., proving) is harder than checking (resp., verifying); that is, it is believed that P is different from NP, where P corresponds to the class of efficiently solvable problems and NP corresponds to the seemingly wider class of problems allowing for efficient verification of potential solutions. Indeed, the PvsNP Question has been unresolved since the early 1970s, and it is the author’s guess that the question will remain unresolved for centuries, waiting for the development of a deeper understanding of the nature of efficient computation. However, life will continue in the meantime, and it will bring along a variety of NPproblems, where some of these problems will be placed in P (by presenting efficient algorithms solving them) and others will resist such attempts and will be conjectured to be too computationally hard to belong to P. Actually, the latter description is not a wild guess; this has been the state of affairs for several decades now. At present, when faced with a seemingly hard problem in NP, we can only hope to prove that it is not in P by assuming that NP is different from P. Thus, we seek ways of proving that if the problem at hand is in P, then NP equals P, which means that all problems in NP are in P. This is where the theory of NPcompleteness comes into the picture. Intuitively, a problem in NP is called NPcomplete if any efficient algorithm for it can be converted into an efficient algorithm for any other problem in NP. It follows that if some NPcomplete problem is in P, then all problems in NP are in P. Hence, if NP is different from P, then no NPcomplete problem can be in P. Consequently, the PvsNP
Preface
xv
Question is captured by the question of whether or not an individual (NPcomplete) problem can be solved efficiently. Amazingly enough, NPcomplete problems exist, and furthermore, hundreds of natural computational problems arising in many different areas of mathematics and science are NPcomplete. The aforementioned conversion of an efficient algorithm for one problem into efficient algorithms for other problems is actually performed by a translation of the latter problems’ instances. Such a translation is called a reduction, and the theory of NPcompleteness is based on the notion of efficient reductions. In general, one computational problem is (efficiently) reducible to another problem if it is possible to (efficiently) solve the former when provided access to an (efficient) algorithm for solving the latter. A problem (in NP) is NPcomplete if any problem in NP is efficiently reducible to it, which means that each individual NPcomplete problem “encodes” all problems in NP. The fact that NPcomplete problems exist, let alone in such an abundance and variety, is indeed amazing. Since its discovery, NPcompleteness has been used as the main tool by which the intrinsic complexity of certain problems is demonstrated. A vast number of NPcompleteness results have been discovered since the early 1970s. These discoveries have been guiding theoretical research as well as technological development by indicating when one needs to relax computational problems in order to obtain efficient procedures. This impact is neither confined to computer science nor to the need to solve some computational problems. It typically occurs when researchers or engineers seek a simple characterization of objects that satisfy some property, whereas it turns out that deciding whether a given object has this property is an NPcomplete problem. Needless to say, in such a case, no simple characterization is likely to exist, and so one better abandon the search for it. Indeed, diverse scientific disciplines, which were unsuccessfully struggling with some of their internal questions, came to realize that these questions are inherently difficult since they are closely related to computational problems that are NPcomplete. The Current Book. The main focus of the current book is on the PvsNP Question and on the theory of NPcompleteness. Indeed, a large portion of the book is devoted to presenting and studying the various formulations of the PvsNP Question. This portion may be viewed as a mathematical articulation of the intuitive gap between searching for solutions and checking their validity (or between proving theorems and verifying the correctness of proofs). Another large portion of the book is devoted to the presentation of the theory of NPcompleteness, while providing a treatment of the general notion of efficient
xvi
Preface
reductions between computational problems. This portion may be viewed as a mathematical articulation of the daily notion of a “reduction” (i.e., solving one problem by using a known procedure for another problem), augmented with the fundamental and surprising feature of “universality” (i.e., the existence of complete problems to which all problems can be reduced). The book, which includes adequate preliminaries regarding computational problems and computational models, aims to provide a wide perspective on the issues in its core. For example, the treatment of efficient reductions goes beyond the minimum that suffices for a presentation of the theory of NPcompleteness, and this feature supports the study of the relative complexity of search and decision problems. In general, the book is believed to present the very basics of Complexity Theory, while bearing in mind that most readers do not intend to specialize in Complexity Theory (and yet hoping that some will be motivated to do so). Relation to a Different Book by the Author. The current book is a significant revision of Chapter 2 (and Section 1.2) of the author’s book Computational Complexity: A Conceptual Perspective [13]. The revision was aimed at making the book more friendly to the novice. In particular, numerous technical expositions were further detailed and many exercises were added. Web Site for Notices Regarding This Book. The author intends to maintain a Web site listing corrections of various types. The location of the site is http://www.wisdom.weizmann.ac.il/∼oded/bcbook.html
Acknowledgments. The author is grateful to Asilata Bapat and Michael Forbes for their careful reading of a draft of this book and for the numerous corrections and suggestions that they provided.
Overview
This book starts by providing the relevant background on computability theory, which is the setting in which Complexity theoretic questions are being studied. Most importantly, this preliminary chapter (i.e., Chapter 1) provides a treatment of central notions, such as search and decision problems, algorithms that solve such problems, and their complexity. Special attention is given to the notion of a universal algorithm. The main part of this book (i.e., Chapters 2–5) focuses on the PvsNP Question and on the theory of NPcompleteness. Additional topics covered in this part include the general notion of an efficient reduction (with a special emphasis on reductions of search problems to corresponding decision problems), the existence of problems in NP that are neither NPcomplete nor in P, the class coNP, optimal search algorithms, and promise problems. A brief overview of this main part follows. The PvsNP Question. Loosely speaking, the PvsNP Question refers to search problems for which the correctness of solutions can be efficiently checked (i.e., there is an efficient algorithm that given a solution to a given instance determines whether or not the solution is correct). Such search problems correspond to the class NP, and the PvsNP Question corresponds to whether or not all these search problems can be solved efficiently (i.e., is there an efficient algorithm that given an instance finds a correct solution). Thus, the PvsNP Question can be phrased as asking whether finding solutions is harder than checking the correctness of solutions. An alternative formulation, in terms of decision problems, refers to assertions that have efficiently verifiable proofs (of relatively short length). Such sets of assertions also correspond to the class NP, and the PvsNP Question corresponds to whether or not proofs for such assertions can be found efficiently (i.e., is there an efficient algorithm that given an assertion determines xvii
xviii
Overview
its validity and/or finds a proof for its validity?). Thus, the PvsNP Question can also be phrased as asking whether discovering proofs is harder than verifying their correctness; that is, is proving harder than verifying (or are proofs valuable at all). In these equivalent formulations of the PvsNP Question, P corresponds to the class of efficiently solvable problems, whereas NP corresponds to a natural class of problems for which it is reasonable to seek efficient solvability (i.e., NP corresponds to the seemingly wider class of problems allowing for efficient verification of potential solutions). We also note that in both cases, equality between P and NP contradicts our intuitions regarding the notions that underlie the formulation of NP (i.e., the notions of solving search problems and proving theorems). Indeed, it is widely believed that the answer to these two equivalent formulations of the PvsNP Question is that P is different from NP; that is, finding (resp., discovering) is harder than checking (resp., verifying). The fact that this natural conjecture is unsettled seems to be one of the big sources of frustration of Complexity Theory. The author’s opinion, however, is that this feeling of frustration is unjustified and is rooted in unrealistic expectations (i.e., naive underestimations of the difficulty of relating complexity classes of such a nature). In any case, at present, when faced with a seemingly hard problem in NP, we cannot expect to prove that the problem is not in P unconditionally. The best we can expect is a conditional proof that the said problem is not in P, based on the assumption that NP is different from P. The contrapositive is proving that if the said problem is in P, then so is any problem in NP (i.e., NP equals P). The theory of NPcompleteness captures this idea. NPCompleteness. The theory of NPcompleteness is based on the notion of an efficient reduction, which is a relation between computational problems. Loosely speaking, one computational problem is efficiently reducible to another problem if it is possible to efficiently solve the former when provided with an (efficient) algorithm for solving the latter. Thus, the first problem is not harder to solve than the second one. A problem (in NP) is NPcomplete if any problem in NP is efficiently reducible to it, which means that the first problem “encodes” all problems in NP (and so, in some sense, is the hardest among them). Indeed, the fate of the entire class NP (with respect to inclusion in P) rests with each individual NPcomplete problem. In particular, showing that a problem is NPcomplete implies that this problem is not in P unless NP equals P. The fact that NPcomplete problems can be defined does not mean that they exist. Indeed, the ability of an individual problem to encode all problems in a class as diverse as NP is unfamiliar in daily life, and a layperson is likely to guess
Overview
xix
that such a phenomenon is selfcontradictory (especially when being told that the complete problem has to be in the same class). Nevertheless, NPcomplete problems exist, and furthermore, hundreds of natural computational problems arising in many different areas of mathematics and science are NPcomplete. The list of known NPcomplete problems includes finding a satisfiable assignment to a given Boolean formula (or deciding whether such an assignment exists), finding a 3coloring of the vertices of a given graph (or deciding whether such a coloring exists), and so on. The core of establishing the NPcompleteness of these problems is showing that each of them can encode any other problem in NP. Thus, these demonstrations provide a method of encoding instances of any NP problem as instances of the target NPcomplete problem. The Actual Organization. The foregoing paragraphs refer to material that is covered in Chapters 2–4. Specifically, Chapter 2 is devoted to the PvsNP Question per se, Chapter 3 is devoted to the notion of an efficient reduction, and Chapter 4 is devoted to the theory of NPcompleteness. We mention that NPcomplete problems are not the only seemingly hard problems in NP; that is, if P is different from NP, then NP contains problems that are neither NPcomplete nor in P (see Section 4.4). Additional related topics are discussed in Chapter 5. In particular, in Section 5.2, it is shown that the PvsNP Question is not about inventing sophisticated algorithms or ruling out their existence, but rather boils down to the analysis of a single known algorithm; that is, we will present an optimal search algorithm for any problem in NP, while having no clue about its timecomplexity. Each of the main chapters (i.e., Chapters 1–4) starts with a short overview, which sets the stage for the entire chapter. These overviews provide the basic motivation for the notions defined, as well as a highlevel summary of the main results, and hence should not be skipped. The chapter’s overview is followed by teaching notes, which assume familiarity with the material and thus are better skipped by the novice. Each chapter ends with exercises, which are designed to help verify the basic understanding of the main text (and not to test or inspire creativity). In a few cases, exercises (augmented by adequate guidelines) are used for presenting related advanced material. The book also includes a short historical account (see Historical Notes), a brief overview of Complexity Theory at large (see Epilogue), and a laconic review of some popular computational problems (see Appendix).
To the Teacher
According to a common opinion, the most important aspect of a scientific work is the technical result that it achieves, whereas explanations and motivations are merely redundancy introduced for the sake of “error correction” and/or comfort. It is further believed that, as with a work of art, the interpretation of the work should be left to the reader. The author strongly disagrees with the aforementioned opinions, and argues that there is a fundamental difference between art and science, and that this difference refers exactly to the meaning of a piece of work. Science is concerned with meaning (and not with form), and in its quest for truth and/or understanding, science follows philosophy (and not art). The author holds the opinion that the most important aspects of a scientific work are the intuitive question that it addresses, the reason that it addresses this question, the way it phrases the question, the approach that underlies its answer, and the ideas that are embedded in the answer. Following this view, it is important to communicate these aspects of the work. The foregoing issues are even more acute when it comes to Complexity Theory, firstly because conceptual considerations seem to play an even more central role in Complexity Theory than in other scientific fields. Secondly (and even more importantly), Complexity Theory is extremely rich in conceptual content. Thus, communicating this content is of primary importance, and failing to do so misses the most important aspects of Complexity Theory. Unfortunately, the conceptual content of Complexity Theory is rarely communicated (explicitly) in books and/or surveys of the area. The annoying (and quite amazing) consequences are students who have only a vague understanding of the meaning and general relevance of the fundamental notions and results that they were taught. The author’s view is that these consequences are easy to avoid by taking the time to explicitly discuss the meaning of definitions and results. A closely related issue is using the “right” definitions (i.e., those that xxi
xxii
To the Teacher
reflect better the fundamental nature of the notion being defined) and emphasizing the (conceptually) “right” results. The current book is written accordingly; two concrete and central examples follow. The first example refers to the presentation of the PvsNP Question, where we avoid using (polynomialtime) nondeterministic machines. We believe that these fictitious “machines” have a negative effect from both a conceptual and a technical point of view. The conceptual damage caused by defining NP in terms of (polynomialtime) nondeterministic machines is that it is unclear why one should care about what such machines can do. Needless to say, the reason to care is clear when noting that these fictitious “machines” offer a (convenient but rather slothful) way of phrasing fundamental issues. The technical damage caused by using nondeterministic machines is that they tend to confuse the students. In contrast to using a fictitious model as a pivot, we define NP in terms of proof systems such that the fundamental nature of this class and the PvsNP Question are apparent. We also push to the front a formulation of the PvsNP Question in terms of search problems. We believe that this formulation may appeal to nonexperts even more than the formulation of the PvsNP Question in terms of decision problems. The aforementioned formulation refers to classes of search problems that are analogous to the decision problem classes P and NP. Specifically, we consider the classes PF and PC (see Definitions 2.2 and 2.3), where PF consists of search problems that are efficiently solvable and PC consists of search problems having efficiently checkable solutions.1 To summarize, we suggest presenting the PvsNP Question both in terms of search problems and in terms of decision problems. Furthermore, when presenting the decisionproblem version, we suggest introducing NP by explicitly referring to the terminology of proof systems (rather than using the more standard formulation, which is based on nondeterministic machines). We mention that the formulation of NP as proof systems is also a better starting point for the study of more advanced issues (e.g., counting classes, let alone probabilistic proof systems). Turning to the second example, which refers to the theory of NPcompleteness, we highlight a central recommendation regarding the presentation of this theory. We believe that from a conceptual point of view, the mere existence of NPcomplete problems is an amazing fact. We thus suggest emphasizing and discussing this fact per se. In particular, we recommend first proving the mere existence of NPcomplete problems, and only later establishing the fact that certain natural problems such as SAT are NPcomplete. Also, when establishing the NPcompleteness of SAT, we recommend decoupling 1
Indeed, these classes are often denoted FP and FN P, respectively.
To the Teacher
xxiii
the emulation of Turing machines by circuits (used for establishing the NPcompleteness of CSAT) from the emulation of circuits by formulae (used in the reduction of CSAT to SAT). Organization. In Chapter 1, we present the basic framework of Computational Complexity, which serves as a stage for the rest of the book. In particular, we formalize the notions of search and decision problems (see Section 1.2), algorithms solving them (see Section 1.3), and their time complexity (see Section 1.3.5). In Chapter 2, we present the two formulations of the PvsNP Question. The general notion of a reduction is presented in Chapter 3, where we highlight its applicability outside the domain of NPcompleteness. In particular, in Section 3.3 we treat reductions of search problems to corresponding decision problems. Chapter 4 is devoted to the theory of NPcompleteness, whereas Chapter 5 treats three relatively advanced topics (i.e., the framework of promise problems, the existence of optimal search algorithms for NP, and the class coNP). The book ends with an Epilogue, which provides a brief overview of Complexity Theory, and an Appendix that reviews some popular computational problems (which are used as examples in the main text). The Chapters’ Overviews. Each of the main chapters (i.e., Chapters 1–4) starts with a short overview, which provides the basic motivation for the notions defined in that chapter as well as a highlevel summary of the chapter’s main results. We suggest using these overviews as a basis for motivational discussions preceding the actual technical presentation. Additional Teaching Notes. Each chapter overview is followed by additional teaching notes. These notes articulate various choices made in the presentation of the material in the corresponding chapter. Basing a Course on the Current Book. The book can serve as a basis for an undergraduate course, which may be called Basics of Computational Complexity. The core material for this course is provided by Chapters 1–4. Specifically, Sections 1.1–1.3 provide the required elements of computability theory, and Chapters 2–4 provide the basic elements of Complexity Theory. In addition, §1.4.1.1 and §1.4.3.1 (or, alternatively, Appendix A.2) provide preliminaries regarding Boolean circuits and formulae that are required in Section 4.3 (which refers to CSAT and SAT). For a schematic outline of the course, see Figure 0.1. On the Choice of Additional (Basic and Advanced) Topics. As depicted in Figure 0.1, depending on time constraints, we suggest augmenting the core material with a selection of additional basic and advanced topics. As for
xxiv
To the Teacher
topic Elements of computability theory The PvsNP Question Optional: definitional variations Polynomialtime reductions The existence of NPcomplete problems Natural NPcomplete problems (e.g., CSAT, SAT, VC) Preliminaries on Boolean circuits and formulae Add’l basic topics: NPI, promise problems, optimal search Advanced topics, if time permits
sections 1.1−1.3 2.1−2.4, 2.7 2.5, 2.6 3.1−3.3 4.1−4.2 4.3 1.4.1, 1.4.3, A.2 4.4, 5.1, 5.2 from [13, 1]
Figure 0.1. Outline of the suggested course.
the basic topics, we recommend at least mentioning the class NPI, promise problems, and the optimal search algorithms for NP. Regarding the choice of advanced topics, we recommend an introduction to probabilistic proof systems. In our opinion, this choice is most appropriate because it provides natural extensions of the notion of an NPproof system and offers very appealing positive applications of NPcompleteness. Section 4.3.5 provides a brief overview of probabilistic proof systems, while [13, Chap. 9] provides an extensive overview (which transcends the needs of a basic complexity course). Alternative advanced topics can be found in [13, 1]. A Revision of the CS Curriculum. The best integration of the aforementioned course in undergraduate CS education calls for a revision of the standard CS curriculum. Indeed, we believe that there is no real need for a semesterlong course in Computability (i.e., a course that focuses on what can be computed rather than on what can be computed efficiently). Instead, CS undergraduates should take a course in Computational Complexity, which should contain the computability aspects that serve as a basis for the study of efficient computation (i.e., the rest of this course). Specifically, the computability aspects should occupy at most onethird of the course, and the focus should be on basic complexity issues (captured by P, NP, and NPcompleteness), which may be augmented by a selection of some more advanced material. Indeed, such a course can be based on the current book (possibly augmented by a selection of some additional topics from, say, [13, 1]).
Notations and Conventions
Although we do try to avoid using various notations and conventions that may not be familiar to the reader, some exceptions exists – especially in advanced discussions. In order to be on the safe side, we list here some standard notations and conventions that are (lightly) used in the book. Standard Asymptotic Notation. When referring to integral functions, we use the standard asymptotic notation; that is, for f, g : N → N, we write f = O(g) if there exists a constant c > 0 such that f (n) ≤ c · g(n) holds for all sufficiently large n ∈ N. We usually denote by “poly” an unspecified polynomial, and write f (n) = poly(n) instead of “there exists a polynomial p such that f (n) ≤ p(n) for all n ∈ N.” Standard Combinatorial and Graph Theory Terms and Notation. For a def natural number n ∈ N, we denote [n] = {1, . . . , n}. Many of the computational problems that we mention refer to finite (undirected) graphs. Such a graph, denoted G = (V , E), consists of a set of vertices, denoted V , and a set of edges, denoted E, which are unordered pairs of vertices. By default, graphs are undirected, whereas directed graphs consist of vertices and directed edges, where a directed edge is an order pair of vertices. For further background on graphs and computational problems regarding graphs, the reader is referred to Appendix A.1. Typographic Conventions. We denote formally defined complexity classes by calligraphic letters (e.g., N P), but we do so only after defining these classes. Furthermore, when we wish to maintain some ambiguity regarding the specific formulation of a class of problems, we use Roman font (e.g., NP may denote either a class of search problems or a class of decision problems). Likewise, xxv
xxvi
Notations and Conventions
we denote formally defined computational problems by typewriter font (e.g., SAT). In contrast, generic problems and algorithms will be denoted by italic font. Our Use of Footnotes. In trying to accommodate a diverse spectrum of readers, we use footnotes for presentation of additional details that most readers may wish to skip but some readers may find useful. The most common usage of footnotes is for providing additional technical details that may seem obvious to most readers but be missed by some others. Occasionally, footnotes are also used for advanced comments.
Main Definitions and Results
Following is a list of the main definitions and results presented in the book. The list only provides a laconic description of each of the items, while a full description can be found in the actual text (under the provided reference). The list is ordered approximately according to the order of appearance of the corresponding topics in the main text. Search and Decision Problems. The former refer to finding solutions to given instances, whereas the latter refer to determining whether the given instance has a predetermined property. See Definitions 1.1 and 1.2, respectively. Turing Machines. The model of Turing machines offers a relatively simple formulation of the notion of an algorithm. See Section 1.3.2. Theorem 1.4. The set of computable functions is countable, whereas the set of all functions (from strings to strings) is not countable. Theorem 1.5. The Halting Problem is undecidable. Universal Algorithms. A universal machine computes the partial function u that is defined on pairs ( M, x) such that M halts on input x, in which case it holds that u( M, x) = M(x). See Section 1.3.4. Efficient and Inefficient. Efficiency is associated with polynomialtime computations, whereas computations requiring more time are considered inefficient or intractable (or infeasible). See Section 2.1. The Class PF (Polynomialtime Find). The class of efficiently solvable search problems. See Definition 2.2. xxvii
xxviii
Main Definitions and Results
The Class PC (Polynomialtime Check). The class of search problems having efficiently checkable solutions. See Definition 2.3. The Notations SR and R(x) Associated with a Search Problem R. For any search problem, R, we denote the set of solutions to the instance x by R(x) (i.e., R(x) = {y : (x, y) ∈ R}), and denote the set of instances having solutions by SR (i.e., SR = {x : R(x) = ∅}). The Class P. The class of efficiently solvable decision problems. See Definition 2.4. The Class N P. The class of decision problems having efficiently verifiable proof systems. See Definition 2.5. Theorem 2.6. PC ⊆ PF if and only if P = N P. The PvsNP Question. It is widely believed that P is different from NP. This belief is supported by both philosophical and empirical considerations. See Section 2.7. The Traditional Definition of N P. Traditionally, N P is defined as the class of sets that can be decided by a fictitious device called a nondeterministic polynomialtime machine (which explains the source of the notation NP). See Section 2.6. Cookreductions. A problem is Cookreducible to a problem if can be solved efficiently when given access to any procedure (or oracle) that solves the problem . See Definition 3.1. Karpreductions. A decision problem S is Karpreducible to a decision problem S if there exists a polynomialtime computable function f such that, for every x, it holds that x ∈ S if and only if f (x) ∈ S . See Definition 3.3. Levinreductions. A search problem R is Levinreducible to a search problem R if there exists polynomialtime computable functions f and g such that (1) f is a Karpreduction of SR to SR , and (2) for every x ∈ SR and y ∈ R (f (x)) it holds that (x, g(x, y )) ∈ R. See Definition 3.4. Theorem 3.2. Every search problem in PC is Cookreducible to some decision problem in N P.
Main Definitions and Results
xxix
Selfreducibility of Search Problems. The decision implicit in a search problem R is deciding membership in the set SR , and R is called selfreducible if it is Cookreducible to SR . See Section 3.3. NPCompleteness (of Decision Problems). A decision problem S is N P 
complete if (1) S is in N P, and (2) every decision problem in N P is Karp
reducible to S. See Definition 4.1. NPCompleteness of Search Problems. A search problem R is PCcomplete (or NPcomplete) if (1) R is in PC, and (2) every search problem in PC is Levinreducible to R. See Definition 4.2. Theorem 4.3. There exist NPcomplete search and decision problems. Theorems 4.5 and 4.6 (Also Known as Cook–Levin Theorem). Circuit satisfiability (CSAT) and formula satisfiability (SAT) are NPcomplete. Proposition 4.4. If an N Pcomplete decision problem S is Karpreducible to a decision problem S ∈ N P (resp., a PCcomplete search problem R is Levinreducible to a search problem R ∈ PC), then S is N Pcomplete (resp., R is PCcomplete). Theorem 4.12. Assuming N P = P, there exist decision problems in N P \ P that are not NPcomplete (even when allowing Cookreductions). Promise Problems. Promise problems are natural generalizations of search and decision problems that are obtained by explicitly specifying a set of legitimate instances (rather than considering any string as a legitimate instance). See Section 5.1. Theorem 5.5. There exists an optimal algorithm for any candid search problem in NP, where the candid search problem of the binary relation R consists of finding solutions whenever they exist (and behaving arbitrarily otherwise; see Definition 5.2). Theorem 5.7. If every set in N P can be Cookreduced to some set in N P ∩ coN P, then N P = coN P, where coN P = {{0, 1}∗ \ S : S ∈ N P}.
1 Computational Tasks and Models
Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to promote viewing computation as a general phenomenon, which may capture both artificial and natural processes. Loosely speaking, a computation is a process that modifies a relatively large environment via repeated applications of a simple and predetermined rule. Although each application of the rule has a very limited effect, the effect of many applications of the rule may be very complex. We are interested in the transformation of the environment effected by the computational process (or computation), where the computation rule is designed to achieve a desired effect. Typically, the initial environment to which the computation is applied encodes an input string, and the end environment (i.e., at termination of the computation) encodes an output string. Thus, the computation defines a mapping from inputs to outputs, and such a mapping can be viewed as solving a search problem (i.e., given an instance x find a solution y that relates to x in some predetermined way) or a decision problem (i.e., given an instance x determine whether or not x has some predetermined property). Indeed, our focus will be on solving computational tasks (mostly search and decision problems), where a computational task refers to an infinite set of instances such that each instance is associated with a set of valid solutions. In the case of search problem this set may contain several different solutions (per each instance), but in the case of a decision problem the set of solutions is a singleton that consists of a binary value (per each instance). 1
2
1 Computational Tasks and Models
In order to provide a basis for a rigorous study of the complexity of computational tasks, we need to define computation (and its complexity) rigorously. This, in turn, requires specifying a concrete model of computation, which corresponds to an abstraction of a real computer (be it a PC, mainframe, or network of computers) and yet is simpler (and thus facilitates further study). We will refer to the model of Turing machines, but any reasonable alternative model will do. We also discuss two fundamental features of any reasonable model of computation: the existence of problems that cannot be solved by any computing device (in this model) and the existence of universal computing devices (in this model). Organization. We start by introducing the general framework for our discussion of computational tasks (or problems). This framework refers to the representation of instances as binary sequences (see Section 1.1) and focuses on two types of tasks: searching for solutions and making decisions (see Section 1.2). Once computational tasks are defined, we turn to methods for solving such tasks, which are described in terms of some model of computation. The description of such models is the main contents of this chapter. Specifically, we consider two types of models of computation: uniform models and nonuniform models (see Sections 1.3 and 1.4, respectively). The uniform models correspond to the intuitive notion of an algorithm, and will provide the stage for the rest of the book (which focuses on efficient algorithms). In contrast, nonuniform models (e.g., Boolean circuits) facilitate a closer look at the way a computation progresses, and will be used only sporadically in this book. Thus, whereas Sections 1.1–1.3 are absolute prerequisites for the rest of this book, Section 1.4 is not.
Teaching Notes This chapter provides the necessary preliminaries for the rest of the book; that is, we discuss the notion of a computational task and present computational models for describing methods for solving such tasks. Sections 1.1–1.3 correspond to the contents of a traditional Computability course, except that our presentation emphasizes some aspects and deemphasizes others. In particular, the presentation highlights the notion of a universal machine (see Section 1.3.4), explicitly discusses the complexity of computation
1.1 Representation
3
(Section 1.3.5), and provides a definition of oracle machines (Section 1.3.6). This material (with the exception of Kolmogorov Complexity) is taken for granted in the rest of the current book. In contrast, Section 1.4 presents basic preliminaries regarding nonuniform models of computation (e.g., various types of Boolean circuits), and these are used only lightly in the rest of the book. We strongly recommend avoiding the standard practice of teaching the student to program with Turing machines. These exercises seem very painful and pointless. Instead, one should prove that the Turing machine model is exactly as powerful as a model that is closer to a reallife computer (see the “sanity check” in §1.3.2.2); that is, a function can be computed by a Turing machine if and only if it is computable by a machine of the latter model. For starters, one may prove that a function can be computed by a singletape Turing machine if and only if it is computable by a multitape (e.g., twotape) Turing machine. As noted in Section 1.3.7, we reject the common coupling of computability theory with the theory of automata and formal languages. Although the historical links between these two theories (at least in the West) cannot be denied, this fact cannot justify coupling two fundamentally different theories (especially when such a coupling promotes a wrong perspective on computability theory). Thus, in our opinion, the study of any of the lower levels of Chomsky’s Hierarchy [16, Chap. 9] should be decoupled from the study of computability theory (let alone the study of Complexity Theory). Indeed, this is related to the discussion of the “revision of the CS curriculum” in the preliminary section “To the Teacher.” The perspective on nonuniform models of computation provided by Section 1.4 is more than the very minimum that is required for the rest of this book. If pressed for time, then the teacher may want to skip all of Section 1.4.2 as well as some of the material in Section 1.4.1 and Section 1.4.3 (i.e., avoid §1.4.1.2 as well as §1.4.3.2). Furthermore, for a minimal presentation of Boolean formulae, one may use Appendix A.2 instead of §1.4.3.1.
1.1 Representation In mathematics and most other sciences, it is customary to discuss objects without specifying their representation. This is not possible in the theory of computation, where the representation of objects plays a central role. In a sense, a computation merely transforms one representation of an object to another representation of the same object. In particular, a computation designed to solve some problem merely transforms the problem instance to its solution,
4
1 Computational Tasks and Models
where the latter can be thought of as a (possibly partial) representation of the instance. Indeed, the answer to any fully specified question is implicit in the question itself, and computation is employed to make this answer explicit. Computational tasks refer to objects that are represented in some canonical way, where such canonical representation provides an “explicit” and “full” (but not “overly redundant”) description of the corresponding object. Furthermore, when we discuss natural computational problems, we always use a natural representation of the corresponding objects. We will only consider finite objects like numbers, sets, graphs, and functions (and keep distinguishing these types of objects although, actually, they are all equivalent). While the representation of numbers, sets, and functions is quite straightforward (see the following), we refer the reader to Appendix A.1 for a discussion of the representation of graphs. In order to facilitate a study of methods for solving computational tasks, these tasks are defined with respect to infinitely many possible instances (each being a finite object). Indeed, the comparison of different methods seems to require the consideration of infinitely many possible instances; otherwise, the choice of the language in which the methods are described may totally dominate and even distort the discussion (cf., e.g., the discussion of Kolmogorov Complexity in §1.3.4.2). Strings. We consider finite objects, each represented by a finite binary sequence called a string. For a natural number n, we denote by {0, 1}n the set of all strings of length n, hereafter referred to as nbit (long) strings. The set of all strings is denoted {0, 1}∗ ; that is, {0, 1}∗ = ∪n∈N {0, 1}n , where 0 ∈ N. For x ∈ {0, 1}∗ , we denote by x the length of x (i.e., x ∈ {0, 1}x ), and often denote by xi the i th bit of x (i.e., x = x1 x2 · · · xx ). For x, y ∈ {0, 1}∗ , we denote by xy the string resulting from concatenation of the strings x and y. At times, we associate {0, 1}∗ ×{0, 1}∗ with {0, 1}∗ ; the reader should merely consider an adequate encoding (e.g., the pair (x1 · · · xm , y1 · · · yn ) ∈ {0, 1}∗ × {0, 1}∗ may be encoded by the string x1 x1 · · · xm xm 01y1 · · · yn ∈ {0, 1}∗ ). Likewise, we may represent sequences of strings (of fixed or varying length) as single strings. When we wish to emphasize that such a sequence (or some other object) is to be considered as a single object, we use the notation · (e.g., “the pair (x, y) is encoded as the string x, y”). Numbers. Unless stated differently, natural numbers will be encoded by their binary expansion; that is, the string bn−1 · · · b1 b0 ∈ {0, 1}n encodes the number n−1 i i=0 bi · 2 , where typically we assume that this representation has no leading zeros (i.e., bn−1 = 1), except when the number itself is zero. Rational numbers
1.2 Computational Tasks
5
will be represented as pairs of natural numbers. In the rare cases in which one considers real numbers as part of the input to a computational problem, one actually means rational approximations of these real numbers. Sets are usually represented as lists, which means that the representation introduces an order that is not specified by the set itself. Indeed, in general, the representation may have features that are not present in the represented object. Functions are usually represented as sets of argument–value pairs (i.e., functions are represented as binary relations, which in turn are sets of ordered pairs). Special Symbols. We denote the empty string by λ (i.e., λ ∈ {0, 1}∗ and λ = 0), and the empty set by ∅. It will be convenient to use some special symbols that are not in {0, 1}∗ . One such symbol is ⊥, which typically denotes an indication (e.g., produced by some algorithm) that something is wrong.
1.2 Computational Tasks Two fundamental types of computational tasks are the socalled search problems and decision problems. In both cases, the key notions are the problem’s instances and the problem’s specification.
1.2.1 Search Problems A search problem consists of a specification of a (possibly empty) set of valid solutions for each possible instance. Given an instance, one is required to find a corresponding solution (or to determine that no such solution exists). For example, consider the problem in which one is given a system of equations and is asked to find a valid solution. Needless to say, much of computer science is concerned with solving various search problems (e.g., finding shortest paths in a graph, finding an occurrence of a given pattern in a given string, finding the median value in a given list of numbers, etc). Furthermore, search problems correspond to the daily notion of “solving a problem” (e.g., finding one’s way between two locations), and thus a discussion of the possibility and complexity of solving search problems corresponds to the natural concerns of most people. In the following definition of solving search problems, the potential solver is a function (which may be thought of as a solving strategy), and the sets of possible solutions associated with each of the various instances are “packed” into a single binary relation.
6
1 Computational Tasks and Models
Definition 1.1 (solving a search problem): Let R ⊆ {0, 1}∗ × {0, 1}∗ and def R(x) = {y : (x, y) ∈ R} denote the set of solutions for the instance x. A function f : {0, 1}∗ → {0, 1}∗ ∪ {⊥} solves the search problem of R if for every x the following holds: if R(x) = ∅ then f (x) ∈ R(x) and otherwise f (x) = ⊥. Indeed, R = {(x, y) ∈ {0, 1}∗ × {0, 1}∗ : y ∈ R(x)}. The solver f is required to find a solution to the given instance x whenever such a solution exists; that is, given x, the solver is required to output some y ∈ R(x) whenever the set R(x) is not empty. It is also required that the solver f never outputs a wrong solution; that is, if R(x) = ∅ then f (x) ∈ R(x), and if R(x) = ∅ then f (x) = ⊥. This means that f indicates whether or not x has any solution (since f (x) ∈ {0, 1}∗ if x has a solution, whereas f (x) = ⊥ ∈ {0, 1}∗ otherwise). Note that the solver is not necessarily determined by the search problem (i.e., the solver is uniquely determined if and only if R(x) ≤ 1 holds for every x). Of special interest is the case of search problems having a unique solution (for each possible instance); that is, the case that R(x) = 1 for every x. In this case, R is essentially a (total) function, and solving the search problem of R means computing (or evaluating) the function R (or rather the function def R defined by R (x) = y if and only if R(x) = {y}). Popular examples include sorting a sequence of numbers, multiplying integers, finding the prime factorization of a composite number, and so on.1
1.2.2 Decision Problems A decision problem consists of a specification of a subset of the possible instances. Given an instance, one is required to determine whether the instance is in the specified set. For example, consider the problem where one is given a natural number and is asked to determine whether or not the number is a prime (i.e., whether or not the given number is in the set of prime numbers). Note that one typically presents decision problems in terms of deciding whether a given object has some predetermined property, but this can always be viewed as deciding membership in some predetermined set (i.e., the set of objects having this property). For example, when talking about determining whether or not a given graph is connected, we refer to deciding membership in the set of connected graphs. 1
For example, sorting is represented as a binary relation that contains all pairs of sequences such that the second sequence is a sorted version of the first sequence. That is, the pair ((x1 , . . . , xn ), (y1 , . . . , yn )) is in the relation if and only if there exists a permutation π over [n] such that yi = xπ (i) and yi < yi+1 for every relevant i.
1.2 Computational Tasks
7
One important type of decision problems concerns those derived from search problems by considering the set of instances having a solution (with respect to some fixed search problem); that is, for any binary relation R ⊆ {0, 1}∗ × {0, 1}∗ we consider the set {x : R(x) = ∅}. Indeed, being able to determine whether or not a solution exists is a prerequisite to being able to solve the corresponding search problem (as per Definition 1.1). In general, decision problems refer to the natural task of making binary decisions, a task that is not uncommon in daily life (e.g., determining whether a traffic light is red). In any case, in the following definition of solving decision problems, the potential solver is again a function; specifically, in this case the solver is a Boolean function, which is supposed to indicate membership in a predetermined set. Definition 1.2 (solving a decision problem): Let S ⊆ {0, 1}∗ . A function f : {0, 1}∗ → {0, 1} solves the decision problem of S (or decides membership in S) if for every x it holds that f (x) = 1 if and only if x ∈ S. That is, the solver f is required to indicate whether or not the instance x resides in the predetermined set S. This indication is modeled by a binary value, where 1 corresponds to a positive answer and 0 corresponds to a negative answer. Thus, given x, the solver is required to output 1 if x ∈ S, and output 0 otherwise (i.e., if x ∈ S). Note that the function that solves a decision problem is uniquely determined by the decision problem; that is, if f solves (the decision problem of) S, then f equals the characteristic function of S (i.e., the function χS : {0, 1}∗ → {0, 1} defined such that χS (x) = 1 if and only if x ∈ S). As hinted already in Section 1.2.1, the solver of a search problem implicitly determines membership in the set of instances that have solutions. That is, if f solves the search problem of R, then the Boolean function f : {0, 1}∗ → {0, 1} def defined by f (x) = 1 if and only if f (x) = ⊥ solves the decision problem of {x : R(x) = ∅}. Terminology. We often identify the decision problem of a set S with S itself, and also identify S with its characteristic function. Likewise, we often identify the search problem of a relation R with R itself. Reflection. Most people would consider search problems to be more natural than decision problems: Typically, people seeks solutions more often than they stop to wonder whether or not solutions exist. Definitely, search problems are not less important than decision problems; it is merely that their study tends
8
1 Computational Tasks and Models
to require more cumbersome formulations. This is the main reason that most expositions choose to focus on decision problems. The current book attempts to devote at least a significant amount of attention to search problems, too.
1.2.3 Promise Problems (an Advanced Comment) Many natural search and decision problems are captured more naturally by the terminology of promise problems, in which the domain of possible instances is a subset of {0, 1}∗ rather than {0, 1}∗ itself. In particular, note that the natural formulation of many search and decision problems refers to instances of a certain type (e.g., a system of equations, a pair of numbers, a graph), whereas the natural representation of these objects uses only a strict subset of {0, 1}∗ . For the time being, we ignore this issue, but we shall revisit it in Section 5.1. Here we just note that in typical cases, the issue can be ignored by postulating that every string represents some legitimate object; for example, each string that is not used in the natural representation of these objects is postulated to be a representation of some fixed object (e.g., when representing graphs, we may postulate that each string that is not used in the natural representation of graphs is in fact a representation of the 1vertex graph).
1.3 Uniform Models (Algorithms) We finally reach the heart of the current chapter, which is the definition of (uniform) models of computation. Before presenting these models, let us briefly explain the need for their formal definitions. Indeed, we are all familiar with computers and with the ability of computer programs to manipulate data. But this familiarity is rooted in positive experience; that is, we have some experience regarding some things that computers can do. In contrast, Complexity Theory is focused at what computers cannot do, or rather with drawing the line between what can be done and what cannot be done. Drawing such a line requires a precise formulation of all possible computational processes; that is, we should have a clear definition of all possible computational processes (rather than some familiarity with some computational processes). We note that while our main motivation for defining formal models of computation is to capture the intuitive notion of an algorithm, such models also provide a useful perspective on a wide variety of processes that take place in the world.
1.3 Uniform Models (Algorithms)
9
Organization of Section 1.3. We start, in Section 1.3.1, with a general and abstract discussion of the notion of computation. Next, in Section 1.3.2, we provide a highlevel description of the model of Turing machines. This is done merely for the sake of providing a concrete model that supports the study of computation and its complexity, whereas the material in this book will not depend on the specifics of this model. In Section 1.3.3 and Section 1.3.4 we discuss two fundamental properties of any reasonable model of computation: the existence of uncomputable functions and the existence of universal computations. The time (and space) complexity of computation is defined in Section 1.3.5. We also discuss oracle machines and restricted models of computation (in Section 1.3.6 and Section 1.3.7, respectively).
1.3.1 Overview and General Principles Before being formal, let us offer a general and abstract description of the notion of computation. This description applies both to artificial processes (taking place in computers) and to processes that are aimed at modeling the evolution of the natural reality (be it physical, biological, or even social). A computation is a process that modifies an environment via repeated applications of a predetermined rule. The key restriction is that this rule is simple: In each application it depends and affects only a (small) portion of the environment, called the active zone. We contrast the a priori bounded size of the active zone (and of the modification rule) with the a priori unbounded size of the entire environment. We note that although each application of the rule has a very limited effect, the effect of many applications of the rule may be very complex. Put in other words, a computation may modify the relevant environment in a very complex way, although it is merely a process of repeatedly applying a simple rule. As hinted, the notion of computation can be used to model the “mechanical” aspects of the natural reality, that is, the rules that determine the evolution of the reality (rather than the specific state of the reality at a specific time). In this case, the starting point of the study is the actual evolution process that takes place in the natural reality, and the goal of the study is finding the (computation) rule that underlies this natural process. In a sense, the goal of science at large can be phrased as finding (simple) rules that govern various aspects of reality (or rather one’s abstraction of these aspects of reality). Our focus, however, is on artificial computation rules designed by humans in order to achieve specific desired effects on a corresponding artificial environment. Thus, our starting point is a desired functionality, and our aim is to design
10
1 Computational Tasks and Models
computation rules that effect it. Such a computation rule is referred to as an algorithm. Loosely speaking, an algorithm corresponds to a computer program written in a highlevel (abstract) programming language. Let us elaborate. We are interested in the transformation of the environment as effected by the computational process (or the algorithm). Throughout (almost all of) this book, we will assume that, when invoked on any finite initial environment, the computation halts after a finite number of steps. Typically, the initial environment to which the computation is applied encodes an input string, and the end environment (i.e., at termination of the computation) encodes an output string. We consider the mapping from inputs to outputs induced by the computation; that is, for each possible input x, we consider the output y obtained at the end of a computation initiated with input x, and say that the computation maps input x to output y. Thus, a computation rule (or an algorithm) determines a function (computed by it): This function is exactly the aforementioned mapping of inputs to outputs. In the rest of this book (i.e., outside the current chapter), we will also consider the number of steps (i.e., applications of the rule) taken by the computation on each possible input. The latter function is called the time complexity of the computational process (or algorithm). While time complexity is defined per input, we will often considers it per input length, taking the maximum over all inputs of the same length. In order to define computation (and computation time) rigorously, one needs to specify some model of computation, that is, provide a concrete definition of environments and a class of rules that may be applied to them. Such a model corresponds to an abstraction of a real computer (be it a PC, mainframe, or network of computers). One simple abstract model that is commonly used is that of Turing machines (see Section 1.3.2). Thus, specific algorithms are typically formalized by corresponding Turing machines (and their time complexity is represented by the time complexity of the corresponding Turing machines). We stress, however, that almost all results in the theory of computation hold regardless of the specific computational model used, as long as it is “reasonable” (i.e., satisfies the aforementioned simplicity condition and can perform some apparently simple computations). What is being Computed? The foregoing discussion has implicitly referred to algorithms (i.e., computational processes) as means of computing functions. Specifically, an algorithm A computes the function fA : {0, 1}∗ → {0, 1}∗ ∪ {⊥} defined by fA (x) = y if, when invoked on input x, algorithm A halts with output y. However, algorithms can also serve as means of “solving search problems” or “making decisions” (as in Definitions 1.1 and 1.2). Specifically, we will say
1.3 Uniform Models (Algorithms)
11
that algorithm A solves the search problem of R (resp., decides membership in S) if fA solves the search problem of R (resp., decides membership in S). In the rest of this exposition we associate the algorithm A with the function fA computed by it; that is, we write A(x) instead of fA (x). For the sake of future reference, we summarize the foregoing discussion in a definition. Definition 1.3 (algorithms as problem solvers): We denote by A(x) the output of algorithm A on input x. Algorithm A solves the search problem R (resp., the decision problem S) if A, viewed as a function, solves R (resp., S).
1.3.2 A Concrete Model: Turing Machines The model of Turing machines offers a relatively simple formulation of the notion of an algorithm. The fact that the model is very simple complicates the design of machines that solve problems of interest, but makes the analysis of such machines simpler. Since the focus of Complexity Theory is on the analysis of machines and not on their design, the tradeoff offered by this model is suitable for our purposes. We stress again that the model is merely used as a concrete formulation of the intuitive notion of an algorithm, whereas we actually care about the intuitive notion and not about its formulation. In particular, all results mentioned in this book hold for any other “reasonable” formulation of the notion of an algorithm. The model of Turing machines provides only an extremely coarse description of reallife computers. Indeed, Turing machines are not meant to provide an accurate portrayal of reallife computers, but rather to capture their inherent limitations and abilities (i.e., a computational task can be solved by a reallife computer if and only if it can be solved by a Turing machine). In comparison to reallife computers, the model of Turing machines is extremely oversimplified and abstracts away many issues that are of great concern to computer practice. However, these issues are irrelevant to the higherlevel questions addressed by Complexity Theory. Indeed, as usual, good practice requires more refined understanding than the one provided by a good theory, but one should first provide the latter. Historically, the model of Turing machines was invented before modern computers were even built, and was meant to provide a concrete model of computation and a definition of computable functions.2 Indeed, this concrete 2
In contrast, the abstract definition of “recursive functions” yields a class of “computable” functions without referring to any model of computation (but rather based on the intuition that any such model should support recursive functional composition).
12
1 Computational Tasks and Models
b 5
3
2
2
2
3
1
0
















b 5
3
2
2
2
3
3
0
Figure 1.1. A single step by a Turing machine.
model clarified fundamental properties of computable functions and plays a key role in defining the complexity of computable functions. The model of Turing machines was envisioned as an abstraction of the process of an algebraic computation carried out by a human using a sheet of paper. In such a process, at each time, the human looks at some location on the paper, and depending on what he/she sees and what he/she has in mind (which is little . . . ), he/she modifies the contents of this location and shifts his/her look to an adjacent location. 1.3.2.1 The Actual Model Following is a highlevel description of the model of Turing machines. While this description should suffice for our purposes, more detailed (lowlevel) descriptions can be found in numerous textbooks (e.g., [30]). Recall that, in order to describe a computational model, we need to specify the set of possible environments, the set of machines (or computation rules), and the effect of applying such a rule on an environment. The Environment. The main component in the environment of a Turing machine is an infinite sequence of cells, each capable of holding a single symbol (i.e., member of a finite set ⊃ {0, 1}). This sequence is envisioned as starting at a leftmost cell, and extending infinitely to the right (cf. Figure 1.1). In addition, the environment contains the current location of the machine on this sequence, and the internal state of the machine (which is a member of a finite set Q). The aforementioned sequence of cells is called the tape, and its contents combined with the machine’s location and its internal state is called the instantaneous configuration of the machine.
1.3 Uniform Models (Algorithms)
13
The Machine Itself (i.e., the Computation Rule). The main component in the Turing machine itself is a finite rule (i.e., a finite function) called the transition function, which is defined over the set of all possible symbolstate pairs. Specifically, the transition function is a mapping from × Q to × Q × {−1, 0, +1}, where {−1, +1, 0} correspond to a movement instruction (which is either “left” or “right” or “stay,” respectively). In addition, the machine’s description specifies an initial state and a halting state, and the computation of the machine halts when the machine enters its halting state. (Envisioning the tape as in Figure 1.1, we use the convention by which if the machine ever tries to move left of the end of the tape, then it is considered to have halted.) We stress that in contrast to the finite description of the machine, the tape has an a priori unbounded length (and is considered, for simplicity, as being infinite). A Single Application of the Computation Rule. A single computation step of such a Turing machine depends on its current location on the tape, on the contents of the corresponding cell, and on the internal state of the machine. Based on the latter two elements, the transition function determines a new symbolstate pair as well as a movement instruction (i.e., “left” or “right” or “stay”). The machine modifies the contents of the said cell and its internal state accordingly, and moves as directed. That is, suppose that the machine is in state q and resides in a cell containing the symbol σ , and suppose that the transition function maps (σ, q) to (σ , q , D). Then, the machine modifies the contents of the said cell to σ , modifies its internal state to q , and moves one cell in direction D. Figure 1.1 shows a single step of a Turing machine that, when in state “b” and seeing a binary symbol σ ∈ {0, 1}, replaces σ with the symbol σ + 2, maintains its internal state, and moves one position to the right.3 Formally, we define the successive configuration function that maps each instantaneous configuration to the one resulting by letting the machine take a single step. This function modifies its argument in a very minor manner, as described in the foregoing paragraph; that is, the contents of at most one cell (i.e., at which the machine currently resides) is changed, and in addition the internal state of the machine and its location may change, too. 3
Figure 1.1 corresponds to a machine that, when in the initial state (i.e., “a”), replaces the symbol σ ∈ {0, 1} by σ + 4, modifies its internal state to “b,” and moves one position to the right. (See also Figure 1.2, which depicts multiple steps of this machine.) Indeed, “marking” the leftmost cell (in order to allow for recognizing it in the future) is a common practice in the design of Turing machines.
14
1 Computational Tasks and Models
Providing a concrete representation of the successive configuration function requires providing a concrete representation of instantaneous configurations. For example, we may represent each instantaneous configuration of a machine with symbol set and state set Q as a triple (α, q, i), where α ∈ ∗ , q ∈ Q and i ∈ {1, 2, . . . , α}. Let T : × Q → × Q × {−1, 0, +1} be the transition function of the machine. Then, the successive configuration function maps (α, q, i) to (α , q , i + d) such that α differs from α only in the i th location, which is determined according to the first element in T (αi , q). The new state (i.e., q ) and the movement (i.e., d) are determined by the other two elements of T (αi , q). Specifically, except for some pathological cases, the successive configuration function maps (α, q, i) to (α , q , i + d) if and only if T (αi , q) = (αi , q , d) and αj = αj for every j = i, where αj (resp., αj ) denotes the j th symbol of α (resp., α ). The aforementioned pathological cases refer to cases in which the machine resides in one of the “boundary locations” and needs to move farther in that direction. One such case is the case that i = 1 and d = −1, which causes the machine to halt (rather than move left of the left boundary of the tape). The opposite case refers to i = α and d = +1, where the machine moves to the right of the rightmost nonblank symbol, which is represented by = ). extending α with a blank symbol “” (i.e., α  = α + 1 and αα+1 Initial and Final Environments. The initial environment (or configuration) of a Turing machine consists of the machine residing in the first (i.e., leftmost) cell and being in its initial state. Typically, one also mandates that in the initial configuration, a prefix of the tape’s cells holds bit values, which concatenated together are considered the input, and the rest of the tape’s cells hold a special (“blank”) symbol (which in Figures 1.1 and 1.2 is denoted by “”). Thus, the initial configuration of a Turing machine has a finite (explicit) description. Once the machine halts, the output is defined as the contents of the cells that are to the left of its location (at termination time).4 Note, however, that the machine need not halt at all (when invoked on some initial environment).5 Thus, each machine defines a (possibly partial) function mapping inputs to outputs, called the function computed by the machine. That is, the function computed by machine M maps x to y if, when invoked on input x, machine M halts with output y, and is undefined on x if machine M does halt on input x. 4
5
By an alternative convention, the machine must halt when residing in the leftmost cell, and the output is defined as the maximal prefix of the tape contents that contains only bit values. In such a case, the special nonBoolean output ⊥ is indicated by the machine’s state (and indeed in this case the set of states, Q, contains several halting states). A simple example is a machine that “loops forever” (i.e., it remains in the same state and the same location regardless of what it reads). Recall, however, that we shall be mainly interested in machines that do halt after a finite number of steps (when invoked on any initial environment).
1.3 Uniform Models (Algorithms)
15
a 1
1
0
0
0
1
1
0








0
0
0
1
1
0








0
















b 5
1
b 5
3
2
2
2
3
1
b 5
3
2
2
2
3
3
0
Figure 1.2. Multiple steps of the machine depicted in Figure 1.1.
As stated up front, the Turing machine model is only meant to provide an extremely coarse portrayal of reallife computers. However, the model is intended to reflect the inherent limitations and abilities of reallife computers. Thus, it is important to verify that the Turing machine model is exactly as powerful as a model that provides a more faithful portrayal of reallife computers (see the “sanity check” in §1.3.2.2); that is, a function can be computed by a Turing machine if and only if it is computable by a machine of the latter model. For starters, one may prove that a function can be computed by a singletape Turing machine if and only if it is computable by a multitape (e.g., twotape) Turing machine (as defined next); see Exercise 1.3.
Multitape Turing Machines. We comment that in most expositions, one refers to the location of the “head of the machine” on the tape (rather than to the “location of the machine on the tape”). The standard terminology is more intuitive when extending the basic model, which refers to a single tape, to a
16
1 Computational Tasks and Models
model that supports a constant number of tapes. In the corresponding model of socalled multitape machines, the machine maintains a single head on each such tape, and each step of the machine depends on and effects the cells that are at the machine’s head location on each tape. The input is given on one designated tape, and the output is required to appear on some other designated tape. As we shall see in Section 1.3.5, the extension of the model to multitape Turing machines is crucial to the definition of space complexity. A less fundamental advantage of the model of multitape Turing machines is that it is easier to design multitape Turing machines that compute functions of interest (see, e.g., Exercise 1.4). 1.3.2.2 The ChurchTuring Thesis The entire point of the model of Turing machines is its simplicity. That is, in comparison to more “realistic” models of computation, it is simpler to formulate the model of Turing machines and to analyze machines in this model. The ChurchTuring Thesis asserts that nothing is lost by considering the Turing machine model: A function can be computed by some Turing machine if and only if it can be computed by some machine of any other “reasonable and general” model of computation. This is a thesis, rather than a theorem, because it refers to an intuitive notion (i.e., the notion of a reasonable and general model of computation) that is left undefined on purpose. The model should be reasonable in the sense that it should allow only computation rules that are “simple” in some intuitive sense. For example, we should be able to envision a mechanical implementation of these computation rules. On the other hand, the model should allow the computation of “simple” functions that are definitely computable according to our intuition. At the very least, the model should allow the emulatation of Turing machines (i.e., computation of the function that, given a description of a Turing machine and an instantaneous configuration, returns the successive configuration). A Philosophical Comment. The fact that a thesis is used to link an intuitive concept to a formal definition is common practice in any science (or, more broadly, in any attempt to reason rigorously about intuitive concepts). Any attempt to rigorously define an intuitive concept yields a formal definition that necessarily differs from the original intuition, and the question of correspondence between these two objects arises. This question can never be rigorously treated because one of the objects that it relates to is undefined. That is, the question of correspondence between the intuition and the definition always
1.3 Uniform Models (Algorithms)
17
transcends a rigorous treatment (i.e., it always belongs to the domain of the intuition). A Sanity Check: Turing Machines can Emulate an Abstract RAM. To gain confidence in the ChurchTuring Thesis, one may attempt to define an abstract RandomAccess Machine (RAM), and verify that it can be emulated by a Turing machine. An abstract RAM consists of an infinite number of memory cells, each capable of holding an integer, a finite number of similar registers, one designated as program counter, and a program consisting of instructions selected from a finite set. The set of possible instructions includes the following instructions: r reset(r), where r is an index of a register, results in setting the value of register r to zero. r inc(r), where r is an index of a register, results in incrementing the content of register r. Similarly dec(r) causes a decrement. r load(r , r ), where r and r are indices of registers, results in loading to 1 2 1 2 register r1 the contents of the memory location m, where m is the current contents of register r2 . r store(r , r ), stores the contents of register r in the memory, analogously 1 2 1 to load. r condgoto(r, ), where r is an index of a register and does not exceed the program length, results in setting the program counter to − 1 if the content of register r is nonnegative. The program counter is incremented after the execution of each instruction, and the next instruction to be executed by the machine is the one to which the program counter points (and the machine halts if the program counter exceeds the program’s length). The input to the machine may be defined as the contents of the first n memory cells, where n is placed in a special input register, and all other memory cells are assumed to be empty (i.e., contain blanks). We note that the abstract RAM model (as defined) is as powerful as the Turing machine model (see the following details). However, in order to make the RAM model closer to reallife computers, we may augment it with additional instructions that are available on reallife computers like the instruction add(r1 , r2 ) (resp., mult(r1 , r2 )) that results in adding (resp., multiplying) the contents of registers r1 and r2 (and placing the result in register r1 ). Likewise, we may augment the model with explicit loopconstructs (although such constructs are easily implementable using the condgoto instruction). We suggest proving that this abstract RAM can be emulated by a Turing machine, see Exercise 1.5. We emphasize this direction of the equivalence of
18
1 Computational Tasks and Models
the two models, because the RAM model is introduced in order to convince the reader that Turing machines are not too weak (as a model of general computation). The fact that they are not too strong seems selfevident. Thus, it seems pointless to prove that the RAM model can emulate Turing machines. (Still, note that this is indeed the case, by using the RAM’s memory cells to store the contents of the cells of the Turing machine’s tape, and holding its head location in a special register.) Reflections. Observe that the abstract RAM model is significantly more cumbersome than the Turing machine model. Furthermore, seeking a sound choice of the instruction set (i.e., the instructions to be allowed in the model) creates a vicious cycle (because the sound guideline for such a choice should have been allowing only instructions that correspond to “simple” operations, whereas the latter correspond to easily computable functions . . . ). This vicious cycle was avoided in the foregoing paragraph by trusting the reader to include only instructions that are available in some reallife computer. (We comment that this empirical consideration is justifiable in the current context because our current goal is merely linking the Turing machine model with the reader’s experience of reallife computers.)
1.3.3 Uncomputable Functions Strictly speaking, the current subsection is not necessary for the rest of this book, but we feel that it provides a useful perspective. 1.3.3.1 On the Existence of Uncomputable Functions In contrast to what every layman would think, not all functions are computable. Indeed, an important message to be communicated to the world is that not every welldefined task can be solved by applying a “reasonable” automated procedure (i.e., a procedure that has a simple description that can be applied to any instance of the problem at hand). Furthermore, not only is it the case that there exist uncomputable functions, but it is rather the case that most functions are uncomputable. In fact, only relatively few functions are computable. Theorem 1.4 (on the scarcity of computable functions): The set of computable functions is countable, whereas the set of all functions (from strings to strings) is not countable. Furthermore, the latter set has the same cardinality as the power set of the natural numbers, which in turn has the same cardinality as the set of real numbers.
1.3 Uniform Models (Algorithms)
19
We stress that the theorem holds for any reasonable model of computation. In fact, it relies only on the postulate that each machine in the model has a finite description (i.e., can be described by a string). Proof: Since each computable function is computable by a machine that has a finite description, there is an injection of the set of computable functions to the set of strings (whereas the set of all strings is in 11 correspondence to the natural numbers). On the other hand, there is a 11 correspondence between the set of Boolean functions (i.e., functions from strings to a single bit) and the power set of the natural numbers. This correspondence associates each subset S ∈ N to the function f : N → {0, 1} such that f (i) = 1 if and only if i ∈ S. Establishing the remaining set theoretic facts is not really in the scope of the current book. Specifically, we refer to the following facts: 1. The set of all Boolean functions has the same cardinality as the set of all functions (from strings to strings). 2. The power set of the natural numbers has the same cardinality as the set of real numbers. 3. Each of the foregoing sets (e.g., the real numbers) is not countable.6 The theorem follows. 1.3.3.2 The Halting Problem In contrast to the discussion in Section 1.3.1, at this point we also consider machines that may not halt on some inputs. The functions computed by such machines are partial functions that are defined only on inputs on which the machine halts. Again, we rely on the postulate that each machine in the model has a finite description, and denote the description of machine M by
M ∈ {0, 1}∗ . The halting function, h : {0, 1}∗ × {0, 1}∗ → {0, 1}, is defined def such that h( M, x) = 1 if and only if M halts on input x. The following result goes beyond Theorem 1.4 by pointing to an explicit function (of natural interest) that is not computable. Theorem 1.5 (undecidability of the halting problem): The halting function is not computable. The term undecidability means that the corresponding decision problem cannot be solved by an automated procedure. That is, Theorem 1.5 asserts that the 6
Advanced comment: This fact is usually established by a “diagonalization” argument, which is actually the core of the proof of Theorem 1.5. For further discussion, the interested reader is referred to [3, Chap. 2].
20
1 Computational Tasks and Models
decision problem associated with the set h−1 (1) = {( M, x) : h( M, x) = 1} is not solvable by an algorithm (i.e., there exists no algorithm that, given a pair ( M, x), decides whether or not M halts on input x). Actually, the following proof shows that there exists no algorithm that, given M, decides whether or not M halts on input M. The conceptual significance of Theorem 1.5 is discussed in §1.3.3.3 (following Theorem 1.6). Proof: We will show that even the restriction of h to its “diagonal” (i.e., def the function d( M) = h( M, M)) is not computable. Note that the value of d( M) refers to the question of what happens when we feed M with its own description, which is indeed a “nasty” (but legitimate) thing to do. We will actually do something “worse”: toward the contradiction, we will consider the value of d when evaluated at a (machine that is related to a) hypothetical machine that supposedly computes d. We start by considering a related function, d , and showing that this function is uncomputable. The function d is defined on purpose so as to foil any attempt to compute it; that is, for every machine M, the value d ( M) is defined to differ from M( M). Specifically, the function d : {0, 1}∗ → {0, 1} is defined def such that d ( M) = 1 if and only if M halts on input M with output 0. That is, d ( M) = 0 if either M does not halt on input M or its output does not equal the value 0. Now, suppose, toward the contradiction, that d is computable by some machine, denoted Md . Note that machine Md is supposed to halt on every input, and so Md halts on input Md . But, by definition of d , it holds that d ( Md ) = 1 if and only if Md halts on input Md with output 0 (i.e., if and only if Md ( Md ) = 0). Thus, Md ( Md ) = d ( Md ) in contradiction to the hypothesis that Md computes d . We next prove that d is uncomputable, and thus h is uncomputable (because d(z) = h(z, z) for every z). To prove that d is uncomputable, we show that if d is computable then so is d (which we already know not to be the case). Indeed, suppose toward the contradiction that A is an algorithm for computing d (i.e., A( M) = d( M) for every machine M). Then, we construct an algorithm for computing d , which given M , invokes A on M , where M is defined to operate as follows: 1. On input x, machine M emulates M on input x. 2. If M halts on input x with output 0 then M halts. 3. If M halts on input x with an output different from 0 then M enters an infinite loop (and thus does not halt). Otherwise (i.e., M does not halt on input x), then machine M does not halt (because it just stays stuck in Step 1 forever).
1.3 Uniform Models (Algorithms)
21
Note that the mapping from M to M is easily computable (by augmenting M with instructions to test its output and enter an infinite loop if necessary), and that d( M ) = d ( M ), because M halts on x if and only if M halts on x with output 0. We thus derived an algorithm for computing d (i.e., transform the input M into M and output A( M )), which contradicts the already established fact by which d is uncomputable. Thus, our contradiction hypothesis that there exists an algorithm (i.e., A) that computes d is proved false, and the theorem follows (because if the restriction of h to its diagonal (i.e., d) is not computable, then h itself is surely not computable). Digest. The core of the second part of the proof of Theorem 1.5 is an algorithm that solves one problem (i.e., computes d ) by using as a subroutine an algorithm that solves another problem (i.e., computes d (or h)).7 In fact, the first algorithm is actually an algorithmic scheme that refers to a “functionally specified” subroutine rather than to an actual (implementation of such a) subroutine, which may not exist. Such an algorithmic scheme is called a Turingreduction (see formulation in Section 1.3.6). Hence, we have Turingreduced the computation of d to the computation of d, which in turn Turingreduces to h. The “natural” (“positive”) meaning of a Turingreduction of f to f is that, when given an algorithm for computing f , we obtain an algorithm for computing f . In contrast, the proof of Theorem 1.5 uses the “unnatural” (“negative”) counterpositive: If (as we know) there exists no algorithm for computing f = d then there exists no algorithm for computing f = d (which is what we wanted to prove). Jumping ahead, we mention that resourcebounded Turingreductions (e.g., polynomialtime reductions) play a central role in Complexity Theory itself, and again they are used mostly in a “negative” way. We will define such reductions and extensively use them in subsequent chapters. 1.3.3.3 A Few More Undecidability Results We briefly review a few appealing results regarding undecidable problems. Rice’s Theorem. The undecidability of the Halting Problem (or rather the fact that the function h is uncomputable) is a special case of a more general phenomenon: Every nontrivial decision problem regarding the function computed by a given Turing machine has no algorithmic solution. We state this fact next, clarifying the definition of the aforementioned class of problems. (Again, we refer to Turing machines that may not halt on all inputs.) 7
The same holds also with respect to the first part of the proof, which uses the fact that the ability to compute h yields the ability to compute d. However, in this case the underlying algorithmic scheme is so obvious that we chose not to state it explicitly.
22
1 Computational Tasks and Models
Theorem 1.6 (Rice’s Theorem): Let F be any nontrivial subset8 of the set of all computable partial functions, and let SF be the set of strings that describe machines that compute functions in F. Then deciding membership in SF cannot be solved by an algorithm. Theorem 1.6 can be proved by a Turingreduction from d. We do not provide a proof because this is too remote from the main subject matter of the book. (Still, the interested reader is referred to Exercise 1.6.) We stress that Theorems 1.5 and 1.6 hold for any reasonable model of computation (referring both to the potential solvers and to the machines the description of which is given as input to these solvers). Thus, Theorem 1.6 means that no algorithm can determine any nontrivial property of the function computed by a given computer program (written in any programming language). For example, no algorithm can determine whether or not a given computer program halts on each possible input. The relevance of this assertion to the project of program verification is obvious. See further discussion of this issue at the end of Section 4.2. The Post Correspondence Problem. We mention that undecidability also arises outside of the domain of questions regarding computing devices (given as input). Specifically, we consider the Post Correspondence Problem in which the input consists of two sequences of (nonempty) strings, (α1 , . . . , αk ) and (β1 , . . . , βk ), and the question is whether or not there exists a sequence of indices i1 , . . . , i ∈ {1, . . . , k} such that αi1 · · · αi = βi1 · · · βi . (We stress that the length of this sequence is not a priori bounded.)9 Theorem 1.7: The Post Correspondence Problem is undecidable. Again, the omitted proof is by a Turingreduction from d (or h), and the interested reader is referred to Exercise 1.8.
1.3.4 Universal Algorithms So far we have used the postulate that in any reasonable model of computation, each machine (or computation rule) has a finite description. Furthermore, in the proof of Theorem 1.5, we also used the postulate that such a model allows for 8
9
The set S is called a nontrivial subset of U if both S and U \ S are nonempty. Clearly, if F is a trivial set of computable functions then the corresponding decision problem can be solved by a “trivial” algorithm that outputs the corresponding constant bit. In contrast, the existence of an adequate sequence of a specified length can be determined in time that is exponential in this length.
1.3 Uniform Models (Algorithms)
23
easy modification of a description of a machine that computes a function into a description of a machine that computes a closely related function. Here, we go one step further and postulate that the description of machines (in this model) is “effective” in the following natural sense: There exists an algorithm that, given a description of a machine (resp., computation rule) and a corresponding environment, determines the environment that results from performing a single step of this machine on this environment (resp., the effect of a single application of the computation rule).10 This algorithm can, in turn, be implemented in the said model of computation (assuming this model is general; see the ChurchTuring Thesis). Successive applications of this algorithm lead to the notion of a universal machine, which (for concreteness) is formulated next in terms of Turing machines. Definition 1.8 (universal machines): A universal Turing machine is a Turing machine that when given a description of a machine M and a corresponding input x returns the value of M(x) if M halts on x and otherwise does not halt. That is, a universal Turing machine computes the partial function u that is defined on pairs ( M, x) such that M halts on input x, in which case it holds that u( M, x) = M(x). That is, u( M, x) = M(x) if M halts on input x, and u is undefined on ( M, x) otherwise. We note that if M halts on all possible inputs then u( M, x) is defined for every x. 1.3.4.1 The Existence of Universal Algorithms We stress that the mere fact that we have defined something (i.e., a universal Turing machine) does not mean that it exists. Yet, as hinted in the foregoing discussion and obvious to anyone who has written a computer program (and thought about what he/she was doing), universal Turing machines do exist. Theorem 1.9: There exists a universal Turing machine. Theorem 1.9 asserts that the partial function u is computable. In contrast, it can be shown that any extension of u to a total function is uncomputable. That is, for any total function uˆ that agrees with the partial function u on all the inputs on which the latter is defined, it holds that uˆ is uncomputable (see Exercise 1.10). Proof: Given a pair ( M, x), we just emulate the computation of machine M on input x. This emulation is straightforward because (by the effectiveness of the description of M) we can iteratively determine the next instantaneous configuration of the computation of M on input x. If the said computation 10
For details, see Exercise 1.9.
24
1 Computational Tasks and Models
halts, then we will obtain its output and can output it (and so, on input ( M, x), our algorithm returns M(x)). Otherwise, we turn out emulating an infinite computation, which means that our algorithm does not halt on input ( M, x). Thus, the foregoing emulation procedure constitutes a universal machine (i.e., yields an algorithm for computing u). As hinted already, the existence of universal machines is the fundamental fact underlying the paradigm of generalpurpose computers. Indeed, a specific Turing machine (or algorithm) is a device that solves a specific problem. A priori, solving each problem would have required building a new physical device that allows for this problem to be solved in the physical world (rather than as a thought experiment). The existence of a universal machine asserts that it is enough to build one physical device, that is, a general purpose computer. Any specific problem can then be solved by writing a corresponding program to be executed (or emulated) by the generalpurpose computer. Thus, universal machines correspond to generalpurpose computers, and provide the philosophical basis for separating hardware from software. Furthermore, the existence of universal machines says that software can be viewed as (part of the) input. In addition to their practical importance, the existence of universal machines (and their variants) has important consequences in the theories of computing and Computational Complexity. To demonstrate the point, we note that Theorem 1.6 implies that many questions about the behavior of a fixed (universal) machine on certain input types are undecidable. For example, it follows that for some fixed machines (i.e., universal ones), there is no algorithm that determines whether or not the (fixed) machine halts on a given input (see Exercise 1.7). Also, revisiting the proof of Theorem 1.7 (see Exercise 1.8), it follows that the Post Correspondence Problem remains undecidable even if the input sequences are restricted to having a specific length (i.e., k is fixed). A more important application of universal machines to the theory of computing is presented next (i.e., in §1.3.4.2). 1.3.4.2 A Detour: Kolmogorov Complexity The existence of universal machines, which may be viewed as universal languages for writing effective and succinct descriptions of objects, plays a central role in Kolmogorov Complexity. Loosely speaking, the latter theory is concerned with the length of (effective) descriptions of objects, and views the minimum such length as the inherent “complexity” of the object; that is, “simple” objects (or phenomena) are those having a short description (resp., short explanation), whereas “complex” objects have no short description. Needless to say, these (effective) descriptions have to refer to some fixed “language” (i.e.,
1.3 Uniform Models (Algorithms)
25
to a fixed machine that, given a succinct description of an object, produces its explicit description). Fixing any machine M, a string x is called a description of s with respect to M if M(x) = s. The complexity of s with respect to M, denoted KM (s), is the length of the shortest description of s with respect to M. Certainly, we want to fix M such that every string has a description with respect to M, and furthermore such that this description is not “significantly” longer than the description with respect to a different machine M . This desire is fulfilled by the following theorem, which makes it natural to use a universal machine as the “point of reference” (i.e., as the aforementioned M). Theorem 1.10 (complexity wrt a universal machine): Let U be a universal machine. Then, for every machine M , there exists a constant c such that KU (s) ≤ KM (s) + c for every string s. The theorem follows by (setting c = O( M ) and) observing that if x is a description of s with respect to M then ( M , x) is a description of s with respect to U . Here it is important to use an adequate encoding of pairs of strings (e.g., the pair (σ1 · · · σk , τ1 · · · τ ) is encoded by the string σ1 σ1 · · · σk σk 01τ1 · · · τ ). Fixing any universal machine U , we define the Koldef mogorov Complexity of a string s as K(s) = KU (s). The reader may easily verify the following facts: 1. K(s) ≤ s + O(1), for every s. (Hint: Apply Theorem 1.10 to a machine that computes the identity mapping.)
2. There exist infinitely many strings s such that K(s) s. (Hint: Consider s = 1n . Alternatively, consider any machine M such that M(x) x for every x.)
3. Some strings of length n have complexity at least n. Furthermore, for every n and i, {s ∈ {0, 1}n : K(s) ≤ n − i} < 2n−i+1 (Hint: Different strings must have different descriptions with respect to U .)
It can be shown that the function K is uncomputable; see Exercise 1.11. The proof is related to the paradox captured by the following “description” of a natural number: the smallest natural number that cannot be described by an English sentence of up to a thousand letters. (The paradox amounts to observing that if the foregoing number is well defined, then we reach contradiction by noting that the foregoing sentence uses fewer than one thousand letters.) Needless to say, the foregoing sentence presupposes
26
1 Computational Tasks and Models
that any English sentence is a legitimate description in some adequate sense (e.g., in the sense captured by Kolmogorov Complexity). Specifically, the foregoing sentence presupposes that we can determine the Kolmogorov Complexity of each natural number, and thus that we can effectively produce the smallest number that has Kolmogorov Complexity exceeding some threshold (by relying on the fact that natural numbers have arbitrarily large Kolmogorov Complexity). Indeed, the paradox suggests a proof to the fact that the latter task cannot be performed; that is, there exists no algorithm that given t produces the lexicographically first string s such that K(s) > t, because if such an algorithm A would have existed then K(s) ≤ O( A) + log t in contradiction to the definition of s.
1.3.5 Time (and Space) Complexity Fixing a model of computation (e.g., Turing machines) and focusing on algorithms that halt on each input, we consider the number of steps (i.e., applications of the computation rule) taken by the algorithm on each possible input. The latter function is called the time complexity of the algorithm (or machine); that is, tA : {0, 1}∗ → N is called the time complexity of algorithm A if, for every x, on input x algorithm A halts after exactly tA (x) steps. We will be mostly interested in the dependence of the time complexity on the input length when taking the maximum over all inputs of the relevant length. That is, for tA as in the foregoing paragraph, we will consider TA : N → N def defined by TA (n) = maxx∈{0,1}n {tA (x)}. Abusing terminology, we sometimes refer to TA as the time complexity of A. A Small Detour: Linear Speedup and the ONotation. Many models of computation allow for speedup computation by any constant factor; see Exercise 1.14, which refers to the Turing machine model. This motivates the ignoring of constant factors in stating (time) complexity upper bounds, and leads to an extensive usage of the corresponding Onotation in computer science. Recall that we say that f : N → N is O(g), where g : N → N, if there exists a (positive) constant c such that for every (sufficiently large) n ∈ N it holds that f (n) ≤ c · g(n). (The parenthetical augmentations are intended to overcome some pathological cases, where one wishes to use natural bounding functions that “misbehave” on finitely many inputs; e.g., g(n) = n evaluates to zero on 0, and g(n) = n log2 n evaluates to zero on 1). The Time Complexity of a Problem. As stated in the Preface, typically Complexity Theory is not concerned with the (time) complexity of a specific
1.3 Uniform Models (Algorithms)
27
algorithm. It is rather concerned with the (time) complexity of a problem, assuming that this problem is solvable at all (by some algorithm). Intuitively, the time complexity of such a problem is defined as the time complexity of the fastest algorithm that solves this problem (assuming that the latter term is well defined).11 Actually, we shall be interested in upper and lower bounds on the (time) complexity of algorithms that solve the problem. Thus, when we say that a certain problem has complexity T , we actually mean that has complexity at most T . Likewise, when we say that requires time T , we actually mean that has time complexity at least T . Recall that the foregoing discussion refers to some fixed model of computation. Indeed, the complexity of a problem may depend on the specific model of computation in which algorithms that solve are implemented. The following CobhamEdmonds Thesis asserts that the variation (in the time complexity) is not too big, and in particular is irrelevant to the PvsNP Question (as well as to almost all of the current focus of Complexity Theory). The CobhamEdmonds Thesis. As just stated, the time complexity of a problem may depend on the model of computation. For example, deciding membership in the set {xx : x ∈ {0, 1}∗ } can be done in linear time on a twotape Turing machine, but requires quadratic time on a singletape Turing machine (see Exercise 1.13). On the other hand, any problem that has time complexity t in the model of multitape Turing machines has complexity O(t 2 ) in the model of singletape Turing machines (see Exercise 1.12). The CobhamEdmonds Thesis asserts that the time complexities in any two “reasonable and general” models of computation are polynomially related. That is, a problem has time complexity t in some “reasonable and general” model of computation if and only if it has time complexity poly(t) in the model of (singletape) Turing machines. Indeed, the CobhamEdmonds Thesis strengthens the ChurchTuring Thesis. It asserts not only that the class of solvable problems is invariant as far as “reasonable and general” models of computation are concerned, but also that the time complexity (of the solvable problems) in such models is polynomially related. We note that when compared to the ChurchTuring Thesis, the CobhamEdmonds Thesis relies on a more refined perception of what constitutes a reasonable model of computation. Specifically, we should not allow unitcost operations (i.e., computational steps) that effect an unbounded amount of data, 11
Advanced comment: We note that the naive assumption that a “fastest algorithm” (for solving a problem) exists is not always justified (even when ignoring constant factors; see [13, Sec. 4.2.2]). On the other hand, the assumption is essentially justified in some important cases (see, e.g., Theorem 5.5). But even in these cases the said algorithm is “fastest” (or “optimal”) only up to a constant factor.
28
1 Computational Tasks and Models
or alternatively we should charge each operation proportionally to the amount of data being effected by it. A typical example arises in the abstract RAM model rediscussed next. Referring to the abstract RAM model (as defined in §1.3.2.2), we note that a problem has time complexity t in the abstract RAM model if and only if it has time complexity poly(t) in the model of (singletape) Turing machines. While this assertion requires no qualification when referring to the bare model (which only includes the operation reset(·), inc(·), dec(·), load(·, ·), store(·, ·), and condgoto(·, ·)), we need to be careful with respect to augmenting this instruction set with additional (abstract) instructions that (correspond to instructions that) are available on reallife computers. Consider, for example, augmenting the instruction set with add(r1 , r2 ) (resp., mult(r1 , r2 )) that represents adding (resp., multiplying) the contents of registers r1 and r2 (and placing the result in register r1 ). Note that using the addition instruction t times may increase the length (of the bit representation) of the numbers stored in these registers by at most t units,12 but t applications of the multiplication instruction may increase this length by a factor of 2t (via repeated squaring). Thus, we should either restrict these operations to fixedlength integers (as done in reallife computers) or charge each of these operations in proportion to the length of the actual contents of the relevant (abstract) registers. Efficient Algorithms. As hinted in the foregoing discussions, much of Complexity Theory is concerned with efficient algorithms. The latter are defined as polynomialtime algorithms (i.e., algorithms that have time complexity that is upperbounded by a polynomial in the length of the input). By the CobhamEdmonds Thesis, the definition of this class is invariant under the choice of a “reasonable and general” model of computation. For further discussion of the association of efficient algorithms with polynomialtime computation see Section 2.1. Universal Machines, Revisited. The notion of time complexity gives rise to a timebounded version of the universal function u (presented in Section 1.3.4). def Specifically, we define u ( M, x, t) = y if on input x machine M halts within def t steps and outputs the string y, and u ( M, x, t) = ⊥ if on input x machine M makes more than t steps. Unlike u, the function u is a total function. Furthermore, unlike any extension of u to a total function, the function u 12
The same consideration applies also to the other basic instructions (e.g., inc(·)), which justifies our ignoring the issue when discussing the basic instruction set. In fact, using only the basic instructions yields an even slower increase in the length of the stored numbers.
1.3 Uniform Models (Algorithms)
29
is computable. Moreover, u is computable by a machine U that, on input X = ( M, x, t), halts after poly( M + x + t) steps. Indeed, machine U is a variant of a universal machine (i.e., on input X, machine U merely emulates M for t steps rather than emulating M till it halts (and potentially indefinitely)). Note that the number of steps taken by U depends on the specific model of computation (and that some overhead is unavoidable because emulating each step of M requires reading the relevant portion of the description of M). Space Complexity. Another natural measure of the “complexity” of an algorithm (or a task) is the amount of memory consumed by the computation. We refer to the memory used for storing some intermediate results of the computation. Since computations that utilize memory that is sublinear in their input length are of natural interest, it is important to use a model in which one can differentiate memory used for computation from memory used for storing the initial input or the final output. In the context of Turing machines, this is done by considering multitape Turing machines such that the input is presented on a special readonly tape (called the input tape), the output is written on a special writeonly tape (called the output tape), and intermediate results are stored on a worktape. Thus, the input and output tapes cannot be used for storing intermediate results. The space complexity of such a machine M is defined as a function sM such that sM (x) is the number of cells of the worktape that are scanned by M on input x. As in the case of time complexity, we will usually refer to def SA (n) = maxx∈{0,1}n {sA (x)}. In this book we do not discuss space complexity any further, but rather refer the interested reader to [13, Chap. 5].
1.3.6 Oracle Machines and TuringReductions The notion of Turingreductions, which was discussed in Section 1.3.3, is captured by the following definition of socalled oracle machines. Loosely speaking, an oracle machine is a machine that is augmented such that it may pose questions to the outside. We consider the case in which these questions, called queries, are answered consistently by some function f : {0, 1}∗ → {0, 1}∗ , called the oracle. That is, if the machine makes a query q, then the answer it obtains is f (q). In such a case, we say that the oracle machine is given access to the oracle f . For an oracle machine M, a string x and a function f , we denote by M f (x) the output of M on input x when given access to the oracle f . (Reexamining the second part of the proof of Theorem 1.5, observe that we have actually described an oracle machine that computes d when given access to the oracle d.)
30
1 Computational Tasks and Models
Oracle machines provide a formulation of procedures that use “functionally specified” subroutines. That is, the functionality of the subroutine is specified (by the aforementioned function f ), but its operation remains unspecified. In contrast, the oracle machine (i.e., M) provides a full specification of how the subroutine (represented by f ) is used. Such procedures (or rather such efficient procedures) are the subject of Chapter 3, and further discussion will appear there. Our aim in the current section is merely introducing the basic framework, which is analogous to our introducing the notion of algorithms in the current chapter, whereas the entire book focuses on efficient algorithms. The notion of an oracle machine extends the notion of a standard computing device (machine), and thus a rigorous formulation of the former extends a formal model of the latter. Specifically, extending the model of Turing machines, we derive the following model of oracle Turing machines. Definition 1.11 (using an oracle): r An oracle machine is a Turing machine with a special additional tape, called the oracle tape, and two special states, called oracle invocation and oracle spoke. r The computation of the oracle machine M on input x and access to the oracle f : {0, 1}∗ → {0, 1}∗ is defined based on the successive configuration function. For configurations with a state different from oracle invocation the next configuration is defined as usual. Let γ be a configuration in which the machine’s state is oracle invocation and suppose that the actual contents of the oracle tape is q (i.e., q is the contents of the maximal prefix of the tape that holds bit values).13 Then, the configuration following γ is identical to γ , except that the state is oracle spoke, and the actual contents of the oracle tape is f (q). The string q is called M’s query and f (q) is called the oracle’s reply. r The output of the oracle machine M on input x when given oracle access to f is denoted M f (x). We stress that the running time of an oracle machine is the number of steps made during its (own) computation, and that the oracle’s reply on each query is obtained in a single step. Combining Definition 1.11 with the notion of solving a problem (see Definitions 1.1 and 1.2), we obtain the definition of a Turingreduction.
13
This fits the definition of the actual initial contents of a tape of a Turing machine (cf. Section 1.3.2). A common convention is that the oracle can be invoked only when the machine’s head resides at the leftmost cell of the oracle tape.
1.4 NonUniform Models (Circuits and Advice)
31
Definition 1.12 (Turing reduction): A problem is Turingreducible to a problem if there exists an oracle machine M such that for every function f that solves it holds that M f solves . It follows that if there exists an algorithm for solving , then there exists an algorithm for solving . Indeed, in the proof of Theorem 1.5 we used the contrapositive of the foregoing (i.e., if no algorithm can solve , then no algorithm can solve ). Recall that (efficient) reductions are the subject matter of Chapter 3, and so we shall return to them at greater length at that point.
1.3.7 Restricted Models We mention that restricted models of computation are often mentioned in the context of a course on computability, but they will play no role in the current book. One such model is the model of finite automata, which in some variant coincides with Turing machines that have space complexity zero (equiv., constant). In our opinion, the most important motivation for the study of these restricted models of computation is that they provide simple models for some natural (or artificial) phenomena. This motivation, however, seems only remotely related to the study of the complexity of various computational tasks, which calls for the consideration of general models of computation and the evaluation of the complexity of computation with respect to such models.
1.4 NonUniform Models (Circuits and Advice) In the current book, we only use nonuniform models of computation as a source of some natural computational problems (cf. Section 4.3.1). Specifically, we will refer to the satisfiability of Boolean circuits (defined in §1.4.1.1) and formulae (defined in §1.4.3.1). We mention, however, that these models are typically considered for other purposes (see a brief discussion that follows). By a nonuniform model of computation we mean a model in which for each possible input length a different computing device is considered, while there is no “uniformity” requirement relating devices that correspond to different input lengths. Furthermore, this collection of devices is infinite by nature, and (in the absence of a uniformity requirement) this collection may not even have a finite description. Nevertheless, each device in the collection has a finite description. In fact, the relationship between the size of the device (resp., the length of its description) and the length of the input that it handles will be of
32
1 Computational Tasks and Models
major concern. Specifically, the size of these devices gives rise to a complexity measure that can be used to upperbound the time complexity of corresponding algorithms. Nonuniform models of computation are considered either toward the development of techniques for proving complexity lower bounds or as providing simplified upper bounds on the ability of efficient algorithms.14 In both cases, the uniformity condition is eliminated in the interest of simplicity and with the hope (and belief) that nothing substantial is lost as far as the issues at hand are concerned. In the context of developing lower bounds, the hope is that the finiteness of all parameters (i.e., the input length and the device’s description) will allow for the application of combinatorial techniques to analyze the limitations of certain settings of parameters. We mention that this hope has materialized in some restricted cases (see Section 1.4.3). We will focus on two related models of nonuniform computing devices: Boolean circuits (Section 1.4.1) and “machines that take advice” (Section 1.4.2). The former model is more adequate for the study of the evolution of computation (i.e., development of “lower bound techniques”), whereas the latter is more adequate for modeling purposes (e.g., limiting the ability of efficient algorithms).
1.4.1 Boolean Circuits The most popular model of nonuniform computation is the one of Boolean circuits. Historically, this model was introduced for the purpose of describing the “logic operation” of reallife electronic circuits. Ironically, nowadays this model provides the stage for some of the most practically removed studies in Complexity Theory (which aim at developing methods that may eventually lead to an understanding of the inherent limitations of efficient algorithms). 1.4.1.1 The Basic Model A Boolean circuit is a directed acyclic graph15 with labels on the vertices, to be discussed shortly. For the sake of simplicity, we disallow isolated vertices (i.e., vertices with no incoming or outgoing edges), and thus the graph’s vertices are of three types: sources, sinks, and internal vertices. 14
15
Advanced comment: The second case refers mainly to efficient algorithms that are given a pair of inputs (of (polynomially) related length) such that these algorithms are analyzed with respect to fixing one input (arbitrarily) and varying the other input (typically, at random). Typical examples include the context of derandomization (cf. [13, Sec. 8.3]) and the setting of zeroknowledge (cf. [13, Sec. 9.2]). See Appendix A.1.
1.4 NonUniform Models (Circuits and Advice)
33
1. Internal vertices are vertices having incoming and outgoing edges (i.e., they have indegree and outdegree at least 1). In the context of Boolean circuits, internal vertices are called gates. Each gate is labeled by a Boolean operation, where the operations that are typically considered are ∧, ∨ and ¬ (corresponding to and, or and neg). In addition, we require that gates labeled ¬ have indegree 1. The indegree of ∧gates and ∨gates may be any number greater than zero, and the same holds for the outdegree of any gate. 2. The graph sources (i.e., vertices with no incoming edges) are called input terminals. Each input terminal is labeled by a natural number (which is to be thought of as the index of an input variable). (For the sake of defining formulae (see §1.4.3.1), we allow different input terminals to be labeled by the same number.)16 3. The graph sinks (i.e., vertices with no outgoing edges) are called output terminals, and we require that they have indegree 1. Each output terminal is labeled by a natural number such that if the circuit has m output terminals then they are labeled 1, 2, . . . , m. That is, we disallow different output terminals to be labeled by the same number, and insist that the labels of the output terminals are consecutive numbers. (Indeed, the labels of the output terminals will correspond to the indices of locations in the circuit’s output.) See the example in Figure 1.3. For the sake of simplicity, we also mandate that the labels of the input terminals are consecutive numbers.17 A Boolean circuit with n different input labels and m output terminals induces (and indeed computes) a function from {0, 1}n to {0, 1}m defined as follows. For any fixed string x ∈ {0, 1}n , we iteratively define the value of vertices in the circuit such that the input terminals are assigned the corresponding bits in x = x1 · · · xn and the values of other vertices are determined in the natural manner. That is: r An input terminal with label i ∈ {1, . . . , n} is assigned the i th bit of x (i.e., the value xi ). 16
17
This is not needed in the case of general circuits, because we can just feed outgoing edges of the same input terminal to many gates. Note, however, that this is not allowed in the case of formulae, where all nonsinks are required to have outdegree exactly 1. This convention slightly complicates the construction of circuits that ignore some of the input values. Specifically, we use artificial gadgets that have incoming edges from the corresponding input terminals, and compute an adequate constant. To avoid having this constant as an output terminal, we feed it into an auxiliary gate such that the value of the latter is determined by the other incoming edge (e.g., a constant 0 fed into an ∨gate). See an example of dealing with x3 in Figure 1.3.
34
1 Computational Tasks and Models
1
2
neg
neg
and
and
4
3
neg
and and
0
or or
1
2
Figure 1.3. A circuit computing f (x1 , x2 , x3 , x4 ) = (x1 ⊕ x2 , x1 ∧ ¬x2 ∧ x4 ).
r If the children of a gate (of indegree d) that is labeled ∧ have values v1 , v2 , . . . , vd , then the gate is assigned the value ∧di=1 vi . The value of a gate labeled ∨ (or ¬) is determined analogously. Indeed, the hypothesis that the circuit is acyclic implies that the following natural process of determining values for the circuit’s vertices is well defined: As long as the value of some vertex is undetermined, there exists a vertex such that its value is undetermined but the values of all its children are determined. Thus, the process can make progress, and terminates when the values of all vertices (including the output terminals) are determined. The value of the circuit on input x (i.e., the output computed by the circuit on input x) is y = y1 · · · ym , where yi is the value assigned by the foregoing process to the output terminal labeled i. We note that there exists a polynomialtime algorithm that, given a circuit C and a corresponding input x, outputs the value of C on input x. This algorithm determines the values of the circuit’s vertices, going from the circuit’s input terminals to its output terminals. We say that a family of circuits (Cn )n∈N computes a function f : {0, 1}∗ → {0, 1}∗ if for every n the circuit Cn computes the restriction of f to strings of length n. In other words, for every x ∈ {0, 1}∗ , it must hold that Cx (x) = f (x). Bounded and Unbounded Fanin. It is often natural to consider circuits in which each gate has at most two incoming edges.18 In this case, the types of 18
Indeed, the term bounded fanin suggests that the upper bound on the number of incoming edges may be any fixed constant, but such circuits can be emulated by circuits with twoargument operations while incurring only a constant factor blowup in their size. The same reason justifies the assertion that the choice of a full basis is immaterial, because each such
1.4 NonUniform Models (Circuits and Advice)
35
(twoargument) Boolean operations that we allow is immaterial (as long as we consider a “full basis” of such operations, i.e., a set of operations that can implement any other twoargument Boolean operation). Such circuits are called circuits of bounded fanin. In contrast, other studies are concerned with circuits of unbounded fanin, where each gate may have an arbitrary number of incoming edges. Needless to say, in the case of circuits of unbounded fanin, the choice of allowed Boolean operations is important and one focuses on operations that are “uniform” (across the number of operands, e.g., ∧ and ∨). Unless specified differently, we shall refer to circuits of unbounded fanin; however, in many of the cases that we consider, the choice is immaterial. 1.4.1.2 Circuit Complexity As stated earlier, the Boolean circuit model is used in Complexity Theory mainly as a basis for defining a (nonuniform) complexity measure. Specifically, the complexity of circuits is defined as their size. Circuit Size as a Complexity Measure. The size of a circuit is the number of its edges. When considering a family of circuits (Cn )n∈N that computes a function f : {0, 1}∗ → {0, 1}∗ , we are interested in the size of Cn as a function of n. Specifically, we say that this family has size complexity s : N → N if for every n the size of Cn is s(n). The circuit complexity of a function f , denoted sf , is the infimum of the size complexity of all families of circuits that compute f . Alternatively, for each n we may consider the size of the smallest circuit that computes the restriction of f to nbit strings (denoted fn ), and set sf (n) accordingly. We stress that nonuniformity is implicit in this definition, because no conditions are made regarding the relation between the various circuits used to compute the function on different input lengths.19 On the Circuit Complexity of Functions. We highlight some simple facts regarding the circuit complexity of functions. These facts are in clear correspondence to facts regarding Kolmogorov Complexity mentioned in §1.3.4.2, and establishing them is left as an exercise (see Exercise 1.15). 1. Most importantly, any Boolean function can be computed by some family of circuits, and thus the circuit complexity of any function is well
19
basis allows for emulating any twoargument operation by a constant size circuit. Indeed, in both cases, we disregard constant factor changes in the circuit size. Advanced comment: We also note that, in contrast to footnote 11, the circuit model and the corresponding (circuit size) complexity measure support the notion of an optimal computing device: Each function f has a unique size complexity sf (and not merely upper and lower bounds on its complexity).
36
1 Computational Tasks and Models
defined. Furthermore, each function has at most exponential circuit complexity. 2. Some functions have polynomial circuit complexity. In particular, any function that has time complexity t (i.e., is computed by an algorithm of time complexity t) has circuit complexity at most poly(t). Furthermore, the corresponding circuit family is uniform (in a natural sense to be discussed in the next paragraph). 3. Almost all Boolean functions require exponential circuit complexity. Specifically, the number of functions mapping {0, 1}n to {0, 1} that can be computed by some circuit of size s is smaller than s 2s . Note that the first fact implies that families of circuits can compute functions that are uncomputable by algorithms. Furthermore, this phenomenon occurs also when restricting attention to families of polynomialsize circuits. See further discussion in Section 1.4.2 (and specifically Theorem 1.14). Uniform Families. A family of polynomialsize circuits (Cn )n∈N is called uniform if given n one can construct the circuit Cn in poly(n)time. Note that if a function is computable by a uniform family of polynomialsize circuits then it is computable by a polynomialtime algorithm. This algorithm first constructs the adequate circuit (which can be done in polynomial time by the uniformity hypothesis), and then evaluates this circuit on the given input (which can be done in time that is polynomial in the size of the circuit). Note that limitations on the computing power of arbitrary families of polynomialsize circuits certainly hold for uniform families (of polynomialsize circuits), which in turn yield limitations on the computing power of polynomialtime algorithms. Thus, lower bounds on the circuit complexity of functions yield analogous lower bounds on their time complexity. Furthermore, as is often the case in mathematics and science, disposing of an auxiliary condition that is not well understood (i.e., uniformity) may turn out to be fruitful. Indeed, this has occured in the study of classes of restricted circuits, which is reviewed in Section 1.4.3.
1.4.2 Machines That Take Advice General (nonuniform) circuit families and uniform circuit families are two extremes with respect to the “amounts of nonuniformity” in the computing device. Intuitively, in the former, nonuniformity is only bounded by the size of the device, whereas in the latter, the amount of nonuniformity is zero. Here we consider a model that allows for decoupling the size of the computing device
1.4 NonUniform Models (Circuits and Advice)
37
from the amount of nonuniformity, which may range from zero to the device’s size. Specifically, we consider algorithms that “take a nonuniform advice” that depends only on the input length. The amount of nonuniformity will be defined to equal the length of the corresponding advice (as a function of the input length). Definition 1.13 (taking advice): We say that algorithm A computes the function f using advice of length : N → N if there exists an infinite sequence (an )n∈N such that 1. For every x ∈ {0, 1}∗ , it holds that A(ax , x) = f (x). 2. For every n ∈ N, it holds that an  = (n). The sequence (an )n∈N is called the advice sequence. Note that any function having circuit complexity s can be computed using advice of length O(s log s), where the length upper bound is due to the fact that a graph with v vertices and e edges can be described by a string of length 2e log2 v. Note that the model of machines that use advice allows for some sharper bounds than the ones stated in §1.4.1.2: Every function can be computed using advice of length such that (n) = 2n , and some uncomputable functions can be computed using advice of length 1. Theorem 1.14 (the power of advice): There exist functions that can be computed using onebit advice but cannot be computed without advice. Proof: Starting with any uncomputable Boolean function f : N → {0, 1}, consider the function f defined as f (x) = f (x); that is, the value of f (x) only depends on the length of x (and, specifically, equals f (x)). Note that f is Turingreducible to f (e.g., on input n make any nbit query to f , and return the answer).20 Thus, f cannot be computed without advice. On the other hand, f can be easily computed by using the advice sequence (an )n∈N such that an = f (n); that is, the algorithm merely outputs the advice bit (and indeed ax = f (x) = f (x), for every x ∈ {0, 1}∗ ).
1.4.3 Restricted Models The model of Boolean circuits (cf. §1.4.1.1) allows for the introduction of many natural subclasses of computing devices. Following is a laconic review of a few 20
Indeed, this Turingreduction is not efficient (i.e., it runs in exponential time in n = log2 n), but this is immaterial in the current context.
38
1 Computational Tasks and Models
PARITY
PARITY
of x .... x 1
of x
n
...x
n+1
neg
neg
2n
PARITY
PARITY
of x .... x 1
PARITY
of x ...x
n
n+1
PARITY
of x .... x
2n
1
of x ...x
n
n+1
neg
2n
neg
and
and
and
and
or
or
Figure 1.4. Recursive construction of parity circuits and formulae.
of these subclasses. (For further detail regarding the study of these subclasses, the interested reader is referred to [1].) 1.4.3.1 Boolean Formulae In (general) Boolean circuits the nonsink vertices are allowed arbitrary outdegree. This means that the same intermediate value can be reused without being recomputed (and while increasing the size complexity by only one unit). Such “free” reusage of intermediate values is disallowed in Boolean formulae, which are formally defined as Boolean circuits in which all nonsink vertices have outdegree 1. This means that the underlying graph of a Boolean formula is a tree (see Appendix A.2), and it can be written as a Boolean expression over Boolean variables by traversing this tree (and registering the vertices’ labels in the order traversed). Indeed, we have allowed different input terminals to be assigned the same label in order to allow formulae in which the same variable occurs multiple times. As in the case of general circuits, one is interested in the size of these restricted circuits (i.e., the size of families of formulae computing various functions). We mention that quadratic lower bounds are known for the formula size of simple functions (e.g., parity), whereas these functions have linear circuit complexity. This discrepancy is depicted in Figure 1.4. Formulae in CNF and DNF. A restricted type of Boolean formulae consists of formulae that are in conjunctive normal form (CNF). Such a formula consists of a conjunction of clauses, where each clause is a disjunction of literals, each being either a variable or its negation. That is, such formulae are represented by layered circuits of unbounded fanin in which the first layer consists of
1.4 NonUniform Models (Circuits and Advice)
1
2
3
and
1
2
3
1
neg
neg
neg
and
2
and
39
3
1
2
neg
neg
neg
3
and
or
1
Figure 1.5. A 3DNF formula computing x1 ⊕ x2 ⊕ x3 as (x1 ∧ x2 ∧ x3 ) ∨ (x1 ∧ ¬x2 ∧ ¬x3 ) ∨ (¬x1 ∧ x2 ∧ ¬x3 ) ∨ (¬x1 ∧ ¬x2 ∧ x3 ).
neggates that compute the negation of input variables, the second layer consists of orgates that compute the logicalor of subsets of inputs and negated inputs, and the third layer consists of a single andgate that computes the
logicaland of the values computed in the second layer. Note that each Boolean function can be computed by a family of CNF formulae of exponential size (see Exercise 1.17), and that the size of CNF formulae may be exponentially larger than the size of ordinary formulae computing the same function (e.g., parity).21 For a constant k (e.g., k = 2, 3), a formula is said to be in k CNF if its CNF has disjunctions of size at most k. An analogous restricted type of Boolean formulae refers to formulae that are in disjunctive normal form (DNF). Such a formula consists of a disjunction of a conjunction of literals, and when each conjunction has at most k literals we say that the formula is in k DNF. (Figure 1.5 depicts a 3DNF formula that computes the parity of three variables.) 1.4.3.2 Other Restricted Classes of Circuits Two other restricted classes of circuits, which have received a lot of attention in Complexity Theory (but are not used in this book), are the classes of constantdepth circuits and monotone circuits. Constantdepth Circuits. Circuits have a “natural structure” (i.e., their structure as graphs). One natural parameter regarding this structure is the depth 21
See Exercise 1.18.
40
1 Computational Tasks and Models
of a circuit, which is defined as the longest directed path from any source to any sink. Of special interest are constantdepth circuits of unbounded fanin. We mention that subexponential lower bounds are known for the size of such circuits that compute a simple function (e.g., parity).
Monotone Circuits. The circuit model also allows for the consideration of monotone computing devices: A monotone circuit is one having only monotone gates (e.g., gates computing ∧ and ∨, but no negation gates (i.e., ¬gates)). Needless to say, monotone circuits can only compute monotone functions, where a function f : {0, 1}n → {0, 1} is called monotone if for any x y it holds that f (x) ≤ f (y) (where x1 · · · xn y1 · · · yn if and only if for every bit position i it holds that xi ≤ yi ). One natural question is whether, as far as monotone functions are concerned, there is a substantial loss in using only monotone circuits. The answer is yes: There exist monotone functions that have polynomial circuit complexity but require subexponentialsize monotone circuits.
1.5 Complexity Classes Complexity classes are sets of computational problems. Typically, such classes are defined by fixing three parameters: 1. A type of computational problems (see Section 1.2). Indeed, the most standard complexity classes refer to decision problems, but classes of search problems, promise problems, and other types of problems are also considered. 2. A model of computation, which may be either uniform (see Section 1.3) or nonuniform (see Section 1.4). 3. A complexity measure and a limiting function (or a set of functions), which when put together limit the class of computations of the previous item; that is, we refer to the class of computations that have complexity not exceeding the specified function (or set of functions). For example, in Section 1.3.5, we mentioned time complexity and space complexity, which apply to any uniform model of computation. We also mentioned polynomialtime computations, which are computations in which the time complexity (as a function) does not exceed some polynomial (i.e., is a member of the set of polynomial functions). The most common complexity classes refer to decision problems, and are sometimes defined as classes of sets rather than classes of the corresponding
Exercises
41
decision problems. That is, one often says that a set S ⊆ {0, 1}∗ is in the class C, rather than saying that the problem of deciding membership in S is in the class C. Likewise, one talks of classes of relations rather than classes of the corresponding search problems (i.e., saying that R ⊆ {0, 1}∗ × {0, 1}∗ is in the class C means that the search problem of R is in the class C).
Exercises Exercise 1.1 (a quiz) 1. 2. 3. 4. 5. 6. 7. 8. 9.
What is the default representation of integers (in Complexity Theory)? What are search and decision problems? What is the motivation for considering the model of Turing machines? What does the ChurchTuring Thesis assert? What is a universal algorithm? What does undecidability mean? What is the time complexity of an algorithm? What does the CobhamEdmonds Thesis assert? What are Boolean circuits and formulae?
Exercise 1.2 Prove that any function that can be computed by a Turing machine can be computed by a machine that never moves left of the end of the tape. Guideline: Modify the original machine by “marking” the leftmost cell of the tape (by using special symbols such that the original contents is maintained). Needless to say, this marking corresponds to an extension of the tape’s symbols. Exercise 1.3 (singletape versus multitape Turing machines) Prove that a function can be computed by a singletape Turing machine if and only if it is computable by a multitape (e.g., twotape) Turing machine. Guideline: The emulation of the multitape Turing machine on a singletape machine is based on storing all the original tapes on a single tape such that the i th cell of the single tape records the contents of the i th cell of each of the original tapes. In addition, the i th cell of the single tape records an indication as to which of the original heads reside in the i th cell of the corresponding original tapes. To emulate a single step of the original machine, the new machine scans its tape, finds all original head locations, and retrieves the corresponding cell contents. Based on this information, the emulating machine effects the corresponding step (according to the original transition function) by modifying its (single) tape’s contents in an analogous manner.
42
1 Computational Tasks and Models
Exercise 1.4 (computing the sum of natural numbers) Prove that a Turing machine can add natural numbers; that is, outline a (multitape) Turing machine that on input a pair of integers (in binary representation) outputs their sum. Specifically, show that the straightforward addition algorithm can be implemented in linear time by a multitape Turing machine. Guideline: A straightforward implementation of addition on a twotape Turing machine starts by copying the two (input) integers (from the input tape) to the second tape such that the i th least significant bits of both integers reside in the i th cell (of the second tape). Exercise 1.5 (Turing machines vs abstract RAM) Prove that an abstract RAM can be emulated by a Turing machine. Guideline: Recall that by our conventions, the abstract RAM computation is initialized such that only a prefix of the memory cells contains meaningful data, and (the length of) this prefix is specified in a special register. Thus, during the emulation (of the abstract RAM), we only need to keep track of the contents of these memory cells as well as the contents of any other memory cells that were accessed during the computation (and the contents of all registers). Consequently, during the emulation, the Turing machine’s tape will contain a list of the RAM’s memory cells that were accessed so far as well as their current contents. When we emulate a RAM instruction that refers to some memory location (which is specified in the contents of a fixed register), we first check whether the relevant RAM cell appears on our list, and accordingly either augment the list by a corresponding entry or modify this entry as required. Exercise 1.6 (Rice’s Theorem (Theorem 1.6)) Let F and SF be as in Theorem 1.6. Present a Turingreduction of d to SF . Guideline: Let f⊥ denote the function that is undefined on all inputs. Assume, without loss of generality, that f⊥ ∈ F, let f1 denote an arbitrary function in F, and let M1 be an arbitrary fixed machine that computes f1 . Then, the reduction maps an input M for d to the input M for SF such that machine M operates as follows on input x: 1. First, machine M emulates M on input M. 2. If M halts (in Step 1), then M emulates M1 (x), and outputs whatever it does. Note that the mapping from M to M is easily computable (by augmenting M with the fixed machine M1 ). Now, if d( M) = 1, then machine M reaches Step 2, and thus M (x) = f1 (x) for every x, which in turn implies M ∈ SF
Exercises
43
(because M computes f1 ∈ F ). On the other hand, if d( M) = 0, then machine M remains stuck in Step 1, and thus M does not halt on any x, which in turn implies M ∈ SF (because M computes f⊥ ∈ F ). Exercise 1.7 Prove that there exists a Turing machine M such that there is no algorithm that determines whether or not M halts on a given input. Guideline: Let M be a universal machine, and present a Turingreduction from h to hM , where hM (x) = h( M, x).
Exercise 1.8 (Post Correspondence Problem (Theorem 1.7)) The following exercise is significantly more difficult than the norm. Present a Turingreduction of h to the Post Correspondence Problem, denoted PCP. Furthermore, use a reduction that maps an instance ( M, x) of h to a pair of sequences ((α1 , . . . , αk ), (β1 , . . . , βk )) such that only α1 and β1 depend on x, whereas k as well as the other strings depend only on M. Guideline: Consider a modified version of the Post Correspondence Problem, denoted MPCP, in which the first index in the solution sequence must equal 1 (i.e., i1 = 1). Reduce h to MPCP, and next reduce MPCP to PCP. The main reduction (i.e., of h to MPCP) maps ( M, x) to ((α1 , . . . , αk ), (β1 , . . . , βk )) such that a solution sequence (i.e., i1 , . . . , i s.t. αi1 · · · αi = β1 · · · βi ) yields a full description of the computation of M on input x (i.e., the sequence of all instantaneous configurations in this computation). Specifically, α1 will describe the initial configuration of M on input x, whereas β1 will be essentially empty (except for a delimiter, denoted #, which is also used at the beginning and at the end of α1 ). Assuming that the set of tapesymbols and the set of states of M are disjoint (i.e., ∩ Q = ∅), configurations will be described as sequences over their union (i.e., sequences over ∩ Q, where # ∈ ∪ Q). Other pairs (αi , βi ) include r For every tapesymbol σ , we shall have α = β = σ (for some i). We shall i i also have αi = βi = # (for some i). Such pairs reflect the preservation of the tape’s contents (whenever the head location is not present at the current cell). r For every nonhalting state q and every transition regarding q, we shall have a pair reflecting this transition. For example, if the transition function maps (q, σ ) to (q , σ , +1), then we have βi = qσ and αi = σ q (for some i). For left movement (i.e., if the transition function maps (q, σ ) to (q , σ , −1)) we have βi = τ qσ and αi = q τ σ . Assuming that blank symbols (i.e., ) are only written to the left of other blank symbols (and when moving left), if the transition function maps (q, σ ) to (q , , −1), then we have βi = τ qσ and αi = q τ (rather than αi = q τ ).
44
1 Computational Tasks and Models
r Assuming that the machine halts in state p only when it resides in the leftmost cell (and after writing blanks in all cells), we have βi = p ## and αi = # (for some i). Note that in a solution sequence i1 , . . . , i such that αi1 · · · αi = β1 · · · βi , for every t < it holds that βi1 · · · βit is a prefix of αi1 · · · αit such that the latter contains exactly one configuration less than the former. The relations between the pairs (αi , βi ) guarantee that these prefixes are prefixes of the sequence of all instantaneous configurations in the computation of M on input x, and a solution can be completed only if this computation halts. For details see [16, Sec. 8.5] or [30, Sec. 5.2]. Exercise 1.9 (total functions extending the universal function) Present an algorithm that, given a description of a Turing machine and a corresponding instantaneous configuration, determines the instantaneous configuration that results by performing a single step of the given machine on the given instantaneous configuration. Note that this exercise requires fixing a concrete representation of Turing machines and corresponding configurations. Guideline: Use the representation of configurations provided in §1.3.2.1. Exercise 1.10 (total functions extending the universal function) Let u be the function computed by any universal machine (for a fixed reasonable model of computation). Prove that any extension of u to a total function (i.e., any total function uˆ that agrees with the partial function u on all the inputs on which the latter is defined) is uncomputable. Guideline: The claim is easy to prove for the special case of the total function uˆ that extends u such that the special symbol ⊥ is assigned to inputs on def
which u is undefined (i.e., uˆ ( M, x) = ⊥ if u is not defined on ( M, x) and def uˆ ( M, x) = u( M, x) otherwise). In this case h( M, x) = 1 if and only if uˆ ( M, x) = ⊥, and so the halting function h is Turingreducible to uˆ . In the general case, we may adapt the proof of Theorem 1.5 by using the fact that for any machine M that halts on every input, it holds that uˆ ( M, x) = u( M, x) for every x (and in particular for x = M). Exercise 1.11 (uncomputability of Kolmogorov Complexity) Prove that the Kolmogorov Complexity function, denoted K, is uncomputable. Guideline: Consider, for every integer t, the string st that is defined as the def lexicographically first string of Kolmogorov Complexity exceeding t (i.e., st = mins∈{0,1}∗ {K(s) > t}). Note that st is well defined and has length at most t (see
Exercises
45
Fact 3 in §1.3.4.2). Assuming that K is computable, we reach a contradiction by noting that st has description length O(1) + log2 t (because it may be described by combining a fixed machine that computes K with the integer t). Exercise 1.12 (singletape versus multitape Turing machines, refined) In continuation of Exercise 1.3, show that any function that can be computed by a multitape Turing machine in time complexity t can be computed by a singletape Turing machine in time complexity O(t 2 ). Exercise 1.13 (singletape vs twotape Turing machines, a complexity gap) The following exercise is significantly more difficult than the norm. Show that the emulation upper bound stated in Exercise 1.12 is optimal. Specifically, prove that deciding membership in the set {xx : x ∈ {0, 1}∗ } requires quadratic time on a singletape Turing machine, and note that this decision problem can be solved in linear time on a twotape Turing machine. Guideline: Proving the quadratic time lower bound is quite nontrivial. One proof is by a “reduction” from a communication complexity problem [19, Sec. 12.2]. Intuitively, a singletape Turing machine that decides membership in the aforementioned set can be viewed as a channel of communication between the two parts of the input. Specifically, focusing our attention on inputs of the form y0n z0n , for y, z ∈ {0, 1}n , note that each time that the machine passes from the one part to the other part it carries O(1) bits of information (in its internal state) while making at least n steps. The proof is completed by invoking the linear lower bound on the communication complexity of the (twoargument) identity function (i.e, id(y, z) = 1 if y = z and id(y, z) = 0 otherwise); cf. [19, Chap. 1]. Exercise 1.14 (linear speedup of Turing machine) Prove that any problem that can be solved by a twotape Turing machine that has time complexity t can be solved by another twotape Turing machine having time complexity t , where t (n) = O(n) + (t(n)/2). Prove an analogous result for onetape Turing machines, where t (n) = O(n2 ) + (t(n)/2). Guideline: Consider a machine that uses a larger alphabet, capable of encoding a constant (denoted c) number of symbols of the original machine, and thus capable of emulating c steps of the original machine in O(1) steps, where the constant in the Onotation is a universal constant (independent of c). Note that the O(n) term accounts for a preprocessing that converts the binary input to workalphabet of the new machine (which encodes c input bits in one alphabet symbol). Thus, a similar result for onetape Turing machines seems to require an additive O(n2 ) term.
46
1 Computational Tasks and Models
Exercise 1.15 (on the circuit complexity of functions) Prove the following facts: 1. The circuit complexity of any Boolean function is at most exponential. Guideline: fn : {0, 1}n → {0, 1} can be computed by a circuit of size O(n2n ) that implements a lookup table. See also Exercise 1.17. 2. Some functions have polynomial circuit complexity. In particular, any function that has time complexity t (i.e., is computed by an algorithm of time complexity t), has circuit complexity poly(t). Furthermore, the corresponding circuit family is uniform. Guideline: Consider a Turing machine that computes the function, and consider its computation on a generic nbit long input. The corresponding computation can be emulated by a circuit that consists of t(n) layers such that each layer represents an instantaneous configuration of the machine, and the relation between consecutive configurations is captured by (“uniform”) local gadgets in the circuit. For further details see the proof of Theorem 4.5, which presents a similar emulation. 3. Almost all Boolean functions require exponential circuit complexity. Specifically, show that the number of functions mapping {0, 1}n to {0, 1} that can be computed by some circuit of size s is smaller than s 2s , which is smaller than n 22 unless 2s log2 s ≥ 2n . Note that the total number of functions mapping n {0, 1}n to {0, 1} is 22 . Guideline: Show that, without loss of generality, we may consider circuits of bounded v The number of such circuits having v vertices is at fanin. most 2 · v2 + v , where for each gate we either have a choice of binary operation (i.e., ∧ or ∨) and two feeding vertices or a choice of a single feeding vertex (for a ¬gate). Note that the input terminals each have a choice of an index of an input variable in [n], and by our conventions v ≥ n. Exercise 1.16 (the class P/poly) We denote by P/ the class of decision problems that can be solved in polynomial time with advice of length , and by P/poly the union of P/p taken over all polynomials p. Prove that a decision problem is in P/poly if and only if it has polynomial circuit complexity. Guideline: Suppose that a problem can be solved by a polynomialtime algorithm A using the polynomially bounded advice sequence (an )n∈N . We obtain a family of polynomialsize circuits that solves the same problem by observing that the computation of A(ax , x) can be emulated by a circuit of poly(x)size, which incorporates ax and is given x as input. That is, we construct a circuit Cn such that Cn (x) = A(an , x) holds for every x ∈ {0, 1}n (analogously
Exercises
47
to the way Cx is constructed in the proof of Theorem 4.5, where it holds that Cx (y) = MR (x, y) for every y of adequate length). On the other hand, given a family of polynomialsize circuits, we obtain a polynomialtime advicetaking machine that emulates this family when using advice that provides the description of the relevant circuits. (Indeed, we use the fact that a circuit of size s can be described by a string of length O(s log s).) Exercise 1.17 (generic DNF and CNF formulae) Prove that every Boolean function can be computed by a family of DNF (resp., CNF) formula of exponential size. Guideline: For any a ∈ {0, 1}n , consider the function δa : {0, 1}n → {0, 1} such that δa (x) = 1 if x = a and δa (x) = 0 otherwise. Note that any function δa can be computed by a single conjunction of n literals, and that any Boolean func tion f : {0, 1}n → {0, 1} can be written as a:f (a)=1 δa . A corresponding CNF formula can be obtained by applying deMorgan’s Law to the DNF obtained for ¬f . Exercise 1.18 (on the size of general vs DNF formulae) Prove that every DNF (resp., CNF) formula for computing parity must have exponential size. On the other hand, show that parity has quadraticsize formulae (and linearsize circuits). Guideline: For the lower bound, observe that each conjunction in the candidate DNF must contain a literal for each variable. The upper bound follows by Figure 1.4.
2 The P versus NP Question
Overview: Our daily experience is that it is harder to solve problems than it is to check the correctness of solutions to these problems. Is this experience merely a coincidence or does it represent a fundamental fact of life (or a property of the world)? This is the essence of the P versus NP Question, where P represents search problems that are efficiently solvable and NP represents search problems for which solutions can be efficiently checked. Another natural question captured by the P versus NP Question is whether proving theorems is harder that verifying the validity of these proofs. In other words, the question is whether deciding membership in a set is harder than being convinced of this membership by an adequate proof. In this case, P represents decision problems that are efficiently solvable, whereas NP represents sets that have efficiently verifiable proofs of membership. These two formulations of the P versus NP Question are indeed equivalent, and the common belief is that P is different from NP. That is, we believe that solving search problems is harder than checking the correctness of solutions for them and that finding proofs is harder than verifying their validity. Organization. The two formulations of the P versus NP Question are rigorously presented and discussed in Sections 2.2 and 2.3, respectively. The equivalence of these formulations is shown in Section 2.4, and the common belief that P is different from NP is further discussed in Section 2.7. We start by discussing the notion of efficient computation (see Section 2.1). 48
Teaching Notes
49
Teaching Notes Most students have heard of P and NP before, but we suspect that many of them have not obtained a good explanation of what the PvsNP Question actually represents. This unfortunate situation is due to the use of the standard technical definition of NP (which refers to the fictitious and confusing device called a nondeterministic polynomialtime machine). Instead, we advocate the use of slightly more cumbersome definitions, sketched in the foregoing paragraphs (and elaborated in Sections 2.2 and 2.3), which clearly capture the fundamental nature of NP. Indeed, we advocate communicating the fundamental nature of the PvsNP Question by using two equivalent formulations, which refer to search problems (Section 2.2) and decision problems (Section 2.3), respectively. On the Search Problems’ Formulation. Complexity theorists are so accustomed to focusing on decision problems that they seem to forget that search problems are at least as natural as decision problems. Furthermore, to many nonexperts, search problems may seem even more natural than decision problems: Typically, people seek solutions more often than they pause to wonder whether or not solutions exist. Thus, we recommend starting with a formulation of the PvsNP Question in terms of search problems. Admittedly, the cost is more cumbersome formulations, but it is more than worthwhile. In order to reflect the importance of the search version, as well as to facilitate less cumbersome formulations, we chose to introduce concise notations for the two classes of search problems that correspond to P and NP: These classes are denoted PF and PC (standing for Polynomialtime Find and Polynomialtime Check, respectively). The teacher may prefer using notations and terms that are more evocative of P and NP (such as Psearch and NPsearch), and actually we also do so in some motivational discussions. (Still, in our opinion, in the long run, the students and the field may be served better by using standardlooking notations.)1 On the Decision Problems’ Formulation. When presenting the PvsNP Question in terms of decision problems, we define NP as a class of sets having efficiently verifiable proofs of membership (see Definition 2.5). This definition clarifies the fundamental nature of the class NP, but is admittingly more
1
Indeed, these classes are often denoted FP and FN P, respectively. (We mention that “F” stands for function(s), although the definitions actually refer to binary relations.) However, since these notations are not widely used (and since they are somewhat misleading), we prefered to introduce new notations (which we consider better).
50
2 The P versus NP Question
cumbersome than the more traditional definition of NP in terms of fictitious “nondeterministic machines” (see Definition 2.7). Although Definitions 2.5 and 2.7 are equivalent (see Theorem 2.8), we believe that it is important to present NP as in Definition 2.5. Conceptually, this is the right choice because Definition 2.5 clarifies the fundamental nature of the class NP, whereas Definition 2.7 fails to do it. Indeed, a fictitious model can provide a basis for a sound definition, but it typically fails to provide motivation for its study (which may be provided by an equivalence to a natural definition). Furthermore, not all sound definitions are equally accessible. Specifically, many students find Definition 2.7 quite confusing, because they assume that it represents some natural model of computation, and consequently they allow themselves to be fooled by their intuition regarding such models. (Needless to say, the students’ intuition regarding computation is irrelevant when applied to a fictitious model.) Thus, Definition 2.5 is also preferable to Definition 2.7 from a technical point of view.
2.1 Efficient Computation As hinted in the foregoing discussions, much of Complexity Theory is concerned with efficient algorithms. The latter are defined as polynomialtime algorithms (i.e., algorithms that have time complexity that is upperbounded by a polynomial in the length of the input). By the CobhamEdmonds Thesis (see Section 1.3.5), the definition of this class is invariant under the choice of a “reasonable and general” model of computation. The association of efficient algorithms with polynomialtime computation is grounded in the following two considerations: r Philosophical consideration: Intuitively, efficient algorithms are those that can be implemented within a number of steps that is a moderately growing function of the input length. To allow for reading the entire input, at least linear time should be allowed. On the other hand, apparently slow algorithms and in particular “exhaustive search” algorithms, which take exponential time, must be avoided. Furthermore, a good definition of the class of efficient algorithms should be closed under natural composition of algorithms (as well as be robust with respect to reasonable models of computation and with respect to simple changes in the encoding of problems’ instances). Choosing polynomials as the set of time bounds for efficient algorithms satisfies all the foregoing requirements: Polynomials constitute a “closed” set of moderately growing functions, where “closure” means closure under
2.1 Efficient Computation
51
addition, multiplication, and functional composition. These closure properties guarantee the closure of the class of efficient algorithms under natural composition of algorithms (as well as its robustness with respect to any reasonable and general model of computation). Furthermore, polynomialtime algorithms can conduct computations that are apparently simple (although not necessarily trivial), and on the other hand they do not include algorithms that are apparently inefficient (like exhaustive search). r Empirical consideration: It is clear that algorithms that are considered efficient in practice have running time that is bounded by a small polynomial (at least on the inputs that occur in practice). The question is whether any polynomialtime algorithm can be considered efficient in an intuitive sense. The belief, which is supported by past experience, is that every natural problem that can be solved in polynomial time also has a “reasonably efficient” algorithm. Although the association of efficient algorithms with polynomialtime computation is central to our exposition, we wish to highlight the fact that this association is not the source of any of the phenomena discussed in this book. That is, the same phenomena also occur when using other reasonable interpretations of the concept of efficient algorithms. A related comment applies to the formulation of computational problems that refer only to instances of a certain predetermined type. Both issues are discussed further in the following advanced comments. On Other Notions of Efficient Algorithms. We stress that the association of efficient algorithms with polynomialtime computation is not essential to most of the notions, results, and questions of Complexity Theory. Any other class of algorithms that supports the aforementioned closure properties and allows for conducting some simple computations but not overly complex ones gives rise to a similar theory, albeit the formulation of such a theory may be more complicated. Specifically, all results and questions treated in this book are concerned with the relation among the complexities of different computational tasks (rather than with providing absolute assertions about the complexity of some computational tasks). These relations can be stated explicitly, by stating how any upper bound on the time complexity of one task gets translated to an upper bound on the time complexity of another task.2 Such cumbersome 2
For example, the NPcompleteness of SAT (cf. Theorem 4.6) implies that any algorithm solving SAT in time T yields an algorithm that factors composite numbers in time T such that T (n) = poly(n) · (1 + T (poly(n))). More generally, if the correctness of solutions for nbit instances of some search problem R can be verified in time t(n) then the hypothesis regarding
52
2 The P versus NP Question
statements will maintain the contents of the standard statements; they will merely be much more complicated. Thus, we follow the tradition of focusing on polynomialtime computations, while stressing that this focus both is natural and provides the simplest way of addressing the fundamental issues underlying the nature of efficient computation. On the Representation of Problem Instances. As noted in Section 1.2.3, many natural (search and decision) problems are captured more naturally by the terminology of promise problems (cf. Section 5.1), where the domain of possible instances is a subset of {0, 1}∗ rather than {0, 1}∗ itself. For example, computational problems in graph theory presume some simple encoding of graphs as strings, but this encoding is typically not onto (i.e., not all strings encode graphs), and thus not all strings are legitimate instances. However, in these cases, the set of legitimate instances (e.g., encodings of graphs) is efficiently recognizable (i.e., membership in it can be decided in polynomial time). Thus, artificially extending the set of instances to the set of all possible strings (and allowing trivial solutions for the corresponding dummy instances) does not change the complexity of the original problem. We discuss this issue further in Section 5.1. Summary. We associate efficient computation with polynomialtime algorithms.3 Recall that this association is justified by the fact that polynomials are moderately growing functions and the set of polynomials is closed under operations that correspond to the natural composition of algorithms. Furthermore, the class of polynomialtime algorithms is independent of the specific model of computation, as long as the latter is “reasonable” (cf. the CobhamEdmonds Thesis). A Word About Inefficient Computations and Intractability. Computations requiring more that polynomial time are considered inefficient or intractable. We typically refer to these terms only in motivational discussions, when discussing tasks that cannot be performed by efficient algorithms. Our focus is on efficient computations, and the technical presentation refers only to them.
3
SAT implies that solutions (for nbit instances of R) can be found in time T such that T (n) = t(n) · (1 + T (O(t(n))2 )). Advanced comment: In this book, we consider deterministic (polynomialtime) algorithms as the basic model of efficient computation. A more liberal view includes also probabilistic (polynomialtime) algorithms (see [25] or [13, Chap. 6]). We stress that the most important facts and questions that are addressed in the current book have parallels with respect to probabilistic polynomialtime algorithms.
2.2 The Search Version: Finding versus Checking
53
2.2 The Search Version: Finding versus Checking Much of computer science is concerned with solving various search problems (as in Definition 1.1). A few examples, which will serve us throughout the book, are presented next.4 In each of these examples, if no solution exists, then the solver should indicate that this is the case. r Solving linear (or polynomial) systems of equations: Given a system of linear (or polynomial) equations, find an assignment to the variables that satisfies all equations. Formulae satisfiability is a related problem in which one is given a Boolean formula and is required to find an assignment that satisfies it. (When the formula is in CNF, this can be viewed as finding an assignment that satisfies a system of Boolean equations (which arise from the individual clauses).) r Integer factorization: Given a natural number, find a nontrivial factor of this number. r Finding a spanning tree: Given a (connected) graph, find a spanning tree in it (i.e., a connected subgraph that contains all vertices of the original graph but contains no simple cycles). r Finding a Hamiltonian path (or cycle): Given a (connected) graph, find a simple path (cycle) that traverses all the vertices of the graph. Indeed, a Hamiltonian path is a spanning tree in which each intermediate vertex has degree 2. r The traveling salesman problem (TSP): Given a matrix of distances between cities and a threshold, find a tour that passes all cities and covers a total distance that does not exceed the threshold. Indeed, the Hamiltonian cycle problem is a special case of TSP, where the distances are in {0, 1} and represent the existence of the various edges in the graph.5 r Job scheduling: This term actually refers to a variety of problems, in which one is given a set of scheduling constraints and is required to find a scheduling of jobs to machines such that the given constraints are all satisfied. In addition to the dominant role of search problems in computer science, solving search problems corresponds to the daily notion of “solving problems.” Thus, search problems are of natural general interest. In the current section, we will consider the question of which search problems can be solved efficiently. Indeed, efficiently solvable search problems are the subject matter of most basic courses on algorithmic design. Examples include sorting, finding patterns in strings, finding (rational) solutions to linear systems of (rational) equations, 4 5
See Appendix for further details. That is, in the TSP instance, the distance between i and j equals 1 if {i, j } is an edge in the graph, and equals 0 otherwise.
54
2 The P versus NP Question
finding shortest paths in graphs, and many other graphtheoretic search problems. In contrast to these courses, our focus will be on search problems that cannot be solved efficiently. A Necessary Condition for Efficient Solvability. One type of search problems that cannot be solved efficiently consists of those for which the solutions are too long in terms of the length of the problem’s instances. In such a case, merely typing the solution amounts to an activity that is deemed inefficient, and so this case is not really interesting (from a computational point of view). Thus, we consider only search problems in which the length of the solution is bounded by a polynomial in the length of the instance. Recalling that search problems are associated with binary relations (see Definition 1.1), we focus our attention on polynomially bounded relations. Definition 2.1 (polynomially bounded relations): We say that R ⊆ {0, 1}∗ × {0, 1}∗ is polynomially bounded if there exists a polynomial p such that for every (x, y) ∈ R it holds that y ≤ p(x). Recall that (x, y) ∈ R means that y is a solution to the problem instance x, where R represents the problem itself. For example, in the case of finding a prime factor of a given integer, we refer to a relation R such that (x, y) ∈ R if the integer y is a prime factor of the integer x. Likewise, in the case of finding a spanning tree in a given graph, we refer to a relation R such that (x, y) ∈ R if y is a spanning tree of the graph x. For a polynomially bounded relation R it makes sense to ask whether or not, given a problem instance x, one can efficiently find an adequate solution y (i.e., find y such that (x, y) ∈ R). The polynomial bound on the length of the solution (i.e., y) guarantees that a negative answer is not merely due to the length of the required solution.
2.2.1 The Class P as a Natural Class of Search Problems Recall that we are interested in the class of search problems that can be solved efficiently, that is, problems for which solutions (whenever they exist) can be found efficiently. Restricting our attention to polynomially bounded relations, we identify the corresponding fundamental class of search problems (or binary relations), denoted PF (standing for “Polynomialtime Find”). (The relationship between PF and the standard definition of P will be discussed in Sections 2.4 and 3.3.) The following definition refers to the formulation of solving search problems provided in Definition 1.1.
2.2 The Search Version: Finding versus Checking
55
Definition 2.2 (efficiently solvable search problems): r The search problem of a polynomially bounded relation R ⊆ {0, 1}∗ × {0, 1}∗ is efficiently solvable if there exists a polynomialtime algorithm A def such that, for every x, it holds that if R(x) = {y : (x, y) ∈ R} is not empty, then A(x) ∈ R(x), and otherwise A(x) = ⊥ (indicating that x has no solution).6 r We denote by PF the class of (polynomially bounded) search problems that are efficiently solvable. That is, R ∈ PF if R is polynomially bounded and there exists a polynomialtime algorithm that solves R. Note that R(x) denotes the set of valid solutions for the problem instance x. Thus, the solver A is required to find a valid solution (i.e., satisfy A(x) ∈ R(x)) whenever such a solution exists (i.e., R(x) is not empty). On the other hand, if the instance x has no solution (i.e., R(x) = ∅) then clearly A(x) ∈ R(x). The extra condition (also made in Definition 1.1) requires that in this case A(x) = ⊥. Thus, algorithm A always outputs a correct answer, which is a valid solution in the case that such a solution exists (and provides an indication that no solution exists otherwise). We have defined a fundamental class of problems, and we do know of many natural problems in this class (e.g., solving linear equations over the rationals, finding shortest paths in graphs, finding patterns in strings, finding a perfect matching in a graph, and a variety of other search problems that are the focus of various courses on algorithms). However, these facts per se do not mean that we are able to characterize natural problems with respect to membership in this class. For example, we do not know whether or not the problem of finding the prime factors of a given integer is in this class (i.e., in PF). In fact, currently, we do not have a good understanding regarding the actual contents of the class PF; that is, we are unable to characterize many natural problems with respect to membership in this class. This situation is quite common in Complexity Theory, and seems to be a consequence of the fact that complexity classes are defined in terms of the “external behavior” (of potential algorithms), rather than in terms of the “internal structure” (of the problem). Turning back to PF, we note that while it contains many natural search problems, there are also many natural search problems that are not known to be in PF. A natural class containing a host of such problems is presented next. 6
Recall that by Definition 1.1 this means that A solves R.
56
2 The P versus NP Question
2.2.2 The Class NP as Another Natural Class of Search Problems Natural search problems have the property that valid solutions (for them) can be efficiently recognized. That is, given an instance x of the problem R and a candidate solution y, one can efficiently determine whether or not y is a valid solution for x (with respect to the problem R, i.e., whether or not y ∈ R(x)). For example, candidate solutions for a system of linear (or polynomial) equations can be easily verified for validity by instantiation and arithmetic manipulation. Likewise, it is easy to verify whether a given sequence of vertices constitutes a Hamiltonian path in a given graph. The class of all search problems allowing for efficient recognizable (valid) solutions is a natural class per se, because it is not clear why one should care about a solution unless one can recognize a valid solution once given. Furthermore, this class is a natural domain of candidates for PF, because the ability to efficiently recognize a valid solution seems to be a natural (albeit not absolutely necessary) prerequisite for a discussion regarding the complexity of finding such solutions. We restrict our attention again to polynomially bounded relations, and consider the class of relations for which membership of pairs in the relation can be decided efficiently. We stress that we consider deciding membership of given pairs of the form (x, y) in a fixed relation R, and not deciding membership of x def in the set SR = {x : R(x) = ∅}. (The relationship between the following definition and the standard definition of NP will be discussed in Sections 2.4–2.6 and 3.3.) Definition 2.3 (search problems with efficiently checkable solutions): r The search problem of a polynomially bounded relation R ⊆ {0, 1}∗ × {0, 1}∗ has efficiently checkable solutions if there exists a polynomialtime algorithm A such that, for every x and y, it holds that A(x, y) = 1 if and only if (x, y) ∈ R. r We denote by PC (standing for “Polynomialtime Check”) the class of search problems that correspond to polynomially bounded binary relations that have efficiently checkable solutions. That is, R ∈ PC if the following two conditions hold: 1. For some polynomial p, if (x, y) ∈ R then y ≤ p(x). 2. There exists a polynomialtime algorithm that given (x, y) determines whether or not (x, y) ∈ R. Note that the algorithm postulated in Item 2 must also handle inputs of the form (x, y) such that y > p(x). Such inputs, which are evidently not in R (by Item 1), are easy to handle by merely determining x, y and p(x).
2.2 The Search Version: Finding versus Checking
57
Thus, the crux of Item 2 is typically in the case that the input (x, y) satisfies y ≤ p(x). The class PC contains thousands of natural problems (e.g., finding a traveling salesman tour of length that does not exceed a given threshold, finding the prime factorization of a given composite, finding a truth assignment that satisfies a given Boolean formula, etc). In each of these natural problems, the correctness of solutions can be checked efficiently (e.g., given a traveling salesman tour it is easy to compute its length and check whether or not it exceeds the given threshold); see Exercise 2.4. The class PC is the natural domain for the study of which problems are in PF, because the ability to efficiently recognize a valid solution is a natural prerequisite for a discussion regarding the complexity of finding such solutions. We warn, however, that PF contains (unnatural) problems that are not in PC (see Exercise 2.2).
2.2.3 The P versus NP Question in Terms of Search Problems Is it the case that every search problem in PC is in PF? That is, is it the case that the ability to efficiently check the correctness of solutions, with respect to some (polynomially bounded) relation R, implies the ability to find solutions with respect to R? In other words, if it is easy to check whether or not a given solution for a given instance is correct, then is it also easy to find a solution to a given instance? If PC ⊆ PF then this would mean that whenever solutions to given instances can be efficiently checked (for correctness), it is also the case that such solutions can be efficiently found (when given only the instance). This would mean that all reasonable search problems (i.e., all problems in PC) are easy to solve. Needless to say, such a situation would contradict the intuitive feeling (and the daily experience) that some reasonable search problems are hard to solve. Furthermore, in such a case, the notion of “solving a problem” would lose its meaning (because finding a solution will not be significantly more difficult than checking its validity). On the other hand, if PC \ PF = ∅ then there exist reasonable search problems (i.e., some problems in PC) that are hard to solve. This conforms with our basic intuition by which some reasonable problems are easy to solve whereas others are hard to solve. Furthermore, it reconfirms the intuitive gap between the notions of solving and checking (asserting that at least in some cases “solving” is significantly harder than “checking”). To illustrate the foregoing paragraph, consider various puzzles like jigsaw puzzles, mazes, crossword puzzles, Sudoku puzzles, and so on. In each of these
58
2 The P versus NP Question
puzzles, checking the correctness of a solution is very easy, whereas finding a solution is sometimes extremely hard. As was mentioned in the various overviews, it is widely believed that finding solutions to search problems is, in general, harder than verifying the correctness of such solutions; that is, it is widely believed that PC \ PF = ∅. However, as also mentioned before, this is only a belief, not a fact. For further discussion see Section 2.7.
2.3 The Decision Version: Proving versus Verifying As we shall see in Section 2.4 (and further in Section 3.3), the study of search problems (e.g., the PCvsPF Question) can be “reduced” to the study of decision problems. Since the latter problems have a less cumbersome terminology, Complexity Theory tends to focus on them (and maintains its relevance to the study of search problems via the aforementioned reduction). Thus, the study of decision problems provides a convenient way for studying search problems. For example, the study of the complexity of deciding the satisfiability of Boolean formulae provides a convenient way for studying the complexity of finding satisfying assignments for such formulae. We wish to stress, however, that decision problems are interesting and natural per se (i.e., beyond their role in the study of search problems). After all, some people do care about the truth, and so determining whether certain claims are true is a natural computational problem. Specifically, determining whether a given object (e.g., a Boolean formula) has some predetermined property (e.g., is satisfiable) constitutes an appealing computational problem. The PvsNP Question refers to the complexity of solving such problems for a wide and natural class of properties associated with the class NP. The latter class refers to properties that have “efficient proof systems” allowing for the verification of the claim that a given object has a predetermined property (i.e., is a member of a predetermined set). Jumping ahead, we mention that the PvsNP Question refers to the question of whether properties that have efficient proof systems can also be decided efficiently (without proofs). Let us clarify all of these notions. Properties of objects are modeled as subsets of the set of all possible objects (i.e., a property is associated with the set of objects having this property). For example, the property of being a prime is associated with the set of prime numbers, and the property of being connected (resp., having a Hamiltonian path) is associated with the set of connected (resp., Hamiltonian) graphs. Thus, we focus on deciding membership in sets (as in Definition 1.2). The standard formulation of the PvsNP Question refers to the questionable equality of
2.3 The Decision Version: Proving versus Verifying
59
two natural classes of decision problems, denoted P and NP (and defined in Section 2.3.1 and Section 2.3.2, respectively).
2.3.1 The Class P as a Natural Class of Decision Problems Needless to say, we are interested in the class of decision problems that are efficiently solvable. This class is traditionally denoted P (standing for Polynomial time). The following definition refers to the formulation of solving decision problems (provided in Definition 1.2). Definition 2.4 (efficiently solvable decision problems): r A decision problem S ⊆ {0, 1}∗ is efficiently solvable if there exists a polynomialtime algorithm A such that, for every x, it holds that A(x) = 1 if and only if x ∈ S. r We denote by P the class of decision problems that are efficiently solvable. Without loss of generality, for an algorithm A as in the first item, it holds that A(x) = 0 whenever x ∈ S, because we can modify any output different from 1 to 0. (Thus, A solves the decision problem S as per Definition 1.2.) As in the case of Definition 2.2, we have defined a fundamental class of problems, which contains many natural problems (e.g., determining whether or not a given graph is connected), but we do not have a good understanding regarding its actual contents (i.e., we are unable to characterize many natural problems with respect to membership in this class). In fact, there are many natural decision problems that are not known to reside in P, and a natural class containing a host of such problems is presented next. This class of decision problems is denoted NP (for reasons that will become evident in Section 2.6).
2.3.2 The Class NP and NPProof Systems Whenever deciding on our own seems hard, it is natural to seek help (e.g., advice) from others. In the context of verifying that an object has a predetermined property (or belongs to a predetermined set), help may take the form of a proof, where proofs should be thought of as advice that can be evaluated for correctness. Indeed, a natural class of decision problems that arises is the class, denoted NP, of all sets such that membership (of each instance) in each set can be verified efficiently with the help of an adequate proof. Thus, we define NP as the class of decision problems that have efficiently verifiable proof systems. This definitional path requires clarifying the notion of a proof system.
60
2 The P versus NP Question
Loosely speaking, we say that a set S has a proof system if instances in S have valid proofs of membership (i.e., proofs accepted as valid by the system), whereas instances not in S have no valid proofs. Indeed, proofs are defined as strings that (when accompanying the instance) are accepted by the (efficient) verification procedure. That is, we say that V is a verification procedure for membership in S if it satisfies the following two conditions: 1. Completeness: True assertions have valid proofs (i.e., proofs accepted as valid by V ). Bearing in mind that assertions refer to membership in S, this means that for every x ∈ S there exists a string y such that V (x, y) = 1; that is, V accepts y as a valid proof for the membership of x in S. 2. Soundness: False assertions have no valid proofs. That is, for every x ∈ S and every string y it holds that V (x, y) = 0, which means that V rejects y as a proof for the membership of x in S. We note that the soundness condition captures the “security” of the verification procedure, that is, its ability not to be fooled (by anything) into accepting a wrong assertion. The completeness condition captures the “viability” of the verification procedure, that is, its ability to be convinced of any valid assertion (when presented with an adequate proof). We stress that, in general, proof systems are defined in terms of their verification procedures, which must satisfy adequate completeness and soundness conditions. Our focus here is on efficient verification procedures that utilize relatively short proofs (i.e., proofs that are of length that is polynomially bounded by the length of the corresponding assertion).7 Let us consider a couple of examples before turning to the actual definition (of efficiently verifiable proof systems). Starting with the set of Hamiltonian graphs, we note that this set has a verification procedure that, given a pair (G, π ), accepts if and only if π is a Hamiltonian path in the graph G. In this case, π serves as a proof that G is Hamiltonian. Note that such proofs are relatively short (i.e., the path is actually shorter than the description of the graph) and are easy to verify. Needless to say, this proof system satisfies the 7
Advanced comment: In continuation of footnote 3, we note that in this book we consider deterministic (polynomialtime) verification procedures, and consequently the completeness and soundness conditions that we state here are errorless. In contrast, we mention that various types of probabilistic (polynomialtime) verification procedures, as well as probabilistic completeness and soundness conditions, are also of interest (see Section 4.3.5 and [13, Chap. 9]). A common theme that underlies both treatments is that efficient verification is interpreted as meaning verification by a process that runs in time that is polynomial in the length of the assertion. In the current book, we use the equivalent formulation that considers the running time as a function of the total length of the assertion and the proof, but require that the latter has length that is polynomially bounded by the length of the assertion. (The latter issue is discussed in Section 2.5.)
2.3 The Decision Version: Proving versus Verifying
61
aforementioned completeness and soundness conditions. Turning to the case of satisfiable Boolean formulae, given a formula φ and a truth assignment τ , the verification procedure instantiates φ (according to τ ), and accepts if and only if simplifying the resulting Boolean expression yields the value true. In this case, τ serves as a proof that φ is satisfiable, and the alleged proofs are indeed relatively short and easy to verify. Definition 2.5 (efficiently verifiable proof systems): r A decision problem S ⊆ {0, 1}∗ has an efficiently verifiable proof system if there exists a polynomial p and a polynomialtime (verification) algorithm V such that the following two conditions hold: 1. Completeness: For every x ∈ S, there exists y of length at most p(x) such that V (x, y) = 1. (Such a string y is called an NPwitness for x ∈ S.) 2. Soundness: For every x ∈ S and every y, it holds that V (x, y) = 0. Thus, x ∈ S if and only if there exists y of length at most p(x) such that V (x, y) = 1. In such a case, we say that S has an NPproof system, and refer to V as its verification procedure (or as the proof system itself). r We denote by N P the class of decision problems that have efficiently verifiable proof systems. We note that the term NPwitness is commonly used.8 In some cases, V (or the set of pairs accepted by V ) is called a witness relation of S. We stress that the same set S may have many different NPproof systems (see Exercise 2.5), and that in some cases the difference is quite fundamental (see Exercise 2.6). Typically, for natural decision problems in N P, it is easy to show that these problems are in N P by using Definition 2.5. This is done by designing adequate NPproofs of membership, which are typically quite straightforward, because natural decision problems are typically phrased as asking about the existence of a structure (or an object) that can be easily verified as valid. For example, SAT is defined as the set of satisfiable Boolean formulae, which means asking about the existence of satisfying assignments. Indeed, we can efficiently check whether a given assignment satisfies a given formula, which means that we have (a verification procedure for) an NPproof system for SAT. Likewise, Hamiltonian graphs are defined as graphs containing simple paths that pass through all vertices. 8
In most cases, this is done without explicitly defining V , which is understood from the context and/or by common practice. In many texts, V is not called a proof system (nor a verification procedure of such a system), although this term is most adequate.
62
2 The P versus NP Question
Note that for any search problem R in PC, the set of instances that have def a solution with respect to R (i.e., the set SR = {x : R(x) = ∅}) is in N P. Specifically, for any R ∈ PC, consider the verification procedure V such that def V (x, y) = 1 if and only if (x, y) ∈ R, and note that the latter condition can be decided in poly(x)time. Thus, any search problem in PC can be viewed as a problem of searching for (efficiently verifiable) proofs (i.e., NPwitnesses for membership in the set of instances having solutions). On the other hand, any NPproof system gives rise to a natural search problem in PC, that is, the problem of searching for a valid proof (i.e., an NPwitness) for the given instance. (Specifically, the verification procedure V yields the search problem that corresponds to R = {(x, y) : V (x, y) = 1}.) Thus, S ∈ N P if and only if there exists R ∈ PC such that S = {x : R(x) = ∅}. The last paragraph suggests another easy way of showing that natural decision problems are in N P: just thinking of the corresponding natural search problem. The point is that natural decision problems (in N P) are phrased as referring to whether a solution exists for the corresponding natural search problem. (For example, in the case of SAT, the question is whether there exists a satisfying assignment to a given Boolean formula, and the corresponding search problem is finding such an assignment.) In all these cases, it is easy to check the correctness of solutions; that is, the corresponding search problem is in PC, which implies that the decision problem is in N P. Observe that P ⊆ N P holds: A verification procedure for claims of membership in a set S ∈ P may just ignore the alleged NPwitness and run the decision procedure that is guaranteed by the hypothesis S ∈ P; that is, we may let V (x, y) = A(x), where A is the aforementioned decision procedure. Indeed, the latter verification procedure is quite an abuse of the term (because it makes no use of the proof); however, it is a legitimate one. As we shall shortly see, the PvsNP Question refers to the question of whether such proofoblivious verification procedures can be used for every set that has some efficiently verifiable proof system. (Indeed, given that P ⊆ N P holds, the PvsNP Question is whether or not N P ⊆ P.)
2.3.3 The P versus NP Question in Terms of Decision Problems Is it the case that NPproofs are useless? That is, is it the case that for every efficiently verifiable proof system, one can easily determine the validity of assertions without looking at the proof? If that were the case, then proofs would be meaningless, because they would offer no fundamental advantage over directly determining the validity of the assertion. The conjecture P = N P asserts that proofs are useful: There exist sets in N P that cannot be decided by
2.4 Equivalence of the Two Formulations
63
a polynomialtime algorithm, which means that for these sets, obtaining a proof of membership (for some instances) is useful (because we cannot efficiently determine membership in these sets by ourselves). In the foregoing paragraph, we viewed P = N P as asserting the advantage of obtaining proofs over deciding the truth by ourselves. That is, P = N P asserts that (at least in some cases) verifying is easier than deciding. A slightly different perspective is that P = N P asserts that finding proofs is harder than verifying their validity. This is the case because, for any set S that has an NPproof system, the ability to efficiently find proofs of membership with respect to this system (i.e., finding an NPwitness of membership in S for any given x ∈ S) yields the ability to decide membership in S. Thus, for S ∈ N P \ P, it must be harder to find proofs of membership in S than to verify the validity of such proofs (which can be done in polynomial time). As was mentioned in the various overviews, it is widely believed that P = N P. For further discussion see Section 2.7.
2.4 Equivalence of the Two Formulations As hinted several times, the two formulations of the PvsNP Questions are equivalent. That is, every search problem having efficiently checkable solutions is solvable in polynomial time (i.e., PC ⊆ PF) if and only if membership in any set that has an NPproof system can be decided in polynomial time (i.e., N P ⊆ P). Recalling that P ⊆ N P (whereas PF is not contained in PC; see Exercise 2.2), we prove the following. Theorem 2.6: PC ⊆ PF if and only if P = N P. Proof: Suppose, on the one hand, that the inclusion holds for the search version (i.e., PC ⊆ PF). We will show that for any set in N P, this hypothesis implies the existence of an efficient algorithm for finding NPwitnesses for this set, which in turn implies that this set is in P. Specifically, let S be an arbitrary set in N P, and V be the corresponding verification procedure (i.e., satisfying the conditions in Definition 2.5). Without loss of generality, there exists a polynomial p such that V (x, y) = 1 holds only if y ≤ p(x). Considering the (polynomially bounded) relation def
R = {(x, y) : V (x, y) = 1} ,
(2.1)
note that R is in PC (since V decides membership in R). Using the hypothesis PC ⊆ PF, it follows that the search problem of R is solvable in polynomial
64
2 The P versus NP Question
Input: x Subroutine: a solver A for the search problem of R. Alternative 1: Output 1 if A(x) = ⊥ and 0 otherwise. Alternative 2: Output V (x, A(x)). Figure 2.1. Solving S by using a solver for R.
time. Denoting by A the polynomialtime algorithm solving the search problem of R, we decide membership in S in the obvious way: That is, on input x, we output 1 if and only if A(x) = ⊥. Note that A(x) = ⊥ holds if and only if A(x) ∈ R(x), which in turn occurs if and only if R(x) = ∅ (equiv., x ∈ S).9 Thus, S ∈ P. Since we started with an arbitrary set in N P, it follows N P ⊆ P (and N P = P). Suppose, on the other hand, that N P = P. We will show that for any search problem in PC, this hypothesis implies an efficient algorithm for determining whether a given string y is a prefix of some solution to a given instance x of this search problem, which in turn yields an efficient algorithm for finding solutions (for this search problem). Specifically, let R be an arbitrary search problem in PC. Considering the set SR = { x, y : ∃y s.t. (x, y y ) ∈ R} , def
(2.2)
note that SR is in N P (because R ∈ PC). Using the hypothesis N P ⊆ P, it follows that SR is in P. This yields a polynomialtime algorithm for solving the search problem of R, by extending a prefix of a potential solution bit by bit while using the decision procedure to determine whether or not the current prefix is valid. That is, on input x, we first check whether or not x, λ ∈ SR and output ⊥ (indicating R(x) = ∅) in case x, λ ∈ SR . Otherwise, x, λ ∈ SR , and we set y ← λ. Next, we proceed in iterations, maintaining the invariant that
x, y ∈ SR . In each iteration, we set y ← y 0 if x, y 0 ∈ SR and y ← y 1 if x, y 1 ∈ SR . If none of these conditions hold (which happens after at most polynomially many iterations), then the current y satisfies (x, y ) ∈ R. (An alternative termination condition amounts to checking explicitly whether the current y satisfies (x, y ) ∈ R; see Figure 2.2.) Thus, for every x ∈ SR (i.e., x such that R(x) = ∅), we output some string in R(x). It follows that for an arbitrary R ∈ PC, we have R ∈ PF, and hence PC ⊆ PF. Reflection. The first part of the proof of Theorem 2.6 associates with each set S in N P a natural relation R (in PC). Specifically, R (as defined in Eq. (2.1)) 9
Indeed, an alternative decision procedure outputs 1 if and only if (x, A(x)) ∈ R, which in turn holds if and only if V (x, A(x)) = 1. The latter alternative appears as Alternative 2 in Figure 2.1.
2.5 Technical Comments Regarding NP
65
Input: x (Checking whether solutions exist) If x, λ ∈ SR then halt with output ⊥. (Comment: x, λ ∈ SR if and only if R(x) = ∅.) (Finding a solution (i.e., a string in R(x) = ∅)) Initialize y ← λ. While (x, y ) ∈ R repeat If x, y 0 ∈ SR then y ← y 0 else y ← y 1. (Comment: Since x, y ∈ SR but (x, y ) ∈ R, either x, y 0 or x, y 1 must be in SR .) Output y (which is indeed in R(x)). Figure 2.2. Solving R by using a solver for SR .
consists of all pairs (x, y) such that y is an NPwitness for membership of x in S. Thus, the search problem of R consists of finding such an NPwitness, when given x as input. Indeed, R is called the witness relation of S, and solving the search problem of R allows for deciding membership in S. Thus, R ∈ PC ⊆ PF implies S ∈ P. In the second part of the proof, we associate with each R ∈ PC a set SR (in N P), but SR is more “expressive” than the set def
SR = {x : ∃y s.t. (x, y) ∈ R} (which is the natural NPset arising from R). Specifically, SR (as defined in Eq. (2.2)) consists of strings that encode pairs (x, y ) such that y is a prefix of some string in R(x) = {y : (x, y) ∈ R}. The key observation is that deciding membership in SR allows for solving the search problem of R; that is, SR ∈ P implies R ∈ PF. Conclusion. Theorem 2.6 justifies the traditional focus on the decision version of the PvsNP Question. Indeed, given that both formulations of the question are equivalent, we may just study the less cumbersome one.
2.5 Technical Comments Regarding NP The following comments are rather technical, and only the first one is used in the rest of this book. A Simplifying Convention. We shall often assume that the length of solutions for any search problem in PC (resp., NPwitnesses for a set in N P) is determined (rather than upperbounded) by the length of the instance. That is, for any R ∈ PC (resp., verification procedure V for a set in N P), we shall
66
2 The P versus NP Question
assume that for some fixed polynomial p, if (x, y) ∈ R (resp., V (x, y) = 1) then y = p(x) rather than y ≤ p(x). This assumption can be justified by a trivial modification of R (resp., V ); see Exercise 2.7. Solving Problems in NP via Exhaustive Search. Every problem in PC (resp., N P) can be solved in exponential time (i.e., time exp(poly(x)) for input x). This can be done by an exhaustive search among all possible candidate solutions (resp., all possible candidate NPwitnesses). Thus, N P ⊆ EX P, where EX P denotes the class of decision problems that can be solved in exponential time (i.e., time exp(poly(x)) for input x). An Alternative Formulation. Recall that when defining PC (resp., N P), we have explicitly confined our attention to search problems of polynomially bounded relations (resp., NPwitnesses of polynomial length). In this case, a polynomialtime algorithm that decides membership of a given pair (x, y) in a relation R ∈ PC (resp., check the validity of an NPwitness y for membership of x in S ∈ N P) runs in time that is polynomial in the length of x. This observation leads to an alternative formulation of the class PC (resp., N P), in which one allows solutions (resp., NPwitnesses) of arbitrary length but requires that the corresponding algorithms run in time that is polynomial in the length of x rather than polynomial in the length of (x, y). That is, by the alternative formulation a binary relation R is in PC (resp., S ∈ N P) if membership of (x, y) in R can be decided in time that is polynomial in the length of x (resp., the verification of a candidate NPwitness y for membership of x in S is required to be performed in poly(x)time). Although this alternative formulation does not upperbound the length of the solutions (resp., NPwitnesses), such an upper bound effectively follows in the sense that it suffices to inspect a poly(x)bit long prefix of the solution (resp., NPwitness) in order to determine its validity. Indeed, such a prefix is as good as the fulllength solution (resp., NPwitness) itself. Thus, the alternative formulation is essentially equivalent to the original one.
2.6 The Traditional Definition of NP Unfortunately, Definition 2.5 is not the most commonly used definition of N P. Instead, traditionally, N P is defined as the class of sets that can be decided by a fictitious device called a nondeterministic polynomialtime machine (which explains the source of the notation NP). The reason that this class of fictitious devices is interesting is due to the fact that it captures (indirectly) the definition of NPproof systems (i.e., Definition 2.5). Since the reader may come across the
2.6 The Traditional Definition of NP
67
traditional definition of N P when studying different works, we feel obliged to provide the traditional definition as well as a proof of its equivalence to Definition 2.5. Definition 2.7 (nondeterministic polynomialtime Turing machines): r A nondeterministic Turing machine is defined as in Section 1.3.2, except that the transition function maps symbolstate pairs to subsets of triples (rather than to a single triple) in × Q × {−1, 0, +1}. Accordingly, the configuration following a specific instantaneous configuration may be one of several possibilities, each determined by a different possible triple. Thus, the computations of a nondeterministic machine on a fixed input may result in different outputs. In the context of decision problems, one typically considers the question of whether or not there exists a computation that halts with output 1 after starting with a fixed input. This leads to the following notions: – We say that the nondeterministic machine M accepts x if there exists a computation of M, on input x, that halts with output 1. – The set accepted by a nondeterministic machine is the set of inputs that are accepted by the machine. r A nondeterministic polynomialtime Turing machine is defined as one that halts after a number of steps that is no more than a fixed polynomial in the length of the input. Traditionally, N P is defined as the class of sets that are each accepted by some nondeterministic polynomialtime Turing machine. We stress that Definition 2.7 refers to a fictitious model of computation. Specifically, Definition 2.7 makes no reference to the number (or fraction) of possible computations of the machine (on a specific input) that yield a specific output.10 Definition 2.7 only refers to whether or not computations leading to a certain output exist (for a specific input). The question of what the mere existence of such possible computations means (in terms of real life) is not addressed, because the model of a nondeterministic machine is not meant to provide a reasonable model of a (reallife) computer. The model is meant to capture something completely different (i.e., it is meant to provide an “elegant” definition of the class N P, while relying on the fact that Definition 2.7 is equivalent to Definition 2.5).11 10
11
Advanced comment: In contrast, the definition of a probabilistic machine refers to this number (or, equivalently, to the probability that the machine produces a specific output, when the probability is taken (essentially) uniformly over all possible computations). Thus, a probabilistic machine refers to a natural model of computation that can be realized provided we can equip the machine with a source of randomness. For details, see [13, Sec. 6.1]. Whether or not Definition 2.7 is elegant is a matter of taste. For sure, many students find Definition 2.7 quite confusing; see further discussion in the teaching notes to this chapter.
68
2 The P versus NP Question
Note that unlike other definitions in this book, Definition 2.7 makes explicit reference to a specific model of computation. Still, a similar (nondeterministic) extension can be applied to other models of computation by considering adequate nondeterministic computation rules. Also note that without loss of generality, we may assume that the transition function maps each possible symbolstate pair to exactly two triples (see Exercise 2.11). Theorem 2.8: Definition 2.5 is equivalent to Definition 2.7. That is, a set S has an NPproof system if and only if there exists a nondeterministic polynomialtime machine that accepts S. Proof: Suppose, on the one hand, that the set S has an NPproof system, and let us denote the corresponding verification procedure by V . Let p be a polynomial that determines the length of NPwitnesses with respect to V (i.e., V (x, y) = 1 implies y = p(x)).12 Consider the following nondeterministic polynomialtime machine, denoted M, that (on input x) first produces nondeterministically a potential NPwitness (i.e., y ∈ {0, 1}p(x) ) and then accepts if and only if this witness is indeed valid (i.e., V (x, y) = 1). That is, on input x, machine M proceeds as follows: 1. Makes m = p(x) nondeterministic steps, producing (nondeterministically) a string y ∈ {0, 1}m . 2. Emulates V (x, y) and outputs whatever it does. We stress that the nondeterministic steps (taken in Step 1) may result in producing any mbit string y. Recall that x ∈ S if and only if there exists y ∈ {0, 1}p(x) such that V (x, y) = 1. It follows that x ∈ S if and only if there exists a computation of M on input x that halts with output 1 (and thus x ∈ S if and only if M accepts x). This implies that the set accepted by M equals S. Since M is a nondeterministic polynomialtime machine, it follows that S is in N P according to Definition 2.7. Suppose, on the other hand, that there exists a nondeterministic polynomialtime machine M that accepts the set S, and let p be a polynomial upperbounding the time complexity of M. Consider the following deterministic polynomialtime machine, denoted M , that on input (x, y) views y as a description of the nondeterministic choices of machine M on input x, and emulates the corresponding computation. That is, on input (x, y), where y has length m = p(x), machine M emulates a computation of M on input x while using the bits of y to determine the nondeterministic steps of M. Specifically, the i th step of M on input x is determined by the i th bit of y such that the i th 12
See the simplifying convention in Section 2.5.
2.7 In Support of P Being Different from NP
69
step of M follows the first possibility (in the transition function) if and only if the i th bit of y equals 1. Note that x ∈ S if and only if there exists y of length p(x) such that M (x, y) = 1. Thus, M gives rise to an NPproof system for S, and so S is in N P according to Definition 2.5.
2.7 In Support of P Being Different from NP Intuition and concepts constitute . . . the elements of all our knowledge, so that neither concepts without an intuition in some way corresponding to them, nor intuition without concepts, can yield knowledge. Immanuel Kant (1724–1804) Kant speaks of the importance of both philosophical considerations (referred to as “concepts”) and empirical considerations (referred to as “intuition”) to science (referred to as (sound) “knowledge”). We shall indeed follow his lead. It is widely believed that P is different from NP, that is, that PC contains search problems that are not efficiently solvable, and that there are NPproof systems for sets that cannot be decided efficiently. This belief is supported by both philosophical and empirical considerations. Philosophical Considerations. Both formulations of the PvsNP Question refer to natural questions about which we have strong conceptions. The notion of solving a (search) problem seems to presume that, at least in some cases (or in general), finding a solution is significantly harder than checking whether a presented solution is correct. This translates to PC \ PF = ∅. Likewise, the notion of a proof seems to presume that, at least in some cases (or in general), the proof is useful in determining the validity of the assertion, that is, that verifying the validity of an assertion may be made significantly easier when provided with a proof. This translates to P = N P, which also implies that it is significantly harder to find proofs than to verify their correctness, which again coincides with the daily experience of researchers and students. Empirical Considerations. The class NP (or rather PC) contains thousands of different problems for which no efficient solving procedure is known. Many of these problems have arisen in vastly different disciplines, and were the subject of extensive research of numerous different communities of scientists and engineers. These essentially independent studies have all failed to provide efficient algorithms for solving these problems, a failure that is extremely hard to attribute to sheer coincidence or to a streak of bad luck.
70
2 The P versus NP Question
We mention that for many of the aforementioned problems, the bestknown algorithms are not significantly faster than an exhaustive search (for a solution); that is, the complexity of the bestknown algorithm is polynomially related to the complexity of an exhaustive search. Indeed, it is widely believed that for some problems in NP, no algorithm can be significantly faster than an exhaustive search. The common belief (or conjecture) that P = N P is indeed very appealing and intuitive. The fact that this natural conjecture is unsettled seems to be one of the sources of frustration of Complexity Theory. Our opinion, however, is that this feeling of frustration is out of place (and merely reflects a naive underestimation of the issues at hand). In contrast, the fact that Complexity Theory evolves around natural and simply stated questions that are so difficult to resolve makes its study very exciting. Throughout the rest of this book, we will adopt the conjecture that P is different from NP. In a few places, we will explicitly use this conjecture, whereas in other places, we will present results that are interesting (if and) only if P = N P (e.g., the entire theory of NPcompleteness becomes uninteresting if P = N P).
2.8 Philosophical Meditations Whoever does not value preoccupation with thoughts, can skip this chapter. Robert Musil, The Man without Qualities, Chap. 28 The inherent limitations of our scientific knowledge were articulated by Kant, who argued that our knowledge cannot transcend our way of understanding. The “ways of understanding” are predetermined; they precede any knowledge acquisition and are the precondition to such acquisition. In a sense, Wittgenstein refined the analysis, arguing that knowledge must be formulated in a language, and the latter must be subject to a (sound) mechanism of assigning meaning. Thus, the inherent limitations of any possible “meaningassigning mechanism” impose limitations on what can be (meaningfully) said. Both philosophers spoke of the relation between the world and our thoughts. They took for granted (or rather assumed) that in the domain of wellformulated thoughts (e.g., logic), every valid conclusion can be effectively reached (i.e., every valid assertion can be effectively proved). Indeed, this naive assumption was refuted by G¨odel. In a similar vain, Turing’s work asserts that there exist welldefined problems that cannot be solved by welldefined methods.
Exercises
71
We stress that Turing’s assertion transcends the philosophical considerations of the first paragraph: It asserts that the limitations of our ability are due not only to the gap between the “world as is” and our model of it. In contrast, Turing’s assertion refers to inherent limitations on any rational process, even when this process is applied to wellformulated information and is aimed at a wellformulated goal. Indeed, in contrast to naive presumptions, not every wellformulated problem can be (effectively) solved. The P = N P conjecture goes even beyond Turing’s assertion. It limits the domain of the discussion to “fair” problems, that is, to problems for which valid solutions can be efficiently recognized as such. Indeed, there is something feigned in problems for which one cannot efficiently recognize valid solutions. Avoiding such feigned and/or unfair problems, P = N P means that (even with this limitation) there exist problems that are inherently unsolvable in the sense that they cannot be solved efficiently. That is, in contrast to naive presumptions, not every problem that refers to efficiently recognizable solutions can be solved efficiently. In fact, the gap between the complexity of recognizing solutions and the complexity of finding them vouches for the meaningfulness of the notion of a problem.
Exercises Exercise 2.1 (a quiz) 1. What are the justifications for associating efficient computation with polynomialtime algorithms? 2. What are the classes PF and PC? 3. What are the classes P and N P? 4. List a few computational problems in PF (resp., P). 5. Going beyond the list of the previous question, list a few problems in PC (resp., N P). 6. What does PC ⊆ PF mean in intuitive terms? 7. What does P = N P mean in intuitive terms? 8. Is it the case that PC ⊆ PF if and only if P = N P? 9. What are the justifications for believing that P = N P? Exercise 2.2 (PF contains problems that are not in PC) Show that PF contains some (unnatural) problems that are not in PC. Guideline: Consider the relation R = {(x, 1) : x ∈ {0, 1}∗ } ∪ {(x, 0) : x ∈ S}, where S is some undecidable set. Note that R is the disjoint union of two binary
72
2 The P versus NP Question
relations, denoted R1 and R2 , where R1 is in PF whereas R2 is not in PC. Furthermore, for every x it holds that R1 (x) = ∅. Exercise 2.3 In contrast to Exercise 2.2, show that if R ∈ PF and each instance of R has at most one solution (i.e., R(x) ≤ 1 for every x), then R ∈ PC. Exercise 2.4 Show that the following search problems are in PC. 1. Finding a traveling salesman tour of length that does not exceed a given threshold (when also given a matrix of distances between cities); 2. Finding the prime factorization of a given natural number; 3. Solving a given system of quadratic equations over a finite field; 4. Finding a truth assignment that satisfies a given Boolean formula. (For Item 2, use the fact that primality can be tested in polynomial time.) Exercise 2.5 Show that any S ∈ N P has many different NPproof systems (i.e., verification procedures V1 , V2 , . . . such that Vi (x, y) = 1 does not imply Vj (x, y) = 1 for i = j ). Guideline: For V and p as in Definition 2.5, define Vi (x, y) = 1 if y = p(x) + i and there exists a prefix y of y such that V (x, y ) = 1. Exercise 2.6 Relying on the fact that primality is decidable in polynomial time and assuming that there is no polynomialtime factorization algorithm, present two “natural but fundamentally different” NPproof systems for the set of composite numbers. Guideline: Consider the following verification procedures V1 and V2 for the set of composite numbers. Let V1 (n, y) = 1 if and only if y = n and n is not a prime, and V2 (n, m) = 1 if and only if m is a nontrivial divisor of n. Show that valid proofs with respect to V1 are easy to find, whereas valid proofs with respect to V2 are hard to find. Exercise 2.7 Show that for every R ∈ PC, there exists R ∈ PC and a polynomial p such that for every x it holds that R (x) ⊆ {0, 1}p(x) , and R ∈ PF if and only if R ∈ PF. Formulate and prove a similar fact for NPproof systems. Guideline: Note that for every R ∈ PC, there exists a polynomial p such that def for every (x, y) ∈ R it holds that y < p(x). Define R such that R (x) = {y01p(x)−(y+1) : (x, y) ∈ R}, and prove that R ∈ PF if and only if R ∈ PF. Exercise 2.8 In continuation of Exercise 2.7, show that for every set S ∈ N P and every sufficiently large polynomial p, there exists an NPproof system V
Exercises
73
such that all NPwitnesses to x ∈ S are of length p(x) (i.e., if V (x, y) = 1 then y = p(x)). Guideline: Start with an NPproof system V0 for S and a polynomial p0 such that V0 (x, y) = 1 implies y ≤ p0 (x). For every polynomial p > p0 (i.e., p(n) > p0 (n) for all n ∈ N), define V such that V (x, y 01p(x)−(y +1) ) = 1 if V0 (x, y ) = 1 and V (x, y) = 0 otherwise. Exercise 2.9 In continuation of Exercise 2.8, show that for every set S ∈ N P and every “nice” : N → N, there exists set S ∈ N P such that (1) S ∈ P if and only if S ∈ P, and (2) there exists an NPproof system V such that all NPwitnesses to x ∈ S are of length (x). Specifically, consider as nice any function : N → N such that is monotonically nondecreasing, computable in polynomial time,13 and satisfies (n) ≤ poly(n) and n ≤ poly((n)) (for every n ∈ N). Note that the novelty here (wrt Exercise 2.8) is that may be a √ sublinear function (e.g., (n) = n). def
Guideline: For an adequate polynomial p , consider S = {x01p (x)−x−1 } and the NPproof system V such that V (x01p (x)−x−1 , y) = V (x, y) and V (x , y) = 0 if x  ∈ {p (n) : n ∈ N}. Now, use Exercise 2.8. Exercise 2.10 Show that for every S ∈ N P, there exists an NPproof system def V such that the witness sets Wx = {y : V (x, y) = 1} are disjoint. Guideline: Starting with an NPproof system V0 for S, consider V such that V (x, y) = 1 if y = x, y and V0 (x, y ) = 1 (and V (x, y) = 0 otherwise). Exercise 2.11 Regarding Definition 2.7, show that if S is accepted by some nondeterministic machine of time complexity t, then it is accepted by a nondeterministic machine of time complexity O(t) that has a transition function that maps each possible symbolstate pair to exactly two triples. Guideline: First note that a kway (nondeterministic) choice can be emulated by log2 k (nondeterministic) binary choices. (Indeed, this requires creating O(k) new states for each such kway choice.) Also note that one can introduce fictitious (nondeterministic) choices by duplicating the set of states of the machine. 13
In fact, it suffices to require that the mapping n → (n) can be computed in time poly(n).
3 Polynomialtime Reductions
Overview: Reductions are procedures that use “functionally specified” subroutines. That is, the functionality of the subroutine is specified, but its operation remains unspecified and its running time is counted at unit cost. Thus, a reduction solves one computational problem by using oracle (or subroutine) calls to another computational problem. Analogously to our focus on efficient (i.e., polynomialtime) algorithms, here we focus on efficient (i.e., polynomialtime) reductions. We present a general notion of (polynomialtime) reductions among computational problems, and view the notion of a “Karpreduction” (also known as “manytoone reduction”) as an important special case that suffices (and is more convenient) in many cases. Reductions play a key role in the theory of NPcompleteness, which is the topic of Chapter 4. In the current chapter, we stress the fundamental nature of the notion of a reduction per se and highlight two specific applications: reducing search problems and optimization problems to decision problems. Furthermore, in these applications, it will be important to use the general notion of a reduction (i.e., “Cookreduction” rather than “Karpreduction”). We comment that the aforementioned reductions of search and optimization problems to decision problems further justify the common focus on the study of the decision problems. Organization. We start by presenting the general notion of a polynomialtime reduction and important special cases of it (see Section 3.1). In Section 3.2, we present the notion of optimization problems and reduce such problems to corresponding search problems. In Section 3.3, we discuss the reduction of search problems to corresponding decision problems, while emphasizing the special case in which the search problem is 74
3.1 The General Notion of a Reduction
75
reduced to the decision problem that is implicit in it. (In such a case, we say that the search problem is selfreducible.)
Teaching Notes We assume that many students have heard of reductions, but we fear that most have obtained a conceptually distorted view of their fundamental nature. In particular, we fear that reductions are identified with the theory of NPcompleteness, whereas reductions have numerous other important applications that have little to do with NPcompleteness (or completeness with respect to any other class). In particular, we believe that it is important to show that (natural) search and optimization problems can be reduced to (natural) decision problems. On Our Terminology. We prefer the terms Cookreductions and Karpreductions over the terms “general (polynomialtime) reductions” and “manytoone (polynomialtime) reductions.” Also, we use the term selfreducibility in a nontraditional way; that is, we say that the search problem of R is selfreducible if it can be reduced to the decision problem of SR = {x : ∃y s.t. (x, y) ∈ R}, whereas traditionally, selfreducibility refers to decision problems and is closely related to our notion of downward selfreducible (presented in Exercise 3.16). A Minor Warning. In Section 3.3.2, which is an advanced section, we assume that the students have heard of NPcompleteness. Actually, we only need the students to know the definition of NPcompleteness. Yet the teacher may prefer postponing the presentation of this material to Section 4.1 (or even to a later stage).
3.1 The General Notion of a Reduction Reductions are procedures that use “functionally specified” subroutines. That is, the functionality of the subroutine is specified, but its operation remains unspecified and its running time is counted at unit cost. Analogously to algorithms, which are modeled by Turing machines, reductions can be modeled as oracle (Turing) machines. A reduction solves one computational problem
76
3 Polynomialtime Reductions
(which may be either a search problem or a decision problem) by using oracle (or subroutine) calls to another computational problem (which again may be either a search or a decision problem). Thus, such a reduction yields a (simple) transformation of algorithms that solve the latter problem into algorithms that solve the former problem.
3.1.1 The Actual Formulation The notion of a general algorithmic reduction was discussed in Section 1.3.3 and formally defined in Section 1.3.6. These reductions, called Turingreductions and modeled by oracle machines (cf. Section 1.3.6), made no reference to the time complexity of the main algorithm (i.e., the oracle machine). Here, we focus on efficient (i.e., polynomialtime) reductions, which are often called Cookreductions. That is, we consider oracle machines (as in Definition 1.11) that run in time that is polynomial in the length of their input. We stress that the running time of an oracle machine is the number of steps made during its (own) computation, and that the oracle’s reply on each query is obtained in a single step. The key property of efficient reductions is that they allow for the transformation of efficient implementations of the subroutine (or the oracle) into efficient implementations of the task reduced to it. That is, as we shall see, if one problem is Cookreducible to another problem and the latter is polynomialtime solvable, then so is the former. The most popular case is that of reducing decision problems to decision problems, but we will also explicitly consider reducing search problems to search problems and reducing search problems to decision problems. Note that when reducing to a decision problem, the oracle is determined as the unique valid solver of the decision problem (since the function f : {0, 1}∗ → {0, 1} solves the decision problem of membership in S if, for every x, it holds that f (x) = 1 if x ∈ S and f (x) = 0 otherwise). In contrast, when reducing to a search problem, the oracle is not uniquely determined because there may be many different valid solvers (since the function f : {0, 1}∗ → {0, 1}∗ ∪ {⊥} def solves the search problem of R if, for every x, it holds that f (x) ∈ R(x) = {y : (x, y) ∈ R} if R(x) = ∅ and f (x) = ⊥ otherwise).1 We capture both cases in the following definition. Definition 3.1 (Cookreduction): A problem is Cookreducible to a problem if there exists a polynomialtime oracle machine M such that for every 1
Indeed, the solver is unique only if for every x it holds that R(x) ≤ 1.
3.1 The General Notion of a Reduction
77
function f that solves it holds that M f solves , where M f (x) denotes the output of M on input x when given oracle access to f . Note that (resp., ) may be either a search problem or a decision problem (or even a yetundefined type of a problem). At this point, the reader should verify that if is Cookreducible to and is solvable in polynomial time, then so is ; see Exercise 3.2 (which also asserts other properties of Cookreductions). We highlight the fact that a Cookreduction of to yields a simple transformation of efficient algorithms that solve the problem into efficient algorithms that solve the problem . The transformation consists of combining the code (or description) of any algorithm that solves with the code of reduction, yielding a code of an algorithm that solves . An Important Example. Observe that the second part of the proof of Theorem 2.6 is actually a Cookreduction of the search problem of any R in PC to a decision problem regarding a related set SR = { x, y : ∃y s.t. (x, y y ) ∈ R}, which is in N P. Thus, that proof establishes the following result. Theorem 3.2: Every search problem in PC is Cookreducible to some decision problem in N P. We shall see a tighter relation between search and decision problems in Section 3.3; that is, in some cases, R will be reduced to SR = {x : ∃y s.t. (x, y) ∈ R} rather than to SR .
3.1.2 Special Cases We shall consider two restricted types of Cookreductions, where the first type applies only to decision problems and the second type applies only to search problems. In both cases, the reductions are restricted to making a single query. Restricted Reductions Among Decision Problems. A Karpreduction is a restricted type of a reduction (from one decision problem to another decision problem) that makes a single query, and furthermore replies with the very answer that it has received. Specifically, for decision problems S and S , we say that S is Karpreducible to S if there is a Cookreduction of S to S that operates as follows: On input x (an instance for S), the reduction computes x , makes query x to the oracle S (i.e., invokes the subroutine for S on input x ), and answers whatever the latter returns. This reduction is often represented by the polynomialtime computable mapping of x to x ; that is, the standard definition of a Karpreduction is actually as follows.
78
3 Polynomialtime Reductions
x
f
f(x) oracle for S’
Figure 3.1. The Cookreduction that arises from a Karpreduction.
Definition 3.3 (Karpreduction): A polynomialtime computable function f is called a Karpreduction of S to S if, for every x, it holds that x ∈ S if and only if f (x) ∈ S . Thus, syntactically speaking, a Karpreduction is not a Cookreduction, but it trivially gives rise to one (i.e., on input x, the oracle machine makes query f (x), and returns the oracle answer; see Figure 3.1). Being slightly inaccurate but essentially correct, we shall say that Karpreductions are special cases of Cookreductions. Needless to say, Karpreductions constitute a very restricted case of Cookreductions. Specifically, Karpreductions refer only to reductions among decision problems, and are restricted to a single query (and to the way in which the answer is used). Still, Karpreductions suffice for many applications (most importantly, for the theory of NPcompleteness (when developed for decision problems)). On the other hand, due to purely technical (or syntactic) reasons, Karpreductions are not adequate for reducing search problems to decision problems. Furthermore, Cookreductions that make a single query are inadequate for reducing (hard) search problems to any decision problem (see Exercise 3.12).2 We note that even within the domain of reductions among decision problems, Karpreductions are less powerful than Cookreductions. Specifically, whereas each decision problem is Cookreducible to its complement, some decision problems are not Karpreducible to their complement (see Exercises 3.4 and 5.10). Augmentation for Reductions Among Search Problems. Karpreductions may (and should) be augmented in order to handle reductions among search problems. The augmentation should provide a way of obtaining a solution for 2
Cookreductions that make a single query overcome the technical reason that makes Karpreductions inadequate for reducing search problems to decision problems. (Recall that Karpreductions are a special case of Cookreductions that make a single query; cf. Exercise 3.11.)
3.1 The General Notion of a Reduction
79
the original instance from any solution for the reduced instance. Indeed, such a reduction of the search problem of R to the search problem of R operates as follows: On input x (an instance for R), the reduction computes x , makes query x to the oracle R (i.e., invokes the subroutine for searching R on input x ) obtaining y such that (x , y ) ∈ R , and uses y to compute a solution y to x (i.e., y ∈ R(x)). Thus, such a reduction can be represented by two polynomialtime computable mappings, f and g, such that (x, g(x, y )) ∈ R for any y that is a solution of f (x) (i.e., for y that satisfies (f (x), y ) ∈ R ). Indeed, f is a Karpreduction (of SR = {x : R(x) = ∅} to SR = {x : R (x ) = ∅}), but (unlike in the case of decision problems) the function g may be nontrivial (i.e., we may not always have g(x, y ) = y ). This type of reduction is called a Levinreduction and, analogously to the case of a Karpreduction, it is often identified with the two aforementioned mappings themselves (i.e., the (polynomialtime computable) mappings f of x to x , and the (polynomialtime computable) mappings g of (x, y ) to y). Definition 3.4 (Levin reduction): A pair of polynomialtime computable functions, f and g, is called a Levinreduction of R to R if f is a Karpreduction of SR = {x : ∃y s.t. (x, y) ∈ R} to SR = {x : ∃y s.t. (x , y ) ∈ R } and for every x ∈ SR and y ∈ R (f (x)) it holds that (x, g(x, y )) ∈ R, where R (x ) = {y : (x , y ) ∈ R }. Indeed, the (first) function f preserves the existence of solutions; that is, for any x, it holds that R(x) = ∅ if and only if R (f (x)) = ∅, since f is a Karpreduction of SR to SR . As for the second function (i.e., g), it maps any solution y for the reduced instance f (x) to a solution for the original instance x (where this mapping may also depend on x). We mention that it is natural also to consider a third function that maps solutions for R to solutions for R (see Exercise 4.20). Again, syntactically speaking, a Levinreduction is not a Cookreduction, but it trivially gives rise to one (i.e., on input x, the oracle machine makes query f (x), and returns g(x, y ) if the oracle answers with y = ⊥ (and returns ⊥ otherwise); see Figure 3.2).
3.1.3 Terminology and a Brief Discussion Cookreductions are often called general (polynomialtime) reductions, whereas Karpreductions are often called manytoone (polynomialtime) reductions. Indeed, throughout the current chapter, whenever we neglect to mention the type of a reduction, we actually mean a Cookreduction.
80
3 Polynomialtime Reductions
f
x
f(x) y’
g(x,y’)
g
oracle for R’
(in R’(f(x)))
Figure 3.2. The Cookreduction that arises from a Levinreduction.
Two Compound Notions. The following terms, which refer to the existence of several reductions, are often used in advanced studies. 1. We say that two problems are computationally equivalent if they are reducible to each other. This means that the two problems are essentially as hard (or as easy). Note that computationally equivalent problems need not reside in the same complexity class. For example, as we shall see in Section 3.3, for many natural relations R ∈ PC, the search problem of R and the decision problem of SR = {x : ∃y s.t. (x, y) ∈ R} are computationally equivalent, although (even syntactically) the two problems do not belong to the same class (i.e., R ∈ PC whereas SR ∈ N P). Also, each decision problem is computationally equivalent to its complement, although the two problems may not belong to the same class (see, e.g., Section 5.3). 2. We say that a class of problems, C, is reducible to a problem if every problem in C is reducible to . We say that the class C is reducible to the class C if for every ∈ C there exists ∈ C such that is reducible to . For example, Theorem 3.2 asserts that PC is reducible to N P. Also note that N P is reducible to PC (see Exercise 3.9). On the Greater Flexibility of Cookreductions. The fact that we allow Cookreductions (rather than confining ourselves to Karpreductions) is essential to various important connections between decision problems and other computational problems. For example, as will be shown in Section 3.2, a natural class of optimization problems is reducible to N P. Also recall that PC is reducible to N P (cf. Theorem 3.2). Furthermore, as will be shown in Section 3.3, many natural search problems in PC are reducible to a corresponding natural decision
3.2 Reducing Optimization Problems to Search Problems
81
problem in N P (rather than merely to some problem in N P). In all of these results, the reductions in use are (and must be) Cookreductions. Recall that we motivated the definition of Cookreductions by referring to their natural (“positive”) application, which offers a transformation of efficient implementations of the oracle into efficient algorithms for the reduced problem. Note, however, that once defined, reductions have a life of their own. In fact, the actual definition of a reduction does not refer to the aforementioned natural application, and reductions may be (and are) also used toward other applications. For further discussion, see Section 3.4.
3.2 Reducing Optimization Problems to Search Problems Many search problems refer to a set of potential solutions, associated with each problem instance, such that different solutions are naturally assigned different “values” (resp., “costs”). For example, in the context of finding a clique in a given graph, the size of the clique may be considered the value of the solution. Likewise, in the context of finding a 2partition of a given graph, the number of edges with both end points in the same side of the partition may be considered the cost of the solution. In such cases, one may be interested in finding a solution that has value exceeding some threshold (resp., cost below some threshold). Alternatively, one may seek a solution of maximum value (resp., minimum cost). For simplicity, let us focus on the case of a value that we wish to maximize. Still, the two different aforementioned objectives (i.e., exceeding a threshold and optimization) give rise to two different (auxiliary) search problems related to the same relation R. Specifically, for a binary relation R and a value function f : {0, 1}∗ × {0, 1}∗ → R, we consider two search problems. 1. Exceeding a threshold: Given a pair (x, v), the task is to find y ∈ R(x) such that f (x, y) ≥ v, where R(x) = {y : (x, y) ∈ R}. That is, we are actually referring to the search problem of the relation def
Rf = {( x, v, y) : (x, y) ∈ R ∧ f (x, y) ≥ v},
(3.1)
where x, v denotes a string that encodes the pair (x, v). 2. Maximization: Given x, the task is to find y ∈ R(x) such that f (x, y) = vx , where vx is the maximum value of f (x, y ) over all y ∈ R(x). That is, we are actually referring to the search problem of the relation Rf = {(x, y) ∈ R : f (x, y) = max {f (x, y )}}. def
y ∈R(x)
(If R(x) = ∅, then we define Rf (x) = ∅.)
(3.2)
82
3 Polynomialtime Reductions
Examples of value functions include the size of a clique in a graph, the amount of flow in a network (with link capacities), and so on. The task may be to find a clique of size exceeding a given threshold in a given graph or to find a maximumsize clique in a given graph. Note that in these examples, the “base” search problem (i.e., the relation R) is quite easy to solve, and the difficulty arises from the auxiliary condition on the value of a solution (presented in Rf and Rf ). Indeed, one may trivialize R (i.e., let R(x) = {0, 1}poly(x) for every x), and impose all necessary structure by the function f (see Exercise 3.6). We confine ourselves to the case that f is (rationalvalued and) polynomialtime computable, which in particular means that f (x, y) can be represented by a rational number of length polynomial in x + y. We will show next that in this case, the two aforementioned search problems (i.e., of Rf and Rf ) are computationally equivalent. Theorem 3.5: For any polynomialtime computable f : {0, 1}∗ ×{0, 1}∗ → Q and a polynomially bounded binary relation R, let Rf and Rf be as in Eq. (3.1) and Eq. (3.2), respectively. Then, the search problems of Rf and Rf are computationally equivalent. Note that for R ∈ PC and polynomialtime computable f , it holds that Rf ∈ PC. Combining Theorems 3.2 and 3.5, it follows that in this case both Rf and Rf are reducible to N P. We note, however, that even in this case it does not necessarily hold that Rf ∈ PC (unless, of course, P = N P). See further discussion following the proof. Proof: The search problem of Rf is reduced to the search problem of Rf by finding an optimal solution (for the given instance) and comparing its value to the given threshold value. That is, we construct an oracle machine that solves Rf by making a single query to Rf . Specifically, on input (x, v), the machine issues the query x (to a solver for Rf ), obtaining the optimal solution y (or an indication ⊥ that R(x) = ∅), computes f (x, y), and returns y if f (x, y) ≥ v. Otherwise (i.e., either y = ⊥ or f (x, y) < v), the machine returns an indication that Rf ( x, v) = ∅. Turning to the opposite direction, we reduce the search problem of Rf to the search problem of Rf by first finding the optimal value vx = maxy∈R(x) {f (x, y)} (by binary search on its possible values), and next finding a solution of value vx . In both steps, we use oracle calls to Rf . For simplicity, we assume that f assigns positive integer values (see Exercise 3.7), and let = poly(x) be such that f (x, y) ≤ 2 − 1 for every y ∈ R(x). Then, on input x, we first find vx = max{f (x, y) : y ∈ R(x)}, by making oracle calls of
3.3 SelfReducibility of Search Problems
83
the form x, v. The point is that vx < v if and only if Rf ( x, v) = ∅, which in turn is indicated by the oracle answer ⊥ (to the query x, v). Making queries, we determine vx (see Exercise 3.8). Note that in case R(x) = ∅, all the answers will indicate that Rf ( x, v) = ∅, and we halt indicating that Rf (x) = ∅ (which is indeed due to R(x) = ∅). Thus, we continue only if vx > 0, which indicates that Rf (x) = ∅. At this point, we make the query (x, vx ), and halt returning the oracle’s answer, which is a string y ∈ R(x) such that f (x, y) = vx . Comments Regarding the Proof of Theorem 3.5. The first direction of the proof uses the hypothesis that f is polynomialtime computable, whereas the opposite direction only uses the fact that the optimal value lies in a finite space of exponential size that can be “efficiently searched.” While the first direction is proved using a Levinreduction, this seems impossible for the opposite direction (i.e., finding an optimal solution does not seem to be Levinreducible to finding a solution that exceeds a threshold). On the Complexity of R f and Rf . Here, we focus on the natural case in which R ∈ PC and f is polynomialtime computable. In this case, Theorem 3.5 asserts that Rf and Rf are computationally equivalent. A closer look reveals, however, that Rf ∈ PC always holds, whereas Rf ∈ PC does not necessarily hold. That is, the problem of finding a solution (for a given instance) that exceeds a given threshold is in the class PC, whereas the problem of finding an optimal solution is not necessarily in the class PC. For example, the problem of finding a clique of a given size K in a given graph G is in PC, whereas the problem of finding a maximumsize clique in a given graph G is not known (and is quite unlikely)3 to be in PC (although it is Cookreducible to PC). The foregoing discussion suggests that the class of problems that are reducible to PC, which seems different from PC itself, is a natural and interesting class. Indeed, for every R ∈ PC and polynomialtime computable f , the former class contains Rf .
3.3 SelfReducibility of Search Problems The results to be presented in this section further justify the focus on decision problems. Loosely speaking, these results show that for many natural relations 3
See Exercise 5.14.
84
3 Polynomialtime Reductions
R, the question of whether or not the search problem of R is efficiently solvable (i.e., is in PF) is equivalent to the question of whether or not the “decision problem implicit in R” (i.e., SR = {x : ∃y s.t. (x, y) ∈ R}) is efficiently solvable (i.e., is in P). In fact, we will show that these two computational problems (i.e., R and SR ) are computationally equivalent. Note that the decision problem of SR is easily reducible to the search problem of R, and so our focus is on the other direction. That is, we are interested in relations R for which the search problem of R is reducible to the decision problem of SR . In such a case, we say that R is selfreducible.4 Definition 3.6 (the decision implicit in a search and selfreducibility): The decision problem implicit in the search problem of R is deciding membership in the set SR = {x : R(x) = ∅}, where R(x) = {y : (x, y) ∈ R}. The search problem of R is called selfreducible if it can be reduced to the decision problem of SR . Note that the search problem of R and the problem of deciding membership in SR refer to the same instances: The search problem requires finding an adequate solution (i.e., given x find y ∈ R(x)), whereas the decision problem refers to the question of whether such solutions exist (i.e., given x determine whether or not R(x) is nonempty). Thus, SR corresponds to the intuitive notion of a “decision problem implicit in R,” because SR is a decision problem that one implicitly solves when solving the search problem of R. Indeed, for any R, the decision problem of SR is easily reducible to the search problem for R (see Exercise 3.10). It follows that if a search problem R is selfreducible, then it is computationally equivalent to the decision problem SR . Note that the general notion of a reduction (i.e., Cookreduction) seems inherent to the notion of selfreducibility. This is the case not only due to syntactic considerations, but is also the case for the following inherent reason. An oracle to any decision problem returns a single bit per invocation, while the intractability of a search problem in PC must be due to the lack of more than a “single bit of information” (see Exercise 3.12). We shall see that selfreducibility is a property of many natural search problems (including all NPcomplete search problems). This justifies the relevance of decision problems to search problems in a stronger sense than established 4
Our usage of the term selfreducibility differs from the traditional one. Traditionally, a decision problem is called (downward) selfreducible if it is Cookreducible to itself via a reduction that on input x only makes queries that are smaller than x (according to some appropriate measure on the size of instances). Under some natural restrictions (i.e., the reduction takes the disjunction of the oracle answers), such reductions yield reductions of search to decision (as discussed in the main text). For further details, see Exercise 3.16.
3.3 SelfReducibility of Search Problems
85
in Section 2.4: Recall that in Section 2.4, we showed that the fate of the search problem class PC (wrt PF) is determined by the fate of the decision problem class N P (wrt P). Here, we show that for many natural search problems in PC (i.e., selfreducible ones), the fate of such an individual problem R (wrt PF) is determined by the fate of the individual decision problem SR (wrt P), where SR is the decision problem implicit in R. (Recall that R ∈ PC implies SR ∈ N P.) Thus, here we have “fate reductions” at the level of individual problems, rather than only at the level of classes of problems (as established in Section 2.4).
3.3.1 Examples We now present a few search problems that are selfreducible. We start with SAT (see Appendix A.2), the set of satisfiable Boolean formulae (in CNF), and consider the search problem in which given a formula one should find a truth assignment that satisfies it. The corresponding relation is denoted RSAT ; that is, (φ, τ ) ∈ RSAT if τ is a satisfying assignment to the formula φ. Indeed, the decision problem implicit in RSAT is SAT. Note that RSAT is in PC (i.e., it is polynomially bounded, and membership of (φ, τ ) in RSAT is easy to decide (by evaluating a Boolean expression)). Proposition 3.7 (RSAT is selfreducible): The search problem of RSAT is reducible to SAT. Thus, the search problem of RSAT is computationally equivalent to deciding membership in SAT. Hence, in studying the complexity of SAT, we also address the complexity of the search problem of RSAT . Proof: We present an oracle machine that solves the search problem of RSAT by making oracle calls to SAT. Given a formula φ, we find a satisfying assignment to φ (in case such an assignment exists) as follows. First, we query SAT on φ itself, and return an indication that there is no solution if the oracle answer is 0 (indicating φ ∈ SAT). Otherwise, we let τ , initiated to the empty string, denote a prefix of a satisfying assignment of φ. We proceed in iterations, where in each iteration we extend τ by one bit (as long as τ does not set all variables of φ). This is done as follows: First we derive a formula, denoted φ , by setting the first τ  + 1 variables of φ according to the values τ 0. We then query SAT on φ (which means that we ask whether or not τ 0 is a prefix of a satisfying assignment of φ). If the answer is positive, then we set τ ← τ 0 else we set τ ← τ 1. This procedure relies on the fact that if τ is a prefix of a satisfying assignment of φ and τ 0 is not a prefix of a satisfying assignment of φ, then τ 1 must be a prefix of a satisfying assignment of φ.
86
3 Polynomialtime Reductions
We wish to highlight a key point that has been blurred in the foregoing description. Recall that the formula φ is obtained by replacing some variables by constants, which means that φ per se contains Boolean variables as well as Boolean constants. However, the standard definition of SAT disallows Boolean constants in its instances.5 Nevertheless, φ can be simplified such that the resulting formula contains no Boolean constants. This simplification is performed according to the straightforward Boolean rules: That is, the constant false can be omitted from any clause, but if a clause contains only occurrences of the constant false, then the entire formula simplifies to false. Likewise, if the constant true appears in a clause, then the entire clause can be omitted, and if all clauses are omitted, then the entire formula simplifies to true. Needless to say, if the simplification process yields a Boolean constant, then we may skip the query, and otherwise we just use the simplified form of φ as our query. Other Examples. Reductions analogous to the one used in the proof of Proposition 3.7 can also be presented for other search problems (and not only for NPcomplete ones). Two such examples are searching for a 3coloring of a given graph and searching for an isomorphism between a given pair of graphs (where the first problem is known to be NPcomplete and the second problem is believed not to be NPcomplete). In both cases, the reduction of the search problem to the corresponding decision problem consists of iteratively extending a prefix of a valid solution, by making suitable queries in order to decide which extension to use. Note, however, that in these two cases, the process of getting rid of constants (representing partial solutions) is more involved. Specifically, in the case of Graph 3Colorability (resp., Graph Isomorphism), we need to enforce a partial coloring of a given graph (resp., a partial isomorphism between a given pair of graphs); see Exercises 3.13 and 3.14, respectively. Reflection. The proof of Proposition 3.7 (as well as the proofs of similar results) consists of two observations. 1. For every relation R in PC, it holds that the search problem of R is reducible to the decision problem of SR = { x, y : ∃y s.t. (x, y y ) ∈ R}. Such a reduction is explicit in the proof of Theorem 2.6 and is implicit in the proof of Proposition 3.7. 5
While the problem seems rather technical in the current setting (since it merely amounts to whether or not the definition of SAT allows Boolean constants in its instances), the analogous problem is far from being so technical in other cases (see Exercises 3.13 and 3.14).
3.3 SelfReducibility of Search Problems
87
2. For specific R ∈ PC (e.g., SSAT ), deciding membership in SR is reducible to deciding membership in SR = {x : ∃y s.t. (x, y) ∈ R}. This is where the specific structure of SAT was used, allowing for a direct and natural transformation of instances of SR to instances of SR . We comment that if SR is NPcomplete, then SR , which is always in N P, is reducible to SR by the mere hypothesis that SR is NPcomplete; this comment is elaborated in the following Section 3.3.2. For an arbitrary R ∈ PC, deciding membership in SR is not necessarily reducible to deciding membership in SR . Furthermore, deciding membership in SR is not necessarily reducible to the search problem of R. (See Exercises 3.18, 3.19, and 3.20.) In general, selfreducibility is a property of the search problem and not of the decision problem implicit in it. Furthermore, under plausible assumptions (e.g., the intractability of factoring), there exist relations R1 , R2 ∈ PC having the same implicitdecision problem (i.e., {x : R1 (x) = ∅} = {x : R2 (x) = ∅}) such that R1 is selfreducible but R2 is not (see Exercise 3.21). However, for many natural decision problems, this phenomenon does not arise; that is, for many natural NPdecision problems S, any NPwitness relation associated with S (i.e., R ∈ PC such that {x : R(x) = ∅} = S) is selfreducible. For details, see the following Section 3.3.2.
3.3.2 SelfReducibility of NPComplete Problems In this section, we assume that the reader has heard of NPcompleteness. Actually, we only need the reader to know the definition of NPcompleteness (i.e., a set S is N Pcomplete if S ∈ N P and every set in N P is reducible to S). Indeed, the reader may prefer to skip this section and return to it after reading Section 4.1 (or even later). Recall that, in general, selfreducibility is a property of the search problem R and not of the decision problem implicit in it (i.e., SR = {x : R(x) = ∅}). In contrast, in the special case of NPcomplete problems, selfreducibility holds for any witness relation associated with the (NPcomplete) decision problem. That is, all search problems that refer to finding NPwitnesses for any NPcomplete decision problem are selfreducible. Theorem 3.8: For every R in PC such that SR is N Pcomplete, the search problem of R is reducible to deciding membership in SR . In many cases, as in the proof of Proposition 3.7, the reduction of the search problem to the corresponding decision problem is quite natural. The
88
3 Polynomialtime Reductions
following proof presents a generic reduction (which may be “unnatural” in some cases). Proof: In order to reduce the search problem of R to deciding SR , we compose the following two reductions: 1. A reduction of the search problem of R to deciding membership in SR = { x, y : ∃y s.t. (x, y y ) ∈ R}. As stated in Section 3.3.1 (in the paragraph titled “Reflection”), such a reduction is implicit in the proof of Proposition 3.7 (as well as being explicit in the proof of Theorem 2.6). 2. A reduction of SR to SR . This reduction exists by the hypothesis that SR is N Pcomplete and the fact that SR ∈ N P. (Note that we need not assume that this reduction is a Karpreduction, and furthermore it may be an “unnatural” reduction). The theorem follows.
3.4 Digest and General Perspective Recall that we presented (polynomialtime) reductions as (efficient) algorithms that use functionally specified subroutines. That is, an efficient reduction of problem to problem is an efficient algorithm that solves while making subroutine calls to any procedure that solves . This presentation fits the “natural” (“positive”) application of such a reduction; that is, combining such a reduction with an efficient implementation of the subroutine (that solves ), we obtain an efficient algorithm for solving . We note that the existence of a polynomialtime reduction of to actually means more than the latter implication. For example, a moderately inefficient algorithm for solving also yields something for ; that is, if is solvable in time t then is solvable in time t such that t(n) = poly(n) · t (poly(n)); for example, if t (n) = nlog2 n then t(n) = poly(n)1+log2 poly(n) = nO(log n) . Thus, the existence of a polynomialtime reduction of to yields a general upper bound on the time complexity of in terms of the time complexity of . We note that tighter relations between the complexity of and can be established whenever the reduction satisfies additional properties. For example, suppose that is polynomialtime reducible to by a reduction that makes queries of linear length (i.e., on input x each query has length O(x)). Then, if solvable in time t such that t(n) = poly(n) · is solvable in time t then is √ √ √ t (O(n)); for example, if t (n) = 2 n then t(n) = 2O(log n)+ O(n) = 2O( n) . We
Exercises
89
further note that bounding other complexity measures of the reduction (e.g., its space complexity) allows for relating the corresponding complexities of the problems. In contrast to the foregoing “positive” applications of polynomialtime reductions, the theory of NPcompleteness (presented in Chapter 4) is famous for its “negative” application of such reductions. Let us elaborate. The fact that is polynomialtime reducible to means that if solving is feasible, then solving is feasible. The direct “positive” application starts with the hypothesis that is feasibly solvable and infers that so is . In contrast, the “negative” application uses the counterpositive: It starts with the hypothesis that solving is infeasible and infers that the same holds for .
Exercises Exercise 3.1 (a quiz) 1. 2. 3. 4. 5. 6.
What are Cookreductions? What are Karpreductions and Levinreductions? What is the motivation for defining all of these types of reductions? Can any problem in PC be reduced to some problem in N P? What is selfreducibility and how does it relate to the previous question? List five search problems that are selfreducible. (See Exercise 3.15.)
Exercise 3.2 Verify the following properties of Cookreductions: 1. Cookreductions preserve efficient solvability: If is Cookreducible to and is solvable in polynomial time, then so is . 2. Cookreductions are transitive: If is Cookreducible to and is Cookreducible to , then is Cookreducible to . 3. Cookreductions generalize efficient decision procedures: If is solvable in polynomial time, then it is Cookreducible to any problem . In continuation of the last item, show that a problem is solvable in polynomial time if and only if it is Cookreducible to a trivial problem (e.g., deciding membership in the empty set). Exercise 3.3 Show that Karpreductions (and Levinreductions) are transitive. Exercise 3.4 Show that some decision problems are not Karpreducible to their complement (e.g., the empty set is not Karpreducible to {0, 1}∗ ). A popular exercise of dubious nature is showing that any decision problem in P is Karpreducible to any nontrivial decision problem, where the decision
90
3 Polynomialtime Reductions
problem regarding a set S is called nontrivial if S = ∅ and S = {0, 1}∗ . It follows that every nontrivial set in P is Karpreducible to its complement. Exercise 3.5 (Exercise 2.7, reformulated) Show that for every search problem R ∈ PC there exists a polynomial p and a search problem R ∈ PC that is computationally equivalent to R such that for every x it holds that R (x) ⊆ {0, 1}p(x) . Formulate and prove a similar fact for NPproof systems. Similarly, revisit Exercise 2.9. Exercise 3.6 (reducing search problems to optimization problems) For every polynomially bounded relation R (resp., R ∈ PC), present a function f (resp., a polynomialtime computable function f ) such that the search problem of R is computationally equivalent to the search problem in which given (x, v) one has to find a y ∈ {0, 1}poly(x) such that f (x, y) ≥ v. Guideline: Let f (x, y) = 1 if (x, y) ∈ R and f (x, y) = 0 otherwise. Exercise 3.7 In the proof of the second direction of Theorem 3.5, we made the simplifying assumption that f assigns values that are both integral and positive. 1. Justify the aforementioned assumption by showing that for any rationalvalued function f there exists a function g as in the assumption such that Rf (resp., Rf ) is computationally equivalent to Rg (resp., Rg ), where Rf , Rf and Rg , Rg are as in Theorem 3.5. 2. Extend the current proof of Theorem 3.5 so that it also applies to the general case in which f is rationalvalued. Indeed, the two items provide alternative justifications for the simplifying assumption made in the said proof. Exercise 3.8 (an application of binary search) Show that using binary queries of the form “is z < v” it is possible to determine the value of an integer z that is a priori known to reside in the interval [0, 2 − 1]. Guideline: Consider a process that iteratively halves the interval in which z is known to reside in. Exercise 3.9 Prove that N P is reducible to PC. Guideline: Consider the search problem defined in Eq. (2.1). Exercise 3.10 Prove that for any R, the decision problem of SR is easily reducible to the search problem for R, and that if R is in PC then SR is in N P.
Exercises
91
Guideline: Consider a reduction that invokes the search oracle and answer 1 if and only if the oracle returns some string (rather than the “no solution” symbol). Exercise 3.11 (Cookreductions that make a single query) Let M be a polynomialtime oracle machine that makes at most one query. Show that the computation of M can be represented by two polynomialtime computable functions f and g such that M F (x) = g(x, F (f (x))), where M F (x) denotes the output of M on input x when given oracle access to the function F . Discuss the relationship between such Cookreductions and Karpreductions (resp., Levinreductions). Exercise 3.12 Prove that if R ∈ PC is reducible to SR by a Cookreduction that makes a logarithmic number of queries, then R ∈ PF. Thus, selfreducibility for problems in PC \ PF requires making more than logarithmically many queries. More generally, prove that if R ∈ PC \ PF is Cookreducible to any decision problem, then this reduction makes more than a logarithmic number of queries. Guideline: Note that the oracle answers can be emulated by trying all possibilities, and that (for R ∈ PC) the correctness of the output of the oracle machine can be efficiently tested. Exercise 3.13 Show that the standard search problem of Graph 3Colorability6 is selfreducible, where this search problem consists of finding a 3coloring for a given input graph. Guideline: Iteratively extend the current prefix of a 3coloring of the graph by making adequate oracle calls to the decision problem of Graph 3Colorability. Specifically, encode the question of whether or not (χ1 , . . . , χt ) ∈ {1, 2, 3}t is a prefix of a 3coloring of the graph G as a query regarding the 3colorability of an auxiliary graph G . Note that we merely need to check whether G has a 3coloring in which the equalities and inequalities induced by the (prefix of the) coloring (χ1 , . . . , χt ) hold. This can be done by adequate gadgets (e.g., inequality is enforced by an edge between the corresponding vertices, whereas equality is enforced by an adequate subgraph that includes the relevant vertices as well as auxiliary vertices). Exercise 3.14 Show that the standard search problem of Graph Isomorphism7 is selfreducible, where this search problem consists of finding an isomorphism between a given pair of graphs. 6 7
See Appendix A.1. See Appendix A.1.
92
3 Polynomialtime Reductions
Guideline: Iteratively extend the current prefix of an isomorphism between the two Nvertex graphs by making adequate oracle calls to the decision problem of Graph Isomorphism. Specifically, encode the question of whether or not (π1 , . . . , πt ) ∈ [N]t is a prefix of an isomorphism between G1 = ([N ], E1 ) and G2 = ([N ], E2 ) as a query regarding isomorphism between two auxiliary graphs G1 and G2 . This can be done by attaching adequate gadgets to pairs of vertices that we wish to be mapped to each other (by the isomorphism). For example, we may connect each of the vertices in the i th pair to an auxiliary star consisting of (N + i) vertices. Exercise 3.15 List five search problems that are selfreducible. Guideline: Note that three such problems were mentioned in Section 3.3.1. Additional examples may include any NPcomplete search problem (see Section 3.3.2) as well as any problem in PF. Exercise 3.16 (downward selfreducibility) We say that a set S is downward selfreducible if there exists a Cookreduction of S to itself that only makes queries that are each shorter than the reduction’s input (i.e., if on input x the reduction makes the query q then q < x).8 1. Show that SAT is downward selfreducible with respect to a natural encoding of CNF formulae. Note that this encoding should have the property that instantiating a variable in a formula results in a shorter formula. A harder exercise consists of showing that Graph 3Colorability is downward selfreducible with respect to some reasonable encoding of graphs. Note that this encoding has to be selected carefully. Guideline: For the case of SAT use the fact that φ ∈ SAT if and only if either φ0 ∈ SAT or φ1 ∈ SAT, where φσ denotes the formula φ with the first variable instantiated to σ . For the case of Graph 3Colorability, partition all possible 3colorings according to whether or not they assign the first pair of unconnected vertices the same color. Enforce an inequality constraint by connecting the two vertices, and enforce an equality constraint by combining the two vertices (rather than by connecting them via a gadget that contains auxiliary vertices as suggested in the guideline to Exercise 3.13). Use an encoding that guarantees that any (n + 1)vertex graph has a longer description than any nvertex graph, and that adding edges decreases the description length.9 8 9
Note that on some instances, the reduction may make no queries at all. (This option prevents a possible nonviability of the definition due to very short instances.) For example, encode any nvertex graph that has m edges as an (n3 − 2m log2 n)bit long string that contains the (adequately padded) list of all pairs of unconnected vertices.
Exercises
93
2. Suppose that S is downward selfreducible by a reduction that outputs the disjunction of the oracle answers.10 Show that in this case, S is characterized by a witness relation R ∈ PC (i.e., S = {x : R(x) = ∅}) that is selfreducible (i.e., the search problem of R is Cookreducible to S). Needless to say, it follows that S ∈ N P. Guideline: Define R such that (x0 , x1 , . . . , xt ) is in R if xt ∈ S ∩ {0, 1}O(1) and, for every i ∈ {0, 1, . . . , t − 1}, on input xi the selfreduction makes a set of queries that contains the string xi+1 . Prove that if x0 ∈ S then a sequence (x0 , x1 , . . . , xt ) ∈ R exists (by forward induction (which selects for each xi ∈ S a query xi+1 in S)). Next, prove that (x0 , x1 , . . . , xt ) ∈ R implies x0 ∈ S (by backward induction from xt ∈ S (which infers from the hypothesis xi+1 ∈ S that xi is in S)). Finally, prove that R ∈ PC (by noting that t ≤ x0 ). Note that the notion of downward selfreducibility may be generalized in some natural ways. For example, we may also say that S is downward selfreducible in case it is computationally equivalent via Karpreductions to some set that is downward selfreducible (in the foregoing strict sense). Note that Part 2 still holds. Exercise 3.17 (compressing Karpreductions) In continuation of Exercise 3.16, we consider downward selfreductions that make at most one query (i.e., Cookreductions of decision problems to themselves that make at most one query such that this query is shorter than the reduction’s input). Note that compressing Karpreductions are a special case, where the Karpreduction f is called compressing if f (x) < x holds for all but finitely many x’s. Prove that if S is downward selfreducible by a Cookreduction that makes at most one query, then S ∈ P. Guideline: Consider first the special case of compressing Karpreductions. Observe that for every x and i (which may depend on x), it holds that x ∈ S if and only if f i (x) ∈ S, where f i (x) denotes the Karpreduction f iterated i times. When extending the argument to the general case, use Exercise 3.11. Exercise 3.18 (NPproblems that are not selfreducible) 1. Prove that if a search problem R is not selfreducible then (1) R ∈ PF and (2) the set SR = { x, y : ∃y s.t. (x, y y ) ∈ R} is not Cookreducible to SR = {x : ∃y s.t. (x, y) ∈ R}. 10
Note that this condition holds for both problems considered in the previous item.
94
3 Polynomialtime Reductions
def
2. Assuming that P = N P ∩ coN P, where coN P = {{0, 1}∗ \S : S ∈ N P}, show that there exists a search problem that is in PC but is not selfreducible. Guideline: Given S ∈ (N P ∩ coN P) \ P, present relations R1 , R2 ∈ PC such that S = {x : R1 (x) = ∅} = {x : R2 (x) = ∅}. Then, consider the relation R = {(x, 1y) : (x, y) ∈ R1 } ∪ {(x, 0y) : (x, y) ∈ R2 }, and prove that R ∈ PC \ PF. Noting that SR = {0, 1}∗ , infer that R is not selfreducible. (Actually, R = R1 ∪ R2 will work, too.) Exercise 3.19 (extending generic solutions’ prefixes versus PC and PF) In contrast to what one may guess, extending solutions’ prefixes (equiv., deciding membership in SR = { x, y : ∃y s.t. (x, y y ) ∈ R}) may not be easy even if finding solutions is easy (i.e., R ∈ PF). Specifically, assuming that P = N P, present a search problem R in PC ∩ PF such that deciding SR is not reducible to the search problem of R. Guideline: Consider the relation R = {(x, 0x) : x ∈ {0, 1}∗ } ∪ {(x, 1y) : (x, y) ∈ R }, where R is an arbitrary relation in PC \ PF, and note that R ∈ PC. Prove that R ∈ PF but SR ∈ P. Exercise 3.20 In continuation of Exercise 3.18, present a natural search problem R in PC such that if factoring integers is intractable, then the search problem R (and so also SR ) is not reducible to SR . Guideline: As in Exercise 2.6, consider the relation R such that (n, q) ∈ R if the integer q is a nontrivial divisor of the integer n. Use the fact that the set of prime numbers is in P. Exercise 3.21 In continuation of Exercises 3.18 and 3.20, show that under suitable assumptions there exists relations R1 , R2 ∈ PC having the same implicitdecision problem (i.e., {x : R1 (x) = ∅} = {x : R2 (x) = ∅}) such that R1 is selfreducible but R2 is not. Specifically: 1. Prove the existence of such relations assuming that P = N P ∩ coN P; 2. Present natural relations assuming the intractability of factoring. Hint: see Exercise 2.6.
Exercise 3.22 Using Theorem 3.2, provide an alternative (presentation of the) proof of Theorem 3.8 without referring to the set SR = { x, y : ∃y s.t. (x, y y ) ∈ R}.11 11
Indeed, this is merely a matter of presentation, since the proof of Theorem 3.2 refers to SR . Thus, when using Theorem 3.2, the decision problem (in N P) to which we reduce R is arbitrary only from the perspective of the theorem’s statement (but not from the perspective of its proof).
Exercises
R
R
S’R
SR
arbitrary
95
R
SR
RSAT
SAT
SR
Figure 3.3. The three proofs of Theorem 3.8: The original proof of Theorem 3.8 is depicted on the left, the outline of Exercise 3.22 is in the middle, and the outline of Exercise 3.23 is on the right. The upper ellipses represent the class PC, and the lower ellipses represent N P.
Guideline: Theorem 3.2 implies that R is Cookreducible to some decision problem in N P, which in turn is reducible to SR (due to the N Pcompleteness of SR ). Exercise 3.23 (Theorem 3.8, revisited) In continuation of Exercise 3.22, using Proposition 3.7 and the fact that RSAT is PCcomplete (as per Definition 4.2), provide an alternative proof of Theorem 3.8 (again, without referring to the set SR ). See Figure 3.3. Guideline: Reduce the search problem of R to deciding SR , by composing the following three reductions: (1) a reduction of the search problem of R to the search problem of RSAT , (2) a reduction of the search problem of RSAT to SAT, and (3) a reduction of SAT to SR .
4 NPCompleteness
Overview: In light of the difficulty of settling the PvsNP Question, when faced with a hard problem H in NP, we cannot expect to prove that H is not in P (unconditionally), because this would imply P = N P. The best we can expect is a conditional proof that H is not in P, based on the assumption that NP is different from P. The contrapositive is proving that if H is in P, then so is any problem in NP (i.e., NP equals P). One possible way of proving such an assertion is showing that any problem in NP is polynomialtime reducible to H. This is the essence of the theory of NPcompleteness. In this chapter we prove the existence of NPcomplete problems, that is, the existence of individual problems that “effectively encode” a wide class of seemingly unrelated problems (i.e., all problems in NP). We also prove that deciding the satisfiability of a given Boolean formula is NPcomplete. Other NPcomplete problems include deciding whether a given graph is 3colorable and deciding whether a given graph contains a clique of a given size. The core of establishing the NPcompleteness of these problems is showing that each of them can encode any other problem in NP. Thus, these demonstrations provide a method of encoding instances of any NP problem as instances of the target NPcomplete problem. Organization. We start by defining NPcomplete problems (see Section 4.1) and demonstrating their existence (see Section 4.2). Next, in Section 4.3, we present several natural NPcomplete problems, including circuit and formula satisfiability (i.e., CSAT and SAT), set cover, and Graph 3Colorability. In Section 4.4, assuming that P = N P, we prove the existence of NP problems that are neither in P nor NPcomplete.
96
Teaching Notes
97
Teaching Notes We are sure that many students have heard of NPcompleteness before, but we suspect that most of them have missed some important conceptual points. Specifically, we fear that they have missed the point that the mere existence of NPcomplete problems is amazing (let alone that these problems include natural ones such as SAT). We believe that this situation is a consequence of presenting the detailed proof of Cook’s Theorem right after defining NPcompleteness. In contrast, we suggest starting with a proof that Bounded Halting is NPcomplete. We suggest establishing the NPcompleteness of SAT by a reduction from the circuit satisfaction problem (CSAT), after establishing the NPcompleteness of the latter. Doing so allows us to decouple two important parts of the proof of the NPcompleteness of SAT: the emulation of Turing machines by circuits and the emulation of circuits by formulae with auxiliary variables. In view of the importance that we attach to search problems, we also address the NPcompleteness of the corresponding search problems. While it could have been more elegant to derive the NPcompleteness of the various decision problems by an immediate corollary to the NPcompleteness of the corresponding search problems (see Exercise 4.2), we chose not to do so. Instead, we first derive the standard results regarding decision problems, and next augment this treatment in order to derive the corresponding results regarding search problems. We believe that our choice will better serve most students. The purpose of Section 4.3.2 is to expose the students to a sample of NPcompleteness results and proof techniques. We believe that this traditional material is insightful, but one may skip it if pressed for time. We mention that the reduction presented in the proof of Proposition 4.10 is not the “standard” one, but is rather adapted from the FGLSSreduction [10]. This is done in anticipation of the use of the FGLSSreduction in the context of the study of the complexity of approximation (cf., e.g., [15] or [13, Sec. 10.1.1]). Furthermore, although this reduction creates a larger graph, we find it clearer than the “standard” reduction. Section 4.3.5 provides a highlevel discussion of some positive applications of NPcompleteness. The core of this section is a brief description of three types of probabilistic proof systems and the role of NPcompleteness in establishing three fundamental results regarding them. For further details on probabilistic proof systems, we refer the interested reader to [13, Chap. 9]. Since probabilistic proof systems provide natural extensions of the notion of an NPproof system, which underlies our definition of N P, we recommend Section 4.3.5 (with a possible augmentation based on [13, Chap. 9]) as the most appropriate choice of advanced material that may accompany the basic material covered in this book.
98
4 NPCompleteness
This chapter contains some additional advanced material that is not intended for presentation in class. One such example is the assertion of the existence of problems in NP that are neither in P nor NPcomplete (i.e., Theorem 4.12). Indeed, we recommend either stating Theorem 4.12 without a proof or merely presenting the proof idea. Another example is Section 4.5, which seems unsuitable for most undergraduate students. Needless to say, Section 4.5 is definitely inappropriate for presentation in an undergraduate class, but it may be useful for guiding a discussion in a small group of interested students.
4.1 Definitions Loosely speaking, a problem in NP is called NPcomplete if any efficient algorithm for it can be converted into an efficient algorithm for any other problem in NP. Hence, if NP is different from P, then no NPcomplete problem can be in P. The aforementioned conversion of an efficient algorithm for one NPproblem1 into efficient algorithms for other NPproblems is actually performed by a reduction. Thus, a problem (in NP) is NPcomplete if any problem in NP is efficiently reducible to it, which means that each individual NPcomplete problem “encodes” all problems in NP. The standard definition of NPcompleteness refers to decision problems, but we will also present a definition of NPcomplete (or rather PCcomplete) search problems. In both cases, NPcompleteness of a problem combines two conditions: 1. is in the class (i.e., being in N P or PC, depending on whether is a decision or a search problem). 2. Each problem in the class is reducible to . This condition is called NPhardness. Although a perfectly good definition of NPhardness could have allowed arbitrary Cookreductions, it turns out that Karpreductions (resp., Levinreductions) suffice for establishing the NPhardness of all natural NPcomplete decision (resp., search) problems. Consequently, NPcompleteness is commonly defined using this restricted notion of a polynomialtime reduction. Definition 4.1 (NPcompleteness of decision problems, restricted notion): A set S is N P complete if it is in N P and every set in N P is Karpreducible to S. 1
I.e., a problem in NP.
4.2 The Existence of NPComplete Problems
99
A set is N P hard if every set in N P is Karpreducible to it (i.e., the class N P is Karpreducible to it). Indeed, there is no reason to insist on Karpreductions (rather than using arbitrary Cookreductions), except that the restricted notion suffices for all known demonstrations of NPcompleteness and is easier to work with. An analogous definition applies to search problems. Definition 4.2 (NPcompleteness of search problems, restricted notion): A binary relation R is PC complete if it is in PC and every relation in PC is Levinreducible to R. Throughout the book, we will sometimes abuse the terminology and refer to search problems as NPcomplete (rather than PCcomplete). Likewise, we will say that a search problem is NPhard (rather than PC hard) if every relation in PC is Levinreducible to it. Note that if R is PCcomplete, then SR is N Pcomplete, where SR = {x : ∃y s.t. (x, y) ∈ R} (see Exercise 4.2). We stress that the mere fact that we have defined a property (i.e., NPcompleteness) does not mean that there exist objects that satisfy this property. It is indeed remarkable that NPcomplete problems do exist. Such problems are “universal” in the sense that efficiently solving them allows for efficiently solving any other (reasonable) problem (i.e., problems in NP).
4.2 The Existence of NPComplete Problems We suggest not to confuse the mere existence of NPcomplete problems, which is remarkable by itself, with the even more remarkable existence of “natural” NPcomplete problems. The following proof delivers the first message and also focuses on the essence of NPcompleteness, rather than on more complicated technical details. The essence of NPcompleteness is that a single computational problem may “effectively encode” a wide class of seemingly unrelated problems. Theorem 4.3: There exist NPcomplete relations and sets. Proof: The proof (as well as any other NPcompleteness proofs) is based on the observation that some decision problems in N P (resp., search problems in PC) are “rich enough” to encode all decision problems in N P (resp., all search problems in PC). This fact is most obvious for the “generic” decision and search problems, denoted Su and Ru (and defined next), which are used to derive the simplest proof of the current theorem. We consider the following relation Ru and the decision problem Su implicit in Ru (i.e., Su = {x : ∃y s.t. (x, y) ∈ Ru }). Both problems refer to the same
100
4 NPCompleteness
type of instances, which in turn have the form x = M, x, 1t , where M is a description of a (standard deterministic) Turing machine, x is a string, and t is a natural number. The number t is given in unary (rather than in binary) in order to guarantee that bounds of the form poly(t) are polynomial (rather than exponential) in the instance’s length. (This implies that various complexity measures (e.g., time and length) that can be upperbounded by a polynomial in t yield upper bounds that are polynomial in the length of the instance (i.e.,  M, x, 1t , which is linearly related to M + x + t).) A solution to the instance x = M, x, 1t (of Ru ) is a string y (of length at most t)2 such that M accepts the input pair (x, y) within t steps. Definition. The relation Ru consists of pairs ( M, x, 1t , y) such that M
accepts the input pair (x, y) within t steps, where y ≤ t. def The corresponding set Su = {x : ∃y s.t. (x, y) ∈ Ru } consists of triples
M, x, 1t such that machine M accepts some input of the form (x, ·) within t steps. It is easy to see that Ru is in PC and that Su is in N P. Indeed, Ru is recognizable by a universal Turing machine, which on input ( M, x, 1t , y) emulates (t steps of) the computation of M on (x, y). Note that this emulation can be conducted in poly(M + x + t) = poly(( M, x, 1t , y)) steps, and recall that Ru is polynomially bounded (by its very definition). (The fact that Su ∈ N P follows similarly.)3 We comment that u indeed stands for universal (i.e., universal machine), and the proof extends to any reasonable model of computation (which has adequate universal machines). We now turn to show that Ru and Su are NPhard in the adequate sense (i.e., Ru is PChard and Su is N Phard). We first show that any set in N P is Karpreducible to Su . Let S be a set in N P and let us denote its witness relation by R; that is, R is in PC and x ∈ S if and only if there exists y such that (x, y) ∈ R. Let pR be a polynomial bounding the length of solutions in R (i.e., y ≤ pR (x) for every (x, y) ∈ R), let MR be a polynomialtime machine deciding membership (of alleged (x, y) pairs) in R, and let tR be a polynomial bounding its running time. Then, the desired Karpreduction maps an instance 2
3
Instead of requiring that y ≤ t, one may require that M is “canonical” in the sense that it reads its entire input before halting. Thus, if y > t, then such a canonical machine M does not halt (let alone accept) within t steps when given the input pair (x, y). Alternatively, Su ∈ N P follows from Ru ∈ PC, because for every R ∈ PC it holds that SR = {x : ∃y s.t. (x, y) ∈ R} is in N P.
4.2 The Existence of NPComplete Problems
101
x (for S) to the instance MR , x, 1tR (x+pR (x)) (for Su ); that is, def
x → f (x) = MR , x, 1tR (x+pR (x)) .
(4.1)
Note that this mapping can be computed in polynomial time, and that x ∈ S if and only if f (x) = MR , x, 1tR (x+pR (x)) ∈ Su . Details follow. First, note that the mapping f does depend (of course) on S, and so it may depend on the fixed objects MR , pR and tR (which depend on S). Thus, computing f on input x calls for printing the fixed string MR , copying x, and printing a number of 1’s that is a fixed polynomial in the length of x. Hence, f is polynomialtime computable. Second, recall that x ∈ S if and only if there exists y such that y ≤ pR (x) and (x, y) ∈ R. Since MR accepts (x, y) ∈ R within tR (x + y) steps, it follows that x ∈ S if and only if there exists y such that y ≤ pR (x) and MR accepts (x, y) within tR (x + y) steps.4 It follows that x ∈ S if and only if f (x) ∈ Su . We now turn to the search version. For reducing the search problem of any R ∈ PC to the search problem of Ru , we use essentially the same reduction. On input an instance x (for R), we make the query MR , x, 1tR (x+pR (x)) to the search problem of Ru and return whatever the latter returns. Note that if x ∈ S, then the answer will be “no solution,” whereas for every x and y it holds that (x, y) ∈ R if and only if ( MR , x, 1tR (x+pR (x)) , y) ∈ Ru . Thus, a Levinreduction of R to Ru consists of the pair of functions (f, g), where f is the foregoing Karpreduction and g(x, y) = y. Note that, indeed, for every (f (x), y) ∈ Ru , it holds that (x, g(x, y)) = (x, y) ∈ R. Digest: Generic Reductions. The reduction presented in the proof of Theorem 4.3 is called “generic” because it (explicitly) refers to any (generic) NPproblem. That is, we actually presented a scheme for the design of reductions from any set S in N P (resp., relation R in PC) to the set Su (resp., relation Ru ). When plugging in a specific set S (resp., relation R), or rather by providing the corresponding machine MR and polynomials pR , tR , we obtain a specific Karpreduction f (as described in the proof). Note that the fact that we not only provide a Karpreduction of each S ∈ N P to Su but also provide a scheme for deriving such reductions, is more than required in the definition of NPcompleteness.5 4 5
This presentation assumes that pR and tR are monotonically nondecreasing, which holds without loss of generality. Advanced comment: We comment that it is hard to conceive of a demonstration of NPcompleteness that does not yield a scheme for the design of reductions from any given
102
4 NPCompleteness
Digest: the Role of 1 t in the Definition of Ru . The role of including 1t in the description of the problem instance is to allow placement of Ru in PC (resp., Su in N P). In contrast, consider the relation Ru that consists of pairs ( M, x, t, y) such that M accepts x, y within t steps. Indeed, the difference between Ru and Ru is that in Ru the time bound t appears in unary notation, whereas in Ru it appears in binary. Note that although Ru is PChard (see Exercise 4.3), it is not in PC (because membership in Ru cannot be decided in polynomial time (see [13, §4.2.1.2])). Going even further, we note that omitting t altogether from the problem instance yields a search problem that is not solvable at all. That def is, consider the relation RH = {( M, x, y) : M(x, y) = 1} (which is related to the Halting Problem). Indeed, the search problem of any relation in PC is Karpreducible to the search problem of RH , but RH is not solvable at all (i.e., there exists no algorithm that halts on every input such that on input x = M, x the algorithm outputs a string y in RH (x) if such a y exists).
Bounded Halting and NonHalting We note that the problem shown to be NPcomplete in the proof of Theorem 4.3 is related to the following two problems, called Bounded Halting and Bounded NonHalting. Fixing any programming language, the instance to each of these problems consists of a program π and a time bound t (presented in unary). 1. The decision version of Bounded Halting consists of determining whether or not there exists an input (of length at most t) on which the program π halts in t steps, whereas the search problem consists of finding such an input. 2. The decision version of Bounded NonHalting consists of determining whether or not there exists an input (of length at most t) on which the program π does not halt in t steps, whereas the search problem consists of finding such an input. It is easy to prove that both problems are NPcomplete (see Exercise 4.4). Note that the two (decision) problems are not complementary (i.e., (π, 1t ) may be a yesinstance of both decision problems).6
6
NPproblem to the target NPcomplete problem. On the other hand, our scheme requires knowledge of a machine MR and polynomials pR , tR that correspond to the given relation R, rather than only knowledge of the relation R itself. But, again, it is hard to conceive of an alternative (i.e., how is R to be represented to us otherwise?). Indeed, (π, 1t ) can not be a noinstance of both decision problems, but this does not make the problems complementary. In fact, the two decision problems yield a threeway partition of the
4.3 Some Natural NPComplete Problems
103
The decision version of Bounded NonHalting refers to a fundamental computational problem in the area of program verification, specifically, to the problem of determining whether a given program halts within a given time bound on all inputs of a given length.7 We have mentioned Bounded Halting because it is often referred to in the literature, but we believe that Bounded NonHalting is much more relevant to the project of program verification (because one seeks programs that halt on all inputs (i.e., noinstances of Bounded NonHalting), rather than programs that halt on some input). Reflection. The fact that Bounded NonHalting is probably intractable (i.e., is intractable provided that P = N P) is even more relevant to the project of program verification than the fact that the Halting Problem is undecidable. The reason is that the latter problem (as well as other related undecidable problems) refers to arbitrarily long computations, whereas the former problem refers to an explicitly bounded number of computational steps. Specifically, Bounded NonHalting is concerned with the existence of an input that causes the program to violate a certain condition (i.e., halting) within a given time bound. In light of the foregoing discussion, the common practice of “bashing” Bounded (Non)Halting as an “unnatural” problem seems very odd at an age in which computer programs play such a central role. (Nevertheless, we will use the term “natural” in this traditionally and odd sense in the next title, which actually refers to natural computational problems that seem unrelated to computation.)
4.3 Some Natural NPComplete Problems Having established the mere existence of NPcomplete problems, we now turn to proving the existence of NPcomplete problems that do not (explicitly) refer to computation in the problem’s definition. We stress that thousands of such problems are known (and a list of several hundreds can be found in [11]).
7
instances (π, 1t ): (1) pairs (π, 1t ) such that for every input x (of length at most t) the computation of π (x) halts within t steps, (2) pairs (π, 1t ) for which such halting occurs on some inputs but not on all inputs, and (3) pairs (π, 1t ) such that there exists no input (of length at most t) on which π halts in t steps. Note that instances of type (1) are exactly the noinstances of Bounded NonHalting, whereas instances of type (3) are exactly the noinstances of Bounded Halting. The length parameter need not equal the time bound. Indeed, a more general version of the problem refers to two bounds, and t, and to whether the given program halts within t steps on each possible bit input. It is easy to prove that the problem remains NPcomplete also in the case that the instances are restricted to having parameters and t such that t = p(), for any fixed polynomial p (e.g., p(n) = n2 , rather than p(n) = n as used in the main text).
104
4 NPCompleteness
We will prove that deciding the satisfiability of Boolean formulae is NPcomplete (i.e., Cook’s Theorem), and also present some combinatorial problems that are NPcomplete. This presentation is aimed at providing a (small) sample of natural NPcompleteness results, as well as some tools toward proving NPcompleteness of new problems of interest. We start by making a comment regarding the latter issue. The reduction presented in the proof of Theorem 4.3 is called “generic” because it (explicitly) refers to any (generic) NPproblem. That is, we actually presented a scheme for the design of reductions from any desired NPproblem to the single problem proved to be NPcomplete. Indeed, in doing so, we have followed the definition of NPcompleteness. However, once we know some NPcomplete problems, a different route is open to us. We may establish the NPcompleteness of a new problem by reducing a known NPcomplete problem to the new problem. This alternative route is indeed a common practice, and it is based on the following simple proposition. Proposition 4.4: If an NPcomplete problem is reducible to some problem in NP, then is NPcomplete. Furthermore, reducibility via Karpreductions (resp., Levinreductions) is preserved. That is, if an N Pcomplete decision problem S is Karpreducible to a decision problem S ∈ N P, then S is N Pcomplete. Similarly, if a PCcomplete search problem R is Levinreducible to a search problem R ∈ PC, then R is PCcomplete. Proof: The proof boils down to asserting the transitivity of reductions. Specifically, the NPhardness of means that every problem in NP is reducible to , which in turn is reducible to (by the hypothesis). Thus, by transitivity of reduction (see Exercise 3.3), every problem in NP is reducible to , which means that is NPhard and the proposition follows.
4.3.1 Circuit and Formula Satisfiability: CSAT and SAT We consider two related computational problems, CSAT and SAT, which refer (in the decision version) to the satisfiability of Boolean circuits and formulae, respectively. (We refer the reader to the definition of Boolean circuits, formulae, and CNF formulae (see §1.4.1.1 and §1.4.3.1).) We suggest establishing the NPcompleteness of SAT by a reduction from the circuit satisfaction problem (CSAT), after establishing the NPcompleteness of the latter. Doing so allows the decoupling of two important parts of the proof of the NPcompleteness of SAT: the emulation of Turing machines by circuits and the emulation of circuits by formulae with auxiliary variables.
4.3 Some Natural NPComplete Problems
105
4.3.1.1 The NPCompleteness of CSAT Recall that (bounded fanin) Boolean circuits are directed acyclic graphs with internal vertices, called gates, labeled by Boolean operations (of arity either 2 or 1), and external vertices called terminals that are associated with either inputs or outputs. When setting the inputs of such a circuit, all internal nodes are assigned values in the natural way, and this yields a value to the output(s), called an evaluation of the circuit on the given input. The evaluation of circuit C on input z is denoted C(z). We focus on circuits with a single output, and let CSAT denote the set of satisfiable Boolean circuits; that is, a circuit C is in CSAT if there exists an input z such that C(z) = 1. We also consider the related relation RCSAT = {(C, z) : C(z) = 1}. Theorem 4.5 (NPcompleteness of CSAT): The set (resp., relation) CSAT (resp., RCSAT ) is N Pcomplete (resp., PCcomplete). Proof: It is easy to see that CSAT ∈ N P (resp., RCSAT ∈ PC). Thus, we turn to showing that these problems are NPhard. We will focus on the decision version (but also discuss the search version). We will present (again, but for the last time in this book) a generic reduction, where here we reduce any NPproblem to CSAT. The reduction is based on the observation, mentioned in Section 1.4.1 (see also Exercise 1.15), that the computation of polynomialtime algorithms can be emulated by polynomialsize circuits. We start with a description of the basic idea. In the current context, we wish to emulate the computation of a fixed machine M on input (x, y), where x is fixed and y varies (but y = poly(x) and the total number of steps of M(x, y) is polynomial in x + y). Thus, x will be “hardwired” into the circuit, whereas y will serve as the input to the circuit. The circuit itself, denoted Cx , will consists of “layers” such that each layer will represent an instantaneous configuration of the machine M, and the relation between consecutive configurations in a computation of this machine will be captured by (“uniform”) local gadgets in the circuit. The number of layers will depend on x as well as on the polynomial that upperbounds the running time of M, and an additional gadget will be used to detect whether the last configuration is accepting. Thus, only the first layer of the circuit Cx (which will represent an initial configuration with input prefixed by x) will depend on x. (See Figure 4.1.) The punch line is that determining whether, for a given x, there exists a y ∈ {0, 1}poly(x) such that M(x, y) = 1 (in a given number of steps) will be reduced to whether there exists a y such that Cx (y) = 1. Performing this reduction for any machine MR that corresponds to any R ∈ PC (as in the proof of Theorem 4.3), we establish the fact that CSAT is NPcomplete. Details follow.
106
4 NPCompleteness
y x
y

x

2nd configuration
2nd layer
3rd configuration
3rd layer
4th configuration
4th layer
last configuration
last layer
Figure 4.1. The schematic correspondence between the configurations in the computation of M(x, y) (on the left) and the evaluation of the circuit Cx on input y (on the right), where x is fixed and y varies. The value of x (as well as a sequence of blanks) is hardwired (marked gray) in the first layer of Cx , and directed edges connect consecutive layers.
Recall that we wish to reduce an arbitrary set S ∈ N P to CSAT. Let R, pR , MR , and tR be as in the proof of Theorem 4.3 (i.e., R is the witness relation of S, whereas pR bounds the length of the NPwitnesses, MR is the machine deciding membership in R, and tR is its polynomial time bound). Without loss of generality (and for simplicity), suppose that MR is a onetape Turing machine.8 We will construct a Karpreduction that maps an instance x (for S) def to a circuit, denoted f (x) = Cx , such that Cx (y) = 1 if and only if MR accepts the input (x, y) within tR (x + pR (x)) steps. Thus, it will follow that x ∈ S if and only if there exists y ∈ {0, 1}pR (x) such that Cx (y) = 1 (i.e., if and only if Cx ∈ CSAT). The circuit Cx will depend on x as well as on MR , pR , and tR . (We stress that MR , pR , and tR are fixed, whereas x varies and is thus explicit in our notation.) Before describing the circuit Cx , let us consider a possible computation of MR on input (x, y), where x is fixed and y represents a generic string of length pR (x). Such a computation proceeds for (at most) t = tR (x + pR (x)) steps, and corresponds to a sequence of (at most) t + 1 instantaneous configurations, each of length t. Each such configuration can be encoded by t pairs of symbols, where the first symbol in each pair indicates the contents of a cell and the second symbol indicates either a state of the machine or the fact that the machine is not located in this cell. Thus, each pair is a member of × (Q ∪ {⊥}), where is the finite “work alphabet” of MR , and Q is its finite set of internal states, which does not contain the special symbol ⊥ (which is used as indication that the machine is not present at a cell). The initial configuration consists of x, y 8
See Exercise 1.12.
4.3 Some Natural NPComplete Problems
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
107
initial configuration
last configuration
Figure 4.2. An array representing ten consecutive computation steps on input 110y1 y2 . Blank characters are marked by a hyphen (), whereas the indication that the machine is not present in the cell is marked by ⊥. The state of the machine in each configuration is represented in the cell in which it resides, where the set of states of this machine equals {a, b, c, d, e, f}. The three arrows represent the determination of an entry by the three entries that reside above it. The machine underlying this example accepts the input if and only if the input contains a zero.
as input, and is padded by blanks to a total length of t, whereas the decision of MR (x, y) can be read from (the leftmost cell of) the last configuration.9 We view these t + 1 possible configurations as rows in an array, where the i th row describes the instantaneous configuration of M(x, y) after i − 1 steps (and repeats the previous row in the case that the computation of M(x, y) halts before making i − 1 steps). For every i > 1, the values of the entries in the i th row are determined by the entries of the (i − 1)st row (which resides just above the i th row), where this determination reflects the transition function of MR . Furthermore, the value of each entry in the said row is determined by the values of (up to) three entries that reside in the row above it (see Exercise 4.5). Thus, the aforementioned computation is represented by a (t + 1) × t array, depicted in Figure 4.2, where each entry encodes one out of a constant 9
We refer to the output convention presented in Section 1.3.2, by which the output is written in the leftmost cells and the machine halts at the cell to its right.
108
4 NPCompleteness
number of possibilities, which in turn can be encoded by a constantlength bit string. The actual description of Cx . The circuit Cx has a structure that corresponds to the aforementioned array (see, indeed, Figure 4.1). Specifically, each row in the array is represented by a corresponding layer in the circuit Cx such that each entry in the array is represented by a constant number of gates in Cx . When Cx is evaluated at y, these gates will be assigned values that encode the contents of the corresponding entry in the array that describes the computation of MR (x, y). In particular, the entries of the first row of the array are “encoded” (in the first layer of Cx ) by hardwiring the reduction’s input (i.e., x) and feeding the circuit’s input (i.e., y) to the adequate input terminals. That is, the circuit has pR (x) (“real”) input terminals (corresponding to y), and the hardwiring of constants to the other O(t) − pR (x) gates (of the first layer) that represent the first row is done by simple gadgets (as in Figure 1.3). Indeed, the additional hardwiring in the first layer corresponds to the other fixed elements of the initial configuration (i.e., the blank symbols, and the encoding of the initial state and of the initial location; cf. Figure 4.2). The entries of subsequent rows will be “encoded” in corresponding layers of Cx (or rather computed at evaluation time). Specifically, the values that encode an entry in the array will be computed by using constantsize circuits that determine the value of an entry based on the three relevant entries that are encoded in the layer above it. Recall that each entry is encoded by a constant number of gates (in the corresponding layer), and thus these constantsize circuits merely compute the constantsize function described in Exercise 4.5. In addition, the circuit Cx has a few extra gates that check the values of the entries of the last row in order to determine whether or not it encodes an accepting configuration.10 Advanced comment. We note that although the foregoing construction of Cx capitalizes on various specific details of the (onetape) Turing machine model, it can be easily adapted to other natural models of efficient computation (by showing that in such models, the transformation from one configuration to the subsequent one can be emulated by a (polynomialtime constructible) circuit). Alternatively, we recall the CobhamEdmonds Thesis asserting that any problem that is solvable in polynomial time (on some “reasonable” model) can be solved in polynomial time by a (onetape) Turing machine. 10
In continuation of footnote 9, we note that it suffices to check the values of the two leftmost entries of the last row. We assumed here that the circuit propagates a halting configuration to the last row. Alternatively, we may check for the existence of an accepting/halting configuration in the entire array, since this condition is quite simple.
4.3 Some Natural NPComplete Problems
109
The complexity of the mapping of x to f (x) = Cx . Given x, the circuit Cx can
be constructed in polynomial time, by encoding x in an appropriate manner (in the first layer) and generating a “highly uniform” gridlike circuit of size s, where s = O(tR (x + pR (x))2 ). Specifically, the gates of the first layer are determined by x such that each gate is determined by at most a single bit of x, whereas the constantsize circuits connecting consecutive layers only depend on the transition function of MR (which is fixed in the context of reducing S to CSAT). Finally, note that the total number of gates is quadratically related to tR (x + pR (x)), which is a fixed polynomial in x (again, because pR and tR are fixed (polynomials) in the context of reducing S to CSAT). The validity of the mapping of x to f (x) = Cx . By its construction, the circuit Cx emulates tR (x + pR (x)) steps of computation of MR on input (x, ·). Thus, indeed, Cx (y) = 1 if and only if MR accepts the input (x, y) while making at most tR (x + pR (x)) steps. Recalling that S = {x : ∃y s.t. y = pR (x) ∧ (x, y) ∈ R} and that MR decides membership in R in time tR , we infer that x ∈ S if and only if f (x) = Cx ∈ CSAT. Furthermore, (x, y) ∈ R if and only if (f (x), y) ∈ RCSAT . def It follows that f is a Karpreduction of S to CSAT, and, for g(x, y) = y, it holds that (f, g) is a Levinreduction of R to RCSAT . The theorem follows.
4.3.1.2 The NPCompleteness of SAT Recall that Boolean formulae are special types of Boolean circuits (i.e., circuits having a tree structure).11 We further restrict our attention to formulae given in conjunctive normal form (CNF). We denote by SAT the set of satisfiable CNF formulae (i.e., a CNF formula φ is in SAT if there exists a truth assignment τ such that φ(τ ) = 1). We also consider the related relation RSAT = {(φ, τ ) : φ(τ ) = 1}. Theorem 4.6 (NPcompleteness of SAT): The set (resp., relation) SAT (resp., RSAT ) is N Pcomplete (resp., PCcomplete). Proof: Since the set of possible instances of SAT is a subset of the set of instances of CSAT, it is clear that SAT ∈ N P (resp., RSAT ∈ PC). To prove that SAT is NPhard, we reduce CSAT to SAT (and use Proposition 4.4). The reduction boils down to introducing auxiliary variables in order to “cut” the computation of an arbitrary (“deep”) circuit into a conjunction of related computations of “shallow” circuits (i.e., depth2 circuits) of unbounded fanin, which in turn may be presented as a CNF formula. The aforementioned 11
For an alternative definition, see Appendix A.2.
110
4 NPCompleteness
auxiliary variables hold the possible values of the internal gates of the original circuit, and the clauses of the CNF formula enforce the consistency of these values with the corresponding gate operation. For example, if gatei and gatej feed into gatek , which is a ∧gate, then the corresponding auxiliary variables gi , gj , gk should satisfy the Boolean condition gk ≡ (gi ∧ gj ), which can be written as a 3CNF formula with four clauses. Details follow. We start by Karpreducing CSAT to SAT. Given a Boolean circuit C, with n input terminals and m gates, we first construct m constantsize formulae on n + m variables, where the first n variables correspond to the input terminals of the circuit and the other m variables correspond to its gates. The i th formula will depend on the variable that correspond to the i th gate and the 1 or 2 variables that correspond to the vertices that feed into this gate (i.e., 2 vertices in case of ∧gate or ∨gate and a single vertex in case of a ¬gate, where these vertices may be either input terminals or other gates). This (constantsize) formula will be satisfied by a truth assignment if and only if this assignment matches the gate’s functionality (i.e., feeding this gate with the corresponding values results in the corresponding output value). Note that these constantsize formulae can be written as constantsize CNF formulae (in fact, as 3CNF formulae).12 Taking the conjunction of these m formulae and the variable associated with the (gate that feeds into the) output terminal, we obtain a formula φ in CNF. An example, where n = 3 and m = 4, is presented in Figure 4.3. To summarize, the reduction maps the circuit C to a CNF formula φ such that m φi (x1 , . . . , xn , g1 , . . . , gm ) ∧ gm (4.2) φ(x1 , . . . , xn , g1 , . . . , gm ) = i=1
where the Boolean variables x1 , . . . , xn represent the possible values of the input terminals of C, the Boolean variables g1 , . . . , gn represent possible values of the corresponding gates of C, and φi is a constantsize CNF formula that depends only on 2 or 3 of the aforementioned variables (as explained in the previous paragraphs). Note that φ can be constructed in polynomial time from the circuit C; that is, the mapping of C to φ = f (C) is polynomialtime computable. We claim that C is in CSAT if and only if φ is in SAT. The two directions of this claim are proved next. 12
Recall that any Boolean function can be written as a CNF formula having size that is exponential in the length of its input (cf. Exercise 1.17), which in this case is a constant (i.e., either 2 or 3). Indeed, note that the Boolean functions that we refer to here depend on 2 or 3 Boolean variables (since they indicate whether or not the corresponding values respect the gate’s functionality).
4.3 Some Natural NPComplete Problems
1
3
2
1
3
2
3
g1 gate1
g1
g2
g3
g2
and
gate2
and
or
and
or
eq
gate3
111
eq
g3
g4
g4
neg
eq
eq
and
and
gate4
neg
Figure 4.3. Using auxiliary variables (i.e., the gi ’s) to “cut” a depth5 circuit (into a CNF). The dashed regions will be replaced by equivalent CNF formulae. The (small) dashed circle, representing an unbounded fanin andgate, is the conjunction of all constantsize circuits (which enforce the functionalities of the original gates) and the variable that represents the (gate that feeds the) output terminal in the original circuit.
1. Suppose that for some string s it holds that C(s) = 1. Then, assigning to the i th auxiliary variable (i.e., gi ) the value that is assigned to the i th gate of C when evaluated on s, we obtain (together with s) a truth assignment that satisfies φ. This is the case because such an assignment satisfies all m constantsize CNF formulae (i.e., all φi ’s), as well as the variable gm associated with the output of C. 2. On the other hand, if the truth assignment τ satisfies φ, then the first n bit values in τ correspond to an input on which C evaluates to 1. This is the case because the m constantsize CNF formulae (i.e., the φi ’s) guarantee that the variables of φ are assigned values that correspond to the evaluation of C on the first n bits of τ , while the fact that gm has value true guarantees that this evaluation of C yields the value 1. (Recall that gm must have value true in any assignment that satisfies φ, whereas the value of gm represents the value of the output of C on the foregoing input.) Thus, we have established that f is a Karpreduction of CSAT to SAT. Note that the mapping (of the truth assignment τ to its nbit prefix) used in Item 2 is the second mapping required by the definition of a Levinreduction. Thus, augmenting f with the aforementioned second mapping yields a Levinreduction of RCSAT to RSAT . Digest and Perspective. The fact that the second mapping required by the definition of a Levinreduction is explicit in the proof of the validity of the corresponding Karpreduction is a fairly common phenomenon. Actually (see
112
4 NPCompleteness
Exercise 4.20), typical presentations of Karpreductions provide two auxiliary polynomialtime computable mappings (in addition to the main mapping of instances from one problem (e.g., CSAT) to instances of another problem (e.g., SAT)): The first auxiliary mapping is of solutions for the preimage instance (e.g., of CSAT) to solutions for the image instance of the reduction (e.g., of SAT), whereas the second mapping goes the other way around. For example, the proof of the validity of the Karpreduction of CSAT to SAT, denoted f , specified two additional mappings h and g such that (C, s) ∈ RCSAT implies (f (C), h(C, s)) ∈ RSAT and (f (C), τ ) ∈ RSAT implies (C, g(C, τ )) ∈ RCSAT . Specifically, in the proof of Theorem 4.6, we used h(C, s) = (s, a1 , . . . , am ) where ai is the value assigned to the i th gate in the evaluation of C(s), and g(C, τ ) being the nbit prefix of τ . (Note that only the main mapping (i.e., f ) and the second auxiliary mapping (i.e., g) are required in the definition of a Levinreduction.) 3SAT. Observe that the formulae resulting from the Karpreduction presented in the proof of Theorem 4.6 are actually 3CNF formulae; that is, each such formula is in conjunctive normal form (CNF) and each of its clauses contains at most three literals. Thus, the foregoing reduction actually establishes the NPcompleteness of 3SAT (i.e., SAT restricted to CNF formula with up to three literals per clause). Alternatively, one may Karpreduce SAT (i.e., satisfiability of CNF formula) to 3SAT (i.e., satisfiability of 3CNF formula) by replacing long clauses with conjunctions of threevariable clauses (using auxiliary variables; see Exercise 4.6). Either way, we get the following result, where the “furthermore” part is proved by an additional reduction. Proposition 4.7: 3SAT is NPcomplete. Furthermore, the problem remains NPcomplete also if we restrict the instances such that each variable appears in at most three clauses. Proof: The “furthermore part” is proved by a reduction from 3SAT. We just replace each occurrence of a Boolean variable by a new copy of this variable, and add clauses to enforce that all these copies are assigned the same value. Specifically, if variable z occurs t times in the original 3CNF formula φ, then we introduce t new variables (i.e., its “copies”), denoted z(1) , . . . , z(t) , and replace the i th occurrence of z in φ by z(i) . In addition, we add the clauses z(i+1) ∨ ¬z(i) for i = 1 . . . , t (where t + 1 is understood as 1). Thus, each variable appears at most three times in the new formula. Note that the clause z(i+1) ∨ ¬z(i) is logically equivalent to z(i) ⇒ z(i+1) , and thus the conjunction of
4.3 Some Natural NPComplete Problems
113
the aforementioned t clauses is logically equivalent to z(1) ⇔ z(2) ⇔ · · · ⇔ z(t) . The validity of the reduction follows. Related Problems. Note that instances of SAT can be viewed as systems of Boolean conditions over Boolean variables. Such systems can be emulated by various types of systems of arithmetic conditions, implying the NPhardness of solving the latter types of systems. Examples include systems of integer linear inequalities (see Exercise 4.8) and systems of quadratic equalities (see Exercise 4.10). In contrast to the foregoing, we mention that SAT restricted to CNF formula with up to two literals per clause is solvable in polynomial time (see Exercise 4.7). Thus, whereas deciding the satisfiability of 3CNF formulae (i.e., 3SAT) is N Pcomplete, the corresponding problem for 2CNF formulae, denoted 2SAT, is in P. The same phenomena arise also with respect to other natural problems (e.g., 3colorability versus 2colorability), but we suggest not attributing too much significance to this fact.
4.3.2 Combinatorics and Graph Theory The purpose of this section is to expose the reader to a sample of NPcompleteness results and proof techniques (i.e., the design of reductions among computational problems). We present just a few of the many appealing combinatorial problems that are known to be NPcomplete. As in §4.3.1.2, the NPcompleteness of new problems is proved by showing that their instances can encode instances of problems that are already known to be NPcomplete (e.g., SATinstances can encode CSATinstances). Typically, these encodings operate in a local manner, mapping small components of the original instance to local gadgets in the produced instance. Indeed, these problemspecific gadgets are the core of the encoding scheme. Throughout this section, we focus on the decision versions of the various problems and adopt a more informal style. Specifically, we will present a typical decision problem as a problem of deciding whether a given instance, which belongs to a set of relevant instances, is a “yesinstance” or a “noinstance” (rather than referring to deciding membership of arbitrary strings in a set of yesinstances). For further discussion of this style and its rigorous formulation, see Section 5.1. We will also omit showing that these decision problems are in NP; indeed, for natural problems in NP, showing membership in NP is typically straightforward.
114
4 NPCompleteness
Set Cover. We start with the Set Cover problem, in which an instance consists of a collection of finite sets S1 , . . . , Sm and an integer K and the question (for decision) is whether or not there exist (at most)13 K sets that cover m i=1 Si K m (i.e., indices i1 , . . . , iK such that j =1 Sij = i=1 Si ). Proposition 4.8: Set Cover is NPcomplete. Proof: We present a Karpreduction of SAT to Set Cover. For a CNF formula φ with m clauses and n variables, we consider the sets S1,t , S1,f , .., Sn,t , Sn,f ⊆ {1, . . . , m} such that Si,t (resp., Si,f ) is the set of the indices of the clauses (of φ) that are satisfied by setting the i th variable to true (resp., false). That is, if the i th variable appears unnegated in the j th clause then j ∈ Si,t , whereas if the i th variable appears negated in the j th clause then j ∈ Si,f . Indeed, Si,t ∪ Si,f equals the set of clauses containing an occurrence def
of the i th variable, and the union of all these 2n sets equals [m] = {1, . . . , m}. In order to force any cover to contain either Si,t or Si,f , we augment the universe with n additional elements and add the i th such element to both Si,t and Si,f . Thus, the reduction proceeds as follows. 1. On input a CNF formula φ (with n variables and m clauses), the reduction computes the sets S1,t , S1,f , .., Sn,t , Sn,f such that Si,t (resp., Si,f ) is the set of the indices of the clauses in which the i th variable appears unnegated (resp., negated). def 2. The reduction outputs the instance f (φ) = ((S1 , .., S2n ), n), where for i = 1, . . . , n it holds that S2i−1 = Si,t ∪ {m + i} and S2i = Si,f ∪ {m + i}. Note that f (φ) is a yesinstance of Set Cover if and only if the collection (S1 , .., S2n ) contains a subcollection of n sets that covers [m + n]. Observing that f is computable in polynomial time, we complete the proof by showing that f is a valid Karpreduction of SAT to Set Cover. Assume, on the one hand, that φ is satisfied by τ1 · · · τn . Then, for every j ∈ [m] there exists an i ∈ [n] such that setting the i th variable to τi satisfies the j th clause, and so j ∈ S2i−τi . It follows that the collection {S2i−τi : i = 1, . . . , n} covers {1, . . . , m + n}, because {S2i−τi ∩ [m] : i = 1, . . . , n} covers {1, . . . , m} while {S2i−τi \ [m] : i = 1, . . . , n} covers {m + 1, . . . , m + n}. Thus, φ ∈ SAT implies that f (φ) is a yesinstance of Set Cover. On the other hand, for every i ∈ [n], each cover of {m + 1, . . . , m + n} ⊂ {1, . . . , m + n} must include either S2i−1 or S2i , because these are the only sets that cover the element m + i. Thus, a cover of {1, . . . , m + n} using n of the Sj ’s 13
Clearly, in the case of Set Cover, the two formulations (i.e., asking for exactly K sets or at most K sets) are computationally equivalent; see Exercise 4.13.
4.3 Some Natural NPComplete Problems
115
must contain, for every i, either S2i−1 or S2i but not both. Setting τi accordingly (i.e., τi = 1 if and only if S2i−1 is in the cover) implies that {S2i−τi : i = 1, . . . , n} (or rather {S2i−τi ∩ [m] : i = 1, . . . , n}) covers {1, . . . , m}. It follows that τ1 · · · τn satisfies φ, because for every j ∈ [m] there exists an i ∈ [n] such that j ∈ S2i−τi (which implies that setting the i th variable to τi satisfies the j th clause). Thus, if f (φ) is a yesinstance of Set Cover (i.e., there is a cover of [m + n] that uses n of the Sj ’s), then φ ∈ SAT. Exact Cover and 3XC. The Exact Cover problem is similar to the Set Cover problem, except that here the sets that are used in the cover are not allowed to intersect. That is, each element in the universe should be covered by exactly one set in the cover. Restricting the set of instances to sequences of 3sets (i.e., sets of size three), we get the restricted problem called 3Exact Cover (3XC), in which it is unnecessary to specify the number of sets to be used in the exact cover (since this number must equal the size of the universe divided by three). The problem 3XC is rather technical, but it is quite useful for demonstrating the NPcompleteness of other problems (by reducing 3XC to them); see, for example, Exercises 4.17 and 4.18. Proposition 4.9: 3Exact Cover is NPcomplete. Indeed, it follows that the Exact Cover (in which sets of arbitrary size are allowed) is NPcomplete. This follows both for the case that the number of sets in the desired cover is unspecified and for the various cases in which this number is upperbounded and/or lowerbounded in terms of an integer that is part of the instance (as in Set Cover). Proof: The reduction is obtained by composing four reductions, which involve three intermediate computational problems. The first of these problems is a restricted case of 3SAT, denoted r3SAT, in which each literal appears in at most two clauses. Note that, by Proposition 4.7, 3SAT is NPcomplete even when the instances are restricted such that each variable appears in at most three clauses. Actually, the reduction presented in the proof of Proposition 4.7 can be slightly modified in order to reduce 3SAT to r3SAT (see Exercise 4.11).14 The second intermediate problem that we consider is a restricted version of Set Cover, denoted 3SC, in which each set has at most three elements. (Indeed, as in the general case of Set Cover, an instance consists of a sequence of finite sets as well as an integer K, and the question is whether there exists a 14
Alternatively, a closer look at the reduction presented in the proof of Proposition 4.7 reveals that it always produces instances of r3SAT. This alternative presupposes that copies are created also when the original variable appears three times in the original formula.
116
4 NPCompleteness
cover with at most K sets.) We reduce r3SAT to 3SC by using the (very same) reduction presented in the proof of Proposition 4.8, while observing that the size of each set in the reduced instance is at most three (i.e., one more than the number of occurrences of the corresponding literal in clauses of the original formula). Next, we reduce 3SC to the following restricted version of Exact Cover, denoted 3XC , in which each set has at most three elements. An instance of 3XC consists of a sequence of finite sets as well as an integer K, and the question is whether there exists an exact cover with at most K sets. The reduction maps an instance ((S1 , . . . , Sm ), K) of 3SC to the instance (C , K) such that C is a collection of all subsets of each of the sets S1 , . . . , Sm . Since each Si has size at most three, we introduce at most seven nonempty subsets per each such set, and the reduction can be computed in polynomial time. The reader may easily verify the validity of this reduction (see Exercise 4.12). Finally, we reduce 3XC to 3XC. Consider an instance ((S1 , . . . , Sm ), K) of 3XC , and suppose that m i=1 Si = [n]. If n > 3K then this is definitely a noinstance, which can be mapped to a dummy noinstance of 3XC, and so we def assume that x = 3K − n ≥ 0. Intuitively, x represents the “excess” covering ability of a hypothetical exact cover that consists of K sets, each having three elements. Thus, we augment the set system with x new elements, denoted n + 1, . . . , 3K, and replace each Si such that Si  < 3 by a subcollection of 3sets such that each 3set contains Si as well as an adequate number of elements from {n + 1, . . . , 3K}, such that the subcollection associated with Si contains a set for each possible (3 − Si )set of {n + 1, . . . , 3K}. That is, in case Si  = 2, the set Si is replaced by the subcollection (Si ∪ {n + 1}, . . . , Si ∪ {3K}), whereas a singleton Si is replaced by the sets Si ∪ {j1 , j2 } for every j1 < j2 in {n + 1, . . . , 3K}. In addition, we add all possible 3subsets of {n + 1, . . . , 3K}. This completes the description of the last reduction, the validity of which is left as an exercise (see Exercise 4.12). Let us conclude. We have introduced the intermediate problems r3SAT, 3SC, and 3XC , and presented a sequence of Karpreductions leading from 3SAT to 3XC via these intermediate problems. Specifically, we reduced 3SAT to r3SAT, then reduced r3SAT to 3SC, next reduced 3SC to 3XC , and finally reduced 3XC to 3XC. Composing these four reductions, we obtain a Karpreduction of 3SAT to 3XC, and the proposition follows. Vertex Cover, Independent Set, and Clique. Turning to graph theoretic problems (see Appendix A.1), we start with the Vertex Cover problem, which is a special case of the Set Cover problem. The instances consist of pairs (G, K), where G = (V , E) is a simple graph and K is an integer, and the
4.3 Some Natural NPComplete Problems
117
problem is whether or not there exists a set of (at most) K vertices that is incident to all graph edges (i.e., each edge in G has at least one end point in this set). Indeed, this instance of Vertex Cover can be viewed as an instance of Set Cover by considering the collection of sets (Sv )v∈V , where Sv denotes the set of edges incident at vertex v (i.e., Sv = {e ∈ E : v ∈ e}). Thus, the NPhardness of Set Cover follows from the NPhardness of Vertex Cover (but this implication is unhelpful for us here, since we already know that Set Cover is NPhard and we wish to prove that Vertex Cover is NPhard). We also note that the Vertex Cover problem is computationally equivalent to the Independent Set and Clique problems (see Exercise 4.14), and thus it suffices to establish the NPhardness of one of these problems. Proposition 4.10: The problems Vertex Cover, Independent Set and Clique are NPcomplete. Proof: We show a reduction from 3SAT to Independent Set.15 On input a 3CNF formula φ with m clauses and n variables, we construct a graph with 7m vertices, denoted Gφ , as follows: r The vertices are grouped in m equalsize sets, each corresponding to one of the clauses, and edges are placed among all vertices that belong to each of these 7sets (thus obtaining m disjoint 7vertex cliques). The 7set corresponding to a specific clause contains seven vertices that correspond to the seven truth assignments (to the three variables in the clause) that satisfy the clause. That is, the vertices in the graph correspond to partial assignments such that the seven vertices that belong to the i th 7set correspond to the seven partial assignments that instantiate the variables in the i th clause in a way that satisfies this clause. For example, if the i th clause equals xj1 ∨ xj2 ∨ ¬xj3 , then the i th 7set consists of vertices that correspond to the seven Boolean functions τ that are defined on {j1 , j2 , j3 } ⊂ [n] and satisfy τ (j1 ) ∨ τ (j2 ) ∨ ¬τ (j3 ). r In addition to the edges that are internal to these m 7sets (which form 7vertex cliques), we add an edge between each pair of vertices that corresponds to partial assignments that are mutually inconsistent. That is, if a specific (satisfying) assignment to the variables of the i th clause is inconsistent with some (satisfying) assignment to the variables of the j th clause, then we connect the corresponding vertices by an edge. In particular, no 15
Advanced comment: The following reduction is not the “standard” one (see Exercise 4.15), but is rather adapted from the FGLSSreduction (see [10]). This is done in anticipation of the use of the FGLSSreduction in the context of the study of the complexity of approximation (cf., e.g., [15] or [13, Sec. 10.1.1]).
118
4 NPCompleteness
edges are placed between 7sets that represent clauses that share no common variable. (In contrast, the edges that are internal to the m 7sets may be viewed as a special case of the edges connecting mutually inconsistent partial assignments.) To summarize, on input φ, the reduction outputs the pair (Gφ , m), where Gφ is the aforementioned graph and m is the number of clauses in φ. We stress that each 7set of the graph Gφ contains only vertices that correspond to partial assignments that satisfy the corresponding clause; that is, the single partial assignment that does not satisfy this clause is not represented as a vertex in Gφ . Recall that the edges placed among vertices represent partial assignments that are mutually inconsistent. Thus, each truth assignment τ to the entire formula φ yields an independent set in Gφ , which contains all the vertices that correspond to partial assignments that are consistent with τ and satisfy the corresponding clauses. Indeed, the size of this independent set equals the number of clauses that are satisfied by the assignment τ . These observations underlie the validity of the reduction, which is argued next. Suppose, on the one hand, that φ is satisfiable by the truth assignment τ . Consider the partial assignments, to the m clauses, that are derived from τ . We claim that these partial assignments correspond to an independent set of size m in Gφ . The claim holds because these m partial assignments satisfy the corresponding m clauses (since τ satisfies φ) and are mutually consistent (because they are all derived from τ ). It follows that the these m partial assignments correspond to m vertices (residing in different 7sets), and there are no edges between these vertices. Thus, φ ∈ SAT implies that Gφ has an independent set of size m. On the other hand, any independent set of size m in Gφ must contain exactly one vertex in each of the m 7sets, because no independent set may contain two vertices that reside in the same 7set. Furthermore, each independent set in Gφ induces a (possibly partial) truth assignment to φ, because the partial assignments “selected” in the various 7sets must be consistent (or else an edge would have existed among the corresponding vertices). Recalling that an independent set that contains a vertex from a specific 7set induces a partial truth assignment that satisfies the corresponding clause, it follows that an independent set that contains a vertex of each 7set induces a truth assignment that satisfies φ. Thus, if Gφ has an independent set of size m then φ ∈ SAT. Graph 3Colorability (G3C). In this problem, the instances are graphs and the question is whether or not the graph’s vertices can be colored using three colors such that neighboring vertices are not assigned the same color.
4.3 Some Natural NPComplete Problems
2
x
3
y
119
T1
1
M
T2 T3
Figure 4.4. The clause gadget and its subgadget. The lefthand side depicts the subgadget and a generic legal 3coloring of it. Note that if x = y, in this 3coloring, then x = y = 1. The clause gadget is shown on the righthand side. For any legal 3coloring of this gadget it holds that if the three terminals of the gadget are assigned the same color, χ , then M is also assigned the color χ.
Proposition 4.11: Graph 3Colorability is NPcomplete. Proof: We reduce 3SAT to G3C by mapping a 3CNF formula φ to the graph Gφ that consists of two special (“designated”) vertices, a gadget per each variable of φ, a gadget per each clause of φ, and edges connecting some of these components as follows: r The two designated vertices are called ground and false, and are connected by an edge that ensures that they must be given different colors in any legal 3coloring of Gφ . We will refer to the color assigned to the vertex ground (resp., false) by the name ground (resp., false). The third color will be called true. r The gadget associated with variable x is a pair of vertices, associated with the two literals of x (i.e., x and ¬x). These vertices are connected by an edge, and each of them is also connected to the vertex ground. Thus, in any legal 3coloring of Gφ one of the vertices associated with the variable is colored true and the other is colored false. r The gadget associated with a clause C is depicted in Figure 4.4. It contains a master vertex, denoted M, and three terminal vertices, denoted T1, T2, and T3. The master vertex is connected by edges to the vertices ground and false, and thus in any legal 3coloring of Gφ the master vertex must be colored true. The gadget has the property that it is possible to color the terminals with any combination of the colors true and false, except for coloring all terminals with false. That is, in any legal 3coloring of Gφ , if no terminal of a clause gadget is colored ground, then at least one of these terminals is colored true. The terminals of the gadget associated with clause C will be identified with the vertices (of variable gadgets) that are associated with the corresponding
120
4 NPCompleteness
GROUND
FALSE
the two designated verices
variable gadgets
clause gadgets
Figure 4.5. A single clause gadget and the relevant variables gadgets.
literals appearing in C. This means that each clause gadget shares its terminals with the corresponding variable gadgets, and that the various clause gadgets are not vertexdisjoint but may rather share some terminals (i.e., those associated with literals that appear in several clauses).16 See Figure 4.5. The aforementioned association forces each terminal to be colored either true or false (in any legal 3coloring of Gφ ). By the foregoing discussion it follows that in any legal 3coloring of Gφ , at least one terminal of each clause gadget must be colored true. Verifying the validity of the reduction is left as an exercise (see Exercise 4.16). Digest. The reductions presented in the current section are depicted in Figure 4.6, where bold arrows indicate reductions presented explicitly in the proofs of the various propositions (indicated by their index). Note that r3SAT and 3SC are only mentioned inside the proof of Proposition 4.9.
4.3.3 Additional Properties of the Standard Reductions We mention that the standard reductions used to establish natural NPcompleteness results have several additional properties or can be modified 16
Alternatively, we may use disjoint gadgets and “connect” each terminal with the corresponding literal (in the corresponding vertex gadget). Such a connection (i.e., an auxiliary gadget) should force the two end points to have the same color in any legal 3coloring of the graph.
4.3 Some Natural NPComplete Problems
(4.9)
r3SAT
121
(4.9)
3SC
3XC
(4.9) 4.6
CSAT
4.7
SAT
4.8
3SAT
SC 4.10
VC IS
4.11
Clique G3C Figure 4.6. The (nongeneric) reductions presented in Section 4.3.
to have such properties. These properties include an efficient transformation of solutions in the direction of the reduction (see Exercise 4.20), the preservation of the number of solutions (see Exercise 4.21), and being invertible in polynomial time (see Exercise 4.22 as well as Exercise 4.23). Furthermore, these reductions are relatively “simple” in the sense that they can be computed by restricted classes of polynomialtime algorithms (e.g., algorithms of logarithmic space complexity). The foregoing assertions are easiest to verify for the generic reductions presented in the proofs of Theorems 4.3 and 4.5. These reductions satisfy all additional properties (without any modification). Turning to the nongeneric reductions (depicted in Figure 4.6), we note that they all satisfy all additional properties with the exception of the preservation of the number of solutions (see Exercise 4.21). However, in each of the cases that our reduction does not satisfy the latter property, an alternative reduction that does satisfy it is known. We also mention the fact that all known NPcomplete sets are (effectively) isomorphic in the sense that every two such sets are isomorphic via a polynomialtime computable and invertible mapping (see Exercise 4.24).
4.3.4 On the Negative Application of NPCompleteness Since its discovery in the early 1970s, NPcompleteness has been used as the main tool by which the intrinsic complexity of certain problems is demonstrated. Recall that if an NPcomplete problem is in P, then all problems in NP are in P (i.e., P = N P). Hence, demonstrating the NPcompleteness of a problem yields very strong evidence for its intractability. We mention that NPcompleteness means more than intractability in the strict computational sense (i.e., that no efficient algorithm may solve the
122
4 NPCompleteness
problem). It also means that the problem at hand (or the underlying question) has a very rich structure and that the underlying question has no simple answer. To see why this is the case, consider a question that refers to objects of a certain type (e.g., territorial maps) and a property that some of these objects have (e.g., being 3colorable). The question at hand may call for a simple characterization of the objects that satisfy the property, but if the corresponding decision problem is NPcomplete,17 then no such characterization is likely to exist. We stress that the quest for a “simple” characterization could have had nothing to do with computation, but “simple” characterizations yield efficient decision procedures and so NPcompleteness is relevant. Furthermore, the NPcompleteness of a problem means that the objects underlying the desired characterization are complex enough to encode all NPproblems. Indeed, diverse scientific disciplines, which were unsuccessfully struggling with some of their internal questions, came to realize that these questions are inherently difficult since they are closely related to computational problems that are NPcomplete. Lastly, let us note that demonstrating the NPcompleteness of a problem is not the end of the story. Since the problem originates in reality, it does not go away once we realize that it is (probably) hard to solve. However, the problem we consider is never identical to the problem we need to solve in reality; the former is just a model (or abstraction) of the latter. Thus, the fact that our abstraction turns out to yield an NPcomplete problem calls for a refinement of our modeling. A careful reconsideration may lead us to realize that we only care about a subset of all possible instances or that we may relax the requirements from the desired solutions. Such relaxations lead to notions of averagecase complexity and approximation, which are indeed the subject of considerable study. The interested reader is referred to [13, Chap. 10].
4.3.5 Positive Applications of NPCompleteness Throughout this chapter, we have referred to the negative implication of NPcompleteness, that is, the fact that it provides evidence to the intractability of problems. Indeed, the definition of NPcomplete problems was motivated by the intention to use it as a vehicle for proving the hardness of natural computational problems (which reside in NP). Furthermore, we really do not expect to use NPcompleteness for the straightforward positive applications of reductions that were discussed in Section 3.4. So what can the current section title actually mean? 17
This is indeed the case with respect to determinig whether a given territorial map is 3colorable.
4.3 Some Natural NPComplete Problems
123
The answer is that we may use NPcomplete problems as a vehicle to demonstrate properties of all problems in NP. For example, in Section 2.5, we proved that N P ⊆ EX P by referring to an exhaustive search among all possible NPwitnesses (for a given instance, with respect to any problem in N P). An alternative proof can first establish that SAT ∈ EX P and then use the fact that membership in EX P is preserved under Cookreductions. The benefit in this approach is that it is more natural to consider an exhaustive search for SAT. However, this positive application is in line with the applications discussed in Section 3.4, although EX P is not considered a class of efficient problems. Nevertheless, positive applications that are farther from the applications discussed in Section 3.4 have played an important role in the study of “probabilistic proof systems” (to be surveyed shortly). In three important cases, fundamental results regarding (all decision problems in) N P were derived by first establishing the result for SAT (or G3C), and then invoking the N Pcompleteness of SAT (resp., G3C) in order to derive the same result for each problem in N P. The benefit in this methodology is that the simple and natural structure of SAT (resp., G3C) facilitates the establishing of the said result for it. Following is a brief description of three types of probabilistic proof systems and the role of NPcompleteness in establishing three fundamental results regarding them. The reader is warned that the rest of the current section is advanced material, and furthermore that following this text requires some familiarity with the notion of randomized algorithms. On the other hand, the interested reader is referred to [13, Chap. 9] for further details. A General Introduction to Probabilistic Proof Systems. The glory attributed to the creativity involved in finding proofs causes us to forget that it is the lessglorified process of verification that gives proofs their value. Conceptually speaking, proofs are secondary to the verification procedure; indeed, proof systems are defined in terms of their verification procedures. The notion of a verification procedure presupposes the notion of computation and, furthermore, the notion of efficient computation. Associating efficient computation with polynomialtime procedures, we obtain a fundamental class of proof systems, called NPproof systems; see, indeed, Definition 2.5. We stress that that NPproofs provide a satisfactory formulation of (efficiently verifiable) proof systems, provided that one associates efficient procedures with deterministic polynomialtime procedures. However, we can gain a lot if we are willing to take a somewhat nontraditional step and allow probabilistic verification procedures. We shall consider three types of probabilistic proof systems. As in the case of NPproof systems, in each of the following types of proof systems,
124
4 NPCompleteness
explicit bounds are imposed on the computational complexity of the verification procedure, which in turn is personified by the notion of a verifier. The real novelty, in the case of probabilistic proof systems, is that the verifier is allowed to toss coins and rule by statistical evidence. Thus, these probabilistic proof systems carry a probability of error; yet this probability is explicitly bounded and, furthermore, can be reduced by successive application of the proof system.
Interactive Proof Systems. As we shall see, randomized and interactive verification procedures, giving rise to interactive proof systems, seem much more powerful (i.e., “expressive”) than their deterministic counterparts. Loosely speaking, an interactive proof system is a game between a computationally bounded verifier and a computationally unbounded prover whose goal is to convince the verifier of the validity of some assertion. Specifically, the verifier is probabilistic and its time complexity is polynomial in the length of the assertion. It is required that if the assertion holds, then the verifier must always accept (when interacting with an appropriate prover strategy). On the other hand, if the assertion is false, then the verifier must reject with probability at least 12 , no matter what strategy is employed by the prover. Thus, a “proof” in this context is not a fixed and static object, but rather a randomized (and dynamic) process in which the verifier interacts with the prover. Intuitively, one may think of this interaction as consisting of “tricky” questions asked by the verifier, to which the prover has to reply “convincingly.” A fundamental result regarding interactive proof systems is their existence def def for any set in coN P = {S : S ∈ N P}, where S = {0, 1}∗ \S. This result should be contrasted with the common belief that some sets in coN P do not have NPproof systems (i.e., N P = coN P; cf. Section 5.3). Interestingly, the fact that any set in coN P has an interactive proof system is established by presenting such a proof system for SAT (and deriving a proof system for any S ∈ coN P by using the Karpreduction of S to SAT, which is the very Karpreduction of S to SAT).18 The construction of an interactive proof system for SAT relies on an “arithmetization” of CNF formulae, and hence we clearly benefit from the fact that this specific and natural problem (i.e., SAT) is NPcomplete. Zeroknowledge Proof Systems. Interactive proof systems provide the stage for a meaningful introduction of zeroknowledge proofs, which are of great 18
Advanced comment: Actually, the result can be extended to show that a decision problem has an interactive proof system if and only if it is in PSPACE, where PSPACE denotes the class of problems that are solvable in polynomial space complexity. We mention that this extension also relies on the use of a natural complete problem, which is also amenable to arithmetization.
4.3 Some Natural NPComplete Problems
125
theoretical and practical interest (especially in cryptography). Loosely speaking, zeroknowledge proofs are interactive proofs that yield nothing (to the verifier) beyond the fact that the assertion is indeed valid. For example, a zeroknowledge proof that a certain Boolean formula is satisfiable does not reveal a satisfying assignment to the formula nor any partial information regarding such an assignment (e.g., whether the first variable can assume the value true). Whatever the verifier can efficiently compute after interacting with a zeroknowledge prover can be efficiently computed from the assertion itself (without interacting with anyone). Thus, zeroknowledge proofs exhibit an extreme contrast between being convinced of the validity of a statement and learning anything in addition (while receiving such a convincing proof). A fundamental result regarding zeroknowledge proof systems is their existence, under reasonable complexity assumptions, for any set in N P. Interestingly, this result is established by presenting such a proof system for Graph 3Colorability (i.e., G3C), and by deriving a proof system for any S ∈ N P by using the Karpreduction of S to SAT. The construction of a zeroknowledge proof system for G3C is facilitated by the simple structure of the problem, specifically, the fact that verifying the (global) claim that a specific 3partition is a valid 3coloring amounts to verifying a polynomial number of local constraints (i.e., that the colors assigned to the end points of each edge are different). Probabilistically Checkable Proof Systems. NPproofs can be efficiently transformed into a (redundant) form that offers a tradeoff between the number of locations examined in the NPproof and the confidence in its validity. These redundant proofs are called probabilistically checkable proofs (abbreviated PCPs), and have played a key role in the study of approximation problems. Loosely speaking, a PCPsystem consists of a probabilistic polynomialtime verifier having access to an oracle that represents a proof in redundant form. Typically, the verifier accesses only few of the oracle bits, where these bit positions are determined by the outcome of the verifier’s coin tosses. Again, it is required that if the assertion holds, then the verifier must always accept (when given access to an adequate oracle), whereas, if the assertion is false, then the verifier must reject with probability at least 12 , no matter which oracle is used. A fundamental result regarding PCPsystems is that any set in N P has a PCPsystem in which the verifier issues only a constant number of (binary!) queries. Again, the fact that any set in N P has such a PCPsystem is established by presenting such a proof system for SAT (and deriving a similar proof system for any S ∈ N P by using the Karpreduction of S to SAT). The construction
126
4 NPCompleteness
for SAT relies, again, on an arithmetization of CNF formulae, where this arithmetization is different from the one used in the construction of interactive proof systems for SAT.
4.4 NP Sets That Are Neither in P nor NPComplete As stated in Section 4.3, thousands of problems have been shown to be NPcomplete (cf. [11, Apdx.], which contains a list of more than three hundred main entries). Things have reached a situation in which people seem to expect any NPset to be either NPcomplete or in P. This naive view is wrong: Assuming N P = P, there exist sets in N P that are neither NPcomplete nor in P, where here NPhardness also allows Cookreductions. Theorem 4.12: Assuming N P = P, there exists a set T in N P \ P such that some sets in N P are not Cookreducible to T . Theorem 4.12 asserts that if N P = P, then N P is partitioned into three nonempty classes: the class P, the class of problems to which N P is Cookreducible, and the rest, denoted N PI (where “I” stands for “intermediate”). We already know that the first two classes are not empty, and Theorem 4.12 establishes the nonemptiness of N PI under the condition that N P = P, which is actually a necessary condition (because if N P = P then every set in N P is Cookreducible to any other set in N P). The following proof of Theorem 4.12 presents an unnatural decision problem in N PI. We mention that some natural decision problems (e.g., some that are computationally equivalent to factoring) are conjectured to be in N PI. We also mention that if N P = coN P, where coN P = {{0, 1}∗ \ S : S ∈ N P}, def then = N P ∩ coN P ⊆ P ∪ N PI holds (as a corollary to Theorem 5.7). Thus, if N P = coN P then \ P is a (natural) subset of N PI, and the nonemptiness of N PI follows provided that = P. Recall that Theorem 4.12 establishes the nonemptiness of N PI under the seemingly weaker assumption that N P = P. Proof Sketch:19 The basic idea is to modify an arbitrary set in N P \ P so as to fail all possible reductions (from N P to the modified set), as well as all possible polynomialtime decision procedures (for the modified set). Specifically, starting with S ∈ N P \ P, we derive S ⊂ S such that on the one hand there is no polynomialtime reduction of S to S while on the other hand S ∈ N P \ P. 19
For an alternative presestation, see [1, sec 3.3].
4.4 NP Sets That Are Neither in P nor NPComplete
127
The process of modifying S into S proceeds in iterations, alternatively failing a potential reduction (by dropping sufficiently many strings from the rest of S) and failing a potential decision procedure (by including sufficiently many strings from the rest of S). Specifically, each potential reduction of S to S can be failed by dropping finitely many elements from the current S , whereas each potential decision procedure can be failed by keeping finitely many elements of the current S . These two assertions are based on the following two corresponding facts: 1. Any polynomialtime reduction (of any set not in P) to any finite set (e.g., a finite subset of S) must fail, because only sets in P are Cookreducible to a finite set. Thus, for any finite set F1 and any potential reduction (i.e., a polynomialtime oracle machine), there exists an input x on which this reduction to F1 fails.20 2. For every finite set F2 , any polynomialtime decision procedure for S \ F2 must fail, because S is Cookreducible to S \ F2 . Thus, for any potential decision procedure (i.e., a polynomialtime algorithm), there exists an input x on which this procedure fails.21 As stated, the process of modifying S into S proceeds in iterations, alternatively failing a potential reduction (by dropping finitely many strings from the rest of S) and failing a potential decision procedure (by including finitely many strings from the rest of S). This can be done efficiently because it is inessential to determine the first possible points of alternation (in which sufficiently many strings were dropped (resp., included) to fail the next potential reduction (resp., decision procedure)). It suffices to guarantee that adequate points of alternation (albeit highly nonoptimal ones) can be efficiently determined. Thus, S is the intersection of S and some set in P, which implies that S ∈ N P. Following are some comments regarding the implementation of the foregoing idea. The first issue is that the foregoing plan calls for an (“effective”) enumeration of all polynomialtime oracle machines (resp., polynomialtime algorithms). However, none of these sets can be enumerated (by an algorithm). Instead, we 20
21
We mention that the proof relies on additional observations regarding this failure. Specifically, the aforementioned reduction fails while the only queries that are answered positively are those residing in F1 . Furthermore, the aforementioned failure is due to a finite set of queries (i.e., the set of all queries made by the reduction when invoked on an input that is smaller or equal to x). Thus, for every finite set F1 ⊂ S ⊆ S, any reduction of S to S can be failed by dropping a finite number of elements from S and without dropping elements of F1 . Again, the proof relies on additional observations regarding this failure. Specifically, this failure is due to a finite “prefix” of S \ F2 (i.e., the set {z ∈ S \ F2 : z ≤ x}). Thus, for every finite set F2 , any polynomialtime decision procedure for S \ F2 can be failed by keeping a finite subset of S \ F2 .
128
4 NPCompleteness
enumerate all corresponding machines along with all possible polynomials, and for each pair (M, p) we consider executions of machine M with time bound specified by the polynomial p. That is, we use the machine Mp obtained from the pair (M, p) by suspending the execution of M on input x after p(x) steps. We stress that we do not know whether machine M runs in polynomial time, but the computations of any polynomialtime machine is “covered” by some pair (M, p). Next, let us clarify the process in which reductions and decision procedures are ruled out. We present a construction of a “filter” set F in P such that the final set S will equal S ∩ F . Recall that we need to select F such that each polynomialtime reduction of S to S ∩ F fails, and each polynomialtime procedure for deciding S ∩ F fails. The key observation is that for every finite F each polynomialtime reduction of S to (S ∩ F ) ∩ F fails, whereas for every finite F each polynomialtime procedure for deciding (S ∩ F ) \ F fails. Furthermore, each of these failures occurs on some input, and such an input can be determined by finite portions of S and F . Thus, we alternate between failing possible reductions and decision procedures on some inputs, while not trying to determine the “optimal” points of alternation but, rather, determining points of alternation in an efficient manner (which in turn allows for efficiently deciding membership in F ). Specifically, we let F = {x : f (x) ≡ 1 mod 2}, where f : N → {0} ∪ N will be defined such that (i) each of the first f (n) − 1 machines is failed by some input of length at most n, and (ii) the value f (n) can be computed in poly(n)time. The value of f (n) is defined by the following process that performs exactly n3 computation steps (where cubic time is a rather arbitrary choice). The process proceeds in (an a priori unknown number of) iterations, where in the i + 1st iteration we try to find an input on which the i + 1st (modified) machine fails. Specifically, in the i + 1st iteration we scan all inputs, in lexicographic order, until we find an input on which the i + 1st (modified) machine fails, where this machine is an oracle machine if i + 1 is odd and a standard machine otherwise. If we detect a failure of the i + 1st machine, then we increment i and proceed to the next iteration. When we reach the allowed number of steps (i.e., n3 steps), we halt outputting the current value of i (i.e., the current i is output as the value of f (n)). Needless to say, this description is heavily based on determining whether or not the i + 1st machine fails on specific inputs. Intuitively, these inputs will be much shorter than n, and so performing these decisions in time n3 (or so) is not out of the question – see next paragraph. In order to determine whether or not a failure (of the i + 1st machine) occurs on a particular input x, we need to emulate the computation of this machine on input x, as well as determine whether x is in the relevant set (which is
4.4 NP Sets That Are Neither in P nor NPComplete
129
either S or S = S ∩ F ). Recall that if i + 1 is even, then we need to fail a standard machine (which attempts to decide S ), and otherwise we need to fail an oracle machine (which attempts to reduce S to S ). Thus, for even i + 1 we need to determine whether x is in S = S ∩ F , whereas for odd i + 1 we need to determine whether x is in S as well as whether some other strings (which appear as queries) are in S . Deciding membership in S ∈ N P can be done in exponential time (by using the exhaustive search algorithm that tries all possible NPwitnesses). Indeed, this means that when computing f (n) we may only complete the treatment of inputs that are of logarithmic (in n) length, but anyhow in n3 steps we cannot hope to reach (in our lexicographic scanning) strings of length greater than 3 log2 n. As for deciding membership in F , this requires an ability to compute f on adequate integers. That is, we may need to compute the value of f (n ) for various integers n , but as noted, n will be much smaller than n (since n ≤ poly(x) ≤ poly(log n)). Thus, the value of f (n ) is just computed recursively (while counting the recursive steps in our total number of steps).22 The point is that when considering an input x, we may need the values of f only on {1, . . . , pi+1 (x)}, where pi+1 is the polynomial bounding the running time of the i + 1st (modified) machine, and obtaining such a value takes at most pi+1 (x)3 steps. We conclude that the number of steps performed toward determining whether or not a failure (of the i + 1st machine) occurs on the input x is upperbounded by an (exponential) function of x. As hinted in the foregoing paragraph, the procedure will complete n3 steps well before examining inputs of length greater than 3 log2 n, but this does not matter. What matters is that f is unbounded (see Exercise 4.25). Furthermore, by construction, f (n) is computed in poly(n)time. Comment. The proof of Theorem 4.12 actually establishes that for every decidable set S ∈ P, there exists S ∈ P such that S is Karpreducible to S but S is not Cookreducible to S .23 Thus, if P = N P then there exists an infinite sequence of sets S1 , S2 , . . . in N P \ P such that Si+1 is Karpreducible to Si but Si is not Cookreducible to Si+1 . Furthermore, S1 may be N Pcomplete. That is, there exists an infinite sequence of problems (albeit unnatural ones), all in N P, such that each problem is “easier” than the previous ones (in the sense that it can be reduced to any of the previous problems while none of these problems can be reduced to it). 22 23
We do not bother to present a more efficient implementation of this process. That is, we may afford to recompute f (n ) every time we need it (rather than store it for later use). The said Karpreduction (of S to S) maps x to itself if x ∈ F and otherwise maps x to a fixed noinstance of S.
130
4 NPCompleteness
4.5 Reflections on Complete Problems This book will perhaps only be understood by those who have themselves already thought the thoughts which are expressed in it – or similar thoughts. It is therefore not a textbook. Its object would be attained if it afforded pleasure to one who read it with understanding. Ludwig Wittgenstein, Tractatus LogicoPhilosophicus Indeed, this section should be viewed as an invitation to meditate together on questions of the type: What enables the existence of complete problems? Accordingly, the style is intentionally naive and imprecise; this entire section may be viewed as an openended exercise, asking the interested reader to consider substantiations of the vague text.24 We know that NPcomplete problems exist. The question we ask here is what aspects in our modeling of problems enable the existence of complete problems. We should, of course, bear in mind that completeness refers to a class of problems; the complete problem should “encode” each problem in the class and be itself in the class. Since the first aspect, hereafter referred to as encodability of a class, is amazing enough (at least to a layman), we start by asking what enables it. We identify two fundamental paradigms, regarding the modeling of problems, that seem essential to the encodability of any (infinite) class of problems: 1. Each problem refers to an infinite set of possible instances. 2. The specification of each problem uses a finite description (e.g., an algorithm that enumerates all the possible solutions for any given instance).25 These two paradigms seem somewhat conflicting, yet put together they suggest the definition of a universal problem. Specifically, this problem refers to instances of the form (D, x), where D is a description of a problem and x is an instance to that problem, and a solution to the instance (D, x) is a solution to x with respect to the problem (described by) D. Intuitively, this universal problem can encode any other problem (provided that problems are modeled in a way that conforms with the foregoing paradigms): Solving the universal problem allows for solving any other problem.26 24 25
26
We warn that this exercise may be unsuitable for most undergraduate students. This seems the most naive notion of a description of a problem. An alternative notion of a description refers to an algorithm that recognizes all valid instancesolution pairs (as in the definition of NP). However, at this point, we also allow “noneffective” descriptions (as giving rise to the Halting Problem). Recall, however, that the universal problem is not (algorithmically) solvable. Thus, both parts of the implication are false (i.e., this problem is not solvable and, needless to say, there exist
4.5 Reflections on Complete Problems
131
Note that the foregoing universal problem is actually complete with respect to the class of all problems, but it is not complete with respect to any class that contains only (algorithmically) solvable problems (because this universal problem is not solvable). Turning our attention to classes of solvable problems, we seek versions of the universal problem that are complete for these classes. One archetypical difficulty that arises is that, given a description D (as part of the instance to the universal problem), we cannot tell whether or not D is a description of a problem in a predetermined class C (because this decision problem is unsolvable).27 This fact is relevant because if the universal problem requires solving instances that refer to a problem not in C, then intuitively it cannot be itself in C. Before turning to the resolution of the foregoing difficulty, we note that the aforementioned modeling paradigms are pivotal to the theory of computation at large. In particular, so far we have made no reference to any complexity consideration. Indeed, a complexity consideration is the key to resolving the foregoing difficulty: The idea is modifying any description D into a description D such that D is always in C, and D agrees with D in the case that D is in C (i.e., in this case they described exactly the same problem). We stress that in the case that D is not in C, the corresponding problem D may be arbitrary (as long as it is in C). Such a modification is possible with respect to many Complexity theoretic classes. We consider two different types of classes, where in both cases the class is defined in terms of the time complexity of algorithms that do something related to the problem (e.g., recognize valid solutions, as in the definition of NP). 1. Classes defined by a single timebound function t (e.g., t(n) = n3 ). In this case, any algorithm D is modified to the algorithm D that, on input x, emulates (up to) t(x) steps of the execution of D(x). The modified version of the universal problem treats the instance (D, x) as (D , x). This version can encode any problem in the said class C (corresponding to time complexity t). But will this (version of the universal) problem be itself in C? The answer depends both on the efficiency of emulation in the corresponding computational model and on the growth rate of t. For example, for tripleexponential t, the answer will be definitely yes, because t(x) steps
27
unsolvable problems). Indeed, the notion of a problem is rather vague at this stage; it certainly extends beyond the set of all solvable problems. Here we ignore the possibility of using promise problems, which do enable for avoiding such instances without requiring anybody to recognize them. Indeed, using promise problems resolves this difficulty, but the issues discussed following the next paragraph remain valid.
132
4 NPCompleteness
can be emulated in poly(t(x))time (in any reasonable model) while t((D, x)) > t(x + 1) > poly(t(x)). On the other hand, in most reasonable models, the emulation of t(x) steps requires more than O(t(x)) time, whereas for any polynomial t it holds that t(n + O(1)) is smaller than 2 · t(n). 2. Classes defined by a family of infinitely many functions of different growth rate (e.g., polynomials). We can, of course, select a function t that grows faster than any function in the family and proceed as in the prior case, but then the resulting universal problem will definitely not be in the class. Note that in the current case, a complete problem will indeed be striking because, in particular, it will be associated with one function t0 that grows more moderately than some other functions in the family (e.g., a fixed polynomial grows more moderately than other polynomials). Seemingly this means that the algorithm describing the universal machine should be faster in terms of the actual number of steps than some algorithms that describe some other problems in the class. This impression presumes that the instances of both problems are (approximately) of the same length, and so we intensionally violate this presumption by artificially increasing the length of the description of the instances to the universal problem. For example, if D is associated with the time bound tD , then the instance (D, x) −1 2 to the universal problem is presented as, say, (D, x, 1t0 (tD (x) ) ), where the square compensates for the overhead of the emulation (and in the case of NP we used t0 (n) = n). We believe that the last item explains the existence of NPcomplete problems. But what about the NPcompleteness of SAT? We first note that the NPhardness of CSAT is an immediate consequence of the fact that Boolean circuits can emulate algorithms.28 This fundamental fact is rooted in the notion of an algorithm (which postulates the simplicity of a single computational step) and holds for any reasonable model of computation. Thus, for every D and x, the problem of finding a string y such that D(x, y) = 1 is “encoded” as finding a string y such that CD,x (y) = 1, where CD,x is a Boolean circuit that is easily derived from (D, x). In contrast to the fundamental fact underlying the NPhardness of CSAT, the NPhardness of SAT relies on a clever trick that allows for encoding instances of CSAT as instances of SAT. As stated, the NPcompleteness of SAT is proved by encoding instances of CSAT as instances of SAT. Similarly, the NPcompleteness of other new problems is proved by encoding instances of problems that are already known to 28
The fact that CSAT is in NP is a consequence of the fact that the circuit evaluation problem is solvable in polynomial time.
Exercises
133
be NPcomplete. Typically, these encodings operate in a local manner, mapping small components of the original instance to local gadgets in the produced instance. Indeed, these problemspecific gadgets are the core of the encoding phenomenon. Presented with such a gadget, it is typically easy to verify that it works. Thus, one may not be surprised by most of these individual gadgets, but the fact that they exist for thousands of natural problems is definitely amazing.
Exercises Exercise 4.1 (a quiz) 1. What are NPcomplete (search and decision) problems? 2. Is it likely that the problem of finding a perfect matching in a given graph is NPcomplete? 3. Prove the existence of NPcomplete problems. 4. How does the complexity of solving one NPcomplete problem effect the complexity of solving any problem in N P (resp., PC)? 5. In continuation of the previous question, assuming that some NPcomplete problem can be solved in time t, upperbound the complexity of solving any problem in N P (resp., PC). 6. List five NPcomplete problems. 7. Why does the fact that SAT is Karpreducible to Set Cover imply that Set Cover is NPcomplete? 8. Are there problems in N P \ P that are not NPcomplete? Exercise 4.2 (PCcompleteness implies N Pcompleteness) Show that if the search problem R is PCcomplete, then SR is N Pcomplete, where SR = {x : ∃y s.t. (x, y) ∈ R}. Exercise 4.3 Prove that any R ∈ PC is Levinreducible to Ru , where Ru consists of pairs ( M, x, t, y) such that M accepts the input pair (x, y) within t steps (and y ≤ t). Recall that Ru ∈ PC (see [13, §4.2.1.2]). Guideline: A minor modification of the reduction used in the proof of Theorem 4.3 will do. Exercise 4.4 Prove that Bounded Halting and Bounded NonHalting are NPcomplete, where the problems are defined as follows. The instance consists of a pair (M, 1t ), where M is a Turing machine and t is an integer. The decision version of Bounded Halting (resp., Bounded NonHalting) consists of determining whether or not there exists an input (of length at
134
4 NPCompleteness
most t) on which M halts (resp., does not halt) in t steps, whereas the search problem consists of finding such an input. Guideline: Either modify the proof of Theorem 4.3 or present a reduction of (say) the search problem of Ru to the search problem of Bounded (Non) Halting. (Indeed, the exercise is more straightforward in the case of Bounded Halting.) Exercise 4.5 In the proof of Theorem 4.5, we claimed that the value of each entry in the “array of configurations” of a machine M is determined by the values of the three entries that reside in the row above it (as in Figure 4.2). Present a function fM : 3 → , where = × (Q ∪ {⊥}), that substantiates this claim. Guideline: For example, for every σ1 , σ2 , σ3 ∈ , it holds that fM ((σ1 , ⊥), (σ2 , ⊥), (σ3 , ⊥)) = (σ2 , ⊥). More interestingly, if the transition function of M maps (σ, q) to (τ, p, +1) then, for every σ1 , σ2 , σ3 ∈ Q, it holds that fM ((σ, q), (σ2 , ⊥), (σ3 , ⊥)) = (σ2 , p) and fM ((σ1 , ⊥), (σ, q), (σ3 , ⊥)) = (τ, ⊥). Exercise 4.6 Present and analyze a reduction of SAT to 3SAT. Guideline: For a clause C, consider auxiliary variables such that the i th variable indicates whether one of the first i literals is satisfied, and replace C by a 3CNF formula that uses the original variables of C as well as the auxiliary variables. For example, the clause ∨ti=1 xi is replaced by the conjunction of 3CNF formulae that are logically equivalent to the formulae (y2 ≡ (x1 ∨ x2 )), (yi ≡ (yi−1 ∨ xi )) for i = 3, . . . , t, and yt . We comment that this is not the standard reduction, but we find it conceptually more appealing. (The standard reduction replaces the clause ∨ti=1 xi by the conjunction of the 3CNF formula (x1 ∨ x2 ∨ y2 ), ((¬yi−1 ) ∨ xi ∨ yi ) for i = 3, . . . , t, and ¬yt .) Exercise 4.7 (efficient solvability of 2SAT) In contrast to the NPcompleteness of 3SAT, prove that 2SAT (i.e., the satisfiability of 2CNF formulae) is in P. Guideline: Consider the following forcing process for CNF formulae. If the formula contains a singleton clause (i.e., a clause having a single literal), then the corresponding variable is assigned the only value that satisfies the clause, and the formula is simplified accordingly (possibly yielding a constant formula, which is either true or false). The process is repeated until the formula is either a constant or contains only nonsingleton clauses. Note that a formula φ is satisfiable if and only if the formula obtained from φ by the forcing process
Exercises
135
is satisfiable. Now, consider the following algorithm for solving the search problem associated with 2SAT. 1. Choose an arbitrary variable in φ. For each σ ∈ {0, 1}, denote by φσ the formula obtained from φ by assigning this variable the value σ and applying the forcing process to the resulting formula. Note that φσ is either a Boolean constant or a 2CNF formula (which is a conjunction of some clauses of φ). 2. If, for some σ ∈ {0, 1}, the formula φσ equals the constant true, then we halt with a satisfying assignment for the original formula. 3. If both assignments yield the constant false (i.e., for every σ ∈ {0, 1} the formula φσ equals false), then we halt asserting that the original formula is unsatisfiable. 4. Otherwise (i.e., for each σ ∈ {0, 1}, the formula φσ is a (nonconstant) 2CNF formula), we select σ ∈ {0, 1} arbitrarily, set φ ← φσ , and go to Step 1. Proving the correctness of this algorithm boils down to observing that the arbitrary choice made in Step 4 is immaterial. Indeed, this observation relies on the fact that we refer to 2CNF formulae, which implies that the forcing process either yields a constant or a 2CNF formula (which is a conjunction of some clauses of the original φ). Exercise 4.8 (Integer Linear Programming) Prove that the following problem is NPhard.29 An instance of the problem is a system of linear inequalities (say, with integer constants), and the problem is to determine whether the system has an integer solution. A typical instance of this decision problem follows. x + 2y − z ≥ 3 −3x − z ≥ −5 x≥0 −x ≥ −1 Guideline: Reduce from SAT. Specifically, consider an arithmetization of the input CNF by replacing ∨ with addition and ¬x by 1 − x. Thus, each clause gives rise to an inequality (e.g., the clause x ∨ ¬y is replaced by the inequality 29
Proving that the problem is in NP requires showing that if a system of linear inequalities has an integer solution, then it has an integer solution in which all numbers are of length that is polynomial in the length of the description of the system. Such a proof is beyond the scope of the current textbook.
136
4 NPCompleteness
x + (1 − y) ≥ 1, which simplifies to x − y ≥ 0). Enforce a 01 solution by introducing inequalities of the form x ≥ 0 and −x ≥ −1, for every variable x. Exercise 4.9 (Maximum Satisfiability of Linear Systems over GF(2)) Prove that the following problem is NPcomplete. An instance of the problem consists of a system of linear equations over GF(2) and an integer k, and the problem is to determine whether there exists an assignment that satisfies at least k equations. (Note that the problem of determining whether there exists an assignment that satisfies all of the equations is in P.) Guideline: Reduce from 3SAT, using the following arithmetization. Replace each clause that contains t ≤ 3 literals by 2t − 1 linear GF(2) equations that correspond to the different nonempty subsets of these literals, and assert that their sum (modulo 2) equals one; for example, the clause x ∨ ¬y is replaced by the equations x + (1 − y) = 1, x = 1, and 1 − y = 1. Identifying {false, true} with {0, 1}, prove that if the original clause is satisfied by a Boolean assignment v then exactly 2t−1 of the corresponding equations are satisfied by v, whereas if the original clause is unsatisfied by v then none of the corresponding equations is satisfied by v. Exercise 4.10 (Satisfiability of Quadratic Systems over GF(2)) Prove that the following problem is NPcomplete. An instance of the problem consists of a system of quadratic equations over GF(2), and the problem is to determine whether there exists an assignment that satisfies all the equations. Note that the result also holds for systems of quadratic equations over the reals (by adding conditions that force values in {0, 1}). Guideline: Start by showing that the corresponding problem for cubic equations is NPcomplete, by a reduction from 3SAT that maps the clause x ∨ ¬y ∨ z to the equation (1 − x) · y · (1 − z) = 0. Reduce the problem for cubic equations to the problem for quadratic equations by introducing auxiliary variables; that is, given an instance with variables x1 , . . . , xn , introduce the auxiliary variables xi,j ’s and add equations of the form xi,j = xi · xj . Exercise 4.11 (restricted versions of 3SAT) Prove that the following restricted version of 3SAT, denoted r3SAT, is NPcomplete. An instance of the problem consists of a 3CNF formula such that each literal appears in at most two clauses, and the problem is to determine whether this formula is satisfiable. Guideline: Recall that Proposition 4.7 establishes the NPcompleteness of a version of 3SAT in which the instances are restricted such that each variable appears in at most three clauses. So it suffices to reduce this restricted problem to r3SAT. This reduction is based on the fact that if all (three) occurrences of
Exercises
137
a variable are of the same type (i.e., they are all negated or all nonnegated), then this variable can be assigned a value that satisfies all clauses in which it appears (and so the variable and the clauses in which it appear can be omitted from the instance). Thus, the desired reduction consists of applying the foregoing simplification to all relevant variables. Alternatively, a closer look at the reduction used in the proof of Proposition 4.7 reveals the fact that this reduction maps any 3CNF formula to a 3CNF formula in which each literal appears in at most two clauses. Exercise 4.12 Verify the validity of the three main reductions presented in the proof of Proposition 4.9; that is, we refer to the reduction of r3SAT to 3SC, the reduction of 3SC to 3XC , and the reduction of 3XC to 3XC. Exercise 4.13 Show that the following two variants of Set Cover are computationally equivalent. In both variants, an instance consists of a collection of finite sets S1 , . . . , Sm and an integer K. In the first variant we seek a vertex cover of size at most K, whereas in the second variant we seek a vertex cover of size exactly K. Consider both the decision and search versions of both variants, and note that K ≤ m may not hold. Exercise 4.14 (Clique and Independent Set) An instance of the Independent Set problem consists of a pair (G, K), where G is a graph and K is an integer, and the question is whether or not the graph G contains an independent set (i.e., a set with no edges between its members) of size (at least) K. The Clique problem is analogous. Prove that both problems are computationally equivalent via Karpreductions to the Vertex Cover problem. Exercise 4.15 (an alternative proof of Proposition 4.10) Consider the following sketch of a reduction of 3SAT to Independent Set. On input a 3CNF formula φ with m clauses and n variables, we construct a graph Gφ consisting of m triangles (corresponding to the (three literals in the) m clauses) augmented with edges that link conflicting literals. That is, if x appears as the i1th literal of the j1th clause and ¬x appears as the i2th literal of the j2th clause, then we draw an edge between the i1th vertex of the j1th triangle and the i2th vertex of the j2th triangle. Prove that φ ∈ 3SAT if and only if Gφ has an independent set of size m. Exercise 4.16 Verify the validity of the reduction presented in the proof of Proposition 4.11. Exercise 4.17 (Subset Sum) Prove that the following problem is NPcomplete. The instance consists of a list of n + 1 integers, denoted a1 , . . . , an , b, and the question is whether or not a subset of the ai ’s sums up to b (i.e., exists I ⊆ [n]
138
4 NPCompleteness
such that i∈I ai = b). Establish the NPcompleteness of this problem, called Subset Sum, by a reduction from 3XC. Guideline: Given an instance (S1 , . . . , Sm ) of 3XC, where (without loss of generality) S1 , . . . , Sm ⊆ [3k], consider the following instance of Subset Sum j that consists of a list of m + 1 integers such that b = 3k j =1 (m + 1) and j ai = j ∈Si (m + 1) for every i ∈ [m]. (Some intuition may be gained by writing all integers in base m + 1.) Exercise 4.18 Prove that the following problem is NPcomplete. The instance consists of a list of permutations over [n], denoted π1 , . . . , πm , a target permutation π (over [n]), and an integer t presented in unary (i.e., 1t ). The question is whether or not there exists a sequence, i1 , . . . , i ∈ [m], such that ≤ t and π = πi ◦ · · · ◦ πi2 ◦ πi1 , where ◦ denotes the composition of permutations. Establish the NPcompleteness of this problem by a reduction from 3XC. Guideline: Given an instance (S1 , . . . , Sm ) of 3XC, where (without loss of generality) S1 , . . . , Sm ⊆ [3k], consider the following instance ((π1 , . . . , πm ), π, 1k ) of the permutation problem (over [6k]). The target permutation π is the involution (over [6k]) that satisfies π(2i) = 2i − 1 for every i ∈ [3k]. For j = 1, . . . , m, the j th permutation in the list (i.e., πj ) is the involution that satisfies π(2i) = 2i − 1 if i ∈ Sj and π(2i) = 2i (as well as π (2i − 1) = 2i − 1) otherwise. Exercise 4.19 The fact that SAT and CSAT are NPcomplete implies that Graph 3Colorability and Clique can be reduced to SAT and CSAT (via a generic reduction). In this exercise, however, we ask for simple and direct reductions. 1. Present a simple reduction of Graph 3Colorability to 3SAT. Guideline: Introduce three Boolean variables for each vertex such that xi,j indicates whether vertex i is colored with the j th color. Construct clauses that enforce that each vertex is colored by a single color, and that no adjacent vertices are colored with the same color. 2. Present a simple reduction of Clique to CSAT. Guideline: Introduce a Boolean input for each vertex such that this input indicates whether the vertex is in the clique. The circuit should check that all pairs of inputs that are set to 1 correspond to pairs of vertices that are adjacent in the graph, and check that the number of variables that are set to 1 exceeds the given threshold. This calls for constructing a circuit that counts. Constructing a corresponding Boolean formula is left as an advanced exercise.
Exercises
139
Exercise 4.20 (an augmented form of Levinreductions) In continuation of the discussion in the main text, consider the following augmented form of Levinreductions. Such a reduction of R to R consists of three polynomialtime mappings (f, h, g) such that (f is a Karpreduction of SR to SR and)30 the following two conditions hold: 1. For every (x, y) ∈ R it holds that (f (x), h(x, y)) ∈ R . 2. For every (f (x), y ) ∈ R it holds that (x, g(x, y )) ∈ R. (We note that this definition is actually the one used by Levin in [21], except that he restricted h and g to depend only on their second argument.) Prove that such a reduction implies both a Karpreduction and a LevinReduction, and show that all reductions presented in this chapter satisfy this augmented requirement. Exercise 4.21 (parsimonious reductions) Let R, R ∈ PC and let f be a Karpreduction of SR = {x : R(x) = ∅} to SR = {x : R (x) = ∅}. We say that f is parsimonious (with respect to R and R ) if for every x it holds that R(x) = R (f (x)). For each of the reductions presented in this chapter, determine whether or not it is parsimonious.31 Exercise 4.22 (polynomialtime invertible reductions) Show that, under a suitable (but natural) encoding of the problems’ instances, all Karpreductions presented in this chapter are onetoone and polynomialtime invertible; that is, show that for every such reduction f there exists a polynomialtime algorithm that, on any input in the image of f , returns the unique preimage under f . Note that, without loss of generality, when given a string that is not in the image of f , the inverting algorithm returns a special symbol. Exercise 4.23 (on polynomialtime invertible reductions (following [2])) In continuation of Exercise 4.22, we consider a general condition on sets that implies that any Karpreduction to them can be modified into a onetoone and polynomialtime invertible Karpreduction. Loosely speaking, a set is markable if it is feasible to “mark” any instance x by a label α such that the resulting instance M(x, α) preserves the “membership bit” of x (wrt the set) and the label is easily recoverable from M(x, α). That is, we say that a set S is
30 31
The parenthetical condition is actually redundant, because it is implied by the following two conditions. Advanced comment: In most cases, when the standard reductions are not parsimonious, it is possible to find alternative reductions that are parsimonious (cf. [11, Sec. 7.3]). In some cases (e.g., for 3Colorability), finding such alternatives is quite challenging.
140
4 NPCompleteness
markable if there exists a polynomialtime (marking) algorithm M such that
1. For every x, α ∈ {0, 1}∗ it holds that (a) M(x, α) ∈ S if and only if x ∈ S. (b) M(x, α) > x. 2. There exists a polynomialtime (demarking) algorithm D such that, for every x, α ∈ {0, 1}∗ , it holds that D(M(x, α)) = α. Note that all natural NPsets (e.g., those considered in this chapter) are markable (e.g., for SAT, one may mark a formula by augmenting it with additional satisfiable clauses that use specially designated auxiliary variables). Prove that if S is Karpreducible to S and S is markable, then S is Karpreducible to S by a lengthincreasing, onetoone, and polynomialtime invertible mapping. Infer that for any natural NPcomplete problem S, any set in N P is Karpreducible to S by a lengthincreasing, onetoone, and polynomialtime invertible mapping. Guideline: Let f be a Karpreduction of S to S, and let M be the guaranteed marking algorithm. Consider the reduction that maps x to M(f (x), x). Exercise 4.24 (on the isomorphism of NPcomplete sets (following [2])) Suppose that S and T are Karpreducible to each other by lengthincreasing, onetoone, and polynomialtime invertible mappings, denoted f and g, respectively. Using the following guidelines, prove that S and T are “effectively” isomorphic; that is, present a polynomialtime computable and indef vertible onetoone mapping φ such that T = φ(S) = {φ(x) : x ∈ S}. def
def
1. Let F = {f (x) : x ∈ {0, 1}∗ } and G = {g(x) : x ∈ {0, 1}∗ }. Using the lengthincreasing condition of f (resp., g), prove that F (resp., G) is a proper subset of {0, 1}∗ . Prove that for every y ∈ {0, 1}∗ there exists a unique triple (j, x, i) ∈ {1, 2} × {0, 1}∗ × ({0} ∪ N) that satisfies one of the following two conditions: def (a) j = 1, x ∈ G = {0, 1}∗ \ G, and y = (g ◦ f )i (x); def (b) j = 2, x ∈ F = {0, 1}∗ \ F , and y = (g ◦ f )i (g(x)). def def def (In both cases h0 (z) = z, hi (z) = h(hi−1 (z)), and (g ◦ f )(z) = g(f (z)). Hint: consider the maximal sequence of inverse operations g −1 , f −1 , g −1 , . . . that can be applied to y, and note that each inverse shrinks the current string.) def
def
2. Let U1 = {(g ◦ f )i (x) : x ∈ G ∧ i ≥ 0} and U2 = {(g ◦ f )i (g(x)) : x ∈ F ∧ i ≥ 0}. Prove that (U1 , U2 ) is a partition of {0, 1}∗ . Using the fact that f and g are lengthincreasing and polynomialtime invertible, present a polynomialtime procedure for deciding membership in the set U1 .
Exercises
141
Prove the same for the sets V1 = {(f ◦ g)i (x) : x ∈ F ∧ i ≥ 0} and V2 = {(f ◦ g)i (f (x)) : x ∈ G ∧ i ≥ 0}. def
def
3. Note that U2 ⊆ G, and define φ(x) = f (x) if x ∈ U1 and φ(x) = g −1 (x) otherwise. (a) Prove that φ is a Karpreduction of S to T . (b) Note that φ maps U1 to f (U1 ) = {f (x) : x ∈ U1 } = V2 and U2 to g −1 (U2 ) = {g −1 (x) : x ∈ U2 } = V1 . Prove that φ is onetoone and onto. Observe that φ −1 (x) = f −1 (x) if x ∈ f (U1 ) and φ −1 (x) = g(x) otherwise. Prove that φ −1 is a Karpreduction of T to S. Infer that φ(S) = T . Using Exercise 4.23, infer that all natural NPcomplete sets are isomorphic. Exercise 4.25 Referring to the proof of Theorem 4.12, prove that the function f is unbounded (i.e., for every i there exists an n such that n3 steps of the process defined in the proof allow for failing the i + 1st machine). Guideline: Note that f is monotonically nondecreasing (because more steps allow for failure of at least as many machines). Assume, toward the contradiction that f is bounded. Let i = maxn∈N {f (n)} and n be the smallest integer such that f (n ) = i. If i is odd then the set F determined by f is cofinite (because F = {x : f (x) ≡ 1 (mod 2)} ⊇ {x : x ≥ n }). In this case, the i + 1st machine tries to decide S ∩ F (which differs from S on finitely many strings), and must fail on some x. Derive a contradiction by showing that the number of steps taken till reaching and considering this x is at most exp(poly(x)), which is smaller than n3 for some sufficiently large n. A similar argument applies to the case that i is even, where we use the fact that F ⊆ {x : x < n } is finite and so the relevant reduction of S to S ∩ F must fail on some input x. Exercise 4.26 (universal verification procedures) A natural notion, which arises from viewing NPcomplete problems as “encoding” all problems in NP, is the notion of a “universal” NPproof system. We say that an NPproof system is universal if verification in any other NPproof system can be reduced to verification in it. Specifically, following Definition 2.5, let V and V be verification procedures for S ∈ N P and S ∈ N P, respectively. We say that verification with respect to V is reduced to verification with respect to V if there exists two polynomialtime computable functions f, h : {0, 1}∗ → {0, 1}∗ such that for every x, y ∈ {0, 1}∗ it holds that V (x, y) = V (f (x), h(x, y)). Prove the existence of universal NPproof systems and show that the natural NPproof system for SAT is universal. Guideline: See Exercise 4.20.
5 Three Relatively Advanced Topics
In this chapter we discuss three relatively advanced topics. The first topic, which was alluded to in previous chapters, is the notion of promise problems (Section 5.1). Next, we present an optimal algorithm for solving (“candid”) NPsearch problems (Section 5.2). Finally, in Section 5.3, we briefly discuss the class (denoted coNP) of sets that are complements of sets in NP.
Teaching Notes Typically, the foregoing topics are not mentioned in a basic course on complexity. Still, we believe that these topics deserve at least a mention in such a course. This holds especially with respect to the notion of promise problems. Furthermore, depending on time constraints, we recommend presenting all three topics in class (at least at an overview level). We comment that the notion of promise problems was originally introduced in the context of decision problems, and is typically used only in that context. However, given the importance that we attach to an explicit study of search problems, we extend the formulation of promise problems to search problems as well. In that context, it is also natural to introduce the notion of a “candid search problem” (see Definition 5.2).
5.1 Promise Problems Promise problems are natural generalizations of search and decision problems. These generalizations are obtained by explicitly considering a set of legitimate instances (rather than considering any string as a legitimate instance). As noted previously, this generalization provides a more adequate formulation of natural computational problems (and, indeed, this formulation is used in all informal 142
5.1 Promise Problems
143
discussions). For example, in Section 4.3.2 we presented such problems using phrases like “given a graph and an integer . . . ” (or “given a collection of sets . . . ”). In other words, we assumed that the input instance has a certain format (or rather we “promised the solver” that this is the case). Indeed, we claimed that in these cases, the assumption can be removed without affecting the complexity of the problem, but we avoided providing a formal treatment of this issue, which we do next.1
5.1.1 Definitions Promise problems are defined by specifying a set of admissible instances. Candidate solvers of these problems are only required to handle these admissible instances. Intuitively, the designer of an algorithm solving such a problem is promised that the algorithm will never encounter an inadmissible instance (and so the designer need not care about how the algorithm performs on inadmissible inputs). 5.1.1.1 Search Problems with a Promise In the context of search problems, a promise problem is a relaxation in which one is only required to find solutions to instances in a predetermined set, called the promise. The requirement regarding efficient checkability of solutions is adapted in an analogous manner. Definition 5.1 (search problems with a promise): A search problem with a promise consists of a binary relation R ⊆ {0, 1}∗ × {0, 1}∗ and a promise set P . Such a problem is also referred to as the search problem R with promise P . r The search problem R with promise P is solved by algorithm A if for every x ∈ P it holds that (x, A(x)) ∈ R if x ∈ SR and A(x) = ⊥ otherwise, where SR = {x : R(x) = ∅} and R(x) = {y : (x, y) ∈ R}. def The time complexity of A on inputs in P is defined as TAP (n) = maxx∈P ∩{0,1}n {tA (x)}, where tA (x) is the running time of A(x) and TAP (n) = 0 if P ∩ {0, 1}n = ∅. r The search problem R with promise P is in the promise problem extension of PF if there exists a polynomialtime algorithm that solves this problem.2 1
2
Advanced comment: The notion of promise problems was originally introduced in the context of decision problems, and is typically used only in that context. However, we believe that promise problems are as natural in the context of search problems. In this case, it does not matter whether the time complexity of A is defined on inputs in P or on all possible strings. Suppose that A has (polynomial) time complexity T on inputs in P ; then we can modify A to halt on any input x after at most T (x) steps. This modification may only effect the output of A on inputs not in P (which are inputs that do not matter anyhow). The
144
5 Three Relatively Advanced Topics
r The search problem R with promise P is in the promise problem extension of PC if there exists a polynomial T and an algorithm A such that, for every x ∈ P and y ∈ {0, 1}∗ , algorithm A makes at most T (x) steps and it holds that A(x, y) = 1 if and only if (x, y) ∈ R. We stress that nothing is required of the solver in the case that the input violates the promise (i.e., x ∈ P ); in particular, in such a case the algorithm may halt with a wrong output. (Indeed, the standard formulations of PF and PC are obtained by considering the trivial promise P = {0, 1}∗ .)3 In addition to the foregoing motivation for promise problems, we mention one natural class of search problems with a promise. These are search problem in which the promise is that the instance has a solution; that is, in terms of Definition 5.1, we consider a search problem R with the promise P = SR . We refer to such search problems by the name candid search problems. Definition 5.2 (candid search problems): An algorithm A solves the candid search problem of the binary relation R if for every x ∈ SR (i.e., for every (x, y) ∈
R) it holds that (x, A(x)) ∈ R. The time complexity of such an algorithm is def defined as TASR (n) = maxx∈SR ∩{0,1}n {tA (x)}, where tA (x) is the running time of A(x) and TASR (n) = 0 if SR ∩ {0, 1}n = ∅. Note that nothing is required when x ∈ SR : In particular, algorithm A may either output a wrong solution (although no solutions exist) or run for more than TASR (x) steps. The first case can be essentially eliminated whenever R ∈ PC. Furthermore, for R ∈ PC, if we “know” the time complexity of algorithm A (e.g., if we can compute TASR (n) in poly(n)time), then we may modify A into an algorithm A that solves the (general) search problem of R (i.e., halts with a correct output on each input) in time TA such that TA (n) essentially equals TASR (n) + poly(n); see Exercise 5.2. However, we do not necessarily know the running time of an algorithm that we consider (or analyze). Furthermore, as we shall see in Section 5.2, the naive assumption by which we always know the running time of an algorithm that we design is not valid either. 5.1.1.2 Decision Problems with a Promise In the context of decision problems, a promise problem is a relaxation in which one is only required to determine the status of instances that belong to a predetermined set, called the promise. The requirement of efficient verification
3
modification can be implemented in polynomial time by computing t = T (x) and emulating the execution of A(x) for t steps. A similar comment applies to the definition of PC, P, and N P. Here we refer to the alternative formulation of PC outlined in Section 2.5.
5.1 Promise Problems
YESinstances
instances that violate the promise
145
NOinstances
Figure 5.1. A schematic depiction of a promise problem.
is adapted in an analogous manner. In view of the standard usage of the term, we refer to decision problems with a promise by the name promise problems. Formally, promise problems refer to a threeway partition of the set of all strings into yesinstances, noinstances, and instances that violate the promise. (See schematic depiction in Figure 5.1.) Standard decision problems are obtained as a special case by insisting that all inputs are allowed (i.e., the promise is trivial). Definition 5.3 (promise problems): A promise problem consists of a pair of nonintersecting sets of strings, denoted (Syes , Sno ), and Syes ∪ Sno is called the promise. r The promise problem (S , S ) is solved by algorithm A if for every x ∈ S yes no yes it holds that A(x) = 1 and for every x ∈ Sno it holds that A(x) = 0. The promise problem is in the promise problem extension of P if there exists a polynomialtime algorithm that solves it. r The promise problem (S , S ) is in the promise problem extension of N P yes no if there exists a polynomial p and a polynomialtime algorithm V such that the following two conditions hold: 1. Completeness: For every x ∈ Syes , there exists y of length at most p(x) such that V (x, y) = 1. 2. Soundness: For every x ∈ Sno and every y, it holds that V (x, y) = 0. We stress that for algorithms of polynomialtime complexity, it does not matter whether the time complexity is defined only on inputs that satisfy the promise or on all strings (see footnote 2). Thus, the extended classes P and N P (like PF and PC) are invariant under this choice. 5.1.1.3 Reducibility Among Promise Problems The notion of a Cookreduction extends naturally to promise problems, when postulating that a query that violates the promise (of the problem at the target
146
5 Three Relatively Advanced Topics
of the reduction) may be answered arbitrarily.4 That is, the oracle machine should solve the original problem no matter how the oracle answers queries that violate the promise. The latter requirement is consistent with the conceptual meaning of reductions and promise problems. Recall that reductions capture procedures that make subroutine calls to an arbitrary procedure that solves the “target” problem. But in the case of promise problems, such a solver may behave arbitrarily on instances that violate the promise. We stress that the main property of a reduction is preserved (see Exercise 5.3): If the promise problem is Cookreducible to a promise problem that is solvable in polynomial time, then is solvable in polynomial time. Caveat. The extension of a complexity class to promise problems does not necessarily inherit the “structural” properties of the standard class. For example, in contrast to Theorem 5.7, there exist promise problems in N P ∩ coN P such that every set in N P can be Cookreduced to them, see Exercise 5.4. Needless to say, N P = coN P does not seem to follow from Exercise 5.4. See further discussion in §5.1.2.4.
5.1.2 Applications and Limitations The following discussion refers to both the decision and the search versions of promise problems. We start with two generic applications, and later consider some specific applications. (Other applications are surveyed in [12].) We also elaborate on the foregoing caveat. 5.1.2.1 Formulating Natural Computational Problems Recall that promise problems offer the most direct way of formulating natural computational problems. Indeed, this is a major application of the notion of promise problems (although this application usually goes unnoticed). Specifically, the presentation of natural computational problems refers (usually implicitly) to some natural format, and this can be explicitly formulated by defining a (promise problem with a) promise that equals all strings in that format. Thus, the notion of a promise problem allows the discarding of inputs that do not adhere 4
It follows that Karpreductions among promise problems are not allowed to make queries that violate the promise. Specifically, we say that the promise problem = (yes , no ) is Karpreducible to the promise problem = (yes , no ) if there exists a polynomialtime mapping f such that for every x ∈ yes (resp., x ∈ no ) it holds that f (x) ∈ yes (resp., f (x) ∈ no ).
5.1 Promise Problems
147
to this format (and a focus on inputs that do adhere to this format). For example, when referring to computational problems regarding graphs, the promise mandates that the input is a graph (or, rather, the standard representation of some graph). We mention that, typically, the format (or rather the promise) is easily recognizable, and so the complexity of the promise problem can be captured by a corresponding problem (with a trivial promise); see Section 5.1.3 for further discussion. 5.1.2.2 Restricting a Computational Problem In addition to the foregoing application of promise problems, we mention their use in formulating the natural notion of a restriction of a computational problem to a subset of the instances. Specifically, such a restriction means that the promise set of the restricted problem is a subset of the promise set of the unrestricted problem. Definition 5.4 (restriction of computational problems): r For any P ⊆ P and binary relation R, we say that the search problem R with promise P is a restriction of the search problem R with promise P . r We say that the promise problem (S , S ) is a restriction of the promise yes no ⊆ Syes and Sno ⊆ Sno hold. problem (Syes , Sno ) if both Syes For example, when we say that 3SAT is a restriction of SAT, we refer to the fact that the set of allowed instances is now restricted to 3CNF formulae (rather than to arbitrary CNF formulae). In both cases, the computational problem is to determine satisfiability (or to find a satisfying assignment), but the set of instances (i.e., the promise set) is further restricted in the case of 3SAT. The fact that a restricted problem is never harder than the original problem is captured by the fact that the restricted problem is Karpreducible to the original one (via the identity mapping). 5.1.2.3 Nongeneric Applications In addition to the two aforementioned generic uses of the notion of a promise problem, we mention that this notion provides adequate formulations for a variety of specific Computational Complexity notions and results. One example is the notion of a candid search problem (i.e., Definition 5.2). Two other examples follow: 1. Unique solutions: For a binary relation R, we refer to the set of instances that have (at most) a single solution; that is, the promise is P = {x : R(x) ≤ 1}, where R(x) = {y : (x, y) ∈ R}. Two natural problems that arise are the search
148
5 Three Relatively Advanced Topics
problem of R with promise P and the promise problem (P ∩ SR , P \ SR ), where SR = {x : R(x) = ∅}. One fundamental question regarding these promise problems is how their complexity relates to the complexity of the original problem (e.g., the standard search problem of R). For details, see [13, Sec. 6.2.3]. 2. Gap problems: The complexity of various approximation tasks can be captured by the complexity of appropriate “gap problems”; for details, see [13, Sec. 10.1]. For example, approximating the value of an optimal solution is computationally equivalent to the promise problem of distinguishing instances having solutions of high value from instances having only solutions of low value, where the promise rules out instances that have an optimal solution of intermediate value. In all of these cases, promise problems allow discussion of natural computational problems and making statements about their inherent complexity. Thus, the complexity of promise problems (and classes of such problems) addresses natural questions and concerns. In particular, demonstrating the efficient solvability (resp., intractability) of such a promise problem (or of a class of such problems) carries the same conceptual message as demonstrating the efficient solvability (resp., intractability) of a standard problem (or of a class of corresponding standard problems). For example, saying that some promise problem cannot be solved by a polynomialtime algorithm carries the same conceptual message as saying that some standard (search or decision) problem cannot be solved by a polynomialtime algorithm. 5.1.2.4 Limitations Although the promise problem classes that correspond to P and PF preserve the intuitive meaning of the corresponding standard classes of (search or decision) problems, the situation is less clear with respect to N P and PC. Things become even worse when we consider the promise problem classes that correspond to N P ∩ coN P, where coN P = {{0, 1}∗ \ S : S ∈ N P}. Specifically, for S ∈ N P ∩ coN P it holds that every instance x has an NPwitness for membership in the corresponding set (i.e., either S or S = {0, 1}∗ \ S); however, for a promise problem (Syes , Sno ) in the corresponding “extension of N P ∩ coN P” it does not necessarily hold that every x has an NPwitness for membership in the corresponding set (i.e., either Syes or Sno or {0, 1}∗ \(Syes ∪ Sno )). The effect of this discrepancy is demonstrated in the discrepancy between Theorem 5.7 and Exercise 5.4. In general, structural properties of classes of promise problems do not necessarily reflect the properties of the corresponding decision problems. This
5.1 Promise Problems
149
follows from the fact that the answer of an oracle for a promise problem is not necessarily determined by the problem. Furthermore, the (standard) definitions of classes of promise problems do not refer to the complexity of the promise, which may vary from being trivial to being efficiently recognizable to being intractable or even undecidable.
5.1.3 The Standard Convention of Avoiding Promise Problems Recall that although promise problems provide a good framework for presenting natural computational problems, we managed to avoid this framework in previous chapters. This was done by relying on the fact that for all of the (natural) problems considered in the previous chapters, it is easy to decide whether or not a given instance satisfies the promise, which in turn refers to a standard encoding of objects as strings. Details follow. Let us first recall some natural computational problems. For example, SAT (resp., 3SAT) refers to CNF (resp., 3CNF) formulae, which means that we implicitly consider the promise that the input is in CNF (resp., in 3CNF). Indeed, this promise is efficiently recognizable (i.e., given a formula it is easy to decide whether or not it is in CNF (resp., in 3CNF)). Actually, the issue arises already when talking about formulae, because we are actually given a string that is supposed to encode a formula (under some predetermined encoding scheme). Thus, even for a problem concerning arbitrary formulae, we use a promise (i.e., that the input string is a valid encoding of some formula), which is easy to decide for natural encoding schemes. The same applies to all combinatorial problems we considered, because these problems (in their natural formulations) refer to objects like sets and graphs, which are encoded as strings (using some encoding scheme). Thus, in all of these cases, the natural computational problem refers to objects of some type, and this natural problem is formulated by considering a promise problem in which the promise is the set of all strings that encode such objects. Furthermore, in all of these cases, the promise (i.e., the set of legal encodings) is efficiently recognizable (i.e., membership in it can be decided in polynomial time). In these cases, we may avoid mentioning the promise by using one of the following two “nasty” conventions: 1. Fictitiously extending the set of instances to the set of all possible strings (and allowing trivial solutions for the corresponding dummy instances). For example, in the case of a search problem, we may either define all instances that violate the promise to have no solution or define them to have a trivial solution (e.g., be a solution for themselves); that is, for a search problem R
150
5 Three Relatively Advanced Topics
with promise P , we may consider the (standard) search problem of R where R is modified such that R(x) = ∅ for every x ∈ P (or, say, R(x) = {x} for every x ∈ P ). In the case of a promise (decision) problem (Syes , Sno ), we may consider the problem of deciding membership in Syes , which means that instances that violate the promise are considered as noinstances. 2. Considering every string as a valid encoding of some object (i.e., efficiently identifying strings that violate the promise with strings that satisfy the promise).5 For example, fixing any string x0 that satisfies the promise, we consider every string that violates the promise as if it were x0 . In the case of a search problem R with promise P , this means considering the (standard) search problem of R where R is modified such that R(x) = R(x0 ) for every x ∈ P . Similarly, in the case of a promise (decision) problem (Syes , Sno ), we consider the problem of deciding membership in Syes (provided x0 ∈ Sno and otherwise we consider the problem of deciding membership in {0, 1}∗ \ Sno ). We stress that in the case that the promise is efficiently recognizable, the aforementioned conventions (or modifications) do not affect the complexity of the relevant (search or decision) problem. That is, rather than considering the original promise problem, we consider a (search or decision) problem (without a promise) that is computational equivalent to the original one. Thus, in some sense we lose nothing by studying the latter problem rather than the original one (i.e., the original promise problem). However, to get to this situation we need the notion of a promise problem, which allows a formulation of the original natural problem. Indeed, even in the case that the original natural (promise) problem and the problem (without a promise) that was derived from it are computationally equivalent, it is useful to have a formulation that allows for distinguishing between them (as we do distinguish between the different NPcomplete problems although they are all computationally equivalent). This conceptual concern becomes of crucial importance in the case (to be discussed next) that the promise (referred to in the promise problem) is not efficiently recognizable. In the case that the promise is not efficiently recognizable, the foregoing transformations of promise problems into standard (decision and search) problems do not necessarily preserve the complexity of the problem. In this case, the terminology of promise problems is unavoidable. Consider, for example, the problem of deciding whether a Hamiltonian graph is 3colorable. On the face of it, such a problem may have fundamentally different complexity than the problem of deciding whether a given graph is both Hamiltonian and 3colorable. 5
Unlike in the first convention, this means that the dummy instances inherit the solutions to some real instances.
5.2 Optimal Search Algorithms for NP
151
In spite of the foregoing issues, we have adopted the convention of focusing on standard decision and search problems. That is, by default, all computational problems and complexity classes discussed in other sections of this book refer to standard decision and search problems, and the only exception in which we refer to promise problems (outside of the current section) is explicitly stated as such (see Section 5.2). This is justified by our focus on natural computational problems, which can be stated as standard (decision and search) problems by using the foregoing conventions.
5.2 Optimal Search Algorithms for NP This section refers to solving the candid search problem of any relation in PC. Recall that PC is the class of search problems that allow for efficient checking of the correctness of candidate solutions (see Definition 2.3), and that the candid search problem is a search problem in which the solver is promised that the given instance has a solution (see Definition 5.2). We claim the existence of an optimal algorithm for solving the candid search problem of any relation in PC. Furthermore, we will explicitly present such an algorithm and prove that it is optimal in a very strong sense: For any algorithm solving the candid search problem of R ∈ PC, our algorithm solves the same problem in time that is at most a constant factor slower (ignoring a fixed additive polynomial term, which may be disregarded in the case that the problem is not solvable in polynomial time). Needless to say, we do not know the time complexity of the aforementioned optimal algorithm (indeed, if we knew it, then we would have resolved the PvsNP Question). In fact, the PvsNP Question boils down to determining the time complexity of a single explicitly presented algorithm (i.e., the optimal algorithm claimed in Theorem 5.5).6 Theorem 5.5: For every binary relation R ∈ PC there exists an algorithm A that satisfies the following: 1. Algorithm A solves the candid search problem of R. 2. There exists a polynomial p such that for every algorithm A that solves the candid search problem of R, it holds that tA (x) = O(tA (x) + p(x)) (for any x ∈ SR ), where tA (x) (resp., tA (x)) denotes the number of steps taken by A (resp., A ) on input x. Interestingly, we establish the optimality of A without knowing what its (optimal) running time is. Furthermore, the optimality claim is “instancebased” 6
That is, P = N P if and only if the optimal algorithm of Theorem 5.5 has polynomialtime complexity.
152
5 Three Relatively Advanced Topics
(i.e., it refers to any input) rather than “global” (i.e., referring to the (worstcase) time complexity as a function of the input length). We stress that the hidden constant in the Onotation depends only on A , but in the following proof this dependence is exponential in the length of the description of algorithm A (and it is not known whether a better dependence can be achieved). Indeed, this dependence, as well as the idea underlying it, constitute one negative aspect of this otherwise amazing result. Another negative aspect is that the optimality of algorithm A refers only to inputs that have a solution (i.e., inputs in SR ).7 Finally, we note that the theorem as stated refers only to models of computation that have machines that can emulate a given number of steps of other machines with a constant overhead. We mention that in most natural models, the overhead of such emulation is at most polylogarithmic in the number of steps, in which case it holds that A (x) + p(x)), where O(t) = poly(log t) · t. tA (x) = O(t Proof Sketch: Fixing R, we let M be a polynomialtime algorithm that decides membership in R, and let p be a polynomial bounding the running time of M (as a function of the length of the first element in the input pair). Using M, we present an algorithm A that solves the candid search problem of R as follows. On input x, algorithm A emulates (“in parallel”) the executions of all possible search algorithms (on input x), checks the result provided by each of them (using M), and halts whenever it recognizes a correct solution. Indeed, most of the emulated algorithms are totally irrelevant to the search, but using M we can screen the bad solutions offered by them and output a good solution once obtained. Since there are infinitely many possible algorithms, it may not be clear what we mean by the expression “emulating all possible algorithms in parallel.” What we mean is emulating them at different “rates” such that the infinite sum of these rates converges to 1 (or to any other constant). Specifically, we will emulate the i th possible algorithm at rate 1/(i + 1)2 , which means emulating a single step of this algorithm per (i + 1)2 emulation steps (performed for all algorithms).8 Note that a straightforward implementation of this idea may create a significant overhead, which is involved in switching frequently from the emulation of one machine to the emulation of another. Instead, we present an alternative implementation that proceeds in iterations. 7 8
We stress that Exercise 5.2 is not applicable here, because we do not know TASR (·) (let alone that we do not have a poly(n)time algorithm for computing the mapping n → TASR (n)). Indeed, our choice of using the rate function ρ(i) = 1/(i + 1)2 is rather arbitrary and was adopted for the sake of simplicity (i.e., being the reciprocal of a small polynomial). See further discussion in Exercise 5.6.
5.2 Optimal Search Algorithms for NP
153
In the j th iteration, for i = 1, . . . , 2j/2 − 1, algorithm A emulates 2j /(i + 1)2 steps of the i th machine (where the machines are ordered according to the lexicographic order of their descriptions). Each of these emulations is conducted in one chunk, and thus the overhead of switching between the various emulations is insignificant (in comparison to the total number of steps being emulated).9 In the case that one of these emulations (on input x) halts with output y, algorithm A invokes M on input (x, y), and outputs y if and only if M(x, y) = 1. Furthermore, the verification of a solution provided by a candidate algorithm is also emulated at the expense of its step count. (Put in other words, we augment each algorithm with a canonical procedure (i.e., M) that checks the validity of the solution offered by the algorithm.) By its construction, whenever A(x) outputs a string y (i.e., y = ⊥) it must hold that (x, y) ∈ R. To show the optimality of A, we consider an arbitrary algorithm A that solves the candid search problem of R. Our aim is to show that A is not much slower than A . Intuitively, this is the case because the overhead of A results from emulating other algorithms (in addition to A ), but the total number of emulation steps wasted (due to these algorithms) is inversely proportional to the rate of algorithm A , which in turn is exponentially related to the length of the description of A . The punch line is that since A is fixed, the length of its description is a constant. Details follow. For every x, let us denote by t (x) the number of steps taken by A on input x, where t (x) also accounts for the running time of M(x, ·); that is, t (x) = tA (x) + p(x), where tA (x) is the number of steps taken by A (x) itself. Then, the emulation of t (x) steps of A on input x is “covered” by the j th iteration of A, provided that 2j /(2A +1 )2 ≥ t (x) where A  denotes the length of the description of A . (Indeed, we use the fact that the algorithms are emulated in lexicographic order, and note that there are at most 2A +1 − 2 algorithms that precede A in lexicographic order.) Thus, on input x, algorithm A halts after at most jA (x) iterations, where jA (x) = 2(A  + 1) + log2 (tA (x) + p(x)), after emulating a total number of steps that is at most def
t(x) =
jA (x) 2j/2 −1
j =1
i=1
2j (i + 1)2
< 2jA (x)+1 = 22A +3 · (tA (x) + p(x)), 9
(5.1) (5.2)
For simplicity, we start each emulation from scratch; that is, in the j th iteration, algorithm A emulates the first 2j /(i + 1)2 steps of the i th machine. Alternatively, we may maintain a record of the configuration in which we stopped in the j − 1st iteration and resume the computation from that configuration for another 2j /(i + 1)2 steps, but this saving (of k<j 2k /(i + 1)2 steps) is clearly insignificant.
154
5 Three Relatively Advanced Topics
2j/2 −1 1 1 1 < i≥1 (i+1)·i = i≥1 1i − i+1 =1 where the inequality uses i=1 (i+1)2 jA (x) j jA (x)+1 and j =1 2 < 2 . The question of how much time is required for emulating these many steps depends on the specific model of computation. In many models of computation (e.g., a twotape Turing machine), emulation is possible within polylogarithmic overhead (i.e., t steps of an arbitrary machine steps of the emulating machine), and in some models can be emulated by O(t) this emulation can even be performed with constant overhead. The theorem follows. Comment. By construction, the foregoing algorithm A does not halt on input x ∈ SR . This can be easily rectified by letting A emulate a straightforward exhaustive search for a solution, and halt with output ⊥ if and only if this exhaustive search indicates that there is no solution to the current input. This extra emulation can be performed in parallel to all other emulations (e.g., at a rate of one step for the extra emulation per each step of everything else).
5.3 The Class coNP and Its Intersection with NP By prepending the name of a complexity class (of decision problems) with the prefix “co” we mean the class of complement sets; that is, coC = {{0, 1}∗ \ S : S ∈ C}. def
(5.3)
Specifically, coN P = {{0, 1}∗ \ S : S ∈ N P} is the class of sets that are complements of sets in N P. Recalling that each set in N P is characterized by its witness relation such that x ∈ S if and only if there exists an adequate NPwitness, it follows that this set’s complement consists of all instances for which there are no NPwitnesses (i.e., x ∈ {0, 1}∗ \ S if there is no NPwitness for x being in S). For example, SAT ∈ N P implies that the set of unsatisfiable CNF formulae is in coN P. Likewise, the set of graphs that are not 3colorable is in coN P. (Jumping ahead, we mention that it is widely believed that these sets are not in N P.) Another perspective on coN P is obtained by considering the search problems in PC. Recall that for such R ∈ PC, the set of instances having a solution (i.e., SR = {x : ∃y s.t. (x, y) ∈ R}) is in N P. It follows that the set of instances having no solution (i.e., {0, 1}∗ \ SR = {x : ∀y (x, y) ∈ R}) is in coN P. It is widely believed that N P = coN P (which means that N P is not closed under complementation). Indeed, this conjecture implies P = N P (because P is closed under complementation). The conjecture N P = coN P means that
5.3 The Class coNP and Its Intersection with NP
155
some sets in coN P do not have NPproof systems (because N P is the class of sets having NPproof systems). As we will show next, under this conjecture, the complements of NPcomplete sets do not have NPproof systems; for example, there exists no NPproof system for proving that a given CNF formula is not satisfiable. We first establish this fact for NPcompleteness in the standard sense (i.e., under Karpreductions, as in Definition 4.1). Proposition 5.6: Suppose that N P = coN P and let S ∈ N P such that every def set in N P is Karpreducible to S. Then S = {0, 1}∗ \ S is not in N P. In other words, if S is N Pcomplete (under Karpreductions) and S ∈ N P, then N P = coN P. Proof Sketch: We first observe that the fact that every set in N P is Karpreducible to S implies that every set in coN P is Karpreducible to S (see Exercise 5.8). We next claim (and prove later) that if S is in N P, then every set that is Karpreducible to S is also in N P. Applying the claim to S = S, we conclude that S ∈ N P implies coN P ⊆ N P, which in turn implies N P = coN P (see Exercise 5.7) in contradiction to the main hypothesis. We now turn to prove the foregoing claim; that is, we prove that if S has an NPproof system and S is Karpreducible to S , then S has an NPproof system. Let V be the verification procedure associated with S , and let f be a Karpreduction of S to S . Then, we define the verification procedure V (for membership in S ) by V (x, y) = V (f (x), y). That is, any NPwitness for f (x) ∈ S serves as an NPwitness for x ∈ S (and these are the only NPwitnesses for x ∈ S ). This may not be a “natural” proof system (for S ), but it is definitely an NPproof system for S . Assuming that N P = coN P, Proposition 5.6 implies that sets in N P ∩ coN P cannot be NPcomplete with respect to Karpreductions. In light of other limitations of Karpreductions (see, e.g., Exercise 3.4), one may wonder whether or not the exclusion of NPcomplete sets from the class N P ∩ coN P is due to the use of a restricted notion of reductions (i.e., Karpreductions). The following theorem asserts that this is not the case: Some sets in N P cannot be reduced to sets in the intersection N P ∩ coN P even under general reductions (i.e., Cookreductions). Theorem 5.7: If every set in N P can be Cookreduced to some set in N P ∩ coN P, then N P = coN P.
156
5 Three Relatively Advanced Topics
In particular, assuming N P = coN P, no set in N P ∩ coN P can be NPcomplete, even when NPcompleteness is defined with respect to Cookreductions. Since N P ∩ coN P is conjectured to be a proper superset of P, it follows (assuming N P = coN P) that there are decision problems in N P that are neither in P nor NPhard (i.e., specifically, the decision problems in (N P ∩ coN P) \ P). We stress that Theorem 5.7 refers to standard decision problems and not to promise problems (see Section 5.1 and Exercise 5.4). Proof: Analogously to the proof of Proposition 5.6, the current proof boils down to proving that if S is Cookreducible to a set in N P ∩ coN P, then S ∈ N P ∩ coN P. Using this claim, the theorem’s hypothesis implies that N P ⊆ N P ∩ coN P, which in turn implies N P ⊆ coN P and N P = coN P (see Exercise 5.7). Fixing any S and S ∈ N P ∩ coN P such that S is Cookreducible to S , we prove that S ∈ N P (and the proof that S ∈ coN P is similar).10 Let us denote by M the oracle machine reducing S to S . That is, on input x, machine M makes queries and decides whether or not to accept x, and its decision is correct provided that all queries are answered according to S . To show that S ∈ N P, we will present an NPproof system for S. This proof system, denoted V , accepts an alleged (instancewitness) pair of the form (x, (z1 , σ1 , w1 ), . . . , (zt , σt , wt )) if the following two conditions hold: 1. On input x, machine M accepts after making the queries z1 , . . . , zt , and obtaining the corresponding answers σ1 , . . . , σt . That is, V checks that, on input x, after obtaining the answers σ1 , . . . , σi−1 to the first i − 1 queries, the i th query made by M equals zi . In addition, V checks that, on input x and after receiving the answers σ1 , . . . , σt , machine M halts with output 1 (indicating acceptance). Note that V does not have oracle access to S . The procedure V , rather, emulates the computation of M(x) by answering, for each i, the i th query of M(x) by using the bit σi (provided to V as part of its input). The correctness of these answers will be verified (by V ) separately (i.e., see the next item). 2. For every i, it holds that if σi = 1 then wi is an NPwitness for zi ∈ S , whereas if σi = 0 then wi is an NPwitness for zi ∈ {0, 1}∗ \ S . Thus, if this condition holds, then it is the case that each σi indicates the correct status of zi with respect to S (i.e., σi = 1 if and only if zi ∈ S ). 10
def
Alternatively, we show that S ∈ coN P by applying the following argument to S = {0, 1}∗ \ S and noting that S is Cookreducible to S (via S, or alternatively that S is Cookreducible to {0, 1}∗ \ S ∈ N P ∩ coN P).
5.3 The Class coNP and Its Intersection with NP
157
def
We stress that we have used the fact that both S and S = {0, 1}∗ \ S have NPproof systems, and we have referred to the corresponding NPwitnesses. Note that V is indeed an NPproof system for S. Firstly, the length of the corresponding witnesses is bounded by the running time of the reduction (and the length of the NPwitnesses supplied for the various queries). Next note that V runs in polynomial time (i.e., verifying the first condition requires an emulation of the polynomialtime execution of M on input x when using the σi ’s to emulate the oracle, whereas verifying the second condition is done by invoking the relevant NPproof systems). Finally, observe that x ∈ S if def and only if there exists a sequence y = ((z1 , σ1 , w1 ), . . . , (zt , σt , wt )) such that V (x, y) = 1. In particular, V (x, y) = 1 holds only if y contains a valid sequence of queries and answers as made in a computation of M on input x and oracle access to S , and M accepts based on that sequence. The World View – a Digest. Recall that on top of the P = N P conjecture, we mentioned two other conjectures (which clearly imply P = N P): 1. The conjecture that N P = coN P (equivalently, N P ∩ coN P = N P). This conjecture is equivalent to the conjecture that CNF formulae have no short proofs of unsatisfiability (i.e., the set {0, 1}∗ \ SAT has no NPproof system). 2. The conjecture that N P ∩ coN P = P. Notable candidates for the class (N P ∩ coN P) \ P include decision problems that are computationally equivalent to the integer factorization problem (i.e., the search problem (in PC) in which, given a composite number, the task is to find its prime factors). Combining these conjectures, we get the world view depicted in Figure 5.2, which also shows the class of coN Pcomplete sets (defined next). Definition 5.8: A set S is called coN P hard if every set in coN P is Karpreducible to S. A set is called coN P complete if it is both in coN P and coN Phard. Indeed, insisting on Karpreductions is essential for a distinction between N Phardness and coN Phardness. Furthermore, the class of problems that are Karpreducible to N P equals N P (see Exercise 5.9), whereas the class of problems that are Karpreducible to coN P equals coN P (because S is Karpreducible to S if and only if {0, 1}∗ \ S is Karpreducible to {0, 1}∗ \ S ). In contrast, recall that the class of problems that are Cookreducible to N P
158
5 Three Relatively Advanced Topics
NPC
NP
P coNPC
coNP
Figure 5.2. The world view under P = coN P ∩ N P = N P.
(resp., to coN P) contains N P ∪ coN P. This class, commonly denoted P N P , is further discussed in Exercise 5.13.
Exercises Exercise 5.1 (a quiz) 1. What are promise problems? 2. What is the justification for ignoring the promise (in a promise problem) whenever it is polynomialtime recognizable? 3. What is a candid search problem? 4. Could the PvsNP Question boil down to determining the time complexity of a single (known) algorithm? 5. What is the class coN P? 6. How does N P relate to the class of decision problems that are Cookreducible to N P? 7. How does N P relate to the class of decision problems that are Karpreducible to N P? Exercise 5.2 Let R ∈ PC and suppose that A solves the candid search problem of R in time complexity TASR . Prove that if the mapping n → TASR (n) can be computed in poly(n)time, then the standard search problem of R (as well as the ASR (n)) + poly(n), decision problem SR ) can be solved in time TA (n) = O(T where O(t) = poly(log t) · t. Guideline: Consider an algorithm A that on input x first computes t ← TASR (x), and then emulates the execution of A(x) for at most t steps. (The polylogarithmic factor is due to the overhead of this emulation.) If A(x) halts
Exercises
159
with output y, then A checks whether (x, y) ∈ R, and outputs y if the answer is positive (and ⊥ otherwise). Exercise 5.3 (Cookreductions preserve efficient solvability of promise problems) Prove that if the promise problem is Cookreducible to a promise problem that is solvable in polynomial time, then is solvable in polynomial time. Note that the solver may not halt on inputs that violate the promise. Guideline: Use the fact that any polynomialtime algorithm that solves any promise problem can be modified such that it halts on all inputs (in polynomial time). Exercise 5.4 (NPcomplete promise problems in coNP ( following [9])) Consider the promise problem xSAT, having instances that are pairs of CNF formulae. The yesinstances consist of pairs (φ1 , φ2 ) such that φ1 is satisfiable and φ2 is unsatisfiable, whereas the noinstances consist of pairs such that φ1 is unsatisfiable and φ2 is satisfiable. 1. Show that xSAT is in the intersection of (the promise problem classes that are analogous to) N P and coN P. 2. Prove that any promise problem in N P is Cookreducible to xSAT. In designing the reduction, recall that queries that violate the promise may be answered arbitrarily. Guideline: Note that the promise problem version of N P is reducible to SAT, and show a reduction of SAT to xSAT. Specifically, show that the search problem associated with SAT is Cookreducible to xSAT, by adapting the ideas of the proof of Proposition 3.7. That is, suppose that we know (or assume) that τ is a prefix of a satisfying assignment to φ, and we wish to extend τ by one bit. Then, for each σ ∈ {0, 1}, we construct a formula, denoted φσ , by setting the first τ  + 1 variables of φ according to the values τ σ . We query the oracle about the pair (φ1 , φ0 ) and extend τ accordingly (i.e., we extend τ by the value 1 if and only if the answer is positive). Note that if both φ1 and φ0 are satisfiable then it does not matter which bit we use in the extension, whereas if exactly one formula is satisfiable then the oracle answer is reliable. 3. Pinpoint the source of failure of the proof of Theorem 5.7 when applied to the reduction provided in the previous item. Exercise 5.5 Note that Theorem 5.5 holds for any search problem in NP, and not only for NPcomplete search problems. Compare the result of Theorem 5.5 to what would have followed from a corresponding result that only asserts optimal algorithms for all NPcomplete search problems. Ditto with respect
160
5 Three Relatively Advanced Topics
to a corresponding result that only asserts optimal algorithm for some NPcomplete search problem. Guideline: Note that we refer to a strong notion of optimality, which may not be preserved by Levinreductions. Exercise 5.6 Generalizing the proof of Theorem 5.5, consider the possibility of running the i th machine at rate ρ(i) rather than at rate 1/(i + 1)2 , where ρ satisfies i≥1 ρ(i) ≤ 1. Prove that, for any “reasonable” choice of ρ (e.g., ρ(i) = 1/O(i · (log2 (i + 1)) or ρ(i) = 2−i ), the result of Theorem 5.5 remains intact, although the constant hidden in the Onotation is effected. What should be required of a “reasonable” choice of ρ? Guideline: Note that our choice of ρ(i) = 1/(i + 1)2 was quite good, although ρ(i) = 1/O(i · (log2 (i + 1)) is better. Exercise 5.7 For any class C, prove that C ⊆ coC if and only if C = coC. Exercise 5.8 Prove that S1 is Karpreducible to S2 if and only if {0, 1}∗ \ S1 is Karpreducible to {0, 1}∗ \ S2 . Exercise 5.9 Prove that a set S is Karpreducible to some set in N P if and only if S is in N P. Guideline: For the nontrivial direction, see the proof of Proposition 5.6. Exercise 5.10 Recall that the empty set is not Karpreducible to {0, 1}∗ , whereas any set is Cookreducible to its complement. Thus, our focus here is on the Karpreducibility of nontrivial sets to their complements, where a set is nontrivial if it is neither empty nor contains all strings. Furthermore, since any nontrivial set in P is Karpreducible to its complement (see Exercise 3.4), we assume that P = N P and focus on sets in N P \ P. 1. Prove that N P = coN P implies that some sets in N P \ P are Karpreducible to their complements. 2. Prove that N P = coN P implies that some sets in N P \ P are not Karpreducible to their complements. Guideline: Use NPcomplete sets in both parts, and Exercise 5.9 in the second part. Exercise 5.11 (TAUT is coNPcomplete) Prove that the following problem, denoted TAUT, is coN Pcomplete (even when the formulae are restricted
Exercises
161
to 3DNF). An instance of the problem consists of a DNF formula, and the problem is to determine whether this formula is a tautology (i.e., a formula that evaluates to true under every possible truth assignment). Guideline: Reduce from SAT (i.e., the complement of SAT), using the fact that φ is unsatisfiable if and only if ¬φ is a tautology.11 Exercise 5.12 (the class N P ∩ coN P) Prove that a set S is in N P ∩ coN P def if and only if the set S = {(x, χS (x)) : x ∈ {0, 1}∗ } is in N P, where χS : ∗ {0, 1} → {0, 1} is the characteristic function of S (i.e., χS (x) = 1 if and only if x ∈ S). Guideline: An NPproof systems for S can be obtained by combining NPproof systems for S and S, whereas NPproof systems for S and S can be derived from any NPproof system for S . Exercise 5.13 (the class P N P ) Recall that P N P denotes the class of problems that are Cookreducible to N P. Prove the following (simple) facts. 1. For every class C, the class of problems that are Cookreducible to C equals the class of problems that are Cookreducible to coC. In particular, P N P equals the class of problems that are Cookreducible to coN P. 2. The class P N P is closed under complementation (i.e., P N P = coP N P ). Note that each of the foregoing items implies that P N P contains N P ∪ coN P. Exercise 5.14 Assuming that N P = coN P, prove that the problem of finding a maximum clique (resp., independent set) in a given graph is not in PC. Prove the same for the following problems: r Finding a minimum vertex cover in a given graph. r Finding an assignment that satisfies the maximum number of equations in a given system of linear equations over GF(2) (cf. Exercise 4.9.) We stress that maximum and minimum refer to the optimum taken over all legitimate solutions, whereas the terms maximal and minimal refer to a “local optimum” (i.e., optimal with respect to augmenting the current solution or omitting elements from it, respectively). Guideline: Note that the set of pairs (G, K) such that the graph G contains no clique of size K is coN Pcomplete. 11
Note that, given a CNF formula φ, we can easily obtain a DNF formula for ¬φ (by applying deMorgan’s Law).
162
5 Three Relatively Advanced Topics
Exercise 5.15 (the class P/poly, revisited) In continuation of Exercise 1.16, prove that P/poly equals the class of sets that are Cookreducible to a sparse set, where a set S is called sparse if there exists a polynomial p such that for every n it holds that S ∩ {0, 1}n  ≤ p(n). Guideline: For any set in P/poly, encode the advice sequence (an )n∈N as a sparse set {(1n , i, σn,i ) : n ∈ N , i ≤ an }, where σn,i is the i th bit of an . For the opposite direction, note that the emulation of a Cookreduction to a set S, on poly(x) input x, only requires knowledge of S ∩ i=1 {0, 1}i . Exercise 5.16 In continuation of Exercise 5.15, we consider the class of sets that are Karpreducible to a sparse set. It can be proved that this class contains SAT if and only if P = N P (see [23]).12 Here, we only consider the special case in which the sparse set is contained in a polynomialtime decidable set that is itself sparse (e.g., the latter set may be {1}∗ , in which case the former set may be an arbitrary unary set). Actually, the aim of this exercise is to establish the following (seemingly stronger) claim:13 If SAT is Karpreducible to a set S ⊆ G such that G ∈ P and G \ S is sparse, then SAT ∈ P. Using the hypothesis, we outline a polynomialtime procedure for solving the search problem of SAT and leave the task of providing the details as an exercise. The procedure (looking for a satisfying assignment) conducts a DFS on the tree of all possible partial truth assignments to the input formula,14 while truncating the search at nodes that correspond to partial truth assignments that were already demonstrated to be useless (i.e., correspond to a partial truth assignment that cannot be completed to a satisfying assignment). Guideline: The key observation is that each internal node (which yields a formula derived from the initial formula by instantiating the corresponding partial truth assignment) is mapped by the Karpreduction either to a string not in G (in which case we conclude that the subtree contains no satisfying assignments and backtrack from this node) or to a string in G. In the latter case, unless we already know that this string is not in S, we start a scan of the subtree rooted at this node. However, once we backtrack from this internal node, we know that the corresponding member of G is not in S, and we will never 12 13 14
An alternative presentation is available from the book’s Web site. This claim is seemingly stronger because G itself is not assumed to be sparse. For an nvariable formula, the leaves of the tree correspond to all possible nbit long strings, and an internal node corresponding to τ is the parent of the nodes corresponding to τ 0 and τ 1.
Exercises
163
scan again a subtree rooted at a node that is mapped to this string (which was detected to be in G \ S). Also note that once we reach a leaf, we can check by ourselves whether or not it corresponds to a satisfying assignment to the initial formula. When analyzing the foregoing procedure, prove that when given an nvariable formula φ as input, the number of times we start to scan a subtree poly(φ) is at most n ·  i=1 {0, 1}i ∩ (G \ S).
Historical Notes
The following brief account decouples the development of the theory of computation (which was the focus of Chapter 1) from the emergence of the PvsNP Question and the theory of NPcompleteness (studied in Chapters 2–5).
On Computation and Efficient Computation The interested reader may find numerous historical accounts of the developments that led to the emergence of the theory of computation. The following brief account is different from most of these historical accounts in that its perspective is the one of the current research in computer science. The theory of uniform computational devices emerged in the work of Turing [32]. This work put forward a natural model of computation, based on concrete machines (indeed Turing machines), which has been instrumental for subsequent studies. In particular, this model provides a convenient stage for the introduction of natural complexity measures referring to computational tasks. The notion of a Turing machine was put forward by Turing with the explicit intention of providing a general formulation of the notion of computability [32]. The original motivation was to provide a formalization of Hilbert’s challenge (posed in 1900 and known as Hilbert’s Tenth Problem), which called for designing a method for determining the solvability of Diophantine equations. Indeed, this challenge referred to a specific decision problem (later called the Entscheidungsproblem (German for the Decision Problem)), but Hilbert did not provide a formulation of the notion of “(a method for) solving a decision problem.” (We mention that in 1970, the Entscheidungsproblem was proved to be undecidable (see [24]).) In addition to introducing the Turing machine model and arguing that it corresponds to the intuitive notion of computability, Turing’s paper [32] introduces 165
166
Historical Notes
universal machines, and contains proofs of undecidability (e.g., of the Halting Problem). (Rice’s Theorem (Theorem 1.6) is proven in [27], and the undecidability of the Post Correspondence Problem (Theorem 1.7) is proven in [26].) The ChurchTuring Thesis is attributed to the works of Church [4] and Turing [32]. In both works, this thesis is invoked for claiming that the fact that some problem cannot be solved in a specific model of computation implies that this problem cannot be solved in any “reasonable” model of computation. The RAM model is attributed to von Neumann’s report [33]. The association of efficient computation with polynomialtime algorithms is attributed to the papers of Cobham [5] and Edmonds [7]. It is interesting to note that Cobham’s starting point was his desire to present a philosophically sound concept of efficient algorithms, whereas Edmonds’s starting point was his desire to articulate why certain algorithms are “good” in practice. The theory of nonuniform computational devices emerged in the work of Shannon [29], which introduced and initiated the study of Boolean circuits. The formulation of machines that take advice (as well as the equivalence to the circuit model) originates in [18].
On NP and NPCompleteness Many sources provide historical accounts of the developments that led to the formulation of the PvsNP Problem and to the discovery of the theory of NPcompleteness (see, e.g., [11, Sec. 1.5] and [31]). Still, we feel that we should not refrain from offering our own impressions, which are based on the texts of the original papers. Nowadays, the theory of NPcompleteness is commonly attributed to Cook [6], Karp [17], and Levin [21]. It seems that Cook’s starting point was his interest in theoremproving procedures for propositional calculus [6, p. 151]. Trying to provide evidence for the difficulty of deciding whether or not a given formula is a tautology, he identified N P as a class containing “many apparently difficult problems” (cf, e.g., [6, p. 151]), and showed that any problem in N P is reducible to deciding membership in the set of 3DNF tautologies. In particular, Cook emphasized the importance of the concept of polynomialtime reductions and the complexity class N P (both explicitly defined for the first time in his paper). He also showed that CLIQUE is computationally equivalent to SAT, and envisioned a class of problems of the same nature. Karp’s paper [17] can be viewed as fulfilling Cook’s prophecy: Stimulated by Cook’s work, Karp demonstrated that a “large number of classic difficult
Historical Notes
167
computational problems, arising in fields such as mathematical programming, graph theory, combinatorics, computational logic and switching theory, are [NP]complete (and thus equivalent)” [17, p. 86]. Specifically, his list of twentyone NPcomplete problems includes Integer Linear Programming, Hamilton Circuit, Chromatic Number, Exact Set Cover, Steiner Tree, Knapsack, Job Scheduling, and Max Cut. Interestingly, Karp defined N P in terms of verification procedures (i.e., Definition 2.5), pointed to its relation to “backtrack search of polynomial bounded depth” [17, p. 86], and viewed N P as the residence of a “wide range of important computational problems” (which seem not to be in P). Independently of these developments, while being in the USSR, Levin proved the existence of “universal search problems” (where universality meant NPcompleteness). The starting point of Levin’s work [21] was his interest in the “perebor” conjecture asserting the inherent need for brute force in some search problems that have efficiently checkable solutions (i.e., problems in PC). Levin emphasized the implication of polynomialtime reductions on the relation between the time complexity of the related problems (for any growth rate of the timecomplexity), asserted the NPcompleteness of six “classical search problems,” and claimed that the underlying method “provides a means for readily obtaining” similar results for “many other important search problems.” It is interesting to note that although the works of Cook [6], Karp [17], and Levin [21] were received with different levels of enthusiasm, none of their contemporaries realized the depth of the discovery and the difficulty of the question posed (i.e., the PvsNP Question). This fact is evident in every account from the early 1970s, and may explain the frustration of the corresponding generation of researchers over the failure to resolve the PvsNP Question, which they expected to be resolved in their lifetime (if not in a matter of a few years). Needless to say, the author’s opinion is that there was absolutely no justification for these expectations, and that one should have actually expected quite the opposite. We mention that the three “founding papers” of the theory of NPcompleteness (i.e., Cook [6], Karp [17], and Levin [21]) use the three different types of reductions used in this book. Specifically, Cook uses the general notion of polynomialtime reduction [6], often referred to as Cookreductions (Definition 3.1). The notion of Karpreductions (Definition 3.3) originates from Karp’s paper [17], whereas its augmentation to search problems (i.e., Definition 3.4) originates from Levin’s paper [21]. It is worth stressing that Levin’s work is stated in terms of search problems, unlike Cook’s and Karp’s works, which treat decision problems.
168
Historical Notes
The reductions presented in Section 4.3.2 are not necessarily the original ones. Most notably, the reduction establishing the NPhardness of the Independent Set problem (i.e., Proposition 4.10) is adapted from [10]. In contrast, the reductions presented in Section 4.3.1 are merely a reinterpretation of the original reduction as presented in [6]. The equivalence of the two definitions of N P (i.e., Theorem 2.8) was proven in [17]. The existence of NPsets that are neither in P nor NPcomplete (i.e., Theorem 4.12) was proven by Ladner [20], Theorem 5.7 was proven by Selman [28], and the existence of optimal search algorithms for NPrelations (i.e., Theorem 5.5) was proven by Levin [21]. (Interestingly, the latter result was proven in the same paper in which Levin presented the discovery of NPcompleteness, independently of Cook and Karp.) Promise problems were explicitly introduced by Even, Selman, and Yacobi [9]; see [12] for a survey of their numerous applications. A more detailed description of probabilistic proof systems, including proper credits for the results mentioned in Section 4.3.5, can be found in [13, Chap. 9].
Epilogue: A Brief Overview of Complexity Theory
Out of the tough came forth sweetness.1 Judges, 14:14 The following brief overview is intended to give a flavor of the questions addressed by Complexity Theory. It includes a brief review of the contents of the current book, as well as a brief overview of several more advanced topics. The latter overview is quite vague, and is merely meant as a teaser toward further study (cf., e.g., [13]).
Absolute Goals and Relative Results Complexity Theory is concerned with the study of the intrinsic complexity of computational tasks. Its “final” goals include the determination of the complexity of any welldefined task. Additional goals include obtaining an understanding of the relations between various computational phenomena (e.g., relating one fact regarding Computational Complexity to another). Indeed, we may say that the former type of goals is concerned with absolute answers regarding specific computational phenomena, whereas the latter type is concerned with questions regarding the relation between computational phenomena. Interestingly, so far Complexity Theory has been more successful in coping with goals of the latter (“relative”) type. In fact, the failure to resolve questions of the “absolute” type led to the flourishing of methods for coping with questions of the “relative” type. Musing for a moment, let us say that, in general, the difficulty of obtaining absolute answers may naturally lead to a search for conditional answers, which may in turn reveal interesting relations between 1
The quotation is commonly interpreted as meaning that benefit arose out of misfortune.
169
170
Epilogue
phenomena. Furthermore, the lack of absolute understanding of individual phenomena seems to facilitate the development of methods for relating different phenomena. Anyhow, this is what happened in Complexity Theory. Putting aside for a moment the frustration caused by the failure to obtain absolute answers, we must admit that there is something fascinating in the success of relating different phenomena: In some sense, relations between phenomena are more revealing than absolute statements about individual phenomena. Indeed, the first example that comes to mind is the theory of NPcompleteness. Let us consider this theory for a moment, from the perspective of these two types of goals.
P, NP, and NPcompleteness Complexity Theory has failed to determine the intrinsic complexity of tasks such as finding a satisfying assignment to a given (satisfiable) propositional formula or finding a 3coloring of a given (3colorable) graph. But it has succeeded in establishing that these two seemingly different computational tasks are in some sense the same (or, more precisely, are computationally equivalent). We find this success amazing and exciting, and hope that the reader shares these feelings. The same feeling of wonder and excitement is generated by many of the other discoveries of Complexity Theory. Indeed, the reader is invited to join a fast tour of some of the other questions and answers that make up the field of Complexity Theory. We will start with the P versus NP Question (and, indeed, briefly review the contents of Chapter 2). Our daily experience is that it is harder to solve a problem than it is to check the correctness of a solution (e.g., think of either a puzzle or a research problem). Is this experience merely a coincidence or does it represent a fundamental fact of life (i.e., a property of the world)? Could you imagine a world in which solving any problem is not significantly harder than checking a solution to it? Would the term “solving a problem” not lose its meaning in such a hypothetical (and impossible, in our opinion) world? The denial of the plausibility of such a hypothetical world (in which “solving” is not harder than “checking”) is what “P different from NP” actually means, where P represents tasks that are efficiently solvable and NP represents tasks for which solutions can be efficiently checked. The mathematically (or theoretically) inclined reader may also consider the task of proving theorems versus the task of verifying the validity of proofs. Indeed, finding proofs is a special type of the aforementioned task of “solving a problem” (and verifying the validity of proofs is a corresponding case of
Epilogue
171
checking correctness). Again, “P different from NP” means that there are theorems that are harder to prove than to be convinced of their correctness when presented with a proof. This means that the notion of a “proof” is meaningful; that is, proofs do help when seeking to be convinced of the correctness of assertions. Here, NP represents sets of assertions that can be efficiently verified with the help of adequate proofs, and P represents sets of assertions that can be efficiently verified from scratch (i.e., without proofs). In light of the foregoing discussion, it is clear that the P versus NP Question is a fundamental scientific question of farreaching consequences. The fact that this question seems beyond our current reach led to the development of the theory of NPcompleteness. Loosely speaking, this theory (presented in Chapter 4) identifies a set of computational problems that are as hard as NP. That is, the fate of the P versus NP Question lies with each of these problems: If any of these problems is easy to solve, then so are all problems in NP. Thus, showing that a problem is NPcomplete provides evidence of its intractability (assuming, of course, “P different than NP”). Indeed, demonstrating the NPcompleteness of computational tasks is a central tool in indicating hardness of natural computational problems, and it has been used extensively both in computer science and in other disciplines. We note that NPcompleteness indicates not only the conjectured intractability of a problem but also its “richness,” in the sense that the problem is rich enough to “encode” any other problem in NP. The use of the term “encoding” is justified by the exact meaning of NPcompleteness, which in turn establishes relations between different computational problems (without referring to their “absolute” complexity).
Some Advanced Topics The foregoing discussion of NPcompleteness hints at the importance of representation, since it referred to different problems that encode one another. Indeed, the importance of representation is a central aspect of Complexity Theory. In general, Complexity Theory is concerned with problems for which the solutions are implicit in the problem’s statement (or rather in the instance). That is, the problem (or rather its instance) contains all necessary information, and one merely needs to process this information in order to supply the answer.2 Thus, Complexity Theory is concerned with manipulation of information, and 2
In contrast, in other disciplines, solving a problem may require gathering information that is not available in the problem’s statement. This information may either be available from auxiliary (past) records or be obtained by conducting new experiments.
172
Epilogue
with its transformation from one representation (in which the information is given) to another representation (which is the one desired). Indeed, a solution to a computational problem is merely a different representation of the information given, that is, a representation in which the answer is explicit rather than implicit. For example, the answer to the question of whether or not a given Boolean formula is satisfiable is implicit in the formula itself (but the task is to make the answer explicit). Thus, Complexity Theory clarifies a central issue regarding representation, that is, the distinction between what is explicit and what is implicit in a representation. Furthermore, it even suggests a quantification of the level of nonexplicitness. In general, Complexity Theory provides new viewpoints on various phenomena that were also considered by past thinkers. Examples include the aforementioned concepts of solutions, proofs, and representation, as well as concepts like randomness, knowledge, interaction, secrecy, and learning. We next discuss the latter concepts and the perspective offered by Complexity Theory. The concept of randomness has puzzled thinkers for ages. Their perspective can be described as ontological: They asked “what is randomness” and wondered whether it exists at all (or whether the world is deterministic). The perspective of Complexity Theory is behavioristic: It is based on defining objects as equivalent if they cannot be told apart by any efficient procedure. That is, a coin toss is (defined to be) “random” (even if one believes that the universe is deterministic) if it is infeasible to predict the coin’s outcome. Likewise, a string (or a distribution on strings) is “random” if it is infeasible to distinguish it from the uniform distribution (regardless of whether or not one can generate the latter). Interestingly, randomness (or rather, pseudorandomness) defined this way is efficiently expandable; that is, under a reasonable complexity assumption (to be discussed next), short pseudorandom strings can be deterministically expanded into long pseudorandom strings. Indeed, it turns out that randomness is intimately related to intractability. Firstly, note that the very definition of pseudorandomness refers to intractability (i.e., the infeasibility of distinguishing a pseudorandomness object from a uniformly distributed object). Secondly, as stated, a complexity assumption, which refers to the existence of functions that are easy to evaluate but hard to invert (called oneway functions), implies the existence of deterministic programs (called pseudorandom generators) that stretch short random seeds into long pseudorandom sequences. In fact, it turns out that the existence of pseudorandom generators is equivalent to the existence of oneway functions. Complexity Theory offers its own perspective on the concept of knowledge (and distinguishes it from information). Specifically, Complexity Theory views
Epilogue
173
knowledge as the result of a hard computation. Thus, whatever can be efficiently done by anyone is not considered knowledge. In particular, the result of an easy computation applied to publicly available information is not considered knowledge. In contrast, the value of a hardtocompute function applied to publicly available information is knowledge, and if somebody provides you with such a value, then that person has provided you with knowledge. This discussion is related to the notion of zeroknowledge interactions, which are interactions in which no knowledge is gained. Such interactions may still be useful, because they may convince a party of the correctness of specific data that was provided beforehand. For example, a zeroknowledge interactive proof may convince a party that a given graph is 3colorable without yielding any 3coloring. The foregoing paragraph has explicitly referred to interaction, viewing it as a vehicle for gaining knowledge and/or gaining confidence. Let us highlight the latter application by noting that it may be easier to verify an assertion when one is allowed to interact with a prover rather than when reading a proof. Put differently, interaction with a good teacher may be more beneficial than reading any book. We comment that the added power of such interactive proofs is rooted in their being randomized (i.e., the verification procedure is randomized), because if the verifier’s questions can be determined beforehand, then the prover may just provide the transcript of the interaction as a traditional written proof. Another concept related to knowledge is that of secrecy: Knowledge is something that one party may have but another party does not have (and cannot feasibly obtain by itself) – thus, in some sense knowledge is a secret. In general, Complexity Theory is related to cryptography, where the latter is broadly defined as the study of systems that are easy to use but hard to abuse. Typically, such systems involve secrets, randomness, and interaction, as well as a complexity gap between the ease of proper usage and the infeasibility of causing the system to deviate from its prescribed behavior. Thus, much of cryptography is based on Complexity theoretic assumptions, and its results are typically transformations of relatively simple computational primitives (e.g., oneway functions) into more complex cryptographic applications (e.g., secure encryption schemes). We have already mentioned the concept of learning when referring to learning from a teacher versus learning from a book. Recall that Complexity Theory provides evidence to the advantage of the former. This is in the context of gaining knowledge about publicly available information. In contrast, computational learning theory is concerned with learning objects that are only partially available to the learner (i.e., reconstructing a function based on its value at a few
174
Epilogue
random locations or even at locations chosen by the learner). Still, Complexity Theory sheds light on the intrinsic limitations of learning (in this sense). Complexity Theory deals with a variety of computational tasks. We have already mentioned two fundamental types of tasks: searching for solutions (or, rather, “finding solutions”) and making decisions (e.g., regarding the validity of assertions). We have also hinted that in some cases these two types of tasks can be related. Now we consider two additional types of tasks: counting the number of solutions and generating random solutions. Clearly, both the latter tasks are at least as hard as finding arbitrary solutions to the corresponding problem, but it turns out that for some natural problems they are not significantly harder. Specifically, under some natural conditions on the problem, approximately counting the number of solutions and generating an approximately random solution is not significantly harder than finding an arbitrary solution. Having mentioned the notion of approximation, we note that the study of the complexity of finding “approximate solutions” is also of natural importance. One type of approximation problems refers to an objective function defined on the set of potential solutions: Rather than finding a solution that attains the optimal value, the approximation task consists of finding a solution that attains an “almost optimal” value, where the notion of “almost optimal” may be understood in different ways, giving rise to different levels of approximation. Interestingly, in many cases, even a very relaxed level of approximation is as difficult to obtain as solving the original (exact) search problem (i.e., finding an approximate solution is as hard as finding an optimal solution). Surprisingly, these hardnessofapproximation results are related to the study of probabilistically checkable proofs, which are proofs that allow for ultrafast probabilistic verification. Amazingly, every proof can be efficiently transformed into one that allows for probabilistic verification based on probing a constant number of bits (in the alleged proof). Turning back to approximation problems, we mention that in other cases, a reasonable level of approximation is easier to achieve than solving the original (exact) search problem. Approximation is a natural relaxation of various computational problems. Another natural relaxation is the study of averagecase complexity, where the “average” is taken over some “simple” distributions (representing a model of the problem’s instances that may occur in practice). We stress that although it was not stated explicitly, the entire discussion so far has referred to “worstcase” analysis of algorithms. We mention that worstcase complexity is a more robust notion than averagecase complexity. For starters, one avoids the controversial question of characterizing the instances that are “important in practice” and, correspondingly, the selection of the class of distributions for which averagecase analysis is to be conducted. Nevertheless, a relatively robust theory of
Epilogue
175
averagecase complexity has been suggested, albeit it is less developed than the theory of worstcase complexity. In view of the central role of randomness in Complexity Theory (as evident, say, in the study of pseudorandomness, probabilistic proof systems, and cryptography), one may wonder as to whether the randomness needed for the various applications can be obtained in real life. One specific question, which received a lot of attention, is the possibility of “purifying” randomness (or “extracting good randomness from bad sources”). That is, can we use “defective” sources of randomness in order to implement almost perfect sources of randomness? The answer depends, of course, on the model of such defective sources. This study turned out to be related to Complexity Theory, where the most tight connection is between some type of randomness extractors and some type of pseudorandom generators. So far we have focused on the time complexity of computational tasks, while relying on the natural association of efficiency with time. However, time is not the only resource one should care about. Another important resource is space: the amount of (temporary) memory consumed by the computation. The study of space complexity has uncovered several fascinating phenomena, which seem to indicate a fundamental difference between space complexity and time complexity. For example, in the context of space complexity, verifying proofs of validity of assertions (of any specific type) has the same complexity as verifying proofs of invalidity for the same type of assertions. In case the reader feels dizzy, it is no wonder. We took an ultrafast air tour of some mountaintops, and dizziness is to be expected. For a totally different touring experience, we refer the interested reader to the author’s book [13], which offers climbing the aforementioned mountains by foot, while stopping often for appreciation of the view and reflection. Absolute Results (also Known as Lower Bounds). As stated in the beginning of this epilogue, absolute results are not known for many of the “big questions” of Complexity Theory (most notably the P versus NP Question). However, several highly nontrivial absolute results have been proved. For example, it was shown that using negation can speed up the computation of monotone functions (which do not require negation for their mere computation). In addition, many promising techniques were introduced and employed with the aim of providing a lowlevel analysis of the progress of computation. However, as stated up front, the focus of this epilogue was elsewhere.
Appendix: Some Computational Problems
Although we view specific (natural) computational problems as secondary to (natural) complexity classes, we do use the former for clarification and illustration of the latter. This appendix provides definitions of such computational problems, grouped according to the type of objects to which they refer (i.e., graphs and Boolean formula). We start by addressing the central issue of the representation of the various objects that are referred to in the aforementioned computational problems. The general principle is that elements of all sets are “compactly” represented as binary strings (without much redundancy). For example, the elements of a finite set S (e.g., the set of vertices in a graph or the set of variables appearing in a Boolean formula) will be represented as binary strings of length log2 S.
A.1 Graphs Graph theory has long become recognized as one of the more useful mathematical subjects for the computer science student to master. The approach which is natural in computer science is the algorithmic one; our interest is not so much in existence proofs or enumeration techniques, as it is in finding efficient algorithms for solving relevant problems, or alternatively showing evidence that no such algorithms exist. Although algorithmic graph theory was started by Euler, if not earlier, its development in the last ten years has been dramatic and revolutionary. Shimon Even, Graph Algorithms [8] A simple graph G = (V , E) consists of a finite set of vertices V and a finite set of edges E, where each edge is an unordered pair of vertices; that is, E ⊆ {{u, v} : u, v ∈ V ∧ u = v}. This formalism does not allow selfloops and parallel edges, which are allowed in general (i.e., nonsimple) graphs, where E is a multiset that may contain (in addition to twoelement subsets of V also) singletons (i.e., selfloops). The vertex u is called an end point of the edge {u, v}, and the edge {u, v} is said to be incident at v. In such a case, we say that u and v are adjacent in the graph, and that u is a neighbor of v. The degree of a vertex in G is defined as the number of edges that are incident at this vertex.
177
178
Appendix
We will consider various substructures of graphs, the simplest one being paths. A path in a graph G = (V , E) is a sequence of vertices (v0 , . . . , v ) such that for every def
i ∈ [] = {1, . . . , } it holds that vi−1 and vi are adjacent in G. Such a path is said to have length . A simple path is a path in which each vertex appears at most once, which implies that the longest possible simple path in G has length V  − 1. The graph is called connected if there exists a path between each pair of vertices in it. A cycle is a path in which the last vertex equals the first one (i.e., v = v0 ). The cycle (v0 , . . . , v ) is called simple if > 2 and {v0 , . . . , v } = (i.e., if vi = vj then i ≡ j (mod ), and the cycle (u, v, u) is not considered simple). A graph is called acyclic (or a forest) if it has no simple cycles, and if it is also connected, then it is called a tree. Note that G = (V , E) is a tree if and only if it is connected and E = V  − 1, and that there is a unique simple path between each pair of vertices in a tree. A subgraph of the graph G = (V , E) is any graph G = (V , E ) satisfying V ⊆ V and E ⊆ E. Note that a simple cycle in G is a connected subgraph of G in which each vertex has degree exactly two. An induced subgraph of the graph G = (V , E) is any subgraph G = (V , E ) that contains all edges of E that are contained in V (i.e., E = {{u, v} ∈ E : u, v ∈ V }). In such a case, we say that G is the subgraph induced by V .
Directed Graphs. We will also consider (simple)
directed graphs (also known as digraphs), where edges are ordered pairs of vertices.1 In this case, the set of edges is a subset of V × V \ {(v, v) : v ∈ V }, and the edges (u, v) and (v, u) are called antiparallel.
General (i.e., nonsimple) directed graphs are defined analogously. The edge (u, v) is viewed as going from u to v, and thus is called an outgoing edge of u (resp., incoming edge of v). The outdegree (resp., indegree) of a vertex is the number of its outgoing edges (resp., incoming edges). Directed paths and the related objects are defined analogously; for example, v0 , . . . , v is a directed path if for every i ∈ [] it holds that (vi−1 , vi ) is a directed edge (which is directed from vi−1 to vi ). It is common to consider also a pair of antiparallel edges as a simple directed cycle. A directed acyclic graph (DAG) is a digraph that has no directed cycles. Every DAG has at least one vertex having outdegree (resp., indegree) zero, called a sink (resp., a source). A simple directed acyclic graph G = (V , E) is called an inward (resp., outward) directed tree if E = V  − 1 and there exists a unique vertex, called the root, having outdegree (resp., indegree) zero. Note that each vertex in an inward (resp., outward) directed tree can reach the root (resp., is reachable from the root) by a unique directed path.2
Representation. Graphs are commonly represented by their adjacency matrix and/or their incidence lists. The adjacency matrix of a simple graph G = (V , E) is a V byV  1 2
Again, the term “simple” means that selfloops and parallel (directed) edges are not allowed. In contrast, antiparallel edges are allowed. Note that in any DAG, there is a directed path from each vertex v to some sink (resp., from some source to each vertex v). In an inward (resp., outward) directed tree this sink (resp., source) must be unique. The condition E = V  − 1 enforces the uniqueness of these paths, because (combined with the reachability condition) it implies that the underlying graph (obtained by disregarding the orientation of the edges) is a tree.
Appendix
179
Boolean matrix in which the (i, j )th entry equals 1 if and only if i and j are adjacent in G. The incidence list representation of G consists of V  sequences such that the i th sequence is an ordered list of the set of edges incident at vertex i. (Needless to say, it is easy to transform one of these representations to the other.)
Computational Problems. Simple computational problems regarding graphs include determining whether a given graph is connected (and/or acyclic) and finding shortest paths in a given graph. Another simple problem is determining whether a given graph is bipartite, where a graph G = (V , E) is bipartite (or 2colorable) if there exists a 2coloring of its vertices that does not assign neighboring vertices the same color. All of these problems are easily solvable by BFS. Moving to more complicated tasks that are still solvable in polynomial time, we mention the problem of finding a perfect matching (or a maximum matching) in a given graph, where a matching is a subgraph in which all vertices have degree 1, a perfect matching is a matching that contains all of the graph’s vertices, and a maximum matching is a matching of maximum cardinality (among all matching of the said graph). Turning to seemingly hard problems, we mention that the problem of determining whether a given graph is 3colorable3 (i.e., G3C) is NPcomplete. A few additional NPcomplete problems follow. r A Hamiltonian path (resp., Hamiltonian cycle) in the graph G = (V , E) is a simple path (resp., cycle) that passes through all of the vertices of G. Such a path (resp., cycle) has length V  − 1 (resp., V ). The problem is to determine whether a given graph contains a Hamiltonian path (resp., cycle). r An independent set (resp., clique) of the graph G = (V , E) is a set of vertices V ⊆ V such that the subgraph induced by V contains no edges (resp., contains all possible edges). The problem is to determine whether a given graph has an independent set (resp., a clique) of a given size. A vertex cover of the graph G = (V , E) is a set of vertices V ⊆ V such that each edge in E has at least one end point in V . Note that V is a vertex cover of G if and only if V \ V is an independent set of V . A natural computational problem, which is believed to be neither in P nor NPcomplete, is the Graph Isomorphism problem. The input consists of two graphs, G1 = (V1 , E1 ) and G2 = (V2 , E2 ), and the question is whether there exist a 11 and onto mapping φ : V1 → V2 such that {u, v} is in E1 if and only if {φ(u), φ(v)} is in E2 . (Such a mapping is called an isomorphism.)
A.2 Boolean Formulae In §1.4.3.1, Boolean formulae are defined as a special case of Boolean circuits (cf. §1.4.1.1). Here, we take the more traditional approach and define Boolean formulae (also 3
We say that a a graph G = (V , E) is 3colorable if its vertices can be colored using three colors such that neighboring vertices are not assigned the same color.
180
Appendix
known as propositional formulae) as structured sequences over an alphabet consisting of variable names and various connectives. It is most convenient to define Boolean formulae recursively as follows:
r A Boolean variable is a Boolean formula. r If φ , . . . , φ are Boolean formulae and ψ is a tary Boolean operation, then 1 t ψ(φ1 , . . . , φt ) is a Boolean formula. Typically, we consider three Boolean operations: the unary operation of negation (denoted neg or ¬), and the (bounded or unbounded) conjunction and disjunction (denoted ∧ and ∨, respectively). Furthermore, the convention is to shorthand ¬(φ) by ¬φ, and to write (∧ti=1 φi ) or (φ1 ∧ · · · ∧ φt ) instead of ∧(φ1 , . . . , φt ), and similarly for ∨. Two important special cases of Boolean formulae are CNF and DNF formulae. A CNF formula is a conjunction of disjunctions of variables and/or their negation; that is, t ∧ti=1 φi is a CNF if each φi has the form (∨ji=1 φi,j ), where each φi,j is either a variable or a negation of a variable (and is called a literal). If for every i it holds that ti ≤ k (e.g., k = 2, 3), then we say that the formula is a kCNF. Similarly, DNF formulae are defined as disjunctions of conjunctions of literals. The value of a Boolean formula under a truth assignment to its variables is defined recursively along its structure. For example, ∧ti=1 φi has the value true under an assignment τ if and only if every φi has the value true under τ . We say that a formula φ is satisfiable if there exists a truth assignment τ to its variables such that the value of φ under τ is true. The set of satisfiable CNF (resp., 3CNF) formulae is denoted SAT (resp., 3SAT), and the problem of deciding membership in it is NPcomplete. The set of tautologies (i.e., formula that have the value true under any assignment) is coNPcomplete, even when restricted to 3DNF formulae.
Bibliography
[1] S. Arora and B. Barak. Complexity Theory: A Modern Approach. Cambridge University Press, 2009. [2] L. Berman and J. Hartmanis. On Isomorphisms and Density of NP and Other Complete Sets. SIAM Journal on Computing, Vol. 6 (2), pages 305–322, 1977. [3] G. Boolos, J. P. Burgess, and R. C. Jeffrey. Computability and Logic, 5th edition. Cambridge University Press, 2007. [4] A. Church. An Unsolvable Problem of Elementary Number Theory. Amer. J. of Math., Vol. 58, pages 345–363, 1936. [5] A. Cobham. The Intristic Computational Difficulty of Functions. In Proc. 1964 Iternational Congress for Logic Methodology and Philosophy of Science, pages 24–30, 1964. [6] S. A. Cook. The Complexity of Theorem Proving Procedures. In 3rd ACM Symposium on the Theory of Computing, pages 151–158, 1971. [7] J. Edmonds. Paths, Trees, and Flowers. Canad. J. Math., Vol. 17, pages 449–467, 1965. [8] S. Even. Graph Algorithms. Computer Science Press, 1979. [9] S. Even, A. L. Selman, and Y. Yacobi. The Complexity of Promise Problems with Applications to PublicKey Cryptography. Information and Control, Vol. 61, pages 159–173, 1984. [10] U. Feige, S. Goldwasser, L. Lov´asz, S. Safra, and M. Szegedy. Approximating Clique is Almost NPComplete. Journal of the ACM, Vol. 43, pages 268–292, 1996. Preliminary version in 32nd FOCS, 1991. [11] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman and Company, New York, 1979. [12] O. Goldreich. On Promise Problems: A Survey. In [14]. An earlier version is available from ECCC, TR05018, 2005. [13] O. Goldreich. Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008. [14] O. Goldreich, A. L. Rosenberg, and A. L. Selman (eds.). Essays in Theoretical Computer Science in Memory of Shimon Even. Springer Verlag, LNCS Festschrift, Vol. 3895, March 2006. [15] D. Hochbaum (ed.). Approximation Algorithms for NPHard Problems. PWS, 1996.
181
182
Bibliography
[16] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. AddisonWesley, 1979. [17] R. M. Karp. Reducibility among Combinatorial Problems. In Complexity of Computer Computations, R. E. Miller and J. W. Thatcher (eds.), Plenum Press, pages 85–103, 1972. [18] R. M. Karp and R. J. Lipton. Some Connections Between Nonuniform and Uniform Complexity Classes. In 12th ACM Symposium on the Theory of Computing, pages 302–309, 1980. [19] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1996. [20] R. E. Ladner. On the Structure of Polynomial Time Reducibility. Journal of the ACM, Vol. 22, pages 155–171, 1975. [21] L. A. Levin. Universal Search Problems. Problemy Peredaci Informacii 9, pages 115–116, 1973 (in Russian). English translation in Problems of Information Transmission 9, pages 265–266. [22] M. Li and P. Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer Verlag, August 1993. [23] S. Mahaney. Sparse Complete Sets for NP: Solution of a Conjecture of Berman and Hartmanis. Journal of Computer and System Science, Vol. 25, pages 130– 143, 1982. [24] Y. Matiyasevich. Hilbert’s Tenth Problem. MIT Press, 1993. [25] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. [26] E. Post. A Variant of a Recursively Unsolvable Problem. Bull. AMS, Vol. 52, pages 264–268, 1946. [27] H. G. Rice. Classes of Recursively Enumerable Sets and Their Decision Problems. Trans. AMS, Vol. 89, pages 25–59, 1953. [28] A. Selman. On the Structure of NP. Notices Amer. Math. Soc., Vol. 21 (6), page 310, 1974. [29] C. E. Shannon. A Symbolic Analysis of Relay and Switching Circuits. Trans. American Institute of Electrical Engineers, Vol. 57, pages 713–723, 1938. [30] M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997. [31] B. A. Trakhtenbrot. A Survey of Russian Approaches to Perebor (Brute Force Search) Algorithms. Annals of the History of Computing, Vol. 6 (4), pages 384– 398, 1984. [32] C. E. Turing. On Computable Numbers, with an Application to the Entscheidungsproblem. Proc. Londom Mathematical Soceity, Ser. 2, Vol. 42, pages 230– 265, 1936. A Correction, ibid., Vol. 43, pages 544–546. [33] J. von Neumann, First Draft of a Report on the EDVAC, 1945. Contract No. W670ORD492, Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia. Reprinted (in part) in Origins of Digital Computers: Selected Papers. SpringerVerlag, Berlin and Heidelberg, pages 383–392, 1982.
Index
Author Index Church, A., 165 Cobham, A., 165 Cook, S. A., 165–167 Edmonds, J., 165 Even, S., 167 Karp, R. M., 165–167 Ladner, R. E., 167 Levin, L. A., 165–167 Selman, A. L., 167 Shannon, C. E., 165 Turing, A. M., 164, 165 Yacobi, Y., 167
Subject Index Algorithms, see Computability theory Approximation, 173 Averagecase complexity, 173 Boolean circuits, 32–40, 104–113 bounded fanin, 35 constantdepth, 40 depth, 39 monotone, 40 size, 35–36 unbounded fanin, 35, 38, 40 uniform, 36 Boolean formulae, 33, 38–39, 177–178 clauses, 38 CNF, 38, 104–113, 178 DNF, 39, 178 literals, 38
ChurchTuring Thesis, 16, 17, 27 Circuits, see Boolean circuits CNF, see Boolean formulae CobhamEdmonds Thesis, 27, 28, 50, 52, 108 Complexity classes coNP, 94, 126, 154–158 EXP, 66, 123 generic, 40 IP, see Interactive proofs NP, see NP NPC, see NPcompleteness NPI, 126 P, see P PC, 55–58, 63–66, 69, 77, 80–88, 98–102, 105, 109, 144, 151–154 PCP, see Probabilistically checkable proofs PF, 54–55, 57–58, 63–65 ZK, see Zeroknowledge Computability theory, 1–31 Computational problems 2SAT, 113 3SAT, 112, 113, 123, 124, 126, 178 3XC, 115 Bipartiteness, 177 Bounded Halting, 102 Bounded NonHalting, 102–103 Clique, 117, 177 Connectivity, 177 CSAT, 104–112 Entscheidungsproblem, 164 Exact Cover, 115 Factoring integers, 53, 72, 94, 157 Graph 2Colorability, 113, 177 Graph 3Colorability, 91, 113, 118, 123, 125, 177
183
184
Index
Computational problems (cont.) Graph Isomorphism, 91, 177 Halting Problem, 19–21, 102, 103 Hamiltonian path, 53, 56, 58, 60, 61, 177 Independent Set, 117, 177 PCP, see Post Correspondence Problem Perfect matching, 177 Primality testing, 72 SAT, 53, 57, 61, 85–86, 104–113, 154, 178 Set Cover, 114 Solving systems of equations, 53 Spanning trees, 53 TSP, 53, 57 Vertex Cover, 117, 177 Computational tasks and models, 1–47 Constantdepth circuits, see Boolean circuits Cookreductions, see Reductions Cryptography, 125, 172 Decision problems, 6–8, 58–65 DNF, see Boolean formulae Exhaustive search, 50, 51, 66, 70 Finite automata, 31 Formulae, see Boolean formulae Graph theory, 175–177 Halting Problem, see Computational problems Hilbert’s Tenth Problem, 164 Interactive proofs, 124, 172 Karpreductions, see Reductions Kolmogorov Complexity, 24–26, 35 Levinreductions, see Reductions Monotone circuits, see Boolean circuits NP, 48–158 as proof system, 59–62, 123, 125 as search problem, 55–58 Optimal search, 151–154 traditional definition, 66–69 NPcompleteness, 89–133, 155–158 Onotation, 26 Oneway functions, 171 Optimal search for NP, 151–154 Oracle machines, 29–31
P, 48–158 as search problem, 54–55, 57–58 P versus NP Question, 48–70 Polynomialtime reductions, see Reductions Post Correspondence Problem, 22, 24, 43 Probabilistic proof systems, 123–126 Probabilistically checkable proofs, 125–126 Promise problems, 8, 52, 142–151, 156 Proof systems Interactive, see Interactive proofs NP, see NP PCP, see Probabilistically checkable proofs Probabilistic, see Probabilistic proof systems Zeroknowledge, see Zeroknowledge Pseudorandom generators, 171 Pseudorandomness, 171 Randomness extractors, 174 Reductions Cookreductions, 76–99, 120–129, 155–157 Downward selfreducibility, 92 Karpreductions, 77–81, 98–120, 155 Levinreductions, 79–81, 83, 99–113 parsimonious, 139 Polynomialtime reductions, 74–129 Selfreducibility, 83–88 to sparse sets, 162–163 Turingreductions, 21, 29–31 Rice’s Theorem, 21 Search problems, 5–6, 52–58, 63–65 versus decision, 63–65, 77, 80, 83–88 Selfreducibility, see Reductions Space complexity, 29 Time complexity, 10, 26–29 Turing machines, 11–18 multitape, 16 nondeterministic, 66–69 singletape, 15 with advice, 36–37 Turingreductions, see Reductions Uncomputable functions, 18–22 Undecidability, 19, 22 Universal algorithms, 22–26, 28 Universal machines, 22–26 Worstcase complexity, 173 Zeroknowledge, 124–125, 172