Bài giảng PostgreSQL Up and Running

164 trang huongle 9860

Download

Bạn đang xem 20 trang mẫu của tài liệu "Bài giảng PostgreSQL Up and Running", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

bai_giang_postgresql_up_and_running.pdf

Nội dung text: Bài giảng PostgreSQL Up and Running

PostgreSQL: Up and Running wnload from Wow! eBook o D Regina Obe and Leo Hsu Beijing • Cambridge • Farnham • Kửln • Sebastopol • Tokyo
PostgreSQL: Up and Running by Regina Obe and Leo Hsu Copyright â 2012 Regina Obe and Leo Hsu. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles ( For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Meghan Blanchette Cover Designer: Karen Montgomery Production Editor: Iris Febres Interior Designer: David Futato Proofreader: Iris Febres Illustrator: Rebecca Demarest Revision History for the First Edition: 2012-07-02 First release See for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. PostgreSQL: Up and Running, the image of the elephant shrew, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-32633-3 [LSI] 1341247831
Table of Contents Preface ix 1. The Basics 1 Where to Get PostgreSQL 1 Notable PostgreSQL Forks 1 Administration Tools 2 What’s New in Latest Versions of PostgreSQL? 3 Why Upgrade? 4 What to Look for in PostgreSQL 9.2 4 PostgreSQL 9.1 Improvements 5 Database Drivers 5 Server and Database Objects 6 Where to Get Help 8 2. Database Administration 9 Configuration Files 9 The postgresql.conf File 10 The pg_hba.conf File 12 Reload the Configuration Files 14 Setting Up Groups and Login Roles (Users) 14 Creating an Account That Can Log In 15 Creating Group Roles 15 Roles Inheriting Rights 15 Databases and Management 16 Creating and Using a Template Database 16 Organizing Your Database Using Schemas 16 Permissions 17 Extensions and Contribs 18 Installing Extensions 19 Common Extensions 21 Backup 22 iii
Selective Backup Using pg_dump 23 Systemwide Backup Using pg_dumpall 24 Restore 24 Terminating Connections 24 Using psql to Restore Plain Text SQL backups 25 Using pg_restore 26 Managing Disk Space with Tablespaces 27 Creating Tablespaces 27 Moving Objects Between Tablespaces 27 Verboten 27 Delete PostgreSQL Core System Files and Binaries 28 Giving Full Administrative Rights to the Postgres System (Daemon) Ac- count 28 Setting shared_buffers Too High 29 Trying to Start PostgreSQL on a Port Already in Use 29 3. psql 31 Interactive psql 31 Non-Interactive psql 32 Session Configurations 33 Changing Prompts 34 Timing Details 35 AUTOCOMMIT 35 Shortcuts 36 Retrieving Prior Commands 36 psql Gems 36 Executing Shell Commands 37 Lists and Structures 37 Importing and Exporting Data 38 Basic Reporting 39 4. Using pgAdmin 43 Getting Started 43 Overview of Features 43 Connecting to a PostgreSQL server 44 Navigating pgAdmin 44 pgAdmin Features 45 Accessing psql from pgAdmin 45 Editing postgresql.conf and pg_hba.conf from pgAdmin 47 Creating Databases and Setting Permissions 47 Backup and Restore 48 pgScript 51 Graphical Explain 54 iv | Table of Contents
Job Scheduling with pgAgent 55 Installing pgAgent 55 Scheduling Jobs 56 Helpful Queries 57 5. Data Types 59 Numeric Data Types 59 Serial 59 Generate Series Function 60 Arrays 60 Array Constructors 60 Referencing Elements in An Array 61 Array Slicing and Splicing 61 Character Types 62 String Functions 63 Splitting Strings into Arrays, Tables, or Substrings 63 Regular Expressions and Pattern Matching 64 Temporal Data Types 65 Time Zones: What It Is and What It Isn’t 66 Operators and Functions for Date and Time Data Types 68 XML 70 Loading XML Data 70 Querying XML Data 70 Custom and Composite Data Types 71 All Tables Are Custom 71 Building Your Own Custom Type 71 6. Of Tables, Constraints, and Indexes 73 Tables 73 Table Creation 73 Multi-Row Insert 75 An Elaborate Insert 75 Constraints 77 Foreign Key Constraints 77 Unique Constraints 78 Check Constraints 78 Exclusion Constraints 79 Indexes 79 PostgreSQL Stock Indexes 79 Operator Class 81 Functional Indexes 81 Partial Indexes 82 Multicolumn Indexes 82 Table of Contents | v
7. SQL: The PostgreSQL Way 85 SQL Views 85 Window Functions 87 Partition By 88 Order By 89 Common Table Expressions 90 Standard CTE 91 Writeable CTEs 92 Recursive CTE 92 Constructions Unique to PostgreSQL 93 DISTINCT ON 93 LIMIT and OFFSET 94 Shorthand Casting 94 ILIKE for Case Insensitive Search 94 Set Returning Functions in SELECT 95 Selective DELETE, UPDATE, and SELECT from Inherited Tables 95 RETURNING Changed Records 96 Composite Types in Queries 96 8. Writing Functions 99 Anatomy of PostgreSQL Functions 99 Function Basics 99 Trusted and Untrusted Languages 100 Writing Functions with SQL 101 Writing PL/pgSQL Functions 103 Writing PL/Python Functions 103 Basic Python Function 104 Trigger Functions 105 Aggregates 107 9. Query Performance Tuning 111 EXPLAIN and EXPLAIN ANALYZE 111 Writing Better Queries 113 Overusing Subqueries in SELECT 114 Avoid SELECT * 116 Make Good Use of CASE 116 Guiding the Query Planner 118 Strategy Settings 118 How Useful Is Your Index? 118 Table Stats 120 Random Page Cost and Quality of Drives 120 Caching 121 vi | Table of Contents
10. Replication and External Data 123 Replication Overview 123 Replication Lingo 123 PostgreSQL Built-in Replication Advancements 124 Third-Party Replication Options 125 Setting Up Replication 125 Configuring the Master 125 Configuring the Slaves 126 Initiate the Replication Process 127 Foreign Data Wrappers (FDW) 127 Querying Simple Flat File Data Sources 128 Querying More Complex Data Sources 128 Appendix: Install, Hosting, and Command-Line Guides 131 Table of Contents | vii
Download from Wow! eBook
Preface PostgreSQL is an open source relational database management system that began as a University of California, Berkeley project. It was originally under the BSD license, but is now called the PostgreSQL License (TPL). For all intents and purposes, it’s BSD licensed. It has a long history, almost dating back to the beginning of relational data- bases. It has enterprise class features such as SQL windowing functions, the ability to create aggregate functions and also utilize them in window constructs, common table and recursive common table expressions, and streaming replication. These features are rarely found in other open source database platforms, but commonly found in newer versions of the proprietary databases such as Oracle, SQL Server, and IBM DB2. What sets it apart from other databases, including the proprietary ones we just mentioned, is the ease with which you can extend it without changing the underlying base—and in many cases, without any code compilation. Not only does it have advanced features, but it performs them quickly. It can outperform many other databases, including pro- prietary ones for many types of database workloads. In this book, we’ll expose you to the advanced ANSI-SQL features that PostgreSQL offers. and the unique features PostgreSQL has that you won’t find in other databases. If you’re an existing PostgreSQL user or have some familiarity with PostgreSQL, we hope to show you some gems you may have missed along the way; or features found in newer PostgreSQL versions that are not in the version you’re using. If you have used another relational database and are new to PostgreSQL, we’ll show you some parallels with how PostgreSQL handles tasks compared to other common databases, and demonstrate feats you can achieve with PostgreSQL that are difficult or impossible to do in other databases. If you’re completely new to databases, you’ll still learn a lot about what PostgreSQL has to offer and how to use it; however, we won’t try to teach you SQL or relational theory. You should read other books on these topics to take the greatest advantage of what this book has to offer. This book focuses on PostgreSQL versions 9.0 to 9.2, but we will cover some unique and advanced features that are also present in prior versions of PostgreSQL. ix
What Makes PostgreSQL Special and Why Use It? PostgreSQL is special because it’s not just a database: it’s also an application platform —and an impressive one at that. PostgreSQL allows you to write stored procedures and functions in several program- ming languages, and the architecture allows you the flexibility to support more lan- guages. Example languages that you can write stored functions in are SQL (built-in), PL/pgSQL (built-in), PL/Perl, PL/Python, PL/Java, and PL/R, to name a few, most of which are packaged with many distributions. This support for a wide variety of lan- guages allows you to solve problems best addressed with a domain or more procedural language; for example, using R statistics functions and R succinct domain idioms to solve statistics problems; calling a web service via Python; or writing map reduce con- structs and then using these functions within an SQL statement. You can even write aggregate functions in any of these languages that makes the com- bination more powerful than you can achieve in any one, straight language environ- ment. In addition to these languages, you can write functions in C and make them callable, just like any other stored function. You can have functions written in several different languages participating in one query. You can even define aggregate functions with nothing but SQL. Unlike MySQL and SQL Server, no compilation is required to build an aggregate function in PostgreSQL. So, in short, you can use the right tool for the job even if each sub-part of a job requires a different tool; you can use plain SQL in areas where most other databases won’t let you. You can create fairly sophisticated functions without having to compile anything. The custom type support of PostgreSQL is sophisticated and very easy to use, rivaling and often outperforming most other relational databases. The closest competitor in terms of custom type support is Oracle. You can define new data types in PostgreSQL that can then be used as a table column. Every data type has a companion array type so that you can store an array of a type in a data column or use it in an SQL statement. In addition to the ability of defining new types, you can also define operators, functions, and index bindings to work with these. Many third-party extensions for PostgreSQL take advantage of these fairly unique features to achieve performance speeds, provide domain specific constructs to allow shorter and more maintainable code, and accom- plish tasks you can only fantasize about in other databases. If building your own types and functions is not your thing, you have a wide variety of extensions to choose from, many of which are packaged with PostgreSQL distros. PostgreSQL 9.1 introduced a new SQL construct, CREATE EXTENSION, which allows you to install the many available extensions with a single SQL statement for each in a specific database. With CREATE EXTENSION, you can install in your database any of the afore- mentioned PL languages and popular types with their companion functions and oper- ators, like hstore, ltree, postgis, and countless others. For example, to install the popular PostgreSQL key-value store type and its companion functions and operators, you would type: x | Preface
CREATE EXTENSION hstore; In addition, there is an SQL command you can run—sect_extensions—to see the list of available and installed extensions. Many of the extensions we mentioned, and perhaps even the languages we discussed, may seem like arbitrary terms to you. You may recognize them and think, “Meh, I’ve seen Python, and I’ve seen Perl So what?” As we delve further, we hope you experience the same “WOW” moments we have come to appreciate with our many years of using PostgreSQL. Each update treats us to new features, eases usability, brings improve- ments in speed, and pushes the envelope of what is possible with a database. In the end, you will wonder why you ever used any other relational database, when Post- greSQL does everything you could hope for—and does it for free. No more reading the licensing cost fine print of those other databases to figure out how many dollars you need to spend if you have 8 cores on your server and you need X,Y, Z functionality, and how much it will cost you when you get 16 cores. On top of this, PostgreSQL works fairly consistently across all supported platforms. So if you’re developing an app you need to resell to customers who are running Linux, Mac OS X, or Windows, you have no need to worry, because it will work on all of them. There are binaries available for all if you’re not in the mood to compile your own. Why Not PostgreSQL? PostgreSQL was designed from the ground up to be a server-side database. Many people do use it on the desktop similarly to how they use SQL Server Express or Oracle Express, but just like those it cares about security management and doesn’t leave this up to the application connecting to it. As such, it’s not ideal as an embeddable database, like SQLite or Firebird. Sadly, many shared-hosts don’t have it pre-installed, or have a fairly antiquated version of it. So, if you’re using shared-hosting, you’re probably better off with MySQL. This may change in the future. Keep in mind that virtual, dedicated hosting and cloud server hosting is reasonably affordable and getting more competitively priced as more ISPs are beginning to provide them. The cost is not that much more expensive than shared hosting, and you can install any software you want on them. Because of these options, these are more suitable for PostgreSQL. PostgreSQL does a lot and a lot can be daunting. It’s not a dumb data store; it’s a smart elephant. If all you need is a key value store or you expect your database to just sit there and hold stuff, it’s probably overkill for your needs. For More Information on PostgreSQL This book is geared at demonstrating the unique features of PostgreSQL that make it stand apart from other databases, as well as how to use these features to solve real world Preface | xi
problems. You’ll learn how to do things you never knew were possible with a database. Aside from the cool “Eureka!” stuff, we will also demonstrate bread-and-butter tasks, such as how to manage your database, how to set up security, troubleshoot perfor- mance, improve performance, and how to connect to it with various desktop, com- mand-line, and development tools. PostgreSQL has a rich set of online documentation for each version. We won’t endeavor to repeat this information, but encourage you to explore what is available. There are over 2,250 pages in the manuals available in both HTML and PDF formats. In addition, fairly recent versions of these online manuals are available for hard-copy purchase if you prefer paper form. Since the manual is so large and rich in content, it’s usually split into a 3-4 volume book set when packaged in hard-copy form. Below is a list of other PostgreSQL resources: • Planet PostgreSQL is a blog aggregator of PostgreSQL bloggers. You’ll find Post- greSQL core developers and general users show-casing new features all the time and demonstrating how to use existing ones. • PostgreSQL Wiki provides lots of tips and tricks for managing various facets of the database and migrating from other databases. • PostgreSQL Books is a list of books that have been written about PostgreSQL. • PostGIS in Action Book is the website for the book we wrote on PostGIS, the spatial extender for PostgreSQL. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter- mined by context. This icon signifies a tip, suggestion, or general note. xii | Preface
This icon indicates a warning or caution. Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “PostgreSQL: Up and Running by Regina Obe and Leo Hsu (O’Reilly). Copyright 2012 Regina Obe and Leo Hsu, 978-1-449-32633-3.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Safariđ Books Online Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business. Technology professionals, software developers, web designers, and business and cre- ative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training. Safari Books Online offers a range of product mixes and pricing programs for organi- zations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable da- tabase from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Tech- nology, and dozens more. For more information about Safari Books Online, please visit us online. Preface | xiii
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at: To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about our books, courses, conferences, and news, see our website at Find us on Facebook: Follow us on Twitter: Watch us on YouTube: xiv | Preface
CHAPTER 1 The Basics In this chapter, we’ll cover the basics of getting started with PostgreSQL. This includes where to get binaries and drivers, what’s new and exciting in the latest 9.2 release, common administration tools, PostgreSQL nomenclature, and where to turn for help. Where to Get PostgreSQL Years ago, if you wanted PostgreSQL, you had to compile it from source. Thankfully, those days are gone. Granted, you can still compile should you so choose, but most users nowadays get their PostgreSQL with a prepackaged installer. A few clicks or keystrokes, and you’re on your way in 10 minutes or less. If you’re installing PostgreSQL for the first time and have no existing database to up- grade, you should always install the latest stable release version for your OS. .postgresql.org/download maintains a listing of places where you can download PostgreSQL binaries. In “Installation Guides and Distributions” on page 131, you’ll find installation guides and some other additional custom distributions that people wnload from Wow! eBook o we’ve talked to seem to like. D Notable PostgreSQL Forks The fact that PostgreSQL has MIT/BSD style licensing makes it a great candidate for forking. Various groups have done exactly that over the years., and some have con- tributed their changes. Netezza, a popular database choice for data warehousing work- loads, in its inception was a PostgreSQL fork. GreenPlum, used for data warehousing and analyzing petabytes of information, was a spinoff of Bizgres, which was a commu- nity-driven spinoff of PostgreSQL focused on Big Data. PostgreSQL Advanced Plus by EnterpriseDb is a fork of the PostgreSQL codebase—it adds Oracle syntax and com- patibility features to woo Oracle users. EnterpriseDb does provide funding to the Post- greSQL community, and for this we’re grateful. 1
All the aforementioned are proprietary, closed source forks. tPostgres and Postgres- XC are two budding forks that we find interesting with open source licensing. tPostgres braches off PostgreSQL 9.2 and targets Microsoft SQL Server users. For instance, with tPostgres, you can write functions using T-SQL. Postgres-XC is a cluster server pro- viding write-scalable, synchronous multi-master replication. What makes Postgres-XC special is that it supports distributed processing and replication. It is now at version 1.0. Administration Tools There are three popular tools for managing PostgreSQL and these are supported by PostgreSQL core developers; they tend to stay in synch with PostgreSQL versions. In addition, there are plenty of commercial offerings as well. psql psql is a command-line interface for writing queries and managing PostgreSQL. It comes packaged with some nice extras, such as an import and export commands for delimited files, and a reporting feature that can generate HTML output. psql has been around since the beginning of PostgreSQL and is a favorite of hardcore PostgreSQL users. Newer converts who are more comfortable with GUI tools tend to favor pgAdmin. pgAdmin This is the widely used, free, graphical administration tool for PostgreSQL. You can download it separately from PostgreSQL. pgAdmin runs on the desktop and can connect to multiple PostgreSQL servers regardless of version or OS. Even if you have your database server on a window-less Unix-based server, install pgAd- min and you’ll find yourself armed with a fantastic GUI. pgAdmin is pictured in Figure 1-1. Some installers, such as those offered by EnterpriseDB, package pgAdmin with the database server install. If you’re unfamiliar with PostgreSQL, you should definitely start with pgAdmin. You’ll get a great overview and gain an appreciation of the richness of PostgreSQL just by exploring all the database objects in the main in- terface. If you’re coming from SQL Server and used Management Studio, you’ll feel right at home. PHPPgAdmin PHPPgAdmin, pictured in Figure 1-2, is a free, web-based administration tool pat- terned after the popular PHPMyAdmin for MySQL. PostgreSQL has many more kinds of database objects than MySQL, as such PHPPgAdmin is a step up from PHPMyAdmin with additions to manage schemas, procedural languages, casts, operators, and so on. If you’ve used PHPMyAdmin, you’ll find PHPPgAdmin to be nearly identical. 2 | Chapter 1: The Basics
Figure 1-1. pgAdmin Figure 1-2. PHPPgAdmin Tool What’s New in Latest Versions of PostgreSQL? The upgrade process gets simpler with each new version. There’s no reason not to always keep in step with the latest version. PostgreSQL is the fastest growing database technology today. Major versions come out almost annually. Each new version adds enhancements to ease of use, stability, security, performance, and avant-garde features. The lesson here? Always upgrade, and do so often. What’s New in Latest Versions of PostgreSQL? | 3
Why Upgrade? If you’re using PostgreSQL 8.2 or below: upgrade now! Enough said. If you’re using PostgreSQL 8.3: upgrade soon! 8.3 will be reaching end-of-life in early 2013. Details about PostgreSQL EOL policy can be found here: PostgreSQL Release Support Policy. EOL is not a place you want to be. New security updates and fixes to serious bugs will no longer be available. You’ll need to hire specialized PostgreSQL core consultants to patch problems or to implement workarounds—probably not a cheap proposition, assuming you can even locate someone to begin with. Regardless of which version you are using, you should always try to run the latest micro- versions for your version. An upgrade from say 8.4.8 to 8.4.11 requires just binary file replacement, which can be generally done with a quick restart after installing the up- grade. Only bug fixes are introduced in micro-versions, so there’s little cause for con- cern and can in fact save you grief. What to Look for in PostgreSQL 9.2 At time of writing, PostgreSQL 9.1 is the latest stable release, and 9.2 is waiting in the wings to strut its stuff. All of the anticipated features in 9.2 are already set in stone and available in the 9.2 beta release. The following list discusses the most notable features: • Index-only scans. If you need to retrieve only columns that are already a part of an index, PostgreSQL will skip the need to go to the table. You’ll see significant speed improvement in these queries as well as aggregates such as COUNT(*). • Sorting improvements that improve in-memory sort operations by as much as 20%. • Improvements in prepared statements. A prepared statement is now parsed, ana- lyzed, and rewritten, but not necessarily planned. It can also produce custom saved plans of a given prepared statement which are dependent on argument inputs. This reduces the chance that a prepared statement will perform worse than an equivalent ad-hoc query. • Cascading streaming replication supports streaming from a slave to another slave. • SP-GiST, another advance in GiST index technology using space filling trees. This should have great impact on the various extensions that rely on GiST for speed. • ALTER TABLE IF EXISTS syntax for making changes to tables. • Many new variants of ALTER TABLE ALTER TYPE commands that used to require whole table rewrites and rebuild of indexes. (More details are available at More Alter Table Alter Types.) • Even more pg_dump and pg_restore options. (Read our article at 9.2 pg_dump Enhancements.) • plv8js is a new language handler that allows you to create functions in JavaScript. 4 | Chapter 1: The Basics
• JSON built-in data type and companion functions row_to_json(), array_to_json(). This should be a welcome addition for web developers writing AJAX appications. • New range type class of types where a pair of values in data type forms a range, eliminating the need to cludge range-like functionality. • Allow SQL functions to reference arguments by name instead of by number. PostgreSQL 9.1 Improvements PostgreSQL 9.1 introduced enterprise features, making it an even more viable alterna- tive to the likes of Microsoft SQL Server and Oracle: • More built-in replication features including synchronous replication. • Extensions management using the new CREATE EXTENSION, ALTER EXTENSION. Ex- tensions make installing and removing add-ons a breeze. • ANSI-compliant foreign data wrappers for querying disparate data sources. • Writeable common table expressions (CTE). The syntactical convenience of CTEs now works for UPDATE and INSERT queries. • Unlogged tables speeds up queries against tables where logging is unnecessary. • Triggers on views. In prior versions, to make views updatable you used DO INSTEAD rules, which only supported SQL for programming logic. Triggers can be written in most procedural languages—except SQL—and opens the door for more complex abstraction using views. • KNN GiST adds improvement to popular extensions like full-text search, trigram (for fuzzy search and case insensitive search), and PostGIS. Database Drivers If you are using or plan to use PostgreSQL, chances are that you’re not going to use it in a vacuum. To have it interact with other applications, you’re going to need database drivers. PostgreSQL enjoys a generous number of freely available database drivers that can be used in many programming languages. In addition, there are various commercial organizations that provide drivers with extra bells and whistles at modest prices. Below, we’ve listed a few popular, open source ones: • PHP is a common language used to develop web applications, and most PHP dis- tributions come packaged with at least one PostgreSQL driver. There is the older pgsql and the newer pdo_pgsql. You may need to enable them in your php.ini or do a yum install, but they are usually already there. • Java. If you are doing Java development, there are always updated versions of JDBC that support the latest PostgreSQL, which you can download from gresql.org. Database Drivers | 5
• For .NET. (Microsoft or Mono) you can use the Npgsql driver, which has source and binary versions for .NET Frameworks 3.5 and above, and Mono.NET. • If you need to connect from MS Access or some other Windows Office productivity software, download ODBC drivers from sions/msi. The link includes both 32-bit and 64-bit ODBC drivers. • LibreOffice/OpenOffice. LibreOffice 3.5 (and above) comes packaged with a na- tive PostgreSQL driver. For OpenOffice and older versions of LibreOffice, you can use a PostgreSQL JDBC driver or the SDBC driver. You can find details about connecting to these on our article OO Base and PostgreSQL. • Python is a beautiful language and has support for PostgreSQL via various Python database drivers; at the moment, Psycopg is the most popular. • Ruby. You can connect to PostgreSQL via rubypg. • Perl. You’ll find PostgreSQL connectivity support via DBI and the DBD:Pg driver or pure Perl DBD:PgPP driver from CPAN. Server and Database Objects So you installed PostgreSQL and open up pgAdmin. You expand the server tree. Before you is a bewildering array of database objects, some familiar and some completely foreign. PostgreSQL has more database objects than probably any other database, and that’s without considering add-ons. You’ll probably never touch many of these objects, but if you dream up a new functionality that you wish PostgreSQL would offer, more likely than not, it’s already implemented using one of those esoteric objects that you’ve been ignoring. This book is not even going to attempt to describe all that you’ll find in a PostgreSQL install. With PostgreSQL churning out features at breakneck speed, we can’t imagine any book that could possibly itemize all that PostgreSQL has to offer. We’ll now discuss the most commonly used database objects: server service The PostgreSQL server service is often just called a PostgreSQL server, or daemon. You can have more than one a physical server as long as they listen on different ports or IPs and have different places to store their respective data. database Each PostgreSQL server houses many databases. table Table are the workhorses of any database. What is unique about PostgreSQL tables is the inheritance support and the fact that every table automatically begets an accompanying custom data type. Tables can inherit from other tables and querying can bring up child records from child tables. 6 | Chapter 1: The Basics
schema Schemas are part of the ANSI-SQL standards, so you’ll see them in other databases. Schemas are the logical containers of tables and other objects. Each database can have multiple schemas. tablespace Tablespace is the physical location where data is stored. PostgreSQL allows table- spaces to be independently managed, which means you can easily move databases to different drives with just a few commands. view Most relational databases have views for abstracting queries. In PostgreSQL, you can also have views that can be updated. function Functions in PostgreSQL can return scalar value or sets of records. Aggregates are functions used with SQL constructs such as GROUP BY to summarize data. Most of the time, they return scalars but in PostgreSQL they can return composite objects. operator These are symbolic functions that have backing of a function. In PostgreSQL, you can define your own. cast Casts allow you to convert from one data type to another. They are supported by functions that actually perform the conversion. What is rare about PostgreSQL that you won’t find with many other databases is that you can create your own casts and thus change the default behavior of casting. Casting can be implicit or explicit. Implicit casts are automatic and usually will expand from a more specific to a more generic type. When an implicit cast is not offered, you must cast explic- itly. sequence Sequence is what controls auto-incrementation in table definitions. They are usu- ally automatically created when you define a serial column. Because they are ob- jects in their own right, you could have multiple serial columns use the same se- quence object, effectively achieveing uniqueness not only within the column but across them. trigger Found in many databases, triggers detect data change events and can react before or after the actual data is changed. PostgreSQL 9.0 introduced some special twists to this with the WHEN clause. PostgreSQL 9.1 added the extra feature of making triggers available for views. foreign data wrappers Foreign data wrappers allow you to query a remote data source whether that data source be another relational database server, flat file, a NoSQL database, a web service or even an application platform like SalesForce. They are found in SQL Server and Database Objects | 7
Server as linked tables, but PostgreSQL implementation follows the SQL/Manage- ment of External Data (MED) standard, and is open to connect to any kind of data source. row/record Rows and records generally mean the same thing. In PostgreSQL, rows can be treated independently from their respective tables. This distinction becomes ap- parent and useful when you write functions or use the row constructor in SQL. extension This is a new feature introduced in 9.1 that packages a set of functions, types, casts, indexes, and so forth into a single unit for maintainability. It is similar in concept to Oracle packages and is primarily used to deploy add-ons. Where to Get Help There will come a day when you need additional help. Since that day always arrives earlier than expected, we want to point you to some resources now rather than later. Our favorite is the lively newsgroup network specifically designed for helping new and old users with technical issues. First, visit PostgreSQL Help Newsgroups. If you are new to PostgreSQL, the best newsgroup to start with is PGSQL-General Newsgroup. Finally, if you run into what appears to be a bug in PostgreSQL, report it at PostgreSQL Bug Reporting. 8 | Chapter 1: The Basics
CHAPTER 2 Database Administration This chapter will cover what we feel are the most common activities for basic admin- istration of a PostgreSQL server; namely: role management, database creation, add-on installation, backup, and restore. We’ll assume you’ve already installed PostgreSQL and have one of the administration tools at your disposal. Configuration Files Three main configuration files control basic operations of a PostgreSQL server instance. These files are all located in the default PostgreSQL data folder. You can edit them using your text editor of choice, or using the admin pack that comes with pgAdmin (“Editing postgresql.conf and pg_hba.conf from pgAdmin” on page 47). • postgresql.conf controls general settings, such as how much memory to allocate, default storage location for new databases, which IPs PostgreSQL listens on, where logs are stored, and so forth. • pg_hba.conf controls security. It manages access to the server, dictating which users can login into which databases, which IPs or groups of IPs are permitted to connect and the authentication scheme expected. • pg_ident.conf is the mapping file that maps an authenticated OS login to a Post- greSQL user. This file is used less often, but allows you to map a server account to a PostgreSQL account. For example, people sometimes map the OS root account to the postgre’s super user account. Each authentication line in pg_hba.conf can use a different pg_ident.conf file. If you are ever unsure where these files are located, run the Example 2-1 query as a super user while connected to any of your databases. 9
Example 2-1. Location of configuration files SELECT name, setting FROM pg_settings WHERE category = 'File Locations'; name | setting + config_file |E:/PGData91/postgresql.conf data_directory |E:/PGData91 external_pid_file | hba_file |E:/PGData91/pg_hba.conf ident_file |E:/PGData91/pg_ident.conf The postgresql.conf File postgresql.conf controls the core settings of the PostgreSQL server instance as well as default settings for new databases. Many settings—such as sorting memory—can be overriden at the database, user, session, and even function levels for PostgreSQL ver- sions higher than 8.3. Details on how to tune this can be found at Tuning Your PostgreSQL Server. An easy way to check the current settings you have is to query the pg_settings view, as we demonstrate in Example 2-2. Details of the various columns of information and what they mean are described in pg_settings. Example 2-2. Key Settings SELECT name, context , unit , setting , boot_val , reset_val FROM pg_settings WHERE name in('listen_addresses','max_connections','shared_buffers','effective_cache_size', 'work_mem', 'maintenance_work_mem') ORDER BY context,name; name | context | unit | setting | boot_val | reset_val + + + + + listen_addresses | postmaster | | * | localhost | * max_connections | postmaster | | 100 | 100 | 100 shared_buffers | postmaster | 8kB | 4096 | 1024 | 4096 effective_cache_size | user | 8kB | 16384 | 16384 | 16384 maintenance_work_mem | user | kB | 16384 | 16384 | 16384 work_mem | user | kB | 1024 | 1024 | 1024 If context is set to postmaster, it means changing this parameter requires a restart of the postgresql service. If context is set to user, changes require at least a reload. Furthermore, these settings can be overridden at the database, user, session, or function levels. unit tells you the unit of measurement that the setting is reported in. This is very important for memory settings since, as you can see, some are reported 10 | Chapter 2: Database Administration
in 8 kB and some in kB. In postgresql.conf, usually you explicitly set these to a unit of measurement you want to record in, such as 128 MB. You can also get a more human-readable display of a setting by running the statement: SHOW effective_cache_size;, which gives you 128 MB, or SHOW mainte nance_work_mem;, which gives you 16 MB for this particular case. If you want to see everything in friendly units, use SHOW ALL. setting is the currently running setting in effect; boot_val is the default set- ting; reset_val is the new value if you were to restart or reload. You want to make sure that after any change you make to postgresql.conf the setting and reset_val are the same. If they are not, it means you still need to do a reload. We point out the following parameters as ones you should pay attention to in postgresql.conf. Changing their values requires a service restart: • listen_addresses tells PostgreSQL which IPs to listen on. This usually defaults to localhost, but many people change it to *, meaning all available IPs. • port defaults to 5432. Again, this is often set in a different file in some distributions, which overrides this setting. For instance, if you are on a Red Hat or CentOS, you can override the setting by setting a PGPORT value in /etc/sysconfig/pgsql/your_ser vice_name_here. • max_connections is the maximum number of concurrent connections allowed. • shared_buffers defines the amount of memory you have shared across all connec- tions to store recently accessed pages. This setting has the most effect on query performance. You want this to be fairly high, probably at least 25% of your on- board memory. The following three settings are important, too, and take effect without requiring a restart, but require at least a reload, as described in “Reload the Configuration Files” on page 14. • effective_cache_size is an estimate of how much memory you expect to be avail- able in the OS and PostgreSQL buffer caches. It has no affect on actual allocation, but is used only by the PostgreSQL query planner to figure out whether plans under consideration would fit in RAM or not. If it’s set too low, indexes may be underu- tilized. If you have a dedicated PostgreSQL server, then setting this to half or more of your on-board memory would be a good start. • work_mem controls the maximum amount of memory allocated for each operation such as sorting, hash join, and others. The optimal setting really depends on the kind of work you do, how much memory you have, and if your server is a dedicated database server. If you have many users connecting, but fairly simple queries, you want this to be relatively low. If you do lots of intensive processing, like building a data warehouse, but few users, you want this to be high. How high you set this also depends on how much motherboard memory you have. A good article to read Configuration Files | 11
on the pros and cons of setting work_mem is Understanding postgresql.conf work_mem. • maintenance_work_mem is the total memory allocated for housekeeping activities like vacuuming (getting rid of dead records). This shouldn’t be set higher than about 1 GB. The above settings can also be set at the database, function, or user level. For example, you might want to set work_mem higher for a power user who runs sophisticated queries. Similarly, if you have a sort-intensive function, you could raise the work_mem just for it. I edited my postgresql.conf and now my server is broken. The easiest way to figure out what you did wrong is to look at the log file, which is located in the root of the data folder, or in the subfolder pg_log. Open up the latest file and read what the last line says. The error notice is usually self-explanatory. A common culprit is that you set the shared_buffers too high. Another common cause of failures is that there is an old postmaster.pid hanging around from a failed shutdown. You can safely delete this file which is located in the data cluster folder and try to restart again. The pg_hba.conf File The pg_hba.conf controls which and how users can connect to PostgreSQL databases. Changes to the pg_hba.conf require a reload or a server restart to take effect. A typical pg_hba.conf looks like this: # TYPE DATABASE USER ADDRESS METHOD # IPv4 local connections: host all all 127.0.0.1/32 ident # IPv6 local connections: host all all ::1/128 trust host all all 192.168.54.0/24 md5 hostssl all all 0.0.0.0/0 md5 # Allow replication connections from localhost, by a user with the # replication privilege. #host replication postgres 127.0.0.1/32 trust #host replication postgres ::1/128 trust Authentication method. ident, trust, md5, password are the most common and al- ways available. Others such as gss, radius, ldap, and pam, may not always be in- stalled. IPv4 syntax for defining network range. The first part in this case 192.168.54.0 is the network address. The /24 is the bit mask. In this example, we are allowing anyone in our subnet of 192.168.54.0 to connect as long as they provide a valid md5 en- crypted password. IPv6 syntax for defining localhost. This only applies to servers with IPv6 support and may cause the configuration file to not load if you have it and don’t have IPv6. 12 | Chapter 2: Database Administration
For example, on a Windows XP or Windows 2003 machine, you shouldn’t have this line. Users must connect through SSL. In our example, we allow anyone to connect to our server as long as they connect using SSL and have a valid md5-encrypted pass- word. Defines a range of IPs allowed to replicate with this server. This is new in PostgreSQL 9.0+. In this example, we have the line remarked out. For each connection request, postgres service checks the pg_hba.conf file in order from the top down. Once a rule granting access is encountered, processing stops and the connection is allowed. Should the end of the file be reached without any matching rules, the connection is denied. A common mistake people make is to not put the rules in order. For example, if you put 0.0.0.0/0 reject before you put 127.0.0.1/32 trust, local users won’t be able to connect, even though you have a rule allowing them to do so. I edited my pg_hba.conf and now my database server is broken. This occurs quite frequently, but it’s easily recoverable. This error is generally caused by typos, or by adding an unavailable authentication scheme. When the postgres service can’t parse the pg_hba.conf file, it’ll block all access or won’t even start up. The easiest way to figure out what you did wrong is to read the log file. This is located in the root of the data folder or in the sub folder pg_log. Open up the latest file and read the last line. The error message is usually self-explanatory. If you’re prone to slippery fingers, consider backing up the file prior to editing. Authentication Methods PostgreSQL has many methods for authenticating users, probably more than any other database. Most people stick with the four main ones: trust, ident, md5, and password. There is also a fifth one: reject. which performs an immediate deny. Authentication methods stipulated in pg_hba.conf serve as gatekeepers to the entire server. Users or devices must still satisfy individual role and database access restrictions after connect- ing. We list the most commonly used authentication methods below. For more information on the various authentication methods, refer to PostgreSQL Client Authentication. • trust is the least secure of the authentication schemes and means you allow people to state who they are and don’t care about the passwords, if any, presented. As long as they meet the IP, user, and database criteria, they can connect. You really should use this only for local connections or private network connections. Even then it’s possible to have IPs spoofed, so the more security-minded among us dis- courage its use entirely. Nevertheless, it’s the most common for PostgreSQL in- stalled on a desktop for single user local access where security is not as much of a concern. Configuration Files | 13
• md5 is the most common and means an md5-encrypted password is required. • password means clear text password authentication. • ident uses the pg_ident.conf to see if the OS account of the user trying to connect has a mapping to a PostgreSQL account. Password is not checked. You can have multiple authentication methods, even for the same database; just keep in mind the top to bottom checking of pg_hba.conf. Reload the Configuration Files Many, but not all changes, to configuration files require restarting the postgres service. Many changes take effect by performing a reload of the configuration. Reloading doesn’t affect active connections. Open up a command line and follow these steps to reload: pg_ctl reload -D your_data_directory_here If you have PostgreSQL installed as a service in Redhat EL or CentOS, you can do: service postgresql-9.1 reload where postgresql-9.1 is the name of your service. You can also log in as a super user on any database and run this SQL statement: SELECT pg_reload_conf(); You can also do this from pgAdmin, refer to “Editing postgresql.conf and pg_hba.conf from pgAdmin” on page 47. Setting Up Groups and Login Roles (Users) In PostgreSQL, there is really only one kind of an account and that is a role. Some roles wnload from Wow! eBook o can log in; when they have login rights, they are called users. Roles can be members of D other roles, and when we have this kind of relationship, the containing roles are called groups. It wasn’t always this way, though: Pre-8.0 users and groups were distinct en- tities, but the model got changed to be role-centric to better conform to the ANSI-SQL specs. For backward compatibility, there is still a CREATE USER and CREATE GROUP. For the rest of this discussion, we’ll be using the more generic CREATE ROLE , which is used to create both users and groups. If you look at fairly ANSI-SQL standard databases such as Oracle and later versions of SQL Server, you’ll notice they also have a CREATE ROLE statement, which works similarly as the PostgreSQL one. 14 | Chapter 2: Database Administration
Creating an Account That Can Log In postgres is an account that is created when you first initialize the PostgreSQL data cluster. It has a companion database called postgres. Before you do anything else, you should login as this user via psql or pgAdmin and create other users. pgAdmin has a graphical section for creating user roles, but if you were to do it using standard SQL data control language (DCL), you would execute an SQL command as shown in Example 2-3. Example 2-3. User with login rights that can create database objects CREATE ROLE leo LOGIN PASSWORD 'lion!king' CREATEDB VALID UNTIL 'infinity'; The 'infinity' is optional and assumed if not specified. You could instead put in a valid date at which you want the account to expire. If you wanted to create a user with super rights, meaning they can cause major de- struction to your database cluster and can create what we call untrusted language functions, you would create such a user as shown in Example 2-4. You can only create a super user if you are a super user yourself. Example 2-4. User with login rights that can create database objects CREATE ROLE regina LOGIN PASSWORD 'queen!penultimate' SUPERUSER VALID UNTIL '2020-10-20 23:00'; As you can see, we don’t really want our queen to reign forever, so we put in a timestamp when her account will expire. Creating Group Roles Group roles are generally roles that have no login rights but have other roles as mem- bers. This is merely a convention. There is nothing stopping you from creating a role that can both login and can contain other roles. We can create a group role with this SQL DCL statement: CREATE ROLE jungle INHERIT; And add a user or other group role to the group with this statement: GRANT jungle TO leo; Roles Inheriting Rights One quirky thing about PostgreSQL is the ability to define a role that doesn’t allow its member roles to inherit its rights. The concept comes into play when you define a role to have member roles. You can designate that members of this role don’t inherit rights of the role itself. This is a feature that causes much confusion and frustration when Setting Up Groups and Login Roles (Users) | 15
setting up groups, as people often forget to make sure that the group role is marked to allow its permissions as inheritable. Non-Inheritable rights Some permissions can’t be inherited. For example, while you can create a group role that you mark as super user, this doesn’t make its member roles super users; however, those users can impersonate their parent role, thus gaining super power rights for a brief period. Databases and Management The simplest create database statement to write is: CREATE DATABASE mydb; The owner of the database will be the logged in user and is a copy of template1 database. Creating and Using a Template Database A template database is, as the name suggests, a database that serves as a template for other databases. In actuality, you can use any database as template for another, but PostgreSQL allows you to specifically flag certain databases as templates. The main difference is that a database marked as template can’t be deleted and can be used by any user having CREATEDB rights (not just superuser) as a template for their new database. More details about template databases are described in the PostgreSQL manual Man- aging Template Databases. The template1 database that is used as the default when no template is specified, doesn’t allow you to change encodings. As such, if you want to create a database with an encoding and collation different from your default, or you installed extensions in tem plate1 you don’t want in this database, you may want to use template0 instead. CREATE DATABASE mydb TEMPLATE template0; If we wanted to make our new database a template, we would run this SQL statement as a super user: UPDATE pg_database SET datistemplate=true WHERE datname='mydb'; This would allow other users with CREATEDB rights to use this as a template. It will also prevent the database from being deleted. Organizing Your Database Using Schemas Schemas are a logical way of partitioning your database into mini-containers. You can divide schemas by functionality, by users, or by any other attribute you like. Aside from logical partitioning, they provide an easy way for doling out rights. One common prac- 16 | Chapter 2: Database Administration
tice is to install all contribs and extensions, covered in “Extensions and Con- tribs” on page 18 into a separate schema and give rights to use for all users of a database. To create a schema called contrib in a database, we connect to the database and run this SQL: CREATE SCHEMA contrib; The default search_path defined in postgresql.conf is "$user",public. This means that if there is a schema with the same name as the logged in user, then all non-schema qualified objects will first check the schema with the same name as user and then the public schema. You can override this behavior at the user level or the database level. For example, if we wanted all objects in contrib to be accessible without schema qual- ification, we would change our database as follows: ALTER DATABASE mydb SET search_path="$user",public,contrib; Schemas are also used for simple abstraction. A table name only needs to be unique within the schema, so many applications exploit this by creating same named tables in different schemas and, depending on who is logging in, they will get their own version based on which is their primary schema. Permissions Permissions are one of the trickiest things to get right in PostgreSQL. This is one feature that we find more difficult to work with than other databases. Permission management became a lot easier with the advent of PostgreSQL 9.0+. PostgreSQL 9.0 introduced default permissions, which allowed for setting permissions on all objects of a particular schema or database as well as permissions on specific types of objects. More details on permissions management are detailed in the manual, in sections ALTER DEFAULT PRIVILEGES and GRANT. Getting back to our contrib schema. Let’s suppose we want all users of our database to have EXECUTE and SELECT access to any tables and functions we will create in the contrib schema. We can define permissions as shown in Example 2-5: Example 2-5. Defining default permissions on a schema GRANT USAGE ON SCHEMA contrib TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT SELECT, REFERENCES, TRIGGER ON TABLES TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT SELECT, UPDATE ON SEQUENCES TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT EXECUTE ON FUNCTIONS TO public; Databases and Management | 17
ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT USAGE ON TYPES TO public; If you already have your schema set with all the tables and functions, you can retroac- tively set permissions on each object separately or do this for all existing tables, func- tions, and sequences with a GRANT ALL IN SCHEMA. Example 2-6. Set permissions on existing objects of a type in a schema GRANT USAGE ON SCHEMA contrib TO public; GRANT SELECT, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA contrib TO public; GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA contrib TO public; GRANT SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA contrib TO public; If you find this all overwhelming for setting permissions, just use pgAd- min for permission management. pgAdmin provides a great interface for setting default permissions, as well as retroactively granting bulk permissions of selective objects. We’ll cover this feature in “Creating Databases and Setting Permissions” on page 47. Extensions and Contribs Extensions and contribs are add-ons that you can install in a PostgreSQL database to extend functionality beyond the base offerings. They exemplify the best feature of open source software: people collaborating, building, and freely sharing new features. Prior to PostgreSQL 9.1, the add-ons were called contribs. Since PostgreSQL 9.1+, add-ons are easily installed using the new PostgreSQL extension model. In those cases, the term extension has come to replace the term contrib. For the sake of consistency, we’ll be referring to all of them by the newer name of extension, even if they can’t be installed using the newer extension model. The first thing to know about extensions is that they are installed separately in each database. You can have one database with the fuzzy text support extension and another that doesn’t. If you want all your databases to have a certain set of extensions installed in a specific schema, you can set up a template database as discussed in “Creating and Using a Template Database” on page 16 with all these installed, and then create all your databases using that template. To see which extensions you have already installed, run the query in Example 2-7: Example 2-7. List extensions installed SELECT * FROM pg_available_extensions 18 | Chapter 2: Database Administration
WHERE comment LIKE '%string%' OR installed_version IS NOT NULL ORDER BY name; name | default_version | installed_version | comment + + + citext | 1.0 | | data type for case-insen fuzzystrmatch | 1.0 | 1.0 | determine simil and dist hstore | 1.0 | 1.0 | data type for (key, value) pg_trgm | 1.0 | 1.0 | text similarity measur index sear plpgsql | 1.0 | 1.0 | PL/pgSQL procedural language postgis | 2.0.0 | 2.0.0 | geometry, geography, raster temporal | 0.7.1 | 0.7.1 | temporal data type To get details about a particular installed extension, enter the following command from psql: \dx+ fuzzystrmatch Or run this query: SELECT pg_catalog.pg_describe_object(d.classid, d.objid, 0) AS description FROM pg_catalog.pg_depend AS D INNER JOIN pg_extension AS E ON D.refobjid = E.oid WHERE D.refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass AND deptype = 'e' AND E.extname = 'fuzzystrmatch'; Which outputs what is packaged in the extension: description function dmetaphone_alt(text) function dmetaphone(text) function difference(text,text) function text_soundex(text) function soundex(text) function metaphone(text,integer) function levenshtein_less_equal(text,text,integer,integer,integer,integer) function levenshtein_less_equal(text,text,integer) function levenshtein(text,text,integer,integer,integer) function levenshtein(text,text) Installing Extensions Regardless of how you install an extension in your database, you’ll need to have gath- ered all the dependent libraries in your PostgreSQL bin and lib, or have them accessible via your system path. For small extensions, most of these libraries already come pre- packaged with your PostgreSQL install so you don’t have to worry. For others, you’ll either need to compile your own, get them with a separate install, or copy the files from another equivalent setup. Extensions and Contribs | 19
The Old Way Prior to PostgreSQL 9.1, the only way to install an extension was to manually run the requisite SQL scripts in your database. Many extensions still can only be installed this way. By convention, add-ons scripts are automatically dumped into the contrib folder of your PostgreSQL if you use an installer. Where you’d find this folder will depend on your particular OS and distro. As an example, on a CentOS running 9.0, to install the pgAd- min pack, one would run the following from the command line: psql -p 5432 -d postgres -f /usr/pgsql-9.0/share/contrib/adminpack.sql The New Way With PostgreSQL 9.1 and above, you can use the CREATE EXTENSION command. The two big benefits are that you don’t have to figure out where the extension files are kept (they are kept in a folder share/extension), and you can uninstall just as easily with DROP EXTENSION. Most of the common extensions are packaged with PostgreSQL already, so you really don’t need to do more than run the command. To retrieve extensions not packaged with PostgreSQL, visit the PostgreSQL Extension Network. Once you have downloaded, compiled, and installed (install just copies the scripts and .control to share/extension, and the respective binaries to bin and lib) the new extension, run CREATE EXTENSION extension_name to install in specific database. Here is how we would install the fuzzystrmatch extension in PostgreSQL 9.1+: the new way no longer requires psql since CREATE EXTENSION is part of the PostgreSQL’s SQL language. Just connect to the database you want to install the extension and run the SQL command: CREATE EXTENSION fuzzystrmatch; If you wanted all your extensions installed in a schema called my_extensions, you would first create the schema, and install the extensions: CREATE EXTENSION fuzzystrmatch SCHEMA my_extensions; Upgrading from Old to New If you’ve been using a version of PostgreSQL before 9.1 and restored your old database into a 9.1 during a version upgrade, all add-ons should continue to work untouched. For maintainability, you’ll probably want to upgrade your old extensions in the con- trib folder to use the new extensions approach. Many extensions, especially the ones that come packaged with PostgreSQL, have ability to upgrade pre-extension installs. Let’s suppose you had installed the tablefunc extension (which provides cross tabula- tion functions) to your PostgreSQL 9.0 in a schema called contrib, and you’ve just restored your database to a PostgreSQL 9.1 server. Run the following command to upgrade the extension: CREATE EXTENSION tablefunc SCHEMA contrib FROM unpackaged; 20 | Chapter 2: Database Administration
You’ll notice that the old functions are still in the contrib schema, but moving forward they will no longer be backed up and your backups will just have a CREATE EXTENSION clause. Common Extensions Many extensions come packaged with PostgreSQL, but are not installed by default. Some past extensions have gained enough traction to become part of the PostgreSQL core, so if you’re upgrading from an ancient version, you may not even have to worry about extensions. Old Extensions Absorbed into PostgreSQL Prior to PostgreSQL 8.3, the following extensions weren’t part of core: • PL/PgSQL wasn’t always installed by default in every database. In old versions, you had to run CREATE LANGUAGE plpgsql; in your database. From around 8.3 on, it’s installed by default, but you retain the option of uninstalling it. • tsearch is a suite for supporting full-text searches by adding indexes, operators, custom dictionaries, and functions. It became part of PostgreSQL core in 8.3. You don’t have the option to uninstall it. If you’re still relying on old behavior, you can install the tsearch2 extension, which retained old functions that are no longer available in the newer version. A better approach would be just to update where you’re using the functions because compatiblity with the old tsearch could end at any time. • xml is an extension that adds in support of XML data type and related functions and operators. As of version 8.3, XML became an integral part of PostgreSQL, in part to meet the ANSI-SQL XML standard. The old extension, now dubbed xml2, can still be installed and contains functions that didn’t make it into the core. In particular, you need this extension if you relied on the xlst_process() function for processing XSL templates. There are also a couple of old XPath functions not found in the core. Popular Extensions In this section, we’ll list and quickly describe the most popular, and some may say, must-have extensions, that aren’t part of current core. • postgis elevates PostgreSQL to a state-of-the-art spatial database outrivaling all commercial options. If you deal with standard OGC GIS data, demographic sta- tistics data, or geocoding, you don’t want to be without this one. You can learn more about PostGIS in our book, PostGIS in Action. Part of the book’s proceeds will help fund the PostGIS project itself. PostGIS is a whopper of an extension, weighing in at over 800 functions, types, and spatial indexes. Extensions and Contribs | 21
• fuzzystrmatch is a lightweight extension with functions like soundex, levenshtein, and metaphone for fuzzy string matching. We discuss its use in Where is Soundex and Other Warm and Fuzzy Things. • hstore is an extension that adds key-value pair storage and index support well- suited for storing pseudo-normalized data. If you are looking for a comfortable medium between relational and NoSQL, check out hstore. • pg_trgm (trigram) is an extension that is another fuzzy string search library. It is often used in conjunction with fuzzystrmatch. In PostgreSQL 9.1, it takes on an- other special role in that it makes ILIKE searches indexable by creating a trigram index. Trigram can also index wild-card searches of the form LIKE '%something %'. Refer to Teaching ILIKE and LIKE New Tricks for further discussion. • dblink is a module that allows you to query other PostgreSQL databases. This is currently the only supported mechanism of cross-database interaction for Post- greSQL. In PostgreSQL 9.3, foreign data wrapper for PostgreSQL is expected to hit the scene. • pgcrypto provides various encryption tools including the popular PGP. We have a quick primer on using it available here: Encrypting Data with pgcrypto. As of 9.1, less used procedural languages (PLs), index types, and foreign data wrappers (FDW) are also packaged as extensions. Backup PostgreSQL comes with two utilities for backup—pg_dump and pg_dumpall. You’ll find both in the bin folder. You use pg_dump to backup specific databases, and pg_dumpall to backup all databases and server globals. pg_dumpall needs to run under a postgres super user account so it has access to backup all databases. You will notice that most of the commands for these tools will have both long names as well as equiv- alent short switches. You can use them interchangeably, even in the same command. We’ll be covering just the basics here, but for a more in-depth discussion, refer to the PostgreSQL Backup and Restore section of the official manual. We often specify the port and host in these commands because we often run them via scheduled jobs not on the same machine; or we have several instances of PostgreSQL running on the same box, each running on a different port. Sometimes specifying the -h or host switch, for ex- ample, may cause problems if your service is set to only listen on local. You can safely leave it out if you are running from the server. You may also want to employ the use of ~pgpass since none of these command lines give you the option of specifying a password. 22 | Chapter 2: Database Administration
Selective Backup Using pg_dump For day-to-day backup, pg_dump is generally more expeditious than pg_dumpall be- cause it can selectively backup tables, schemas, databases. pg_dump backs up to plain SQL, but also compressed and TAR formats. Compressed and TAR backups can take advantage of the parallel restore feature introduced in 8.4. Refer to “Database Backup: pg_dump” on page 138 for a listing of pg_dump command options. In this example, we’ll show a few common backup scenarios and corresponding pg_dump switches. These examples should work for any version of PostgreSQL. Example 2-8. pg_dump usage Creates a compressed, single database backup: pg_dump -h localhost -p 5432 -U someuser -F c -b -v -f mydb.backup mydb Creates a plain-text single database backup, including Creates database: pg_dump -h localhost -p 5432 -U someuser -C -F p -b -v -f mydb.backup mydb Creates a compressed backup of tables with a name that starts with payments in any schema: pg_dump -h localhost -p 5432 -U someuser -F c -b -v -t *.payments* -f payment_tables.backup mydb Creates a compressed backup of all objects in hr and payroll schemas: pg_dump -h localhost -p 5432 -U someuser -F c -b -v -n hr -n payroll -f hr_payroll_schemas.backup mydb Creates a compressed backup of all objects in all schemas, excluding public schemas: pg_dump -h localhost -p 5432 -U someuser -F c -b -v -N public -f all_schema_except_public.backup mydb Creates a plain-text SQL backup of select tables, useful for porting to lower versions of PostgreSQL or other database systems: pg_dump -h localhost -p 5432 -U someuser -F p column-inserts -f select_tables.backup mydb If you have spaces in your file paths, you’ll want to wrap the file path in double quotes: "/path with spaces/mydb.backup". As a general rule, you can always use double quotes if you aren’t sure. The Directory format option was introduced in PostgreSQL 9.1. This option backs up each table as a separate file in a folder and gets around the problem where your file system has limitations on the size of each file. It is the only pg_dump backup format option that generates multiple files. An example of this is shown in Example 2-8. The directory backup first creates the directory to put the files in and errors out if the di- rectory already exists. Backup | 23
Example 2-9. Directory format backup The a_directory is created and in the folder, a separate gzipped file for each table and a file that has all the structures listed. pg_dump -h localhost -p 5432 -U someuser -F d -f /somepath/a_directory mydb Systemwide Backup Using pg_dumpall The pg_dumpall utility is what you would use to backup all databases into a single plain-text file, along with server globals such as tablespace definitions and users. Refer to “Server Backup: pg_dumpall” on page 140 for listing of available pg_dumpall com- mand options. It’s a good idea to backup globals such as roles and tablespace definitions on a daily basis. Although you can use pg_dumpall to backup databases as well, we generally don’t bother or do it—at most, once a month—since it would take much longer to restore the plain text backup for large databases. To backup roles and tablespaces: pg_dumpall -h localhost -U postgres port=5432 -f myglobals.sql globals-only If you only care about backing up roles and not tables spaces, you would use the roles only option: pg_dumpall -h localhost -U postgres port=5432 -f myroles.sql roles-only Restore There are two ways of restoring in PostgreSQL: • Using psql to restore plain text backups generated with pg_dumpall or pg_dump • Using pg_restore utility for restoring compressed, tar and directory backups created with pg_dump Terminating Connections Before you can perform a full drop and restore of a database or restore a particular table that’s in use, you’ll need to kill connections. Every once in a while, someone else (never you) will execute a query that he or she didn’t mean to and end up wasting resources. You could also run into a query that’s taking much longer than what you have the patience for. Should these things happen, you’ll either want to cancel the query on the connection or kill the connection entirely. To cancel running queries or to terminate connections, you elicit three administrative functions. • pg_stat_activity (SELECT * FROM pg_stat_activity;) is a view that will list cur- rently active connections and the process id. Additionally, it’ll provide details of the active query running on each connection, the connected user (usename), the 24 | Chapter 2: Database Administration
database (datname) in use, and start times of query currently running. You need this view to obtain the proc ids of connections that you wish to terminate. • pg_cancel_backend(procid) (SELECT pg_cancel_backend(procid);) will cancel all ac- tive queries on a connection, but doesn’t terminate the connection. • pg_terminate_backend(procid) (SELECT pg_terminate_backend(procid);) will kill a specific connection. All running queries will automatically cancel. This will be your weapon of choice prior to a restore to prevent an eager user from immediately restarting a cancelled query. PostgreSQL, unlike some other databases, lets you embed functions that perform ac- tions within a regular SELECT query. This means that though pg_terminate_backend() and pg_cancel_backend() can only act on one connection at a time, you can effectuate multiple connections by wrapping them in a SELECT. For example, let’s suppose a user (Regina) was hogging up resources and had 100 connections going. We can kill all her connections by running this command: Before 9.2: SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE usename = 'regina'; 9.2 and after: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'regina'; pg_stat_activity has changed considerably in PostgreSQL 9.2 with renaming and ad- dition of new columns. For example, procpid is now pid. More details about the changes and enhancements are detailed in PostgreSQL 9.2 Monitoring Enhancements. Using psql to Restore Plain Text SQL backups A plain SQL backup is nothing more than a text file of a huge SQL script. It’s the least convenient of backups to have, but it’s portable across different database systems. With wnload from Wow! eBook o SQL backup, you must execute the entire script, there’s no partial restore, unless you’re D willing to manually edit the file. Since there are no options, the backups are simple to restore by using the -f psql switch as shown in Example 2-10. However, they are useful if you need to load data to another DBMS with some editing. Example 2-10. Restores plain text SQL backups Restores a full backup and ignore errors: psql -U postgres -f myglobals.sql Restores and stops on first error: psql -U postgres set ON_ERROR_STOP=on -f myglobals.sql Restores a partial backup to a specific database: psql -U postgres -d mydb -f select_objects.sql Restore | 25
Using pg_restore If you backed up using pg_dump, you can use the versatile pg_restore utility for the restore. pg_restore provides you with a dizzying array of options for restoration and far surpasses any restoration utility found in other database systems. Here are some of its outstanding features: • As of 8.4, you can do parallel restores using the -j switch to control the number of threads to use. This allows each thread to be restoring a separate table simulta- neously, which significantly speeds up restores. • You can generate a plain text table of contents from your backup file to confirm what has been backed up. You have the ability to edit this table of contents and use the revision to control which database objects will be restored. • Just as pg_dump allows you to do selective backups of objects to save time, pg_re- store allows you to do selective restores even from a backup that contains a full database. • For the most part, pg_restore and pg_dump are backward-compatible. You can backup a database on an older version and restore using a newer version. Refer to “Database Backup: pg_restore” on page 141 for a listing of pg_restore com- mand options. A basic restore command of a compressed or TAR backup would be to first create the database in SQL: CREATE DATABASE mydb; and then restore: pg_restore dbname=mydb jobs=4 verbose mydb.backup If the database is the same as the one you backed up, you can create the database in ones step with the following: pg_restore dbname=postgres create jobs=4 verbose mydb.backup If you use the create switch, the dbname switch needs to be different from the database being created, since you can’t really run anything within the context of a database that has yet to be created. The downside of using create is that the database name is always the name of the one you backed up and you can’t change it during the restore. If you are running 9.2, you can take advantage of the section switch to restore just the table structure without the actual data. This is useful if you want to use an existing database as a template for a new one. To do so, we would first create the target database using psql or pgAdmin: CREATE DATABASE mydb2; 26 | Chapter 2: Database Administration
and then use pg_restore: pg_restore dbname=mydb2 section=pre-data jobs=4 mydb.backup Managing Disk Space with Tablespaces PostgreSQL uses tablespaces to ascribe logical names to physical locations on disk. Initializing a PostgreSQL cluster automatically begets two tablespaces: pg_default, which stores for all user data and pg_global, which stores all system data. These are located in the same folder as your default data cluster. You’re free to create tablespaces at will and house them on any server disks. You can explicitly assign default tablespaces for new objects by database. You can also move existing database objects to new ones. Creating Tablespaces To create a tablespace, you just need to denote a logical name and a physical folder. The postgres service account needs to have full access to this folder. If you are on a Windows server, use the following command (note the use of Unix-style slashes): CREATE TABLESPACE secondary LOCATION 'C:/pgdata91_secondary'; For Unix-based systems, you first have to create the folder or define an fstab location then use this command: CREATE TABLESPACE secondary LOCATION '/usr/data/pgdata91_secondary'; Moving Objects Between Tablespaces You can shuffle database objects among different tablespaces. To move all objects in the database to our secondary tablespace: ALTER DATABASE mydb SET TABLESPACE secondary; To move just a table: ALTER TABLE mytable SET TABLESPACE secondary; Moving a table to another tablespace locks it for the duration of the move. Verboten We have seen so many ways that people manage to break their PostgreSQL server that we thought it best to end this chapter itemizing the most common mistakes that people make. For starters, if you don’t know what you did wrong the log file could provide clues. Look for the pg_log folder in your PostgreSQL data folder or the root of the Verboten | 27
PostgreSQL data folder for the log files. It’s also quite possible that your server shut- down before a log entry could be written in which case the log won’t help you. Should your server fail to restart, try the command line by using: path/to/your/bin/pg_ctl -D your_postgresql_data_folder Delete PostgreSQL Core System Files and Binaries When people run out of disk space, the first thing they do is panic and start deleting files from the PostgreSQL data cluster folder because it’s so big. Part of the reason why this mistake happens so frequently is that some folders such as pg_log, pg_xlog, and pg_clog sound like logging folders that you expect to build up and be safe to delete. There are some files you can safely delete, and some that will destroy your data if you do. The pg_log folder often found in your data folder is a folder that tends to build up, especially if you have logging enabled. Files in this folder can always be safely deleted without issues. In fact, many people just schedule jobs to delete them. Files in the other folders except for pg_xlog should never be deleted, even if they sound like logs. In particular, don’t even think of touching pg_clog, the active commit log, without getting into trouble. pg_xlog stores transaction logs. Some systems we’ve seen are configured to move pro- cessed transaction logs in a subfolder called archive. You’ll often have an archive folder somewhere (not necessarily as a subfolder of pg_xlog) if you are running synchoronous replication, continuous archiving, or just keeping around logs if you need to revert to a different point in time. Deleting files in the root of pg_xlog will destroy data, however, deleting files in the archived folder will just prevent you from performing point-in-time recovery, or if a slave server hasn’t played back the logs, prevent them from fetching them. If you aren’t concerned about any of these scenarios, then it’s safe to delete or move files in the archive folder. Be weary of overzealous anti-virus programs, especially on Windows. We’ve seen cases where AV software removed important binaries in the PostgreSQL bin folder. Should PostgreSQL fail to start on a Windows system, the event viewer is the first place to look for clues as to why. Giving Full Administrative Rights to the Postgres System (Daemon) Account Many people are under the misconception that the postgres account needs to have full administrative rights to the server. In fact, depending on your PostgreSQL version, if you give the postgres account full administrative rights to the server, your database server may not even start. The postgres system account should always be created as a regular system user in the OS with just rights to the data cluster and additional tablespace folders. Most installers will set up the correct permissions for postgres. Don’t try to any favors by giving post- 28 | Chapter 2: Database Administration
gres more rights than it needs. Granting unnecessary rights leaves your system vulner- able should you fall under an SQL injection attack. There are cases where you’ll need to give the postgres account write/delete/read rights to folders or executables outside of the data cluster. With scheduled jobs that execute batch files, this need often arises. We advise you to practice restraint and only grant the minimum rights necessary to get the job done. Setting shared_buffers Too High Loading up your server with RAM doesn’t mean you can set the shared_buffers as high as you’d like. Try it and your server may crash or refuse to start. If you are running PostgreSQL on 32-bit Windows, setting it higher than 512 MB often results in insta- bility. With PostgreSQL 64-bit windows, you can push the envelop a bit higher and even exceed 1 GB without any issues. On some Linux systems, the compiled SHMMAX variable is low and shared_buffers can’t be set higher. Details on how to remedy this issue are detailed in the manual, in the section Kernel Resources. Trying to Start PostgreSQL on a Port Already in Use If you do this, you’ll see errors in your pg_log files of the form. Make sure PostgreSQL is not already running. Here are the common reasons why this happens: • You’ve already started postgres service. • You are trying to run it on a port already in use by another service. • Your postgres service had a sudden shutdown and you have an orphan post- gresql.pid file in the data folder. Just delete the file and try to start again. • You have an orphaned PostgreSQL process. When all else fails, kill all running PostgreSQL processes and then start again. Verboten | 29
CHAPTER 3 psql psql is the de rigueur command-line utility packaged with PostgreSQL. Aside from its most common use of running queries, you can use psql as an automated scripting tool, as a tool for importing or exporting data, restoring, database administration, and even go so far as to use it as a minimalistic reporting tool. psql is easy to use. Like any other command-line tool, you just have to be familiar with the myriad of switches involved. If you only have access to a server’s command line with no GUI, psql is pretty much your only choice for querying and managing PostgreSQL. If you fall into this category, we suggest that you print out the dump of psql help from the “psql: Interactive and Scriptable” on page 142 and frame it right above your workstation. Just as the other command-line tools packaged with PostgreSQL, you can forgo ex- plicitly specifying, host, port, user by setting the environment variables PGHOST, PGPORT, PGUSER as described in Environment Variables and setting PGPASSWORD or using a pass- word file as described in The Password File. Should you omit the parameters without having set the environment variables, psql will use the standard defaults. For examples in this chapter, we’ll assume you are using default values or have these variables set. If you’re using pgAdmin as well, you can jump right to psql using the plugin interface, (see “Accessing psql from pgAdmin” on page 45). A console window will open with psql and already connected directly to the database in pgAdmin. Interactive psql To use psql, the first thing you’ll want to know is what you can do interactively. You can get help with psql \?. For a thorough list of available interactive commands, refer to “psql Interactive Commands” on page 142. While in psql, to get help on any SQL commands, type \h followed by the command as in the following example: \h CREATE TABLE Command: CREATE TABLE Description: define a new table 31
Syntax: CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [ { column_name data_type [ COLLATE collation ] [ column_constraint [ ] ] | table_constraint | LIKE source_table [ like_option ] } [, ] ] ) [ INHERITS ( parent_table [, ] ) ] [ WITH ( storage_parameter [= value] [, ] ) | WITH OIDS | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] [ TABLESPACE tablespace ] : Although you have many more interactive commands than non-interactive ones at your disposal, you can effectively use all interactive commands non-interactively by embed- ding them into scripts. We’ll go into more detail on how to do this in the later sections of this chapter. Non-Interactive psql Non-interactive commands means that you ask psql to execute a script file composed of a mix of SQL statements and psql commands. You can alternatively pass one or more SQL statements. These methods are especially applicable to automated tasks. Once you have batched your commands into a file, you can schedule the job to run at regular intervals using a job scheduling agent like pgAgent (covered in “Job Scheduling with pgAgent” on page 55), Unix crontab or Windows scheduler. For situations where you have many commands that must be run in sequence or repeatedly, you’re better off creating a script first and then running it using psql. There will be far fewer switches to worry about when running psql non-interactively since the details have been em- bedded in the script file. To execute a file simply use the -f switch as follows: psql -f some_script_file If you don’t have your commands saved to a file, you can type them in using a -c switch. An example follows: psql -d postgresql_book -c "DROP TABLE IF EXISTS dross; CREATE SCHEMA staging;" Notice how you can have multiple SQL statements as long as you separate them with a semicolon. For a more detailed listing of switches you can include, refer to “psql Non- Interactive Commands” on page 144. You can embed interactive commands inside script files. See Example 3-1 for an ex- ample of this. Example 3-1. Script with psql interactive commands Contents of build_stage.psql: \a \t 32 | Chapter 3: psql
\g create_script.sql SELECT 'CREATE TABLE staging.factfinder_import(geo_id varchar(255), geo_id2 varchar(255), geo_display varchar(255) , '|| array_to_string(array_agg('s' || lpad(i::text,2, '0') || ' varchar(255), s' || lpad(i::text,2, '0') || '_perc varchar(255) ' ), ',') || ');' FROM generate_series(1,51) As i; \o \i create_script.sql Since we want the output of our query to be saved as an executable statement, we need to remove the headers by using the \t switch and use the \a switch to get rid of the extra breaking elements that psql normally puts in. We then use the \g switch to force our query output to be redirected to a file. The use of lpad is so that each numbered column is left padded with 0s so we will have columns s01, s01_perc, s02, s02_perc, The lpad and similar functions are detailed in “String Functions” on page 63. We call the \o with no file arguments to stop redirection of query results to file. To do the actual execution of the CREATE TABLE statement we built, we use the \i followed by the generated script. The \i is the interactive version of the non-inter- active -f switch. To run Example 3-1, we would type: psql -f build_stage.psql -d postgresql_book Example 3-1 is an adaptation of an approach we described in How to Create an N- column Table. As noted in the article, you can perform this without an intermediary file by using the DO command introduced in PostgreSQL 9.0. The intermediary does have the benefit that you have an easy record of what was done. wnload from Wow! eBook Session Configurations o D If you do use psql as your workhorse, consider customizing your psql environment. psql can read configuration settings from a file called psqlrc. When psql is launched, it searches for this file and runs any commands in the file to initialize the environment. On Unix-based systems, the file is generally named .psqlrc and searched for in the home directory. On Windows, this file is called psqlrc.conf and searched for in the %APP- DATA%\postgresql folder, which usually resolves to C:\Users\your_login\AppData \Roaming\postgresql. Don’t worry if you can’t find the file; it usually doesn’t appear on its own and you need to manually create it. Any settings in the file will override psql defaults. More details about this file can be found in psql. You can find examples of psqlrc at psqlrc File for DBAs and Silencing Commands in .psqlrc. If you wish to start psql without checking psqlrc, use the -X switch. In PostgreSQL 9.2, psql understands two new OS environment variables: Session Configurations | 33
• PSQL_HISTORY allows you to control where psql names and places the history file instead of using the default ~/.psql_history. • PSQLRC allows you to control the location of the startup file. Setting this before launching psql, or as part of your system environment settings, will make psql use this location. The contents of a psqlrc file look as shown in Example 3-2. Pretty much any psql com- mand can be added to it for execution at startup. Example 3-2. Example .psqlrc or psqlrc.conf file \pset null 'NULL' \encoding latin1 \set PROMPT1 '%n@%M:%>%x %/# ' \set PROMPT2 '' \timing on \set qstats91 'SELECT usename, datname, substring(current_query, 1,100) || '' '' As query FROM pg_stat_activity WHERE current_query != '' '';' \set qstats92 'SELECT usename, datname, left(query,100) || '' '' As query FROM pg_stat_activity WHERE state != ''idle'' ;' \pset pager always Each set command should be on a single line. For example, the qstats91 statement and its value should be all on the same line. Some commands only work on Unix-based systems and not on Windows, so our psqlrc is fairly generic. When you launch psql now, you’ll see the execution result of your psqlrc as follows: Null display is "NULL". Timing is on. Pager is always used. psql (9.2beta1) Type "help" for help. postgres@localhost:5442 postgresql_book# We’ll cover some popular settings found in psqlrc files. You can still set them during your session if you don’t want them on or off by default. Changing Prompts If you do most of your work using psql and you connect to multiple databases and servers, chances are you’ll be jumping around between them using \connect. Custom- izing your prompt to show which server and database and connected user you’re on helps greatly. In our psqlrc file, we set our prompt to include who we are logged in as (%n), the host server (%M), the port %>, transaction status %x), and the database (%/). 34 | Chapter 3: psql
The cryptic short-hand symbols we used to define our PROMPT1 and PROMPT2 in Exam- ple 3-2 are documented in the psql Reference Guide. When we connect with psql to our database, our prompt looks like: postgres@localhost:5442 postgresql_book# If we change to another database say \connect postgis_book, our prompt changes to postgres@localhost:5442 postgis_book# Timing Details You may find it instructive to have psql output the time it took for each query to execute. Use the \timing command to toggle it on and off. When that is enabled, each query you run will include at the end, the amount of time taken, for example: \timing on SELECT COUNT(*) FROM pg_tables; will output the following: count 73 (1 row) Time: 18.650 ms AUTOCOMMIT By default, AUTOCOMMIT is on, meaning any SQL command you issue that changes data will immediately commit. Each command is its own transaction. If you are doing a large batch of precarious updates, you may want a safety net. Start by turning AUTOCOM- MIT off: \set AUTOCOMMIT off Once AUTOCOMMIT is off, you’ll have the option to rollback before you commit: UPDATE census.facts SET short_name = 'this is a mistake'; To roll this back: ROLLBACK; To commit: COMMIT; Session Configurations | 35
Shortcuts The \set command is also useful for defining user-defined shortcuts. You may want to store the shortcuts in your psqlrc file to have them available each time. For example, if you use EXPLAIN ANALYZE VERBOSE all the time and you’re tired of typing it all out, you can define a variable as follows: \set eav 'EXPLAIN ANALYZE VERBOSE' Now whenever you want to do an EXPLAIN ANALYZE VERBOSE of a query, you prefix it with :eav (colon resolves the variable): :eav SELECT COUNT(*) FROM pg_tables; You can even save commonly used queries as strings in your psqlrc startup script as we did for qstats91 and qstats92. So, if I am on a PostgreSQL 9.2 database, I can see current activity by just typing the following: :qstats92 Retrieving Prior Commands As with many command line tools, you can use the up arrows to access prior com- mands. The number of previous commands stored in the command history is controlled using the HISTSIZE variable. For example: \set HISTSIZE 10 will allow you to recover the past ten commands and no more from the command history file. You can also have psql pipe the history of commands into separate files for each data- base using a command like the following: \set HISTFILE ~/.psql_history- :HOST - :DBNAME The psql history feature generally doesn’t work on Windows unless running under Cygwin. This feature relies on the readline library, which Windows distributions are generally not compiled with. For the same reason, tab completion also doesn’t work. Finally, to unset a variable in psql, simply issue the \unset command followed by the variable name. For example: \unset qstats91 psql Gems In this section, we cover some really helpful features that are buried inside psql help. 36 | Chapter 3: psql
Executing Shell Commands Although you normally use SQL and psql specific commands in psql, you can call out to the OS shell using the ! command. Let’s say you’re on Windows and need to get a list of all OS environment settings that start with A. Instead of exiting psql, you can just directly type the following: \! set A ALLUSERSPROFILE=C:\ProgramData APPDATA=C:\Users\Administrator\AppData\Roaming Lists and Structures There are various psql commands available to get lists of objects along with details. In Example 3-3, we demonstrate how to list all tables in schema pg_catalog that start with pg_t, along with their size. Example 3-3. List tables with \dt+ \dt+ pg_catalog.pg_t* Schema | Name | Type | Owner | Size | Description + + + + + pg_catalog | pg_tablespace | table | postgres | 40 kB | pg_catalog | pg_trigger | table | postgres | 16 kB | pg_catalog | pg_ts_config | table | postgres | 40 kB | pg_catalog | pg_ts_config_map | table | postgres | 48 kB | pg_catalog | pg_ts_dict | table | postgres | 40 kB | pg_catalog | pg_ts_parser | table | postgres | 40 kB | pg_catalog | pg_ts_template | table | postgres | 40 kB | pg_catalog | pg_type | table | postgres | 112 kB | If we wanted detail about a particular object such as the pg_ts_config table, we would use the \d command, as shown in Example 3-4. Example 3-4. Describe object with \d \d+ pg_ts_dict Table "pg_catalog.pg_ts_dict" Column | Type | Modifiers | Storage | Stats target | Description + + + + + dictname | name | not null | plain | | dictnamespace | oid | not null | plain | | dictowner | oid | not null | plain | | dicttemplate | oid | not null | plain | | dictinitoption | text | | extended | | Indexes: "pg_ts_dict_dictname_index" UNIQUE, btree (dictname, dictnamespace) "pg_ts_dict_oid_index" UNIQUE, btree (oid) Has OIDs: yes psql Gems | 37
Importing and Exporting Data psql has a command called \copy for both importing from and exporting to a delimited text file. The default delimiter is tab with rows separated by new line breaks. For our first example, we downloaded data from US Census Fact Finder covering racial demo- graphics of housing in Massachusetts. You can download the file from PostgreSQL Book Data. Fact Finder is a treasure trove of data about the US; a statistician’s dream land. We encourage you to explore it via the guided wizard. Our usual practice in loading denormalized or unfamiliar data is to create a separate schema to segregate it from production data. We then write a series of explorative queries to get a good sense of what we have on our hands. Finally, we distribute the data into various normalized production tables and delete the staging schema. Before bringing the data into PostgreSQL, you must first create a table to hold the incoming data. The data must match the file both in the number of columns and data types. This could be an annoying extra step for a well-formed file, but does obviate the need for psql to guess at data types. psql processes the entire import as a single trans- action; should it encounter any errors in the data, the entire import will fail. If you’re unsure about the data contained in the file, we recommend setting up the table with the most accommodating data types and then recast later if necessary. For example, if you can’t be sure that a column will just have numeric values, make it character varying to get the data in for inspection and then recast later. Example 3-5. Importing data with psql psql \connect postgresql_book \cd /postgresql_book/ch03 \copy staging.factfinder_import FROM DEC_10_SF1_QTH1_with_ann.csv CSV In Example 3-5, we launch psql interactively, connect to our database, use \cd to change the current directory to the folder with our data and then import our data using the \copy command. Since default for copy is tab-delimited, we need to augment our state- ment with CSV to denote that our data is comma separated instead of tab delimited. If you had data with non-standard delimiters like | delimited columns and you also wanted to replace blank data points with nulls, you would use a command: \copy sometable FROM somefile.txt DELIMITER '|' NULL As ''; There is another COPY command, which is part of the SQL language (not to be confused with the \copy in psql) that requires the file be on the server. Since psql is a client utility, all path references are relative to the client while the SQL version is relative to the server and runs under the context of the postgres OS process account. We detail the differences between the two in Import Fixed-width Data in PostgreSQL with just psql. 38 | Chapter 3: psql