Database configuration and maintenance have historically been complex tasks, often requiring expert knowledge of database design and application behavior. In an embedded environment, it is not feasible to require such expertise and ongoing database maintenance. This paper discusses the database administration challenges posed by embedded systems and describes how the Berkeley DB architecture addresses these challenges.
Database administration consists of two components, initial configuration and ongoing maintenance. Initial configuration consists of database design, manifestation, and tuning. The instantiation of the design includes decomposing the design into tables, relations, or objects and designating proper indices and their implementations (e.g., Btrees, hash tables, etc.). Tuning a design requires selecting a location for the log and data files, selecting appropriate database page sizes, specifying the size of in-memory caches, and specifying the limits of multi-threading and concurrency. As embedded systems define a specific environment and set of tasks, requiring expertise during the initial system configuration process is acceptable, and we focus our efforts on the ongoing maintenance of the system. In this way, our emphasis differs from other projects such as Microsoft's AutoAdmin project , and the "no-knobs" administration that is identified as an area of important future research by the Asilomar authors.
In this paper, we focus on what the authors of the Asilomar report call "gizmo" databases , databases that reside in devices such as smart cards, toasters, or telephones. The key characteristics of such databases are that their functionality is completely transparent to users, no one ever performs explicit database operations or database maintenance, the database may crash at any time and must recover instantly, the device may undergo a hard reset at any time, requiring that the database return to its initial state, and the semantic integrity of the database must be maintained at all times. In Section 2, we provide more detail on the sorts of tasks typically performed by database administrators (DBAs) that must be automated in an embedded system.
The rest of this paper is structured as follows. In Section 2, we outline the requirements for embedded database support. In Section 3, we discuss how Berkeley DB is conducive to the hands-off management required in embedded systems. In Section 4, we discuss novel features that enhance Berkeley DB's suitability for the embedded applications. In Section 5, we discuss issues of footprint size. In Section 6 we discuss related work, and we conclude in Section 7.
In the embedded database arena, it is the ongoing maintenance tasks that must be automated, not necessarily the initial system configuration. There are five tasks that are traditionally performed by DBAs, but must be performed automatically in embedded database systems. These tasks are log archival and reclamation, backup, data compaction/reorganization, automatic and rapid recovery, and reinitialization from scratch.
Log archival and backup are tightly coupled. Database backups are part of any large database installation, and log archival is analogous to incremental backup. It is not clear what the implications of backup and archival are in an embedded system. Consumers do not back up their VCRs or refrigerators, yet they do (or should) back up their personal computers or personal digital assistants. For the remainder of this paper, we assume that backups, in some form, are required for gizmo databases (imagine having to reprogram, manually, the television viewing access pattern learned by some set-top television systems today). Furthermore, we require that those backups are nearly instantaneous or completely transparent, as users should not be aware that their gizmos are being backed up and should not have to explicitly initiate such backups.
Data compaction or reorganization has traditionally required periodic dumping and restoration of database tables and the recreation of indices. In an embedded system, such reorganization must happen automatically.
Recovery issues are similar in embedded and traditional environments with a few exceptions. While a few seconds or even a minute recovery is acceptable for a large server installation, no one is willing to wait for their telephone or television to reboot. As with archival, recovery must be nearly instantaneous in an embedded product. Secondly, it is often the case that a system will be completely reinitialized, rather than simply rebooted. In this case, the embedded database must be restored to its initial state, freeing all its resources. This is not typically a requirement of large server systems.
In addition to the maintenance-free operation required of the embedded systems, there are a number of requirements that fall out of the constrained resources typically found in the "gizmos" using gizmo databases. These requirements are: small footprint, short code-path, programmatic interface for tight application coupling and to avoid the overhead (in both time and size) of interfaces such as SQL and ODBC, application configurability and flexibility, support for complete memory-resident operation (e.g., these systems must run on gizmos without file systems), and support for multi-threading.
A small footprint and short code-path are self-explanatory, however what is not as obvious is that the programmatic interface requirement is the logical result of them. Traditional interfaces such as ODBC and SQL add significant size overhead and frequently add multiple context/thread switches per operation, not to mention several IPC calls. An embedded product is less likely to require the complex query processing that SQL enables. Instead, in the embedded space, the ability for an application to configure the database for the specific tasks in question is more important than a general query interface.
As some systems do not provide storage other than RAM and ROM, it is essential that an embedded database work seemlessly in memory-only environments. Similarly, many of today's embedded operating systems provide a single address space architecture, so a simple, multi-threaded capability is essential for application requiring any concurrency.
In general, embedded applications run on gizmos whose native operating system support varies tremendously. For example, the embedded OS may or may not support user-level processing or multi-threading. Even if it does, a particular embedded application may or may not need it. Not all applications need more than one thread of control. An embedded database must provide mechanisms to developers without deciding policy. For example, the threading model in an application is a matter of policy, and depends not on the database software, but on the hardware, operating system, and the application's feature set. Therefore, the data manager must provide for the use of multi-threading, but not require it.
Versions 2.X and higher are distributed by Sleepycat Software and add functionality for concurrency, logging, transactions, and recovery. Each piece of additional functionality is implemented as an independent module, which means that the subsystems can be used outside the context of Berkeley DB. For example, the locking subsystem can easily be used to implement locking for a non-DB application and the shared memory buffer pool can be used for any application caching data in main memory. This subsystem design allows a designer to pick and choose the functionality necessary for the application, minimizing memory footprint and maximizing performance. This addresses the small footprint and short code-path criteria mentioned in the previous section.
As Berkeley DB grew out of a replacement for dbm, its primary implementation language has always been C and its interface has been programmatic. The C interface is the native interface, unlike many database systems where the programmatic API is simply a layer on top of an already-costly query interface (e.g. embedded SQL). Berkeley DB's heritage is also apparent in its data model; it has none. The database stores unstructured key/data pairs, specified as variable length byte strings. This leaves schema design and representation issues the responsibility of the application, which is ideal for an embedded environment. Applications retain full control over specification of their data types, representation, index values, and index relationships. In other words, Berkeley DB provides a robust, high-performance, keyed storage system, not a particular database management system. We have designed for simplicity and performance, trading off complex, general purpose support that is better encapsulated in applications.
Another element of Berkeley DB's programmatic interface is its customizability; applications can specify Btree comparison and prefix compression functions, hash functions, error routines, and recovery models. This means that embedded applications can tailor the underlying database to best suit their data demands. Similarly, the utilities traditionally bundled with a database manager (e.g., recovery, dump/restore, archive) are implemented as tiny wrapper programs around library routines. This means that it is not necessary to run separate applications for the utilities. Instead, independent threads can act as utility daemons, or regular query threads can perform utility functions. Many of the current products built on Berkeley DB are bundled as a single large server with independent threads that perform functions such as checkpoint, deadlock detection, and performance monitoring.
As mentioned earlier, living in an embedded environment requires flexible management of storage. Berkeley DB does not require any preallocation of disk space for log or data files. While many commercial database systems take complete control of a raw device, Berkeley DB uses a normal file system, and can therefore, safely and easily share a data space with other programs. All databases and log files are native files of the host environment, so whatever utilities are provided by the environment can be used to manage database files as well.
Berkeley DB provides three different memory models for its management of shared information. Applications can use the IEEE Std 1003.1b-1993 (POSIX) mmap interface to share data, they can use system shared memory, as frequently provided by the shmget family of interfaces, or they can use per-process heap memory (e.g., malloc). Applications that require no permanent storage and do not provide shared memory facilities can still use Berkeley DB by requesting strictly private memory and specifying that all databases be memory-resident. This provides pure-memory operation.
Lastly, Berkeley DB is designed for rapid startup -- recovery can happen automatically as part of system initialization. This means that Berkeley DB works correctly in environments where gizmos are suddenly shut down and restarted.
As applications are also permitted to specify comparison and hash functions, the application can chose to organize its data based either on uncompressed and clear-text data or compressed and encrypted data. If the application indicates that data should be compared in its processed form (i.e., compressed and encrypted), then the compression and encryption are performed on individual data items and the in-memory representation retains these characteristics. However, if the application indicates that data should be compared in its original form, then entire pages are transformed upon being read into or written out of the main memory buffer cache. These two alternatives provide the flexibility to trade space and security for performance.
Berkeley DB provides an option to perform coarser grain, deadlock-free locking. Rather than locking on pages, locking is performed at the interface to the database. Multiple readers or a single writer are allowed to be active in the database at any instant in time, with conflicting requests queued automatically. The presence of cursors, through which applications can both read and write data, complicates this design. If a cursor is currently being used for reading, but will later be used to write, the system will be deadlock prone if no special precautions are taken. To handle this situation, we require that, when a cursor is created, the application specify any future intention to write. If there is an intention to write, the cursor is granted an intention-to-write lock which does not conflict with readers, but does conflict with other intention-to-write locks and write locks. The end result is that the application is limited to a single potentially writing cursor accessing the database at any point in time.
Under periods of low contention (but potentially high throughput), the normal page-level locking provides the best overall throughput. However, as contention rises, so does the potential for deadlock. As some cross-over point, switching to the less concurrent, but deadlock-free locking protocol will result in higher throughput as operations must never be retried. Given the operating conditions of an embedded database manager, it is useful to make this change automatically as the system itself detects high contention.
Oracle reports that Oracle Lite 3.0 requires 350 KB to 750 KB
of memory and approximately 2.5 MB of hard disk space .
This includes drivers for interfaces such as ODBC and JDBC.
In contrast, Berkeley DB ranges in size from 75 KB to under 200 KB,
foregoing heavyweight interfaces such as ODBC and JDBC and
providing a variety of deployed sizes that can be used depending
on application needs. At the low end, applications requiring
a simple single-user access method can choose from either extended
linear hashing, B+ trees, or record-number based retrieval and
pay only the 75 KB space requirement.
Applications requiring all three access methods will observe the
110 KB footprint.
At the high end, a fully recoverable, high-performance system
occupies less than a quarter megabyte of memory.
This is a system you can easily incorporate in your toaster oven.
Table 1 shows the per-module break down of the entire Berkeley DB
library. Note that this does not include memory used to cache database
|Object sizes in bytes
|Access method common code
|OS compatibility library
|All modules for Btree access method only
|All modules for Recno access method only
|All modules for Hash access method only
|All Access Methods
The Asilomar report identifies a new class of database applications, which they term "gizmo" databases, small databases embedded in tiny mobile appliances, e.g., smart-cards, telephones, personal digital assistants. Such databases must be self-managing, secure and reliable. Thus, the idea is that gizmo databases require plug and play data management with no database administrator (DBA), no human settable parameters, and the ability to adapt to changing conditions. More specifically, the Asilomar authors claim that the goal is self-tuning, including defining the physical DB design, the logical DB design, and automatic reports and utilities  To date, few researchers have accepted this challenge, and there is a dearth of research literature on the subject.
Our approach to embedded database administration is fundamentally different than that described by the Asilomar authors. We adopt their terminology, but view the challenge in supporting gizmo databases to be that of self-sustenance after initial deployment. Therefore, we find it, not only acceptable, but desirable to assume that application developers control initial database design and configuration. To the best of our knowledge, none of the published work in this area addresses this approach.
As the research community has not provided guidance in this arena, most work in embedded database administration has fallen to the commercial vendors. These vendors fall into two camps, companies selling databases specifically designed for embedding or programmatic access and the major database vendors (e.g., Oracle, Informix, Sybase).
The embedded vendors all acknowledge the need for automatic administration, but fail to identify precisely how their products actually accomplish this. A notable exception is Interbase whose white paper comparison with Sybase and Microsoft's SQL servers explicitly address features of maintenance ease. Interbase claims that as they use no log files, there is no need for log reclamation, checkpoint tuning, or other tasks associated with log management. However, Interbase uses Transaction Information Pages, and it is unclear how these are reused or reclaimed . Additionally, with a log-free system, they must use a FORCE policy (write all pages to disk at commit), as defined by Haerder and Reuter . This has serious performance consequences for disk-based systems. The approach described in this paper does use logs and therefore requires log reclamation, but provides hooks so the application may reclaim logs safely and programmatically. While Berkeley DB does require checkpoints, the goal of tuning the checkpoint interval is to bound recovery time. Since the checkpoint interval in Berkeley DB can be expressed by the amount of log data written, it requires no tuning. The application designer sets a target recovery time, and selects the amount of log data that can be read in that interval and specifies the checkpoint interval appropriately. Even as load changes, the time to recover does not.
The backup approaches taken by Interbase and Berkeley DB are similar in that they both allow online backup, but rather different in their affect on transactions running during backup. As Interbase performs backups as transactions , concurrent queries can suffer potentially long delays. Berkeley DB uses native operating system system utilities and recovery for backups, so there is no interference with concurrent activity, other than potential contention on disk arms.
There are a number of database vendors selling in the embedded market (e.g., Raima, Centura, Pervasive, Faircom), but none highlight the special requirements of embedded database applications. On the other end of the spectrum, the major vendors, Oracle, Sybase, Microsoft, are all becoming convinced of the importance of the embedded market. As mentioned earlier, Oracle has announced its Oracle Lite server for embedded use. Sybase has announced its UltraLite platform for "application-optimized, high-performance, SQL database engine for professional application developers building solutions for mobile and embedded platforms." . We believe that SQL is incompatible with the gizmo database environment or truly embedded systems for which Berkeley DB is most suitable. Microsoft research is taking a different approach, developing technology to assist in automating initial database design and index specification . As mentioned earlier, we believe that such configuration is, not only acceptable in the embedded market, but desirable so that applications can tune their database management for the target environment.
 Bernstein, P., Brodie, M., Ceri, S., DeWitt, D., Franklin, M., Garcia-Molina, H., Gray, J., Held, J., Hellerstein, J., Jagadish, H., Lesk, M., Maier, D., Naughton, J., Pirahesh, H., Stonebraker, M., Ullman, J., "The Asilomar Report on Database Research," SIGMOD Record 27(4): 74-80, 1998.
 Chaudhuri, S., Narasayya, V., "AutoAdmin 'What-If' Index Analysis Utility," Proceedings of the ACM SIGMOD Conference, Seattle, 1998.
 Chaudhuri, S., Narasayya, V., "An Efficient, Cost-Driver Index Selection Tool for Microsoft SQL Server," Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.
 Haerder, T., Reuter, A., "Principles of Transaction-Oriented Database Recovery," Computing Surveys 15,4 (1983), 237-318.
 Hostetler, M., "Cover Is Off A New Type of Database," Embedded DB News, http://www.theadvisors.com/embeddeddbnews.htm, 5/6/98.
 Interbase, "A Comparison of Borland InterBase 4.0 Sybase SQL Server and Microsoft SQL Server," http://web.interbase.com/products/doc_info_f.html.
 Oracle, "Oracle Delivers New Server, Application Suite to Power the Web for Mission-Critical Business," http://www.oracle.com.sg/partners/news/newserver.htm, May 1998.
 Sybase, Sybase UltraLite, http://www.sybase.com/products/ultralite/beta.
 Transaction Processing Council, "TPC-C Benchmark Specification, Version 3.4," San Jose, CA, August 1998.
 Transaction Processing Council, "TPC-D Benchmark Specification, Version 2.1," San Jose, CA, April 1999.