xapian-core
1.5.0
|
An indexed database of documents. More...
Public Member Functions | |
void | add_database (const Database &other) |
Add shards from another Database. More... | |
size_t | size () const |
Return number of shards in this Database object. | |
Database () | |
Construct a Database containing no shards. More... | |
Database (const std::string &path, int flags=0) | |
Open a Database. More... | |
Database (int fd, int flags=0) | |
Open a single-file Database. More... | |
virtual | ~Database () |
Destructor. | |
Database (const Database &o) | |
Copy constructor. More... | |
Database & | operator= (const Database &o) |
Assignment operator. More... | |
Database (Database &&o) | |
Move constructor. | |
Database & | operator= (Database &&o) |
Move assignment operator. | |
bool | reopen () |
Reopen the database at the latest available revision. More... | |
void | close () |
Close the database. More... | |
virtual std::string | get_description () const |
Return a string describing this object. | |
PostingIterator | postlist_begin (const std::string &term) const |
Start iterating the postings of a term. More... | |
PostingIterator | postlist_end (const std::string &) const noexcept |
End iterator corresponding to postlist_begin(). | |
TermIterator | termlist_begin (Xapian::docid did) const |
Start iterating the terms in a document. More... | |
TermIterator | termlist_end (Xapian::docid) const noexcept |
End iterator corresponding to termlist_begin(). | |
bool | has_positions () const |
Does this database have any positional information? | |
PositionIterator | positionlist_begin (Xapian::docid did, const std::string &term) const |
Start iterating positions for a term in a document. More... | |
PositionIterator | positionlist_end (Xapian::docid, const std::string &) const noexcept |
End iterator corresponding to positionlist_begin(). | |
TermIterator | allterms_begin (const std::string &prefix=std::string()) const |
Start iterating all terms in the database with a given prefix. More... | |
TermIterator | allterms_end (const std::string &=std::string()) const noexcept |
End iterator corresponding to allterms_begin(prefix). | |
Xapian::doccount | get_doccount () const |
Get the number of documents in the database. | |
Xapian::docid | get_lastdocid () const |
Get the highest document id which has been used in the database. | |
double | get_average_length () const |
Get the mean document length in the database. | |
double | get_avlength () const |
Old name for get_average_length() for backward compatibility. | |
Xapian::totallength | get_total_length () const |
Get the total length of all the documents in the database. More... | |
Xapian::doccount | get_termfreq (const std::string &term) const |
Get the number of documents indexed by a specified term. More... | |
bool | term_exists (const std::string &term) const |
Test is a particular term is present in any document. More... | |
Xapian::termcount | get_collection_freq (const std::string &term) const |
Get the total number of occurrences of a specified term. More... | |
Xapian::doccount | get_value_freq (Xapian::valueno slot) const |
Return the frequency of a given value slot. More... | |
std::string | get_value_lower_bound (Xapian::valueno slot) const |
Get a lower bound on the values stored in the given value slot. More... | |
std::string | get_value_upper_bound (Xapian::valueno slot) const |
Get an upper bound on the values stored in the given value slot. More... | |
Xapian::termcount | get_doclength_lower_bound () const |
Get a lower bound on the length of a document in this DB. More... | |
Xapian::termcount | get_doclength_upper_bound () const |
Get an upper bound on the length of a document in this DB. | |
Xapian::termcount | get_wdf_upper_bound (const std::string &term) const |
Get an upper bound on the wdf of term term. | |
Xapian::termcount | get_unique_terms_lower_bound () const |
Get a lower bound on the unique terms size of a document in this DB. | |
Xapian::termcount | get_unique_terms_upper_bound () const |
Get an upper bound on the unique terms size of a document in this DB. | |
ValueIterator | valuestream_begin (Xapian::valueno slot) const |
Return an iterator over the value in slot slot for each document. | |
ValueIterator | valuestream_end (Xapian::valueno) const noexcept |
Return end iterator corresponding to valuestream_begin(). | |
Xapian::termcount | get_doclength (Xapian::docid did) const |
Get the length of a document. More... | |
Xapian::termcount | get_unique_terms (Xapian::docid did) const |
Get the number of unique terms in a document. More... | |
Xapian::termcount | get_wdfdocmax (Xapian::docid did) const |
void | keep_alive () |
Send a keep-alive message. More... | |
Xapian::Document | get_document (Xapian::docid did, unsigned flags=0) const |
Get a document from the database. More... | |
std::string | get_spelling_suggestion (const std::string &word, unsigned max_edit_distance=2) const |
Suggest a spelling correction. More... | |
Xapian::TermIterator | spellings_begin () const |
An iterator which returns all the spelling correction targets. More... | |
Xapian::TermIterator | spellings_end () const noexcept |
End iterator corresponding to spellings_begin(). | |
Xapian::TermIterator | synonyms_begin (const std::string &term) const |
An iterator which returns all the synonyms for a given term. More... | |
Xapian::TermIterator | synonyms_end (const std::string &) const noexcept |
End iterator corresponding to synonyms_begin(term). | |
Xapian::TermIterator | synonym_keys_begin (const std::string &prefix=std::string()) const |
An iterator which returns all terms which have synonyms. More... | |
Xapian::TermIterator | synonym_keys_end (const std::string &=std::string()) const noexcept |
End iterator corresponding to synonym_keys_begin(prefix). | |
std::string | get_metadata (const std::string &key) const |
Get the user-specified metadata associated with a given key. More... | |
Xapian::TermIterator | metadata_keys_begin (const std::string &prefix=std::string()) const |
An iterator which returns all user-specified metadata keys. More... | |
Xapian::TermIterator | metadata_keys_end (const std::string &=std::string()) const noexcept |
End iterator corresponding to metadata_keys_begin(). | |
std::string | get_uuid () const |
Get a UUID for the database. More... | |
bool | locked () const |
Test if this database is currently locked for writing. More... | |
Xapian::WritableDatabase | lock (int flags=0) |
Lock a read-only database for writing. More... | |
Xapian::Database | unlock () |
Release a database write lock. More... | |
Xapian::rev | get_revision () const |
Get the revision of the database. More... | |
void | compact (const std::string &output, unsigned flags=0, int block_size=0) |
Produce a compact version of this database. More... | |
void | compact (int fd, unsigned flags=0, int block_size=0) |
Produce a compact version of this database. More... | |
void | compact (const std::string &output, unsigned flags, int block_size, Xapian::Compactor &compactor) |
Produce a compact version of this database. More... | |
void | compact (int fd, unsigned flags, int block_size, Xapian::Compactor &compactor) |
Produce a compact version of this database. More... | |
std::string | reconstruct_text (Xapian::docid did, size_t length=0, const std::string &prefix=std::string(), Xapian::termpos start_pos=0, Xapian::termpos end_pos=0) const |
Reconstruct document text. More... | |
Static Public Member Functions | |
static size_t | check (const std::string &path, int opts=0, std::ostream *out=NULL) |
Check the integrity of a database or database table. More... | |
static size_t | check (int fd, int opts=0, std::ostream *out=NULL) |
Check the integrity of a single file database. More... | |
An indexed database of documents.
A Database object contains zero or more shards, and operations are performed across these shards.
To perform a search on a Database, you need to use an Enquire object.
Most methods can throw:
Xapian::DatabaseCorruptError | if database corruption is detected |
Xapian::DatabaseError | in various situation (for example, if there's an I/O error). |
Xapian::DatabaseModifiedError | if the revision being read has been discarded |
Xapian::DatabaseClosedError | may be thrown by some methods after after close() has been called |
Xapian::NetworkError | when remote databases are in use |
Xapian::Database::Database | ( | ) |
Construct a Database containing no shards.
You can then add shards by calling add_database(). A Database containing no shards can also be useful in situations where you need an empty database.
|
explicit |
Open a Database.
path | Filing system path to open database from |
flags | Bitwise-or of Xapian::DB_* constants |
The path can be a file (for a stub database or a single-file glass database) or a directory (for a standard glass database). If flags includes DB_BACKEND_INMEMORY then path is ignored.
Xapian::DatabaseOpeningError | if the specified database cannot be opened |
Xapian::DatabaseVersionError | if the specified database has a format too old or too new to be supported. |
|
explicit |
Open a single-file Database.
This method opens a single-file Database given a file descriptor open on it. Xapian looks starting at the current file offset, allowing a single file database to be easily embedded within another file.
fd | File descriptor for the file. Xapian takes ownership of this and will close it when the database is closed. |
flags | Bitwise-or of Xapian::DB_* constants. |
Xapian::DatabaseOpeningError | if the specified database cannot be opened |
Xapian::DatabaseVersionError | if the specified database has a format too old or too new to be supported. |
Xapian::Database::Database | ( | const Database & | o | ) |
Copy constructor.
The internals are reference counted, so copying is cheap.
|
inline |
Add shards from another Database.
Any shards in other are appended to the list of shards in this object. The shards are reference counted and also remain in other.
other | Another Database to add shards from |
Xapian::InvalidArgumentError | if other is the same object as this. |
TermIterator Xapian::Database::allterms_begin | ( | const std::string & | prefix = std::string() | ) | const |
Start iterating all terms in the database with a given prefix.
prefix | The prefix to restrict the returned terms to (default: iterate all terms) |
|
inlinestatic |
Check the integrity of a database or database table.
path | Path to database or table |
opts | Options to use for check |
out | std::ostream to write output to (NULL for no output) |
|
inlinestatic |
Check the integrity of a single file database.
fd | file descriptor for the database. The current file offset is used, allowing checking a single file database which is embedded within another file. Xapian takes ownership of the file descriptor and will close it before returning. |
opts | Options to use for check |
out | std::ostream to write output to (NULL for no output) |
void Xapian::Database::close | ( | ) |
Close the database.
This closes the database and closes all its file handles.
For a WritableDatabase, if a transaction is active it will be aborted, while if no transaction is active commit() will be implicitly called. Also the write lock is released.
Calling close() on an object cannot be undone - in particular, a subsequent call to reopen() on the same object will not reopen it, but will instead throw a Xapian::DatabaseClosedError exception.
Calling close() again on an object which has already been closed has no effect (and doesn't raise an exception).
After close() has been called, calls to other methods of the database, and to methods of other objects associated with the database, will either:
The reason for this behaviour is that otherwise we'd have to check that the database is still open on every method call on every object associated with a Database, when in many cases they are working on data which has already been loaded and so they are able to just behave correctly.
|
inline |
Produce a compact version of this database.
The compactor functor allows handling progress output and specifying how user metadata is merged.
output | Path to write the compact version to. This can be the same as an input if that input is a stub database (in which case the database(s) listed in the stub will be compacted to a new database and then the stub will be atomically updated to point to this new database). |
flags | Any of the following combined using bitwise-or (| in C++):
|
block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
compactor | Functor |
|
inline |
Produce a compact version of this database.
output | Path to write the compact version to. This can be the same as an input if that input is a stub database (in which case the database(s) listed in the stub will be compacted to a new database and then the stub will be atomically updated to point to this new database). |
flags | Any of the following combined using bitwise-or (| in C++):
|
block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
|
inline |
Produce a compact version of this database.
The compactor functor allows handling progress output and specifying how user metadata is merged.
This variant writes a single-file database to the specified file descriptor. Only the glass backend supports such databases, so this form is only supported for this backend.
fd | File descriptor to write the compact version to. The descriptor needs to be readable and writable (open with O_RDWR) and seekable. The current file offset is used, allowing compacting to a single file database embedded within another file. Xapian takes ownership of the file descriptor and will close it before returning. |
flags | Any of the following combined using bitwise-or (| in C++):
|
block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
compactor | Functor |
|
inline |
Produce a compact version of this database.
This variant writes a single-file database to the specified file descriptor. Only the glass backend supports such databases, so this form is only supported for this backend.
fd | File descriptor to write the compact version to. The descriptor needs to be readable and writable (open with O_RDWR) and seekable. The current file offset is used, allowing compacting to a single file database embedded within another file. Xapian takes ownership of the file descriptor and will close it before returning. |
flags | Any of the following combined using bitwise-or (| in C++):
|
block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
Xapian::termcount Xapian::Database::get_collection_freq | ( | const std::string & | term | ) | const |
Get the total number of occurrences of a specified term.
The collection frequency of a term is defined as the total number of times it occurs in the database, which is the sum of its wdf in all the documents it indexes.
term | The term to get the collection frequency of. An empty string acts as a special pseudo-term which indexes all the documents in the database, so returns get_doccount(). If the term isn't present in the database, 0 is returned. |
Xapian::termcount Xapian::Database::get_doclength | ( | Xapian::docid | did | ) | const |
Get the length of a document.
did | The document id of the document |
Xapian defines a document's length as the sum of the wdf of all the terms which index it.
Xapian::termcount Xapian::Database::get_doclength_lower_bound | ( | ) | const |
Get a lower bound on the length of a document in this DB.
This bound does not include any zero-length documents.
Xapian::Document Xapian::Database::get_document | ( | Xapian::docid | did, |
unsigned | flags = 0 |
||
) | const |
Get a document from the database.
The returned object acts as a handle which lazily fetches information about the specified document from the database.
did | The document ID of the document to be get |
flags | Zero or more flags bitwise-or-ed together (currently only Xapian::DOC_ASSUME_VALID is supported). (default: 0) |
Xapian::InvalidArgumentError | is thrown if did is 0. |
Xapian::DocNotFoundError | is thrown if the specified docid is not present in this database. |
std::string Xapian::Database::get_metadata | ( | const std::string & | key | ) | const |
Get the user-specified metadata associated with a given key.
User-specified metadata allows you to store arbitrary information in the form of (key, value) pairs. See WritableDatabase::set_metadata() for more information.
When invoked on a Xapian::Database object representing multiple databases, currently only the metadata for the first is considered but this behaviour may change in the future.
If there is no piece of metadata associated with the specified key, an empty string is returned (this applies even for backends which don't support metadata).
Empty keys are not valid, and specifying one will cause an exception.
key | The key of the metadata item to access. |
Xapian::InvalidArgumentError | will be thrown if the key supplied is empty. |
Xapian::rev Xapian::Database::get_revision | ( | ) | const |
Get the revision of the database.
The revision is an unsigned integer which increases with each commit.
The database must have exactly one sub-database, which must be of type glass. Otherwise an exception will be thrown.
Experimental - see https://xapian.org/docs/deprecation#experimental-features
std::string Xapian::Database::get_spelling_suggestion | ( | const std::string & | word, |
unsigned | max_edit_distance = 2 |
||
) | const |
Suggest a spelling correction.
word | The potentially misspelled word. |
max_edit_distance | Only consider words which are at most max_edit_distance edits from word. An edit is a character insertion, deletion, or the transposition of two adjacent characters (default is 2). |
Xapian::doccount Xapian::Database::get_termfreq | ( | const std::string & | term | ) | const |
Get the number of documents indexed by a specified term.
term | The term to get the frequency of. An empty string acts as a special pseudo-term which indexes all the documents in the database, so returns get_doccount(). If the term isn't present in the database, 0 is returned. |
Xapian::totallength Xapian::Database::get_total_length | ( | ) | const |
Get the total length of all the documents in the database.
Xapian::termcount Xapian::Database::get_unique_terms | ( | Xapian::docid | did | ) | const |
Get the number of unique terms in a document.
did | The document id of the document |
This is the number of different terms which index the given document.
std::string Xapian::Database::get_uuid | ( | ) | const |
Get a UUID for the database.
The UUID will persist for the lifetime of the database.
Replicas (eg, made with the replication protocol, or by copying all the database files) will have the same UUID. However, copies (made with copydatabase, or xapian-compact) will have different UUIDs.
If the backend does not support UUIDs or this database has no subdatabases, the UUID will be empty.
If this database has multiple sub-databases, the UUID string will contain the UUIDs of all the sub-databases separated by colons.
Xapian::doccount Xapian::Database::get_value_freq | ( | Xapian::valueno | slot | ) | const |
Return the frequency of a given value slot.
This is the number of documents which have a (non-empty) value stored in the slot.
slot | The value slot to examine. |
std::string Xapian::Database::get_value_lower_bound | ( | Xapian::valueno | slot | ) | const |
Get a lower bound on the values stored in the given value slot.
If there are no values stored in the given value slot, this will return an empty string.
slot | The value slot to examine. |
std::string Xapian::Database::get_value_upper_bound | ( | Xapian::valueno | slot | ) | const |
Get an upper bound on the values stored in the given value slot.
If there are no values stored in the given value slot, this will return an empty string.
slot | The value slot to examine. |
void Xapian::Database::keep_alive | ( | ) |
Send a keep-alive message.
For remote databases, this method sends a message to the server to reset the timeout timer. As well as preventing timeouts at the Xapian remote protocol level, this message will also avoid timeouts at lower levels.
For local databases, this method does nothing.
Xapian::WritableDatabase Xapian::Database::lock | ( | int | flags = 0 | ) |
Lock a read-only database for writing.
If the database is actually already writable (i.e. a WritableDatabase via a Database reference) then the same database is returned (with its flags updated, so this provides an efficient way to modify flags on an open WritableDatabase).
Unlike unlock(), the object this is called on remains open.
flags | The flags to use for the writable database. Flags which specify how to open the database are ignored (e.g. DB_CREATE_OR_OVERWRITE doesn't result in the database being wiped), and flags which specify the backend are also ignored as they are only relevant when creating a new database. |
bool Xapian::Database::locked | ( | ) | const |
Test if this database is currently locked for writing.
If the underlying object is actually a WritableDatabase, always returns true unless close() has been called.
Otherwise tests if there's a writer holding the lock (or if we can't test for a lock without taking it on the current platform, throw Xapian::UnimplementedError). If there's an error while trying to test the lock, throws Xapian::DatabaseLockError.
For multi-databases, this tests each sub-database and returns true if any of them are locked.
Xapian::TermIterator Xapian::Database::metadata_keys_begin | ( | const std::string & | prefix = std::string() | ) | const |
An iterator which returns all user-specified metadata keys.
When invoked on a Xapian::Database object representing multiple databases, currently only the metadata for the first is considered but this behaviour may change in the future.
If the backend doesn't support metadata, then this method returns an iterator which compares equal to that returned by metadata_keys_end().
prefix | If non-empty, only keys with this prefix are returned. |
Xapian::UnimplementedError | will be thrown if the backend implements user-specified metadata, but doesn't implement iterating its keys (currently this happens for the InMemory backend). |
Assignment operator.
The internals are reference counted, so assignment is cheap.
Referenced by Xapian::WritableDatabase::operator=().
PositionIterator Xapian::Database::positionlist_begin | ( | Xapian::docid | did, |
const std::string & | term | ||
) | const |
Start iterating positions for a term in a document.
did | The document id of the document |
term | The term |
PostingIterator Xapian::Database::postlist_begin | ( | const std::string & | term | ) | const |
Start iterating the postings of a term.
term | The term to iterate the postings of. An empty string acts as a special pseudo-term which indexes all the documents in the database with a wdf of 1. |
std::string Xapian::Database::reconstruct_text | ( | Xapian::docid | did, |
size_t | length = 0 , |
||
const std::string & | prefix = std::string() , |
||
Xapian::termpos | start_pos = 0 , |
||
Xapian::termpos | end_pos = 0 |
||
) | const |
Reconstruct document text.
This uses term positional information to reconstruct the document text which was indexed. Reading the required positional information is potentially quite I/O intensive.
The reconstructed text will be missing punctuation and most capitalisation.
did | The document id of the document to reconstruct |
length | Number of bytes of text to aim for - note that slightly more may be returned (default: 0 meaning unlimited) |
prefix | Term prefix to reconstruct (default: none) |
start_pos | First position to reconstruct (default: 0) |
end_pos | Last position to reconstruct (default: 0 meaning all) |
bool Xapian::Database::reopen | ( | ) |
Reopen the database at the latest available revision.
Xapian databases (at least with most backends) support versioning such that a Database object uses a snapshot of the database. However, write operations may cause this snapshot to be discarded, which can cause Xapian::DatabaseModifiedError to be thrown. You can recover from this situation by calling reopen() and restarting the search operation.
All shards are updated to the latest available revision. This should be a cheap operation if they're already at the latest revision, so if you're using the same Database object for many searches it's reasonable to call reopen() before each search.
Xapian::DatabaseError | is thrown if close() has been called on any of the shards. |
Xapian::TermIterator Xapian::Database::spellings_begin | ( | ) | const |
An iterator which returns all the spelling correction targets.
This returns all the words which are considered as targets for the spelling correction algorithm. The frequency of each word is available as the term frequency of each entry in the returned iterator.
Xapian::TermIterator Xapian::Database::synonym_keys_begin | ( | const std::string & | prefix = std::string() | ) | const |
An iterator which returns all terms which have synonyms.
prefix | If non-empty, only terms with this prefix are returned. |
Xapian::TermIterator Xapian::Database::synonyms_begin | ( | const std::string & | term | ) | const |
An iterator which returns all the synonyms for a given term.
term | The term to return synonyms for. |
bool Xapian::Database::term_exists | ( | const std::string & | term | ) | const |
Test is a particular term is present in any document.
term | The term to test for. An empty string acts as a special pseudo-term which indexes all the documents in the database, so returns true if the database contains any documents. |
db.term_exists(t) gives the same answer as db.get_termfreq(t) != 0, but is typically more efficient.
TermIterator Xapian::Database::termlist_begin | ( | Xapian::docid | did | ) | const |
Start iterating the terms in a document.
did | The document id to iterate terms from |
The terms are returned ascending string order (by byte value).
Xapian::Database Xapian::Database::unlock | ( | ) |
Release a database write lock.
If called on a read-only database then the same database is returned.
If called on a writable database, the object this method was called on is closed.