A few elements in the interface are specific and and need an explanation.
- ipath
An
ipathidentifies an embedded document inside a standalone one (designated by an URL). The value, if needed, is stored along with the URL, but not indexed. It is accessible or set as a field in the Doc object.ipaths are opaque values for the lower index layers (Doc objects producers or consumers), and their use is up to the specific indexer. For example, the Recoll file system indexer uses theipathto store the part of the document access path internal to (possibly imbricated) container documents.ipathin this case is a vector of access elements (e.g, the first part could be a path inside a zip file to an archive member which happens to be an mbox file, the second element would be the message sequential number inside the mbox etc.). The index itself has no knowledge of this hierarchical structure.At the moment, only the filesystem indexer uses hierarchical
ipaths (neither the Web nor the Joplin one do), and there are some assumptions in the upper software layers about their structure. For example, the Recoll GUI knows about using an FS indexeripathfor such functions as opening the immediate parent of a given document.urlandipathare returned in every search result and define the access to the original document.ipathis empty for top-level document/files (e.g. a PDF document which is a filesystem file).- udi
An
udi(unique document identifier) identifies a document. Because of limitations inside the index engine, it is restricted in length (to 200 bytes). The structure and contents of theudiis defined by the application and opaque to the index engine. For example, the internal file system indexer uses the complete document path (file path + internal path), truncated to a maximum length, the suppressed part being replaced by a hash value to retain practical unicity.To rephrase, and hopefully clarify: the filesystem indexer can't use the URL+ipath as a unique document-identifying term because this may be too big: it derives a shorter
udifrom URL+ipath. Another indexer could use a completely different method. For example, the Joplin indexer uses the note ID.- parent_udi
If this attribute is set on a document when entering it in the index, it designates its physical container document. In a multilevel hierarchy, this may not be the immediate parent. If the indexer uses the
purge()method, then the use ofparent_udiis mandatory for subdocuments. Else it is optional, but its use by an indexer may simplify index maintenance, as Recoll will automatically delete all children defined byparent_udi == udiwhen the document designated byudiis destroyed. e.g. if aZiparchive contains entries which are themselves containers, likemboxfiles, all the subdocuments inside theZipfile (mbox, messages, message attachments, etc.) would have the sameparent_udi, matching theudifor theZipfile, and all would be destroyed when theZipfile (identified by itsudi) is removed from the index.- Stored and indexed fields
The
fieldsfile inside the Recoll configuration defines which document fields are eitherindexed(searchable),stored(retrievable with search results), or both. Apart from a few standard/internal fields, only thestoredfields are retrievable through the Python search interface.

