Mindoo Blog - Cutting edge technologies - About Java, Lotus Notes and iPhone

  • Overview of Domino Data Retrieval: Exploring NSF search, DQL, Domino Views and the QueryResultsProcessor

    Karsten Lehmann  13 July 2024 23:29:18
    As you read in the previous article "The pain of reading data as a Domino developer - and solutions", looking up data on Domino is not as easy as it seems - especially compared to other platforms like SQL. Let's explore the available options.

    NSF search with formula


    For many years, formula language was the only universal search language on the platform. However, searching the database took time (method lotus.domino.Database.search(String formula, DateTime dt, int max)) since each document had to be scanned and processed by the formula. The result, a DocumentCollection, is said to have "no particular order," but this isn't entirely accurate. The underlying C data structure of the returned DocumentCollection is an IDTable containing note IDs. To store it efficiently, early Domino core developers decided to sort the note IDs in ascending order.


    For more on the storage format of IDTables, refer to this interesting knowledge base article:

    https://support.hcltechsw.com/csm?id=kb_article&sysparm_article=KB0026700

    In data processing, the order in which documents are returned by the search call (NSFSearch in the C API) does not matter. You search the entire database and process it (one by one or quickly with one of the stamp methods). To improve performance, the NSFSearch call returns a TIMEDATE value that you can use for subsequent calls to fetch only documents created, changed, or deleted since the given time and whether they do or no longer match the selection formula.


    Normal NSFSearch calls return the whole summary buffer for each matching document, which can contain a lot of irrelevant information. However, by using a special undocumented format for the compiled search formula (with merged column formulas), some undocumented flags, an undocumented item name ("$C1$"), and an undocumented NSFSearchExtended3 call, things get more interesting.


    With this method, the search operation only copies the summary buffer data you are interested in, speeding up the search. The item "$C1$" contains the document's readers list, showing who can view a document. If readers are present, it also includes the document's authors. This means there's no need to manually extract this data in your own code; the NSF search operation handles it.


    Our open-source project,
    Domino JNA, utilizes this powerful search method in the com.mindoo.domino.jna.NotesSearch class, which is an excellent tool for syncing Domino data with external systems or building custom indexes.

    These undocumented flags, item and calls are the magic behind Domino views:


    Domino views


    End users typically want tabular data sorted by one or more columns, not data sorted by note ID. Domino views are designed for this purpose. They consist of a search formula to select relevant data and column information to visualize the data in a specific order.


    Image:Overview of Domino Data Retrieval: Exploring NSF search, DQL, Domino Views and the QueryResultsProcessor

    Domino views provide multi-level categories for drilling down into data, which can be expanded/collapsed, and where sums and averages are aggregated (e.g., costs per team, costs per department, costs for a whole company). The view index represents a persistent and always up-to-date formula search result of one NSF database. Only one instance of a specific view index exists on the server. Since view entries contain readers information, the Notes Indexing Facility (NIF) skips rows that a user is not allowed to see by comparing their personal user names list with the allowed readers of a document.


    String comparison takes time, so if you have a database with 1 million documents and you can only see 10, traversing a view is not fast because 999,990 rows have to be skipped. Domino 14 includes optimizations for these edge cases, such as maintaining an IDTable per user with document note IDs they are allowed to see. This allows the NIF code to skip string comparison and just check if the IDTable contains the note ID. However, these optimizations are not enabled by default for all users.


    Domino view columns can display document item content, values computed via formula language (e.g., Lastname + ", " + Firstname), or special values like child and descendant counts read from the view index itself.


    For more details on NIF, check out John Curtis's blog post:

    https://jdcurtis.blog/2019/12/16/tdr-notes-indexing-facility

    Folders


    Folders work like Domino views, but their content (note IDs) is not retrieved via formula search. Instead, it is added manually by the end user or the application developer. The main use case of a folder is to pick and bookmark documents for future analysis or processing.


    Domino Query Language


    Introduced in Domino 10, the Domino Query Language (DQL) provides a concise syntax for finding documents based on a wide variety of terms. It leverages existing design elements without requiring detailed code to access them.


    Like formula language, DQL can filter documents from a single NSF and return an "unsorted" IDTable. Here is an example of DQL:


    Order_origin in ('Detroit', 'Albuquerque', 'San Diego') and Date_origin >= @dt('2014-07-15') and Date_origin <@dt('2015-07-14’) and
    partno in all ( 389, 27883, 388388, 587992 ) and not in ('Special Processing', 'Special2' , 'Soon to be special’) and not sales_person in ('Christen Summer', 'Isaac Hart')


    DQL provides a more efficient way to perform searches compared to formula language. A query planner analyzes a DQL statement and finds the best strategy to quickly reduce the number of relevant documents, such as by doing view lookups or FT searches.


    A DQL search always returns the complete search result and does not support incremental searching. You can specify views, folders, or document collections to limit the results:


    in ('TrudisDocs', ‘Orders’, ’Special orders folder 1’, ‘Old_orders 2’)


    For more DQL examples, visit:

    https://help.hcltechsw.com/dom_designer/14.0.0/basic/dql_simple_examples.html

    As of Domino 14, a DQL search only covers data documents, not design documents.


    For more details on DQL, see this Admincamp presentation from HCL:

    https://admincamp.de/konferenz/ent2019.nsf/bc36cf8d512621e0c1256f870073e627/6c2e835120d74a18c1258327003f8d83/$FILE/T3S4-Demo%20and%20Deep%20Dive%20-%20Domino%20General%20Query%20Facility%20.pdf

    QueryResultsProcessor


    Introduced in Domino 12, the QueryResultsProcessor processes a list of documents, reads selected values from the documents' summary buffer, and creates tabular data with sorted, categorized, or unsorted columns. The processing result can be returned in JSON format for web applications or materialized as a Domino view in any NSF as a QRP view.


    Unlike standard Domino views, data from multiple NSFs can be combined in a single QueryResultsProcessor call. For JSON output, there is no paging support (skip/limit), and the result is recomputed on every call. The QRP view is created once and does not update its content, making it suitable for producing snapshots of Domino data at a point in time.


    Since you can use the normal View APIs (e.g., the ViewNavigator), returning paged data is not difficult. However, the QRP view does not store any readers lists on the row level. The view contains the data that the user creating it was allowed to see. You can only restrict access on the view level, e.g., to hide financial data from normal employees.


    The primary purpose of QRP views is to produce a snapshot of Domino data at a point in time. They are more of a reporting tool than a good option for real-time queries, because you would need to build one QRP view for each user/user group and discard it as soon as the underlying data has changed, since there is no in-place index update.


    John Curtis has documented the QueryResultsProcessor in a series of blog articles:


    https://jdcurtis.blog/2021/11/29/the-query-results-processor-part-one/

    https://jdcurtis.blog/2021/11/30/the-query-results-processor-part-two/

    https://jdcurtis.blog/2021/12/02/the-queryresultsprocessor-part-three/

    Curtis mentioned plans for the future of the QueryResultsProcessor, including refreshable QRP views, joins/lookups across QRP input collections, and incorporating external data into QRP views. However, he retired at the end of 2022, and the QueryResultsProcessor API has since lost momentum.


    What is missing?


    As described above, using NSFSearch, it is not difficult to mirror Domino data in real-time in other databases with more capable query languages like SQL, GraphQL, or Cypher (for the Graph DB fans). You can let Domino do all the work, including computing formula values and readers lists.


    In the past, we built web applications for customers where we pulled the content of 30 NSFs (one for each department) into the JVM heap (max heap set to 30 GB) on HTTP task startup. We used CQEngine (
    https://github.com/npgall/cqengine) to build indexes for the data, resulting in extremely fast REST APIs with dynamic filter and sorting options.

    However, these are custom solutions for each use case. Instead, there should be a standard way that fits many use cases and feels "Domino-like."


    Let's take the QueryResultsProcessor concept to another level! Let's reinvent the wheel one more time! :-)


    In the next article, I will introduce Domino JNA's new Virtual View API.


    Stay tuned!

    Comments

    1dLUcNgpxc    fJOmLNrx

    2yNYDIbXQ    pNTcQMGNtBApAK