ArangoDB FullText Function Setup & Sample
| ATTENTION: This page has been migrated to the Tazama GitHub repository and is now located at: https://github.com/frmscoe/docs/tree/dev/Knowledge-Articles/Entity-Resolution This page will no longer be maintained in Confluence. |
---|
Setup Analyzer
1. In Lens, connect to the Shell of the ArangoDB pod
2. Type "arangosh" in the shell and hit Enter - this will connect to ArangoSH
3. To connect to the correct DB, type "db.useDatabase("transactionHistory");"
4. Then import analyzers:
var analyzers = require("@arangodb/analyzers");
5. Add a new analyzer:
analyzers.save("text_en_no_stem", "text", { locale: "en.utf-8", accent: false, case: "lower", stemming: false, stopwords: [] }, ["position", "frequency", "norm"]);
That's it! We've added the new analyzer!
The FullText Index (FTI) can be used to find words, or prefixes of words inside documents. In the function used to query a FTI, it will search for all the words exactly. See some samples below.
With the below dataset:
Johannes Petrus Foley | Johannes | Petrus |
Foley | Johannes Petrus | Johannes Foley |
Petrus Foley |
|
|
The following queries will yield the following results:
Search for “Foley”:
FOR i IN
FULLTEXT(transactionHistory,"CstmrCdtTrfInitn.PmtInf.Dbtr","Foley",10)
RETURN i.CstmrCdtTrfInitn.PmtInf.Dbtr.Nm
[
"Foley",
"Johannes Petrus Foley",
"Johannes Foley",
"Petrus Foley"
]
Search for “Petrus”:
Search for “Johannes Foley”:
Search for “Johannes” or “Foley”:
Conclusion
So we can see that each keyword supplied in the search function, needs to be in the result exactly in the above examples. The OR operator and others could also be applied, as per their documentation:
If multiple search words (or prefixes) are given, then by default the results will be AND-combined, meaning only the logical intersection of all searches will be returned. It is also possible to combine partial results with a logical OR, and with a logical NOT:
* FULLTEXT(emails, "body", "+this,+text,+document")
Will return all documents that contain all the mentioned words. Note: specifying the+
symbols is optional here.
* FULLTEXT(emails, "body", "banana,|apple")
Will return all documents that contain either (or both) words banana or apple.
* FULLTEXT(emails, "body", "banana,-apple")
Will return all documents that contain the word banana, but do not contain the word apple.
* FULLTEXT(emails, "body", "banana,pear,-cranberry")
Will return all documents that contain both the words banana and pear, but do not contain the word cranberry.