Seamless Horizons supports both basic search operators, like that which you would use in your favorite internet search tool, as well as advanced search parameters that go above and beyond to find more potential matches for your search. One search operator will not give you all the answers you need so it is important to develop a foundational understanding of the questions each operator will help you ask.
In the sections below, we will guide you through the process of crafting queries to help answer questions we wish to ask of the data. We will also outline the features of the new query builder tool that helps you incorporate basic search operators with a simple and intuitive tool. Lastly, we will present a series of challenges, ranging from easy to difficult, that you can use to hone and practice your skills.
By the end of this guide you will be able to:
Craft basic and advanced search queries
Understand how to leverage the query builder to create complex queries
Broaden and narrow your search using operators and filters
Use queries to overcome challenges with searching dirty data
Practice the skills you learned in this guide
Search Operators
Similar to Google, Bing, or other popular internet search tools, Seamless Horizons supports basic search parameters that can be used to find exact phrases, boost the relevancy of certain terms, and ignore certain terms or phrases. In this section we will cover the following operators:
AND
OR
NOT
""
( )
~
""~N
?
*
Operator | Function | Query Example | Sample Result |
AND | Search will include results that contain both words or phrases on each side of the 'AND' operator. | Doe AND Passport AND 1955 | John Doe, born in 1955, has the passport number AE20359 |
OR | Search will include records containing either word or phrases next to 'OR' operator. | John or Mary | When I found John at the train station, his briefcase had vanished |
NOT | Search will not include records with the word or phrase to the right of the 'NOT' operator. | Doe NOT John NOT Mary | Henry Doe and his sister Linda Doe currently live in Spain |
" " | Performs exact search for word or phrase within the quotation marks. Will also ensure that the word or phrase is included in each search result.
Terms outside quotes are not mandatory, but will boost results relevant to those terms. | "Evil Corporation" Limited Ltd LLC | Several limited liability companies, including Evil Corporation LLC, were caught dumping into the river. |
( ) | Parentheses are used to group operators to create more complex queries and increase the relevancy of search results. | (John OR James OR Mary) AND (Doe or Baker) AND "Company Record" | James found that Amanda Baker had found a record of the company in the Delaware company record registry. |
~ | Conducts a fuzzy search for the attached word using Damerau-levenshtein distance.
The default operator finds all terms with an edit distance of two, where the two changes are the insertion, deletion, or substitution of a single character, or the transposition of two adjacent characters.
An edit distance of 1 should be sufficient to catch 80% of all human misspellings. | Vasiliy~1 | Due to the differences in Russian name transliterations, Vasiliy Gregorovich often spells his name: Vasili, Vasilij, Vasilii, Vasilie |
""~N | Conducts a proximity search for a phrase within the quotation marks. N represents the maximum edit distance for words in the phrase.
This operator is especially useful when searching phrases that might be out of order or not exact, such as an individuals name. | "John Doe"~2 | This individual always likes to change their middle name in official records, so I have to search for: John Smith Doe John Williams Doe Doe Williams John |
? | The '?' operator acts as a wildcard character that replaces a single character in a word. This can be used if you are unsure how a name might be transliterated or if you believe that a letter may have been processed as a different character (i.e. the word 'William was recognized as 'Willian')
Note: Wildcard searches are resource intensive because they search every possible variation of each wildcard character that you add. | Jess?ca W??son | Jessica W@tson Jessica Watson Jessica the Witson Jessica Wilson Jesshca Wooson |
* | The '*' operator is a wildcard string that replaces zero or more characters within, at the beginning, or at the end of a term. This can be used to find a series of numbers within a string of text, or to find words/strings that start or end with a certain sequence of characters.
Note: Wildcard searches are resource intensive because they search every possible variation of each wildcard character that you add. | 202*5309 | 202-8675309 2029995309 202-999-5309
|
The Query Builder
The query builder helps you to build advanced queries using the search operators above more easily with a simple interface. To build a query using the query editor, simply click the toggle at the top of the search bar. This will drop down the query builder tool.
From here you can enter your search terms in the Search bar and specify the fuzziness of the search in the dropdown to the right of the bar. The fuzziness specifier is directly related to the ""~N operator above, which allows for proximity search around the search term itself.
You can also create more complex queries using the buttons below the initial entry bar:
Add Term: Allows AND/OR operator searches between different terms
Add Entity: Search through only specific entity types and properties for only the modeled datasets in Seamless Horizons.
Add Group: Acts like parentheses for a set of terms, linking searches together using AND/OR operators in a group.
When you are finished building your query, click the Save button in the bottom right and it will convert your search parameters into a string using the search operators that were outlined above. From there you can just click the search button as you would normally do to conduct a search.