Ignore accents / diacritics in search

millimoose · October 16, 2019, 10:03am

Small UI improvement I’d find handy:

I used Checkvist to make a rough project outline for a Slovak-speaking audience, meaning I had to actually use diacritics. SK netspeak however omits them and a lot of people don’t have them in muscle memory when trying to quickly navigate UI.

To improve usability, Chrome for instance allows searching in a page for terms sans diacritics while still finding words with diacritics; i.e. searching for “dlzen” will find “dĺžeň”. (Google’s searching works the same way.)

Would it be possible to make this work for the Checkvist search bar? (Having had to deal with this in my apps, a quick first approach is instead of searching the actual text, one searches in text transformed by doing NFD or NFKD Unicode normalization and removing all combining characters, with the search terms transformed the same way; i.e. the haystack text is different than the displayed text.)

maxkir · October 23, 2019, 7:16am

Hello,

Thanks a lot for mentioning this issue. Shame on me, I did not know there is a built-in normalization functions in Javascript - so did not use them when implemented filtering. Will try to squeeze it into the next update.
By the way, the full-text search in Checkvist (by double-enter) should already use this normalization.

Thanks again,

Vicentiu · March 12, 2020, 4:07am

Hi, I’m also interested in this.
I noticed it doesn’t work on the mobile site.

Would it be a lot of effort to add this?

Thanks

Vicentiu · March 13, 2020, 7:31am

I can confirm it works with global search but only for task names. It doesn’t normalize the notes so it doesn’t find those occurrences.

maxkir · March 17, 2020, 9:36pm

Hi @Vicentiu,

This may be not much work, but after a closer look, this brings some performance penalty for the filtering operation. So far I’ve filed this request as https://checkvist.uservoice.com/forums/2121/suggestions/39959248, please vote/watch, maybe I’ll find a better solution.

Thanks,

millimoose · March 20, 2020, 4:13pm

Are you doing the denormalization step for every post content as the user types in the filter string? Because precomputing the haystack strings when saving a note could help with that with fairly little effort.

If that’s the case, a bit more work would be setting up a proper index which should improve your performance across the board above current status. (Sorry if I’m being presumptuous here, my current project had me write full text indexing from near-scratch over ~100MB of data, so I can give you an outline of what’s involved if you’re interested.)

Thank you for all the work you already put in, it’s already a great help.

maxkir · March 21, 2020, 9:53pm

@millimoose Thanks for offering your help and for the suggestions. Currently, we don’t do any denormalization. Adding a separate denormalized index would definitely help, but it is more work than I expected to spend on such a task.

On the server-side we’re using SphinxSearch for the full-text searching, where such an index is created, but on the client-side, the similar logic should be written manually.

I’m just a bit overwhelmed with eliminating a quite important technical debt, not sure if I’ll have time to add this improvement into the schedule

Take care and stay healthy,

millimoose · April 4, 2020, 10:30am

Good luck with your technical debt, of course you know about your scheduling better than anybody else. If anything changes about this I’d be glad to help out, I’m about 80% convinced you can use the IndexedDB API in browsers to do most of the heavy lifting but I’m not sure about this. (I should maybe fill the quarantine lull by making this into a library myself, basically a hackable quick-and-dirty approach that’s easy to understand and modify as opposed to an industry-strength search solution.)