Skip to content

Tunable full text search engine in JavaScript that: (1) works natively on web apps like Express.js; (2) easy to customize (via BM25) to specific types of documents (e.g. tweets, scientifc journals); (3) is deployable on either the client-side or the server side.

License

Notifications You must be signed in to change notification settings

zjohn77/retrieval

Repository files navigation

Build Status

Table of Contents

  1. Basic Idea and Key Benefits
  2. Deploy Full-Text Search in an App
  3. Install
  4. User Guide

1. Basic Idea and Key Benefits:

alt text

An Elasticsearch-comparable, full-text search engine using JavaScript that leverages advanced Natural Language Processing. The BM25 ranking function at the core of this project is tunable to different types of texts (e.g. tweets, scientific journals, legal writing). Key features are:

  • The JavaScript source code can be natively deployed on the server side to Node.js as well as on the client side in browser extensions, single-page apps, serverless, React Native, edge computing, and many other applications.
  • The accuracy and versatility of BM25 comes from being able to tune its parameters to specific types of documents.
  • Separates offline indexing from the time-sensitive online search.
  • Each individual NLP component, like the stemmer or the stopword list, is pluggable and carefully researched to keep at the bleeding edge. (For example, the stopword list is a confluence of the best words from three authoritative stopword lists: the Stanford CoreNLP, Journal of Machine Learning Research, and NLTK.)
  • Dockerfile and Docker image are available. Conveniently tryout the module.
  • Reasonable unit test coverage, continuous integration, and separation of concerns for each functionality.

2. Deploy Full-Text Search in an App:

demo2

Right above is a demo Express app (see MEAN stack) enhanced with full-text search capability. The easy way to try this demo is to run its docker image as below, then point browser to localhost:3000 .

docker run --rm -d -p 3000:8080 jj232/retrieval

Or you can run the command below after installing:

npm run demo2

Then, point browser to localhost:8080 .

Suggestions on deploying: For integrating the module into a simple js app, the demo right here shows this to be doable in only a few lines of code--see source code at "./demo/demo2/server.js". But for a more complex software solution, or one that relies on other languages/RTEs, the recommended way is to Dockerize this module and then expose as a microservice.

3. Install:

For the latest release:

npm install retrieval

For continuous build:

git clone https://github.com/zjohn77/retrieval.git
cd retrieval
npm install

4. User Guide:

const path = require("path");
const Retrieval = require(path.join(__dirname, "..", "..", "src", "Retrieval.js"));
const texts = require("./data/music-collection"); // Load some sample texts to search.

// 1st step: instantiate Retrieval with the tuning parameters for BM25 that attenuate term frequency.
let rt = new Retrieval(K=1.6, B=0.75);

// 2nd step: index the array of texts (strings); store the resulting document-term matrix.
rt.index(texts);

// 3rd step: search. In other words, multiply the document-term matrix and the indicator vector representing the query.
rt.search("theme and variations", 5)   // Top 5 search results for the query 'theme and variations'
  .map(item => console.log(item));
// 04 - Theme & Variations In G Minor.flac
// 17 - Rhapsody On A Theme of Paganini - Variation 18.flac
// 01 - Diabelli Variations - Theme Vivace & Variation 1 Alla Marcia Maestoso.flac
// 07 - Rhapsody On A Theme of Paganini (Introduction and 24 Variations).flac
// 10 - Diabelli Variations - Variation 10 Presto.flac

The example right above is from "./demo/demo1/scenarios.js". To run the full example, do:

npm run demo1

To run unit tests, do:

npm test

About

Tunable full text search engine in JavaScript that: (1) works natively on web apps like Express.js; (2) easy to customize (via BM25) to specific types of documents (e.g. tweets, scientifc journals); (3) is deployable on either the client-side or the server side.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published
  NODES
COMMUNITY 1
Idea 2
idea 2
Project 4
USERS 1