Microsoft Bing Fire Tokenizer – 10x Faster Than NLTK
Here we wanted to share with all of you our FInite State machine and REgular expression manipulation library (FIRE). Bling Fire Tokenizer is a tokenizer designed for fast-speed and quality tokenization of Natural Language text. Comparing Bling Fire with other popular NLP libraries, Bling Fire shows 10X faster speed in tokenization task
See more at benchmark wiki
To start using Bling Fire Library and Finite State Machine manipulation tools, you can build the project on Windows/Linux with CMake.
Source: github.com