Chinese Document Search

by . .

When I come across an unfamiliar phrase, especially one that's metaphorical, it's useful to be able to search text documents, on the web or elsewhere, for regular expressions matching that phrase.

I'd like to begin building my own repository of documents to run such searches against, as well as write a program to run them against websites like the Chinese Text Project, Gu Shi Wen ("Ancient Poetry and Literature"), Jukuu, and Zdic. For a user interface, the ability to run searches comes first. Then, the ability to save information from a search into a memorization flashcard. Further improvements: the interrelated concepts of searching with (limited) regexps and other special operators (see ludwig.com); saving and re-running searches; saving (list (parameters) (API version)) to the user's local storage or offering to download it as a file... a way that produces premade regexps for different purposes, or the ability to save and re-run searches.

As for development techniques, this seems like a fit for a server application. I send it queries, it works, it sends back results. I could run the application on my local computer for my own purposes (a strategy of which I am growing fond), and providing the search tools to other people is as simple as running the app on my own VPS.