- ML work was in service of a number of objectives, depending on the project
- Most interesting areas dealt with semantic text analysis
- Which boiled down to three main subtasks:
- Domain-specific feature engineering
- Semantic search
- Open990 maintained profiles of 1.9M charities
- Official taxonomy was nearly useless
- Built a search engine that could tell the difference between an entity name and a service category
- Would then recommend several options in a dropdown
- If the person completed the search, it would continue to an alternate search engine
- Based on textual similarity with faceting
- Started out with a basic bag-of-words approach and cosine similarity
- By the time we finished, some of our projects were using word2vec for static embeddings
- Semantic text analysis
- In one case, needed to come up with a measure of “Catholic-ness” for a foundation
- Two kinds of projects there:
- Semantic similarity
- Identifying relevant charities for foundations
- Anomaly detection
- Search