Abstract: |
The task of recommending content to professionals (such as attorneys or brokers) differs greatly from the task of recommending news to casual readers. A casual reader may be satisfied with a couple of good recommendations, whereas an attorney will demand precise and comprehensive recommendations from various content sources when conducting legal research. Legal documents are intrinsically complex and multi-topical, contain carefully crafted, professional, domain specific language, and possess a broad and unevenly distributed coverage of issues. Consequently, a high quality content recommendation system for legal documents requires the ability to detect significant topics from a document and recommend high quality content accordingly. Moreover, a litigation attorney preparing for a case needs to be thoroughly familiar the principal arguments associated with various supporting opinions, but also with the secondary and tertiary arguments as well. This paper introduces an issue-based content recommendation system with a built-in topic detection/segmentation algorithm for the legal domain. The system leverages existing legal document metadata such as topical classifications, document citations, and click stream data from user behavior databases, to produce an accurate topic detection algorithm. It then links each individual topic to a comprehensive pre-defined topic (cluster) repository via an association process. A cluster labeling algorithm is designed and applied to provide a precise, meaningful label for each of the clusters in the repository, where each cluster is also populated with member documents from across different content types. This system has been applied successfully to very large collections of legal documents, O(100M), which include judicial opinions, statutes, regulations, court briefs, and analytical documents. Extensive evaluations were conducted to determine the efficiency and effectiveness of the algorithms in topic detection, cluster association, and cluster labeling. Subsequent evaluations conducted by legal domain experts have demonstrated that the quality of the resulting recommendations across different content types is close to those created by human experts. |