数据驱动的大规模知识图谱构建方法
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• Missing isA relaiton hurts the understanding the concepts of entities
• Is Lincoln zephyr a car?
Solution idea: CF based Missing isA inference
• User-based collaborative filtering!
• Why Knowledge Graphs? • Understanding the semantic of text needs background knowledge • A robot brain needs knowledge base to understand the world • Yago,WordNet, FreeBase, Probase, NELL, CYC, DBPedia….
Probase
• A web-scale taxonomy derived from web pages by Hearst linguistic patterns
• “…famous basketball players such as Michael Jordan …” • domestic animals such as cats and dogs ... • China is a developing country. • Life is a box of chocolate. • 10M concepts, and 16M isA relations Hearst pattern
Quality: Wrong data
• Graph structure based correction
Completion
Quality: • Collaborative filtering Missing data
based completion • Transitivity inference based completion
Completion
Quality: • Collaborative filtering Missing data
based completion • Transitivity inference based completion
Jiaqing Liang, Yi Zhang, Yanghua Xiao*, Haixun Wang, Wei Wang and Pinpin Zhu, On the Transitivity of Hypernym-hyponym Relations in Data-Driven Lexical Taxonomies, (AAAI 2017)
• Dynamically tuning k for the top-k selection • Build a regression model • Noisy-or model amplifying the weak signals
• Efficiency
• How to reduce the quadratic complexity of pairwise similarity computation?
• Knowledge graph is a large scale semantic network consisting of entities/concepts as well as the semantic relationships among them
• • • • Higher coverage over entities and concept Richer semantic relationships Usually organized as RDF Quality insurance by Crowdsourcing
• First find c’s synonyms and siblings • Then we transport their hypernyms to c
Idea: if most similar terms of c have h as the hypernym, c is likely to have the hypernym h.
Pipeline of KG construction
Extraction
• End-to-end • Domain specific
Cost: Costly Human Efforts
Correction
Quality: Wrong data
• Graph structure based correction
• Auto-constructed knowledge graph
• • • •
Automatically extracted from huge web corpus Examples: Probase、WikiTaxonomy, etc Size: Huge (From huge corpus) Quality: Good (The accuracy can’t reach 100%)
NP such as NP, NP, ..., and|or NP such NP as NP,* or|and NP NP, NP*, or other NP NP, NP*, and other NP NP, including NP,* or | and NP NP, especially NP,* or|and NP
Data Driven Approaches for Large-scale Knowledge Graph Construction
Yanghua Xiao Fudan University Kowledge Works at Fudan (kw.fudan.edu.cn)
Knowledge Graph
• Because of the huge size, there are many wrong facts
Pipeline of KG construction
Extraction
• End-to-end • Domain specific
Cost: Costly Human Efforts
Correction
Pipeline of KG construction
Extraction
• End-to-end • Domain specific
Cost: Costly Human Efforts
Correction
Quality: Wrong data
• Graph structure based correction
Missing isA relationships in Probase
• “car” and “automobile” are synonyms
• They should share hypernyms • “automobile” should beA “wheelbase vehicle”
Data Driven vs Hand Crafted
• Manually constructed knowledge graph
• Examples: WordNet, Cyc • Size: Small (Huge human cost) • Quality: Almost perfect (Each relation is checked by expects)
Problems Hale Waihona Puke Baiduo be solved
• Effectiveness
• Sparsity: How to deign an effective similarity metric? • Weight aware: How to estimate a frequency for the new isA relation? • Diversity: How to select the final hypernyms?
• Hypernyms • Concepts • Synonyms or Siblings --- Items --- Users --- Similar users
• Concepts with similar meanings tend to share hypernyms/hyponyms in an isA taxonomy • To find missing hypernyms for a concept c
Completion
Quality: • Collaborative filtering Missing data
based completion • Transitivity inference based completion
Jiaqing Liang, Yanghua Xiao, et a, Probase+: Inferring Missing Links in Conceptual Taxonomies, to be published in TKDE 2017
Motivation
• We can use transitivity to find many missing isA relations
• Example 1
• But it is not trivial, there are wrong cases
• Example 2 & 3
omy is taken for granted, that is, given hyponym(A, B) an hyponym(B, C), we know hyponym(A, C) (Sang 2007), a shown in Example 1. Transitivity is thusin one of the corne human-crafted taxonomies, transitivity a lexical taxon stones knowledge-based inferencing, and many omy isin taken for granted, that is, given hyponym (A,applic B) an hyponym B, transitivity C), we know hyponym (A ,C ) (Sang tions rely(on (e.g., finding all the super2007), concepa shown in Example 1. Transitivity is thus one of the corne of an instance). stones in 1 knowledge-based inferencing, and Example Is Einstein a scientist ? many applic tions rely on transitivity (e.g., finding all the super concep hyponym (einstein , physicist ) of an instance). hyponym (physicist, scientist) ) hyponym (einstein , scientist ) Example 1 Is Einstein a scientist ? hyponym(einstein, physicist) Unfortunately, transitivity does not always hold in dat hyponym(physicist, scientist) driven lexical taxonomies. Let us consider the following tw ) hyponym (einstein , scientist ) examples: Unfortunately, transitivity does not always hold in dat Example 2 Is Einstein a profession? driven lexical taxonomies. Let us consider the following tw hyponym(einstein, scientist) examples: hyponym (scientist, profession) ; hyponym (einstein , profession ) Example 2 Is Einstein a profession ? Example 3 Is a car, profession seat a piece hyponym(scientist ) of furniture?
• Upper-bound pruning
Results
• Recover 5.1M missing edges, with precision 87%, recall 80%. • Probase plus has accuracy 91%
Precision and recall Case study