自然语言处理介绍
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
You’re invited to our dinner party, Friday May 27 at 8:30
Party May 27 add
Dan Jurafsky
Ambiguity makes NLP hard: “Crash blossoms”
Violinist Linked to JAL Crash Blossoms Teacher Strikes Idle Kids Red Tape Holds Up New Bridges Hospitals Are Sued by 7 Foot Doctors Juvenile Court to Try Shooting Defendant Local High School Dropouts Cut in Half
Information Extraction
Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris Create new Calendar entry
Dan Jurafsky
Machine Translation
• Fully automatic
Enter Source Text:
• Helping human translators
这 不过 是 一 个 时间 的 问题 .
Translation from Stanford’s Phrasal:
Word sense disambiguation (WSD) I need new batteries for my mouse.
Colorless green ideas sleep furiously.
Parsing
I can see Alcatraz from the window!
Summarization
The Dow Jones is up The S&P500 jumped Housing prices rose Economy is good
Named entity recognition (NER)
PERSON ORG LOC
Machine translation (MT)
第13届上海国际电影节开幕… The 13th Shanghai International Film Festival…
Carter told Mubarak he shouldn’t run again.
Paraphrase
XYZ acquired ABC yesterday ABC has been taken over by XYZ
Part-of-speech (POS) tagging
ADJ ADJ NOUN VERB
• For practical, robust real-world applications
Dan Jurafsky
Skills you’ll need
• Simple linear algebra (vectors, matrices) • Basic probability theory • Java or Python programming
Dan Jurafsky
Ambiguity is pervasive
New York Times headline (17 May 2000)
Fed raises interest rates
Fed raises interest rates
Fed raises interest rates 0.5%
Introduction to NLP
What is Natural Language Processing?
Dan Jurafsky
Question Answering: IBM’s Watson
• Won Jeopardy on February 16, 2011!
WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL
neologisms
unfriend Retweet bromance
world knowledge
Mary and Sue are sisters. Mary and Sue are mothers.
tricky entity names
Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene …
• Weekly programming assignments
Introduction to NLP
What is Natural Language Processing?
Dan Jurafsky
In-video quizzes!
• Most lectures will include a little quiz • Just to check basic understanding • Simple, multiple-choice. • You can retake them if you get them wrong
Einstein met with UN officials in Princeton
Dialog
Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket?
Information extraction (IE)
mostly solved
Spam detection
Let’s go to Agra!
Buy V1AGRA …
Best roast chicken in San Francisco! The waiter ignored us for 20 minutes.
✓ ✗
ADV
Coreference resolution
9
Dan Jurafsky
Why else is natural language understanding difficult?
segmentation issues
the New York-New Haven Railroad the New York-New Haven Railroad
non-standard English
• How we generally do this:
• probabilistic models built from language data • P(“maison” “house”) high • P(“L’avocat général” “the general avocado”) low • Luckily, rough text features can often do half the job.
Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥
idioms
dark horse get cold feet lose face throw in the towel
3
Dan Jurafsky
Information Extraction & Sentiment Analysis
Attributes: zoom affordability size and weight flash ease of use
Size and weight ✓ • nice and compact to carry! • since the camera is small and light, I won't need to carry ✓ around those heavy, bulky professional cameras either! ✗ • the camera feels flimsy, is plastic and very light in weight you have to be very delicate in the handling of this camera 4
This is only a matter of time.
5
Dan Jurafsky
Baidu Nhomakorabea
Language Technology
making good progress
Sentiment analysis
still really hard
Question answering (QA)
Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?
Bram Stoker
2
Dan Jurafsky
Event: Curriculum mtg Date: Jan-16-2012 Subject: curriculum meeting Start: 10:00am Date: January 15, 2012 End: 11:30am Where: Gates 159 To: Dan Jurafsky
Dan Jurafsky
This class
• Teaches key theory and methods for statistical NLP:
• • • • • • • • • Viterbi Naï ve Bayes, Maxent classifiers N-gram language modeling Statistical Parsing Inverted index, tf-idf, vector models of meaning Information extraction Spelling correction Information retrieval Sentiment analysis
But that’s what makes it fun!
Dan Jurafsky
Making progress on this problem…
• The task is difficult! What tools do we need?
• Knowledge about language • Knowledge about the world • A way to combine knowledge sources
Party May 27 add
Dan Jurafsky
Ambiguity makes NLP hard: “Crash blossoms”
Violinist Linked to JAL Crash Blossoms Teacher Strikes Idle Kids Red Tape Holds Up New Bridges Hospitals Are Sued by 7 Foot Doctors Juvenile Court to Try Shooting Defendant Local High School Dropouts Cut in Half
Information Extraction
Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris Create new Calendar entry
Dan Jurafsky
Machine Translation
• Fully automatic
Enter Source Text:
• Helping human translators
这 不过 是 一 个 时间 的 问题 .
Translation from Stanford’s Phrasal:
Word sense disambiguation (WSD) I need new batteries for my mouse.
Colorless green ideas sleep furiously.
Parsing
I can see Alcatraz from the window!
Summarization
The Dow Jones is up The S&P500 jumped Housing prices rose Economy is good
Named entity recognition (NER)
PERSON ORG LOC
Machine translation (MT)
第13届上海国际电影节开幕… The 13th Shanghai International Film Festival…
Carter told Mubarak he shouldn’t run again.
Paraphrase
XYZ acquired ABC yesterday ABC has been taken over by XYZ
Part-of-speech (POS) tagging
ADJ ADJ NOUN VERB
• For practical, robust real-world applications
Dan Jurafsky
Skills you’ll need
• Simple linear algebra (vectors, matrices) • Basic probability theory • Java or Python programming
Dan Jurafsky
Ambiguity is pervasive
New York Times headline (17 May 2000)
Fed raises interest rates
Fed raises interest rates
Fed raises interest rates 0.5%
Introduction to NLP
What is Natural Language Processing?
Dan Jurafsky
Question Answering: IBM’s Watson
• Won Jeopardy on February 16, 2011!
WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL
neologisms
unfriend Retweet bromance
world knowledge
Mary and Sue are sisters. Mary and Sue are mothers.
tricky entity names
Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene …
• Weekly programming assignments
Introduction to NLP
What is Natural Language Processing?
Dan Jurafsky
In-video quizzes!
• Most lectures will include a little quiz • Just to check basic understanding • Simple, multiple-choice. • You can retake them if you get them wrong
Einstein met with UN officials in Princeton
Dialog
Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket?
Information extraction (IE)
mostly solved
Spam detection
Let’s go to Agra!
Buy V1AGRA …
Best roast chicken in San Francisco! The waiter ignored us for 20 minutes.
✓ ✗
ADV
Coreference resolution
9
Dan Jurafsky
Why else is natural language understanding difficult?
segmentation issues
the New York-New Haven Railroad the New York-New Haven Railroad
non-standard English
• How we generally do this:
• probabilistic models built from language data • P(“maison” “house”) high • P(“L’avocat général” “the general avocado”) low • Luckily, rough text features can often do half the job.
Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥
idioms
dark horse get cold feet lose face throw in the towel
3
Dan Jurafsky
Information Extraction & Sentiment Analysis
Attributes: zoom affordability size and weight flash ease of use
Size and weight ✓ • nice and compact to carry! • since the camera is small and light, I won't need to carry ✓ around those heavy, bulky professional cameras either! ✗ • the camera feels flimsy, is plastic and very light in weight you have to be very delicate in the handling of this camera 4
This is only a matter of time.
5
Dan Jurafsky
Baidu Nhomakorabea
Language Technology
making good progress
Sentiment analysis
still really hard
Question answering (QA)
Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?
Bram Stoker
2
Dan Jurafsky
Event: Curriculum mtg Date: Jan-16-2012 Subject: curriculum meeting Start: 10:00am Date: January 15, 2012 End: 11:30am Where: Gates 159 To: Dan Jurafsky
Dan Jurafsky
This class
• Teaches key theory and methods for statistical NLP:
• • • • • • • • • Viterbi Naï ve Bayes, Maxent classifiers N-gram language modeling Statistical Parsing Inverted index, tf-idf, vector models of meaning Information extraction Spelling correction Information retrieval Sentiment analysis
But that’s what makes it fun!
Dan Jurafsky
Making progress on this problem…
• The task is difficult! What tools do we need?
• Knowledge about language • Knowledge about the world • A way to combine knowledge sources