Write a program to create a dataframe df_nda ndArray that stores letters and words starting from 'g' to 'p'. (The first column stores letter and the second column stores the words starting with that letter.)
Answers
Answer:
Back in elementary school you learnt the difference between nouns, verbs, adjectives, and adverbs. These "word classes" are not just the idle invention of grammarians, but are useful categories for many language processing tasks. As we will see, they arise from simple analysis of the distribution of words in text. The goal of this chapter is to answer the following questions:
What are lexical categories and how are they used in natural language processing?
What is a good Python data structure for storing words and their categories?
How can we automatically tag each word of a text with its word class?
Along the way, we'll cover some fundamental techniques in NLP, including sequence labeling, n-gram models, backoff, and evaluation. These techniques are useful in many areas, and tagging gives us a simple context in which to present them. We will also see how tagging is the second step in the typical NLP pipeline, following tokenization.
The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tagset. Our emphasis in this chapter is on exploiting tags, and tagging text automatically.
5.1 Using a Tagger
A part-of-speech tagger, or POS-tagger, processes a sequence of words, and attaches a part of speech tag to each word (don't forget to import nltk):
>>> text = nltk.word_tokenize("And now for something completely different") >>> nltk.pos_tag(text) [('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')]
Here we see that and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective.
Note
NLTK provides documentation for each tag, which can be queried using the tag, e.g. nltk.help.upenn_tagset('RB'), or a regular expression, e.g. nltk.help.upenn_brown_tagset('NN.*'). Some corpora have README files with tagset documentation, see nltk.corpus.???.readme(), substituting in the name of the corpus.
Let's look at another example, this time including some homonyms:
>>> text = nltk.word_tokenize("They refuse to permit us to obtain the refuse permit") >>> nltk.pos_tag(text) [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'), ('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')]
Notice that refuse and permit both appear as a present tense verb (VBP) and a noun (NN). E.g. refUSE is a verb meaning "deny," while REFuse is a noun meaning "trash" (i.e. they are not homophones). Thus, we need to know which word is being used in order to pronounce the text correctly. (For this reason, text-to-speech systems usually perform POS-tagging.)
Note
Your Turn: Many words, like ski and race, can be used as nouns or verbs with no difference in pronunciation. Can you think of others? Hint: think of a commonplace object and try to put the word to before it to see if it can also be a verb, or think of an action and try to put the before it to see if it can also be a noun. Now make up a sentence with both uses of this word, and run the POS-tagger on this sentence.
Lexical categories like "noun" and part-of-speech tags like NN seem to have their uses, but the details will be obscure to many readers. You might wonder what justification there is for introducing this extra level of information. Many of these categories arise from superficial analysis the distribution of words in text. Consider the following analysis involving woman (a noun), bought (a verb), over (a preposition), and the (a determiner). The text.similar() method takes a word w, finds all contexts w1w w2, then finds all words w' that appear in the same context, i.e. w1w'w2.
>>> text = nltk.Text(word.lower() for word in nltk.corpus.brown.words()) >>> text.similar('woman') Building word-context index... man time day year car moment world family house country child boy state job way war girl place room word >>> text.similar('bought') made said put done seen had found left given heard brought got been was set told took in felt that >>> text.similar('over') in on to of and for with from at by that into as up out down through is all about >>> text.similar('the') a his this their its her an that our any all one these my in your no some other and
Observe that searching for woman finds nouns; searching for bought mostly finds verbs; searching for over generally finds prepositions; searching for the finds several determiners. A tagger can correctly identify the tags on these words in the context of a sentence, e.g. The woman bought over $150,000 worth of clothes.
A tagger can also model our knowledge of unknown words, e.g. we can guess that scrobbling is probably a verb, with the root scrobble, and likely to occur in contexts like he was scrobbling.
⤵️⤵️
Imagine that for three years, your other half had been hiding more than 58,60000.00 Rupees from you. Could you do the same thing? Jyoti, a mother of two from Indore, put her marriage and her family in jeopardy by keeping a shocking secret from her husband for three years. Jyoti and Avinash Mehta got married in 2004 after a few years of dating. Avinash Mehta is an engineer, and Jyoti works at the reception of a local clinic. After the birth of their first child in 2008, they decided that Jyoti would give up her job to stay home and look after him. Productive sectors of the economy were hit hard by the global financial crisis in 2008. Fortunately, Avinash Mehta didn't lose his job, but he had to work overtime and decrease his salary significantly in order to help his company survive. "My husband could take care of our family's basic needs, but I could see how exhausted his work was making him. We tried to save as much as possible, but we still didn't have enough money" told Jyoti. "