python 2.7 - NLTK - How to use NER -
how can invoke ner nltk of results on first 2 hundred characters of each of txt files located in same directory?
when try code:
for filename in os.listdir(ebooksfolder): fname, fextension = os.path.splitext(filename) if (fextension == '.txt'): newname = 'ner_' + filename file = open(ebooksfolder + '\\' + filename) rawfile = file.read() parttouse = rawfile[:50] segmentedsentences = nltk.sent_tokenize(parttouse) tokenizedsentences = [nltk.word_tokenize(sent) sent in segmentedsentences] postaggedsentences = [nltk.pos_tag(sent) sent in tokenizedsentences] nerresult = nltk.ne_chunk(postaggedsentences) pathtocopy = 'c:\\users\\felipe\\desktop\\books_txt\\' nametosave = os.path.join(pathtocopy, newname + '.txt') newfile = open(nametosave, 'w') newfile.write(nerresult) newfile.close()
i these errors:
traceback (most recent call last): file "<pyshell#77>", line 11, in <module> nerresult = nltk.ne_chunk(postaggedsentences) file "c:\python27\lib\site-packages\nltk\chunk\__init__.py", line 177, in ne_chunk return chunker.parse(tagged_tokens) file "c:\python27\lib\site-packages\nltk\chunk\named_entity.py", line 116, in parse tagged = self._tagger.tag(tokens) file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 58, in tag tags.append(self.tag_one(tokens, i, tags)) file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 78, in tag_one tag = tagger.choose_tag(tokens, index, history) file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 554, in choose_tag featureset = self.feature_detector(tokens, index, history) file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 605, in feature_detector return self._feature_detector(tokens, index, history) file "c:\python27\lib\site-packages\nltk\chunk\named_entity.py", line 49, in _feature_detector pos = simplify_pos(tokens[index][1]) file "c:\python27\lib\site-packages\nltk\chunk\named_entity.py", line 178, in simplify_pos if s.startswith('v'): return "v" attributeerror: 'tuple' object has no attribute 'startswith'
having tokenized text sentences , pos tags, need iterate on list of tagged sentences so:
nerresult = [nltk.ne_chunk(pts) pts in postaggedsentences]
instead of so:
nerresult = nltk.ne_chunk(postaggedsentences)
Comments
Post a Comment