python 2.7 - NLTK - How to use NER -


how can invoke ner nltk of results on first 2 hundred characters of each of txt files located in same directory?


when try code:

for filename in os.listdir(ebooksfolder):     fname, fextension = os.path.splitext(filename)         if (fextension == '.txt'):             newname = 'ner_' + filename             file = open(ebooksfolder + '\\' + filename)             rawfile = file.read()             parttouse = rawfile[:50]             segmentedsentences = nltk.sent_tokenize(parttouse)             tokenizedsentences = [nltk.word_tokenize(sent) sent in segmentedsentences]             postaggedsentences = [nltk.pos_tag(sent) sent in tokenizedsentences]             nerresult = nltk.ne_chunk(postaggedsentences)             pathtocopy = 'c:\\users\\felipe\\desktop\\books_txt\\'             nametosave = os.path.join(pathtocopy, newname + '.txt')             newfile = open(nametosave, 'w')             newfile.write(nerresult)             newfile.close() 

i these errors:

traceback (most recent call last):   file "<pyshell#77>", line 11, in <module>     nerresult = nltk.ne_chunk(postaggedsentences)   file "c:\python27\lib\site-packages\nltk\chunk\__init__.py", line 177, in ne_chunk     return chunker.parse(tagged_tokens)   file "c:\python27\lib\site-packages\nltk\chunk\named_entity.py", line 116, in parse     tagged = self._tagger.tag(tokens)   file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 58, in tag     tags.append(self.tag_one(tokens, i, tags))   file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 78, in tag_one     tag = tagger.choose_tag(tokens, index, history)   file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 554, in choose_tag     featureset = self.feature_detector(tokens, index, history)   file "c:\python27\lib\site-packages\nltk\tag\sequential.py", line 605, in feature_detector     return self._feature_detector(tokens, index, history)   file "c:\python27\lib\site-packages\nltk\chunk\named_entity.py", line 49, in _feature_detector     pos = simplify_pos(tokens[index][1])   file "c:\python27\lib\site-packages\nltk\chunk\named_entity.py", line 178, in simplify_pos     if s.startswith('v'): return "v" attributeerror: 'tuple' object has no attribute 'startswith' 

having tokenized text sentences , pos tags, need iterate on list of tagged sentences so:

nerresult = [nltk.ne_chunk(pts) pts in postaggedsentences] 

instead of so:

nerresult = nltk.ne_chunk(postaggedsentences) 

Comments

Popular posts from this blog

php - render data via PDO::FETCH_FUNC vs loop -

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

The canvas has been tainted by cross-origin data in chrome only -