https://github.com/g000001/tagger.git
git clone 'https://github.com/g000001/tagger.git'
(ql:quickload :tagger)
This directory contains release 1.2 of the Xerox Part-of-Speech tagger. For more information, print the file doc/tagger/tagger.ps.
Until this project is added to the Quicklisp repository installation must be performed manually in several steps (considering that you've got Quicklisp installed already):
cd
to the ~/quicklisp/local-projects
directory..asd
files in the directory from step 2.Now it is possible to download the application either in parts or entirely:
(ql:quickload "tagger")
When the loading is complete, you can run some simple queries:
(tag-analysis:tag-string "I saw the man on the hill with the telescope.")
I saw the man on the hill with the telescope.
ppss/2 vbd/3 at nn in at nn in/2 at nn/2
(The number following the tag is the arity of the ambiguity class assigned by the lexicon. Words without a number are unambiguous.)
To use the tagger in a program, create a tagging-ts and use the values of calls to the generic function next-token. Note that reinitialize-instance redirects tagging to a new text with minimal initialization overhead.
For example, the following function, my-tag-files, calls my-process-token-and-tag on each token/tag pair generated by tagging each le in the argument files:
(use-package :tdb)
(use-package :tag-analysis)
(defun my-tag-files (files)
(let ((token-stream (make-instance 'tagging-ts)))
(dolist (file files)
(with-open-file (char-stream file)
(reinitialize-instance token-stream :char-stream char-stream)
(loop (multiple-value-bind (token tag)
(next-token token-stream)
(unless token (return))
(my-process-token-and-tag token tag)))))))