tagger

https://github.com/g000001/tagger.git

git clone 'https://github.com/g000001/tagger.git'

(ql:quickload :tagger)
4

This directory contains release 1.2 of the Xerox Part-of-Speech tagger. For more information, print the file doc/tagger/tagger.ps.


Until this project is added to the Quicklisp repository installation must be performed manually in several steps (considering that you've got Quicklisp installed already):

  1. Download project sources (either by cloning the repository or downloading it in an archive).
  2. Unpack them to some directory, remember it.
  3. cd to the ~/quicklisp/local-projects directory.
  4. Create symbolic links to the .asd files in the directory from step 2.

Now it is possible to download the application either in parts or entirely:

(ql:quickload "tagger")

When the loading is complete, you can run some simple queries:

(tag-analysis:tag-string "I saw the man on the hill with the telescope.")

I saw the man on the hill with the telescope.
ppss/2 vbd/3 at nn in at nn in/2 at nn/2

(The number following the tag is the arity of the ambiguity class assigned by the lexicon. Words without a number are unambiguous.)

Programmatic Tagging

To use the tagger in a program, create a tagging-ts and use the values of calls to the generic function next-token. Note that reinitialize-instance redirects tagging to a new text with minimal initialization overhead.

For example, the following function, my-tag-files, calls my-process-token-and-tag on each token/tag pair generated by tagging each le in the argument files:

(use-package :tdb)
(use-package :tag-analysis)
(defun my-tag-files (files)
  (let ((token-stream (make-instance 'tagging-ts)))
    (dolist (file files)
      (with-open-file (char-stream file)
    (reinitialize-instance token-stream :char-stream char-stream)
    (loop (multiple-value-bind (token tag)
          (next-token token-stream)
        (unless token (return))
        (my-process-token-and-tag token tag)))))))