inquisitor

https://github.com/t-sin/inquisitor.git

git clone 'https://github.com/t-sin/inquisitor.git'

(ql:quickload :inquisitor)
11

Inquisitor

Quicklisp

Build Status Circle CI Coverage Status

Encoding/end-of-line detecter and wrapper of external-format for Common Lisp.

The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible. – “The Library of Babel” by Jorge Luis Borges

Goal

Installation

Put in ASDF-path and type your REPL:

(require :inquisitor)

Usage

Detecting encoding

To detect encoding, use (inquisitor:detect-encoding stream scheme). About scheme, see Encoding scheme.

for example:

(with-open-file (in "/path/to/utf8-lf.ja"
                 :direction :input
                 :element-type '(unsigned-byte 8))
  (inquisitor:detect-encoding in :jp))
; => :UTF8

You can see the list of available encodings with inquititor.names:available-encodings.

(inquisitor.names:available-encodings)
; => (:UTF8 :UCS-2LE :UCS-2BE :UTF16 :ISO-2022-JP :EUC-JP :CP932 :BIG5 :ISO-2022-TW
;     :GB2312 :GB18030 :ISO-2022-CN :EUC-KR :JOHAB :ISO-2022-KR :ISO-8859-6 :CP1256
;     :ISO-8859-7 :CP1253 :ISO-8859-8 :CP1255 :ISO-8859-9 :CP1254 :ISO-8859-5
;     :KOI8-R :KOI8-U :CP866 :CP1251 :ISO-8859-2 :CP1250 :ISO-8859-13 :CP1257)

Encoding scheme

Scheme is a language speaking-world to detect encoding. Supported scheme is as follows:

Detecting end-of-line type

(with-open-file (in "/path/to/utf8-lf.ja"
 :direction :input
 :element-type '(unsigned-byte 8))
  (inquisitor:detect-end-of-line in))
; => :LF

Getting name on your implementation

(inquisitor.names:name-on-impl :cp932)
; => #<ENCODING "CP932" :UNIX>  ; on CLISP
; => :WINDOWS-CP932  ; on ECL
; => :SHIFT_JIS  ; on SBCL
; => :WINDOWS-31J  ; on CCL
; => :|X-MS932_0213|  ; on ABCL

If you want to know eol is available on your implementation

Use inquisitor.eol:eol-available-p.

Making external-format implementation independently

(inquisitor:make-external-format
  :utf8 ; implementation independent name of UTF-8
  :lf) ; implementation independent name of LF
; => :UTF-8  ; on SBCL
; => #<EXTERNAL-FORMAT :CP932/:DOS #xxxxxxxxxxx>  ; on CCL

Auto detecting and making external-format, from vector, stream and pathname

In case of vector (on CCL):

(inquisitor:detect-external-format
  (encode-string-to-octets "公的な捜索係、調査官がいる。
わたしは彼らが任務を遂行しているところを見た。")
  :jp)
; => #<EXTERNAL-FORMAT :UTF-8/:UNIX #xxxxxxxxxx>

In case of stream (on CCL):

(with-open-file (in "/path/to/utf8-lf.ja"
 :direction :input
 :element-type '(unsigned-byte 8))
   (inquisitor:detect-external-format in :jp)
; => #<EXTERNAL-FORMAT :UTF-8/:UNIX #xxxxxxxxxx>

In case of pathname (on CCL):

(inquisitor:detect-external-format #P"/path/to/utf8-lf.ja" :jp)
; =># <EXTERNAL-FORMAT :UTF-8/:UNIX #xxxxxxxxxx>

Author

Copyright

Copyright (c) 2000-2007 Shiro Kawai (shiro@acm.org)
Copyright (c) 2007 Masayuki Onjo (onjo@lispuser.net)
Copyright (c) 2011 zqwell (zqwell@gmail.com)
Copyright (c) 2015 gray (shinichi.tanaka45@gmail.com)

License

Licensed under the MIT License.