https://github.com/t-sin/inquisitor.git
git clone 'https://github.com/t-sin/inquisitor.git'
(ql:quickload :inquisitor)
Encoding/end-of-line detecter and wrapper of external-format for Common Lisp.
The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible. – “The Library of Babel” by Jorge Luis Borges
Put in ASDF-path and type your REPL:
(require :inquisitor)
To detect encoding, use (inquisitor:detect-encoding stream scheme)
.
About scheme
, see Encoding scheme
.
for example:
(with-open-file (in "/path/to/utf8-lf.ja"
:direction :input
:element-type '(unsigned-byte 8))
(inquisitor:detect-encoding in :jp))
; => :UTF8
You can see the list of available encodings with inquititor.names:available-encodings
.
(inquisitor.names:available-encodings)
; => (:UTF8 :UCS-2LE :UCS-2BE :UTF16 :ISO-2022-JP :EUC-JP :CP932 :BIG5 :ISO-2022-TW
; :GB2312 :GB18030 :ISO-2022-CN :EUC-KR :JOHAB :ISO-2022-KR :ISO-8859-6 :CP1256
; :ISO-8859-7 :CP1253 :ISO-8859-8 :CP1255 :ISO-8859-9 :CP1254 :ISO-8859-5
; :KOI8-R :KOI8-U :CP866 :CP1251 :ISO-8859-2 :CP1250 :ISO-8859-13 :CP1257)
Scheme is a language speaking-world to detect encoding. Supported scheme is as follows:
(with-open-file (in "/path/to/utf8-lf.ja"
:direction :input
:element-type '(unsigned-byte 8))
(inquisitor:detect-end-of-line in))
; => :LF
(inquisitor.names:name-on-impl :cp932)
; => #<ENCODING "CP932" :UNIX> ; on CLISP
; => :WINDOWS-CP932 ; on ECL
; => :SHIFT_JIS ; on SBCL
; => :WINDOWS-31J ; on CCL
; => :|X-MS932_0213| ; on ABCL
Use inquisitor.eol:eol-available-p
.
(inquisitor:make-external-format
:utf8 ; implementation independent name of UTF-8
:lf) ; implementation independent name of LF
; => :UTF-8 ; on SBCL
; => #<EXTERNAL-FORMAT :CP932/:DOS #xxxxxxxxxxx> ; on CCL
In case of vector (on CCL):
(inquisitor:detect-external-format
(encode-string-to-octets "公的な捜索係、調査官がいる。
わたしは彼らが任務を遂行しているところを見た。")
:jp)
; => #<EXTERNAL-FORMAT :UTF-8/:UNIX #xxxxxxxxxx>
In case of stream (on CCL):
(with-open-file (in "/path/to/utf8-lf.ja"
:direction :input
:element-type '(unsigned-byte 8))
(inquisitor:detect-external-format in :jp)
; => #<EXTERNAL-FORMAT :UTF-8/:UNIX #xxxxxxxxxx>
In case of pathname (on CCL):
(inquisitor:detect-external-format #P"/path/to/utf8-lf.ja" :jp)
; =># <EXTERNAL-FORMAT :UTF-8/:UNIX #xxxxxxxxxx>
Copyright (c) 2000-2007 Shiro Kawai (shiro@acm.org)
Copyright (c) 2007 Masayuki Onjo (onjo@lispuser.net)
Copyright (c) 2011 zqwell (zqwell@gmail.com)
Copyright (c) 2015 gray (shinichi.tanaka45@gmail.com)
Licensed under the MIT License.