Perl and Unicode

PERL & UNICODE

perl 5.8.0까지 ${^WIDE_SYSTEM_CALLS} 지원
${^WIDE_SYSTEM_CALLS}=1 상태에서는
기본적으로 utf8으로 인코딩되어 STDOUT으로 나감
binmode(STDOUT, “:bytes”)도 동일한 효과
binmode(STDOUT, “:utf8”)을 사용하면, utf8을 Latin1으로 취급하여 한번더! utf8 인코딩 됨
binmode(STDOUT, “:encoding(euc-kr)”)을 제대로 사용하려면, 출력될 스트링을 모두 ‘Encode::_utf8_on해야 인코딩됨’

File::Find::find의 경우 경우, $_로 패스되는 녀석들은 octet
(따라서 그냥은 binmode(STDOUT, ":encoding(euc-kr)") 따위가 전혀 동작하지 않음)
perl의 다른 네이티브 펑션들에 대해서는 확인이 필요함
  • 그러나 perl 5.8.0에는 UTF-8 validation이 없음

  • perl5.8.0에서 glob류는 유니코드 미대응!
    readdir류는 대응함. 즉, 직접 다루는 수밖에.

opendir D, ‘.’ or die “cannot readdir .”;
my @files = grep { ! /^.{1,2}$/ } readdir D;

  • 다음과 같이 Win32::APIFile을 직접 이용하는 방법도 있다.

Jan Dubois
View profile
More options Feb 25 2005, 1:15 am
Newsgroups: perl.unicode
From: j…@ActiveState.com (Jan Dubois)
Date: Thu, 24 Feb 2005 08:15:16 -0800
Local: Fri, Feb 25 2005 1:15 am
Subject: RE: Perl and unicode file names
Reply to author | Forward | Print | View thread | Show original | Report this message | Find messages by this author
On Thu, 24 Feb 2005, Ed Batutis wrote:

So the problem I have is how to proceed. Should I give up with
Perl and use Java or C? Any suggestions gratefully received.
I started a really ‘fun’ flame war on this topic several months ago,
so I hesitate to say anything more. But, yes, you should give up on
Perl – or run your script on Linux with a utf-8 locale. On Win32, Perl
internals are converting the filename characters to the system default
code page. So, you are SOL for what you are trying to do.

Actually, you can work around the problems on Windows by using the
Win32API::File and the Encode module. Here is a sample program
Gisle came up with:
#!perl -w
use strict;
use Fcntl qw(O_RDONLY);
use Win32API::File qw(CreateFileW OsFHandleOpenFd :FILE_ OPEN_EXISTING);
use Encode qw(encode);
binmode(STDOUT, “:utf8”);
my $h = CreateFileW(encode(“UTF-16LE”, “\x{2030}.txt\0″), FILE_READ_DATA,
0, [], OPEN_EXISTING, 0, []);
my $fd = OsFHandleOpenFd($h, O_RDONLY);
die if $fd <0;
open(my $fh, ” binmode($fh, “:encoding(UTF-16LE)”);
while () {
print $_;
}

close($fh) || die;
END
It may be possible to do similar readdir() emulation as well.
Win32::APIFile is part of libwin32 and already included in ActivePerl.
Cheers,
-Jan

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.