字符集信息通常在DMP文件头的第二和第三个字节信息进行表示。注意:这个顺序是在Big-Endian操作系统情况下。通常头位信息为:03xx(xx为任意位值)。
0x0001是可以在Oracle中找到对应的取值的。
SQL> select nls_charset_id(value) nls_charset_id, value
2 from v$nls_valid_values
3 where parameter = 'CHARACTERSET'
4 order by nls_charset_id(value);
NLS_CHARSET_ID VALUE
-------------- ----------------------------------------------------------------
1 US7ASCII
2 WE8DEC
(篇幅原因,有省略……)
1865 ZHT16BIG5FIXED
2000 AL16UTF16
247 rows selected
US7ASCII对应的为0x0001,说明导出的文件字符集为US7ASCII。下面我们设置一下NLS_LANG环境变量进行测试。
[oracle@MISDB:~]$export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
[oracle@MISDB:~]$exp \"/ as sysdba\" owner=scott file=scott_test_Set.dmp
Export: Release 11.2.0.3.0 - Production on Wed Jul 1 18:29:43 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in AL32UTF8 character set and UTF8 NCHAR character set
About to export specified users ...
(篇幅原因,有省略……)
Export terminated successfully without warnings.
查看文件头信息。
[oracle@MISDB:~]$cat scott_test_Set.dmp | od -x | head
0000000 0303 6945 5850 4f52 543a 5631 312e 3032
0000020 2e30 300a 4453 5953 0a52 5553 4552 530a
0000040 3430 3936 0a30 0a37 320a 300a 0369 0369
0000060 0367 0001 0000 0000 0000 0000 0012 0020
0000100 2020 2020 2020 2020 2020 2020 2020 2020
*
0000140 2020 2020 2020 2020 2057 6564 204a 756c
0000160 2031 2031 383a 3239 3a34 3320 3230 3135
0000200 7363 6f74 745f 7465 7374 5f53 6574 2e64
0000220 6d70 0000 0000 0000 0000 0000 0000 0000
对应位数:0x0369对应AL32UTF8。下面为一些常见字符集的16进制编码。
The values for the most commonly used character sets are below:
Name ID
----------------------
US7ASCII 0x0001
WE8DEC 0x0002
WE8ISO8859P1 0x001f
EE8ISO8859P2 0x0020
SE8ISO8859P3 0x0021
NE8ISO8850P4 0x0022
CL8ISO8859P5 0x0023
AR8ISO8859P6 0x0024
EL8ISO8859P7 0x0025
IW8ISO8859P8 0x0026
WE8ISO8859P9 0x0027
WE8ISO8859P15 0x002e
TH8TISASCII 0x0029
US8PC437 0x0004
WE8ROMAN8 0x0005
WE8PC850 0x000a
EE8PC852 0x0096
RU8PC855 0X009B
TR8PC857 0x009C
WE8PC858 0x001c
WE8PC860 0x00A0
IS8PC861 0x00A1
N8PC865 0x00BE
RU8PC866 0x0098
EE8MSWIN1250 0x00aa
CL8MSWIN1251 0x00ab
WE8MSWIN1252 0x00b2
EL8MSWIN1253 0x00ae
TR8MSWIN1254 0x00b1
IW8MSWIN1255 0x00af
AR8MSWIN1256 0x0230
BLT8MSWIN1257 0x00b3
ZHT16MSWIN950 0x0363
ZHS16GBK 0x0354
ZHT16HKSCS 0x0364
JA16EUC 0x033e
JA16SJIS 0x0340
ZHT16BIG5 0x0361
AL24UTFFSS 0x0366
UTF8 0x0367
AL32UTF8 0x0369
说明:在使用NLS_LANG显示指定字符集合之后,DMP文件中就按照这个编码方式进行组织文件。
下面我们看一下在Little-Endian下的情况是如何的。