GDALOGR1.9.0获取shp⽂件中中⽂字段值和属性值乱码⽂件解决
今天⼜发现⼀个新的问题,中⽂路径的shp格式可以打开,获取的geometry信息也是正确的,但是就是如果属性字段或者属性值中含有中⽂,悲剧了,整个全是乱码,开始以为还是使⽤CPLSetConfigOption,发现设置完还是不⾏:
CPLSetConfigOption(“GDAL_FILENAME_IS_UTF8”,”NO”);
⽆奈,调试到gdal源代码中,在ogrshapelayer.cpp中OGRShapeLayer类的构造函数中有⼀个关于编码的选项,构造函数如下:
[cpp] view plaincopyprint?
OGRShapeLayer::OGRShapeLayer( OGRShapeDataSource* poDSIn,
const char * pszName,
SHPHandle hSHPIn, DBFHandle hDBFIn,
OGRSpatialReference *poSRSIn, int bSRSSetIn,
int bUpdate,
OGRwkbGeometryType eReqType )
{
poDS = poDSIn;
poSRS = poSRSIn;
bSRSSet = bSRSSetIn;
pszFullName = CPLStrdup(pszName);
hSHP = hSHPIn;
hDBF = hDBFIn;
bUpdateAccess = bUpdate;
iNextShapeId = 0;
panMatchingFIDs = NULL;
bCheckedForQIX = FALSE;
hQIX = NULL;
bSbnSbxDeleted = FALSE;
bHeaderDirty = FALSE;
if( hSHP != NULL )
{
nTotalShapeCount = hSHP->nRecords;
if( hDBF != NULL && hDBF->nRecords != nTotalShapeCount )
{
CPLDebug("Shape", "Inconsistant record number in .shp (%d) and in .dbf (%d)",                  hSHP->nRecords, hDBF->nRecords);
}
}
else
nTotalShapeCount = hDBF->nRecords;
eRequestedGeomType = eReqType;
bTruncationWarningEmitted = FALSE;
if( hDBF != NULL && hDBF->pszCodePage != NULL )
{
CPLDebug( "Shape", "DBF Codepage = %s for %s",
hDBF->pszCodePage, pszName );
// Not too sure about this, but it seems like better than nothing.
osEncoding = ConvertCodePage( hDBF->pszCodePage );
}
if( CPLGetConfigOption( "SHAPE_ENCODING", NULL ) != NULL )
osEncoding = CPLGetConfigOption( "SHAPE_ENCODING", "" );
if( osEncoding != "" )
CPLDebug( "Shape", "Treating as encoding '%s'.", osEncoding.c_str() );
poFeatureDefn = SHPReadOGRFeatureDefn( CPLGetBasename(pszName),
hSHP, hDBF, osEncoding );
/* Init info for the LRU layer mechanism */
poPrevLayer = NULL;
poNextLayer = NULL;
bHSHPWasNonNULL = hSHPIn != NULL;
bHDBFWasNonNULL = hDBFIn != NULL;
eFileDescriptorsState = FD_OPENED;
TouchLayer();
}
在上⾯的构造函数中,有下⾯两句话,调试到此处,
if( CPLGetConfigOption( “SHAPE_ENCODING”, NULL ) != NULL )
osEncoding = CPLGetConfigOption( “SHAPE_ENCODING”, “” );
发现osEncoding = CPLGetConfigOption( “SHAPE_ENCODING”, “” );这句返回的结果是⼀个叫cp936的编码,那么什么是
CP936呢,CP936就是指系统⾥第936号编码格式(CodePage936),也就是GB2312。在之前是没有设置这个选项的,初步推断,该值可能是根据⽤户的操作系统来确定的,由于没有英⽂的操作系统,也
record是什么意思中文
不好做测试。
也就是说,在构造layer的时候,读取的DBF⽂件中的编码,将DBF编码进⾏转换,注意上⾯有句注释:“ // Not too sure about this, but it seems like better t han nothing.”,⼤概意思是说不要太在意这个东西,貌似有总⽐没有的好,就是这⾥,让我的中⽂变成了乱码,情何以堪啊。差不多到原因了,修改起来就⽐较⽅便了,和之前的⼀样,在打开shp之前,设置⼀下SHAPE_ENCODING的值为空即可,像下⾯⼀样:
CPLSetConfigOption(“SHAPE_ENCODING”,”“);
这样就好了,不知道设置成这样对其他的语⾔有没有影响,但是对于中⽂是没有什么问题了。