⽂件格式转换UTF-8
很多时候,需要把⽂件格式转换成标准的utf-8,但往往在很多时候很多⽂件都是ASCII或者gbk2312,这很容易产⽣乱码!
如果⼿动转码,⽆疑是⾮常浪费时间的事情,因此,我也是因为⼯作的需要,烦其⼿⼯,⽽⽹上也⽆好⽤的⼯具,因此,⽤Python写了⼀个脚本,实现了⽂件格式转换UTF-8的需求。
代码如下:
#!/usr/bin/env python
#-*- coding: utf-8 -*-
#modify the file code format
#
>>>>>>>>>>>>>>>###
import os
import shutil
import codecs
import chardet
import sys
reload (sys)
sys.setdefaultencoding('utf-8')
#judge the file code format
def check_file(filePath):
f = open(filePath,'rb')
data = f.read()
f.close()
dict_code = chardet.detect(data)
code_file =  dict_code['encoding']
print (code_file)
return code_file
#read file flow as unicode
def read_file(filePath,encoding = "UTF-8"):
f=codecs.open(filePath,"rb",encoding)
print 'read:  '+filePath
ad()
f.close()
return data
#specify to write the file format
def write_file(filePath,uuu,encoding = 'UTF-8'):
f = codecs.open(filePath,'wb',encoding)
print 'write:  '+filePath
f.write(uuu)
f.close()
#if the file format is utf-8 with bom,modify it to no bom.
def utf_nobom(dst):
f = open(dst,'r')
data = f.read()
f.close()
unicode文件格式
f = open(dst,'w')
if data.startswith(codecs.BOM_UTF8):
data = data[len(codecs.BOM_UTF8):]
f.write(data)
f.close()
#copy the file with directory,and search all of the file's with path
def search_file(src,dst):

发表评论