48python,numpy,pandas数据相互转换及数据类型转换;(汇总)(tcy)--688IT编程网

48python,numpy,pandas数据相互转换及数据类型转换；（汇

总）（tcy）

本篇主要介绍pandas的数据类型层次；数据类型转换；python,numpy,pandas数据相互转换；及数据转换过程中

出现的问题解决办法。

pandas 数据类型 2018/12/11

1.数据类型

2.查看通⽤的所有⼦类型 dtypes：

1.查看数据类型:

df.info()

df.dtypes

series.dtype

get_dtype_counts()

# 如⼀列含多个类型则该列类型是object

# 不同数据类型也会被当成object,⽐如int32,float32

2.实例:

def subdtypes(dtype):

subs = dtype.__subclasses__()

if not subs:

return dtype

return [dtype, [subdtypes(dt) for dt in subs]]

ic)

[ic,

[[numpy.number,

[[numpy.integer,

[[numpy.signedinteger,

[numpy.int8,numpy.int16,numpy.int32,numpy.int32,numpy.int64,numpy.timedelta64]],

[numpy.unsignedinteger,

[numpy.uint8,numpy.uint16,numpy.uint32,numpy.uint32,numpy.uint64]]]

[numpy.inexact,

[[numpy.floating,

[numpy.float16, numpy.float32, numpy.float64, numpy.float64]],

[numpyplexfloating,

[numpyplex64, numpyplex128, numpyplex128]]]]

]

[numpy.flexible,

[[numpy.character,

[numpy.bytes_, numpy.str_]],

[numpy.void,

[d]]]],

numpy.bool_,numpy.datetime64,numpy.object_]

]

数据转换-python/numpy/pandas相互转换 2019/1/10

1.1.python转pandas

实例1.1：python的tuple/list/dict/array转Series/DataFrame import array

v=(1,2)

v=[1,2]

v={'a':1,'b':2}

v=array.array('i',[1,2])

s=pd.Series(v) #字典键名为索引名，其他默认数字

df=pd.DataFrame([v])#字典的键名为列名，其他默认数字

pd.DataFrame.from_dict({'A': [1,2], 'B': [3,4]})

'''

A B

0 1 3

1 2 4

'''

实例1.2：numpy数组转Series/DataFrame

v=np.arange(4).reshape(2,2)

s=pd.Series(v.flatten())#必须是1维

df=pd.DataFrame(v)

1.2.pandas转python,numpy

实例2.1：Series转string/list/dict/array/xarray

s=pd.Series([1,2],index=list('ab'))

<_dict() # {'a': 1, 'b': 2}

<_string() #'a 1\nb 2'

array.array('i',s)#array('i', [1, 2])

<_xarray()

'''

<xarray.DataArray (index: 2)>

array([1, 2], dtype=int64)

Coordinates:

* index (index) object 'a' 'b'

'''

实例2.2：Series转numpy数组

s.values#array([1, 2], dtype=int64)

实例2.3：Series转DataFrame

<_frame()

1.2pandas转python,numpy

实例3.1：DataFrame转list/dict/xarray

df=pd.DataFrame([[1,2],[3,4]],index=list('ab'),columns=list('AB'))

np.array(df).tolist()# [[1, 2], [3, 4]]

df.stack().tolist() # [1, 2, 3, 4]

<_dict() # {'A': {'a': 1, 'b': 3}, 'B': {'a': 2, 'b': 4}}

<_string() #' A B\na 1 2\nb 3 4'

实例3.2：DataFrame转numpy.array

np.array(df) # array([[1, 2], [3, 4]], dtype=int64)

df.values # 结果同上

1.3⽇期格式转换

<_period([freq, copy]) # 将Series从DatetimeIndex转换为具有所需频率的PeriodIndex

dt=pd.DatetimeIndex(['2018-10-14', '2018-10-15', '2018-10-16'])

<_period('D')

# PeriodIndex(['2018-10-14', '2018-10-15', '2018-10-16'], dtype='period[D]', freq='D')

<_timestamp([freq, how, copy]) #在期间开始时转换为时间戳的datetimedex

2.修改数据类型

# 数据类型转换⽅法：

1）astype()函数进⾏强制类型转换

# 转数字注意事项：

# 每列都能简单解释为数字；不含特殊字符如','' ¥'等⾮数字的str；含有缺失值astype()函数可能失效。2）⾃定义函数进⾏数据类型转换

3）函数to_numeric()、to_datetime()

实例1：

实例1：创建df时指定dtype参数指定类型

df = pd.DataFrame([1], dtype='float')

df = pd.DataFrame([1], dtype=np.int8)

实例2：

实例2：astype强制类型转换

data='客户编号客户姓名 2018 2019 增长率所属组 day month year 状态 \n' \

'4564651 张飞 ¥125,000.00 ¥162500.00 30% 500 12 10 2018 Y\n' \

'4564652 刘备 ¥920,000.00 ¥1012000.0 10% 700 26 5 2019 N\n' \

'4564653 关⽻ ¥50,000.00 ¥62500.00 25% 125 24 2 2019 Y\n' \

'4564654 曹操 ¥15,000.00 ¥490000.00 4% 300 10 8 2019 Y\n'

from io import StringIO

ad_csv(StringIO(data), sep=r'\s+')

df.info() #查看加载数据信息主要是每列的数据类型数量

df['客户编号'] = df['客户编号'].astype('object') #对原始数据进⾏转换并覆盖原始数据列

df[['day', 'month']] = df[['day', 'month']].astype(int)

实例3：

实例3：⾃定义函数进⾏数据类型转换

def convert_currency(value):

v = place(',', '').replace('¥', '').replace('￥', '')

return np.float(v)

#2018、2019列完整的转换代码

df['2018'] = df['2018'].apply(convert_currency)

df['2019'] = df['2019'].apply(convert_currency)

# df['2019'].apply(lambda x: x.replace('￥', '').replace(',', '')).astype('float')

def convert_percent(value):

return place('%', '')) / 100

df['增长率']=df['增长率'].apply(convert_percent)

# df['增长率'].apply(lambda x: x.replace('%', '')).astype('float') / 100

df['状态'] = np.where(df['状态'] == 'Y', True, False)

实例4：

实例4：辅助函数进⾏类型转换- 如to_numeric()、to_datetime()

df['所属组']=pd.to_numeric(df['所属组'], errors='coerce').fillna(0)#将⽆效值强制转换为NaN

df['date']=pd.to_datetime(data[['day', 'month', 'year']])#把year、month、day三列合并成⼀个时间戳实例5：

实例5：直接输⼊数据类型

df1 = pd.read_csv(StringIO(data), sep=r'\s+',

converters={

'客户编号': str,

'2018': convert_currency,

'2019': convert_currency,

'增长率': convert_percent,

'所属组': lambda x: pd.to_numeric(x, errors='coerce'),

'状态': lambda x: np.where(x == "Y", True, False)

})

实例6：

实例6：多列转换

a = [['a1', '1.1', '1.2'], ['a2', '0.02', '0.03'], ['a3', '5', 'NG']]

df = pd.DataFrame(a, columns=['A1','A2','A3'])

df[['A2','A3']] = df[['A2','A3']]._numeric)#报错

df._numeric, errors='ignore')#能转换的列转换，不能被转换的保留

实例7：

实例7：类型⾃动推断infer_objects()

df = pd.DataFrame({'a': [1, 2, 3], 'b': ['3','2','1']}, dtype='object')

df.dtypes

a object

python 定义数组

b object

dtype: object

df = df.infer_objects()#将列'a'的类型更改为int64

# 由于'b'的值是字符串，⽽不是整数，因此'b'⼀直保留

df.dtypes

a int64

b object

dtype: object

3.数据转换中的注意事项

3.1.int列中有缺失值，结果都转换为float

pandas内缺失值转换规则：

integer float

boolean object

float no cast

object no cast

需要先做数据类型的转化，然后⽤Nan来表⽰缺失值。

3.2.数字列中含有空字符''

# 空值在MySQL、Python、Pandas上的表现形式：

str空值空str 数值类型空值

MySQL Null '' Null

Python None '' None

Pandas None '' Nan

字符串空值和空字符串在写到csv效果⼀致，导致在读取数据时，⽆法区分。

如后续明确要求区分处理这两种情况，则会因为⼀次读写⽂件的操作导致数据失真。

建议规定⼀个唯⼀标识的字符串来代表None值

3.3.数字型字符串

若某⼀列为数值字符串时，通过pd.read_csv⽅法转化为DataFrame后，该列会被识别为numeric类型应在读取csv⽂件时指定dtype参数

4.函数

688IT编程网

48python,numpy,pandas数据相互转换及数据类型转换;(汇总)(tcy)

发表评论

推荐文章

翻译+汉译英

Handout for Students two

英语演讲稿带翻译模板

英语回复往来邮件格式[修改版]

常用短语之out

热门文章

java es的order by field方法

c++ sort 写法

c#如何使用IComparer子类的Sort排序方法

武汉理工大学算法分析实验报告

referenceanswerstoexercisesinstudenthandoutlesson1

低碳环保生活英语作文60词

视听说四级答案(超完整版)

关注语篇结构,生成教学目标

java中List对象排序通用方法

2018最全amazon仓库地址大全列表

java实现6种字符串数组的排序(Stringarraysort)

听力教程第二册第二版听力原文与答案(施心远)Unit

不同凡响形容什么

常用sas语句总结

最后的英语短语高级表达

全国大学生英语四六级(CET4__CET6)作文常用词汇

头歌java语言之控制语句实训作业实验报告

jdk中的排序函数

英文邮件回复报价样函

2021年辽宁省丹东市中考英语真题及答案

最新文章

英语演讲稿带翻译模板

高中英语单词天天记separate素材

人教版新课标高中英语必修二 Unit 5 知识点明细

searchsourcebuilder用法

deepsort参数

用REG过程进行回归分析

标签列表