使⽤Idea和Maven执⾏Spark源码中Mllib的⽰例
Spark源码中提供了全⾯的Mllib使⽤案例,其实我们可以很简单的利⽤Idea和Maven修改打包这些⽰例,上传到Spark客户端执⾏。⼀、Spark源码下载
如下图所⽰,选择要下载的版本,package type选择Source Code,然后点击spark包的链接进⾏下载即可,下载完成后将其解压。
解压⽂件中,所有的spark⽰例代码在examples中,所⽤到的测试数据在data中。
⼆、作为Maven项⽬导⼊到Idea中
在idea中点击File——New——Project from Existing Sourcesidea debug
下⼀步,在弹出的对话框中,选择spark-2.0.2源码所在路径,选中根路径下的pom⽂件,点击OK。
下⼀步,选择Maven类型。
下⼀步,选择默认配置。
下⼀步,选择默认的配置,
继续下⼀步,选择要导⼊的maven项⽬,
继续下⼀步,稍等⼀会⼉,激动地发现各个模块都导⼊进来啦。箭头所指即为spark example模块。
三、修改、打包、运⾏⽰例代码
下⾯⼀起看下examples——src——main——scala——mllib路径下,都是spark官⽅提供的mllib的⼀些使⽤⽰例。
我们以其中的DecisionTreeClassificationExample例⼦为例,该例使⽤的数据是data根路径中的sample_的,为了在linux客户端中执⾏该程序⽰例,需要修改对应的数据路径,并将数据上传到对应路径中,此处我使⽤的路径是/home/hdp_teu_dpd/user/xyx/spark,修改后的代码如下:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements.  See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License.  You may obtain a copy of the License at
*
*    /licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// scalastyle:off println
package org.amples.mllib