Discussion:
Mahout and Spark 2.2 compatibility
(too old to reply)
Marc Cardus Garcia
2017-12-04 17:18:02 UTC
Permalink
Hello all,


First time I write into this mailing list, so if there is something wrong with my message please let me know.


I work for a company using Mahout and Spark. We have recently started a project using Spark 2.2 and we would like to use Mahout but if I am not wrong, according to his issue MAHOUT-2000<https://issues.apache.org/jira/browse/MAHOUT-2000> there is still no compatibility between Spark 2.x and Mahout. So regarding this issue I have to questions:


* Is it planned to release this compatibility into the near future?
*
If I want to add/help adding Spark 2.2.0 as supported binary release? How could I do that?

Thank you,
Marc.




Marc Cardús Garcia
Data Engineer | Data Science and Big Data Analytics [Web Eurecat] <http://eurecat.org/>
+34 932 381 400 | ** | ***@eurecat.org<mailto:***@eurecat.org>
Carrer Camí Antic de València 54-56, Edifici A - 08005 - Barcelona www.eurecat.org<http://eurecat.org/>
@Eurecat_news<https://twitter.com/eurecat_news>



________________________________
DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber immediatament a la següent adreça: ***@eurecat.org Si el destinatari d'aquest missatge no consent la utilització del correu electrònic via Internet i la gravació de missatges, li preguem que ens ho comuniqui immediatament.

DISCLAIMER: Este mensaje puede contener información confidencial. Si usted no es el destinatario del mensaje, por favor bórrelo y notifíquenoslo inmediatamente a la siguiente dirección: ***@eurecat.org Si el destinatario de este mensaje no consintiera la utilización del correo electrónico vía Internet y la grabación de los mensajes, rogamos lo ponga en nuestro conocimiento de forma inmediata.

DISCLAIMER: Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message you should destroy this message, and notify us immediately to the following address: ***@eurecat.org. If the addressee of this message does not consent to the use of Internet e-mail and message recording, please notify us immediately.
________________________________
Trevor Grant
2017-12-04 17:25:22 UTC
Permalink
Hi Marc,

Actually, it's not THAT hard to get Spark 2.2 compatibility for Mahout.

The code base is more or less compatible. The issues were with respect to
Spark dependencies, specifically.

1. You have to build with Java 1.8
2. You have to have Hadoop 2.6+ (I think) support.

PR #335 outlines this pretty well.

https://github.com/apache/mahout/pull/335

This can be done from source, I'd have to double check the specific mvn
command line you would need.

As for supporting with a binary, maybe 0.13.2 (we're hoping to get spark
2.0/2.1 support in upcoming 0.13.1)

The hangups with releasing a Spark 2.2 binary is that we will have to
version bump java to 2.2 and hadoop to 2.6 or 2.7, and I'm not sure we were
ready to do that ATM. Mainly bc it might alienate some users, and
secondarily Java 1.8 for a release is a real stickler on JavaDocs, and
there is a bit of work required to go back and clean them all up.

Let me know if building from source is an option at your company. I'll
attempt to figure out how to build for SPark 2.2, and try to update the
website. I'll also post back here.

tg
Post by Marc Cardus Garcia
Hello all,
First time I write into this mailing list, so if there is something wrong
with my message please let me know.
I work for a company using Mahout and Spark. We have recently started a
project using Spark 2.2 and we would like to use Mahout but if I am not
wrong, according to his issue MAHOUT-2000<https://issues.
apache.org/jira/browse/MAHOUT-2000> there is still no compatibility
* Is it planned to release this compatibility into the near future?
*
If I want to add/help adding Spark 2.2.0 as supported binary release? How could I do that?
Thank you,
Marc.
Marc Cardús Garcia
Data Engineer | Data Science and Big Data Analytics [Web Eurecat] <
http://eurecat.org/>
Carrer Camí Antic de ValÚncia 54-56, Edifici A - 08005 - Barcelona
www.eurecat.org<http://eurecat.org/>
@Eurecat_news<https://twitter.com/eurecat_news>
________________________________
DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostÚ
no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber
d'aquest missatge no consent la utilització del correu electrònic via
Internet i la gravació de missatges, li preguem que ens ho comuniqui
immediatament.
DISCLAIMER: Este mensaje puede contener información confidencial. Si usted
no es el destinatario del mensaje, por favor bórrelo y notifíquenoslo
destinatario de este mensaje no consintiera la utilización del correo
electrónico vía Internet y la grabación de los mensajes, rogamos lo ponga
en nuestro conocimiento de forma inmediata.
DISCLAIMER: Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message you should
the use of Internet e-mail and message recording, please notify us
immediately.
________________________________
Dmitriy Lyubimov
2017-12-05 02:17:00 UTC
Permalink
I can confirm i have not encounter fundamental issues with samsara (yet)
while running with spark 2.2.0/scala 2.11.11 . it is mostly just adjusting
the build to use proper versions of artifacts.
Post by Trevor Grant
Hi Marc,
Actually, it's not THAT hard to get Spark 2.2 compatibility for Mahout.
The code base is more or less compatible. The issues were with respect to
Spark dependencies, specifically.
1. You have to build with Java 1.8
2. You have to have Hadoop 2.6+ (I think) support.
PR #335 outlines this pretty well.
https://github.com/apache/mahout/pull/335
This can be done from source, I'd have to double check the specific mvn
command line you would need.
As for supporting with a binary, maybe 0.13.2 (we're hoping to get spark
2.0/2.1 support in upcoming 0.13.1)
The hangups with releasing a Spark 2.2 binary is that we will have to
version bump java to 2.2 and hadoop to 2.6 or 2.7, and I'm not sure we were
ready to do that ATM. Mainly bc it might alienate some users, and
secondarily Java 1.8 for a release is a real stickler on JavaDocs, and
there is a bit of work required to go back and clean them all up.
Let me know if building from source is an option at your company. I'll
attempt to figure out how to build for SPark 2.2, and try to update the
website. I'll also post back here.
tg
On Mon, Dec 4, 2017 at 11:18 AM, Marc Cardus Garcia <
Post by Marc Cardus Garcia
Hello all,
First time I write into this mailing list, so if there is something wrong
with my message please let me know.
I work for a company using Mahout and Spark. We have recently started a
project using Spark 2.2 and we would like to use Mahout but if I am not
wrong, according to his issue MAHOUT-2000<https://issues.
apache.org/jira/browse/MAHOUT-2000> there is still no compatibility
between Spark 2.x and Mahout. So regarding this issue I have to
* Is it planned to release this compatibility into the near future?
*
If I want to add/help adding Spark 2.2.0 as supported binary release? How
could I do that?
Thank you,
Marc.
Marc Cardús Garcia
Data Engineer | Data Science and Big Data Analytics [Web Eurecat] <
http://eurecat.org/>
Carrer Camí Antic de ValÚncia 54-56, Edifici A - 08005 - Barcelona
www.eurecat.org<http://eurecat.org/>
@Eurecat_news<https://twitter.com/eurecat_news>
________________________________
DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si
vostÚ
Post by Marc Cardus Garcia
no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber
d'aquest missatge no consent la utilització del correu electrònic via
Internet i la gravació de missatges, li preguem que ens ho comuniqui
immediatament.
DISCLAIMER: Este mensaje puede contener información confidencial. Si
usted
Post by Marc Cardus Garcia
no es el destinatario del mensaje, por favor bórrelo y notifíquenoslo
destinatario de este mensaje no consintiera la utilización del correo
electrónico vía Internet y la grabación de los mensajes, rogamos lo ponga
en nuestro conocimiento de forma inmediata.
DISCLAIMER: Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message you
should
Post by Marc Cardus Garcia
the use of Internet e-mail and message recording, please notify us
immediately.
________________________________
Continue reading on narkive:
Loading...