Support reading DataFrame of a POJO with an enum field

Description

Reading a java class with an enum property into a DataFrame results in an exception.
An encoder does not exist for this enum and schema inference is not aware of this type.

example:
val df = spark.read.grid[PojoWithEnum]
df.show()

results in:

java.lang.NullPointerException
at com.google.common.reflect.TypeToken.method(TypeToken.java:495)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:127)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:126)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.insightedge.relation.SchemaInference$.schemaFor(SchemaInference.scala:126)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:128)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:126)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.insightedge.relation.SchemaInference$.schemaFor(SchemaInference.scala:126)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:128)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:126)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.insightedge.relation.SchemaInference$.schemaFor(SchemaInference.scala:126)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:128)
at org.apache.spark.sql.insightedge.relation.SchemaInference$$anonfun$2.apply(SchemaInference.scala:126)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.insightedge.relation.SchemaInference$.schemaFor(SchemaInference.scala:126)
at org.apache.spark.sql.insightedge.relation.SchemaInference$.schemaFor(SchemaInference.scala:55)
at org.apache.spark.sql.insightedge.relation.InsightEdgeClassRelation.inferredSchema$lzycompute(InsightEdgeClassRelation.scala:35)
at org.apache.spark.sql.insightedge.relation.InsightEdgeClassRelation.inferredSchema(InsightEdgeClassRelation.scala:34)
at org.apache.spark.sql.insightedge.relation.InsightEdgeAbstractRelation.schema(InsightEdgeAbstractRelation.scala:63)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:403)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at org.apache.spark.sql.insightedge.DataFrameImplicits$DataFrameReaderWrapper.grid(DataFrameImplicits.scala:48)

Workaround

None

Acceptance Test

insightedge testcase org.insightedge.spark.jobs.LoadDataFrameSpec

Status

Assignee

Meron Avigdor

Reporter

Ester Atzmon

Labels

None

Priority

Medium

SalesForce Case ID

12424

Fix versions

Commitment Version/s

None

Due date

None

Product

InsightEdge

Edition

Enterprise

Platform

All
Configure