We're updating the issue view to help you get more done. 

Understand how to optimize writes for IE on k8s

Description

This is for the AA rfdb project Dharma is working on.

The rfdb Spark job generates the new fares and rules based on publicly available information which is an airlines clearinghouse.

On IBM Cloud with Amazon S3, the rfdb job takes a total of 12-13 minutes.

In the lab environment it takes approximately 5-6 minutes. The lab environment is not kubernetes, it is using plain InsightEdge.

The resources are less in the IBM Cloud when compared to the lab. 116 Spark executors (IBM Cloud) vs. 270 Spark executors (lab).

Right now the new fares are generated for AA only with 2 fare categories.

There is requirement to generate new fares for all AA subsidiary airlines (approximately 14 more categories).

There is a concern regarding the future additional load. Dharma needs to understand what is available to tune for performance.

In one submit, each job takes less than a minute, except the write. The writes take 3.6 and 2.1 minutes.

This number changed drastically when integrating with IBM COS (Cloud Object Storage). It took 20 minutes total (This data will be regenerated and captured).

If you look at Spark History server job event logs, most of the time is spent writing data to the grid. Is there potential for optimization here?

Another issue is we are being asked by the AA management team to prove the environment is tuned before requesting additional resources.

This is why understanding the BucketedGridModel and saveMultipleToGrid are important.

The Spark History Server helps with analyzing the Spark job. This leads to another issue supporting this in the IBM Cloud environment. A separate ticket will be open for that.

Workaround

None

Acceptance Test

None

Status

Assignee

Unassigned

Reporter

Dixson Huie

Labels

None

Priority

Medium

SalesForce Case ID

None

Fix versions

None

Commitment Version/s

None

Due date

None

Product

None

Edition

Open Source

Platform

All