This is for the AA rfdb project Dharma is working on.
The rfdb Spark job generates the new fares and rules based on publicly available information which is an airlines clearinghouse.
On IBM Cloud with Amazon S3, the rfdb job takes a total of 12-13 minutes.
In the lab environment it takes approximately 5-6 minutes. The lab environment is not kubernetes, it is using plain InsightEdge.
The resources are less in the IBM Cloud when compared to the lab. 116 Spark executors (IBM Cloud) vs. 270 Spark executors (lab).
Right now the new fares are generated for AA only with 2 fare categories.
There is requirement to generate new fares for all AA subsidiary airlines (approximately 14 more categories).
There is a concern regarding the future additional load. Dharma needs to understand what is available to tune for performance.
In one submit, each job takes less than a minute, except the write. The writes take 3.6 and 2.1 minutes.
This number changed drastically when integrating with IBM COS (Cloud Object Storage). It took 20 minutes total (This data will be regenerated and captured).
If you look at Spark History server job event logs, most of the time is spent writing data to the grid. Is there potential for optimization here?
Another issue is we are being asked by the AA management team to prove the environment is tuned before requesting additional resources.
This is why understanding the BucketedGridModel and saveMultipleToGrid are important.
The Spark History Server helps with analyzing the Spark job. This leads to another issue supporting this in the IBM Cloud environment. A separate ticket will be open for that.