Defined as the speed at which an app loads and responds to a member, app performance is critical to an app’s success. When an app responds slowly to a member interaction, it’s an unsatisfactory experience. In order to maintain a reliable and consistent member experience, we have a dedicated performance engineering team to monitor and troubleshoot performance issues. However, the process for identifying negative trends and the underlying causes of poor performance has not always been as sophisticated and nimble as it is today.
Our Android app runs on a cycle of performance operations: monitor, profile, optimize, and ramp. This cycle has helped us not only maintain a solid app performance, but introduce improvements over time. Since 2018, our Android engineers have reduced the app startup time by over 700ms—that’s 28% of a total 2.5s at the 90th percentile.
In the first of this two-part blog series, we’ll introduce how we monitor and profile our Android app to find opportunities for optimization. Part two will dive into the optimizations we applied, and our learnings during the ramping and verification phases.
LinkedIn’s in-house performance monitoring system is called Harrier. It is a debugging and analytic tool for the general health of the app that tracks app crashes, site speed from real-time user monitoring, video metrics, and service call trees.
There are two types of site speed metrics that we care most about: app startup time, which measures the time to bring the application to the foreground, and page load time, which measures the time to load the main content for each page. For a site speed metric, Harrier compares the current 90th percentile value against the current baseline. If a regression is detected, Harrier automatically files a ticket and assigns it to the corresponding team for further investigation.
The Google Play console provides metrics around startup time in a dashboard of Android Vitals. It measures how long it takes for the app to become visible to our members. App startup time involves the following steps:
Create the app process
Launch the main thread
Create main Activity
View inflation, layout, and draw
Depending on where we start, app startup time can be classified in the following three categories:
Cold: The app process and Application, main thread and Activity must be created from scratch.
Warm: The app process and Application are kept in memory. Only the main thread and Activity need to be created.
Hot: Application and Activity are kept in memory. Only the view must be rendered.
We have instrumented app startup time as above. This is continuously monitored so we can take actions (in case of regression) and perform A/B testing as appropriate. To allow the “slice and dice” of app startup time, we also collect the duration for granular phases as illustrated below.
Page load time measures the loading time for a specific page. Examples of top-level pages are the feed, My Network, Notifications, Jobs, and Messaging. The page load time starts from when we enter the specific page (when Fragment.onCreate() is invoked) and ends when the main content of the page is presented to the member (view data is bound to the view). Similar to app startup time, we also collect the timing of each granular phase to help identify the root cause during regression investigation.
When investigating a regression, we add more instrumentation in the app code to narrow down the root cause. If we would like to debug locally, we can easily add log statements that come along with the timestamp. For collecting more granular data in production, we leverage custom metric markers by using a non-predefined string as the key to mark the start and end times during a page load. Once the change of adding the markers makes its way to production, we can inspect the site speed metric from Harrier and reiterate as needed.
To look for new optimization opportunities, we look at the app performance holistically to understand how our app performs and can be improved. Profiling tools come into play, particularly systrace, Android Studio Profiler, and Nanoscope, to help us achieve our goal.
The feature we mainly use for loading performance optimization is the CPU method trace. Both Android Studio Profiler and Nanoscope provide a call chart laying out the method calls executed with their durations presented within a given period.
In our profiling scenarios, low overhead is the most important factor in figuring out the actual duration of each method call, so that we know which methods are worth the effort in pursuing. We compared the pros and cons of each profiling option as illustrated above, and found that we preferred the first two options.
In part one of this blog series, we described how we measure app performance and detect regressions using in-house monitoring system Harrier. We also covered the metric definition of start up time and page load time, and compared different profiling tools in looking for bottlenecks and opportunities to optimize the existing code.
Stay tuned for part two of this series to introduce some case studies of optimization we applied to our Android app and recap the learnings from ramping and verifying.