Profiling Slow Tests

Healthify has reached a breaking point in dealing with our slow test suite. We have a combination of issues we've left unaddressed for awhile, but the general breakdown of concerns is:

  • Tests that break sporadically, or flakey tests
  • Tests we know that don't inspire much confidence or are simply returning false positives
  • Testing against external services, particularly with VCR
  • Slow tests

In total, these issues have amounted to about a 25 minute build time that may or may not fail/pass accurately. Now, we've parallelized our build on CI (thanks James) to make this a manageable 5 minutes but that simply masks the issue of having a test suite that might be a little overzealous.

Our (impossible) hope, is to get the entire test suite down from 25 minutes to under 3.5 minutes.

Benchmarking

We're beginning this refactoring process by benchmarking our Cucumber and RSpec tests. Check out the results below.

RSpec

Profiling RSpec is very straight forward, just run rspec --profile and you're all set.

Top 10 slowest examples (90.02 seconds, 31.9% of total time):  
  CBO Sign In navigating as a cbo user/being forced to confirm resource information
    27.21 seconds ./spec/features/cbo_platform/cbo_sign_in_spec.rb:9
  Search Dashboard Downloading the analytics PDF
    26.26 seconds ./spec/features/dashboard/search_dashboard_spec.rb:63
  Search Dashboard Viewing the top search terms
    10.23 seconds ./spec/features/dashboard/search_dashboard_spec.rb:42
  Search Dashboard Filtering search analytics
    6.39 seconds ./spec/features/dashboard/search_dashboard_spec.rb:14
  FilterCategory validations should ensure inclusion of icon in ["glass", "music", "search", "envelope-o", "heart", "star", "star-o", "user", "film", "th-large", "th", "th-list", "check", "remove", "close", "times", "search-plus", "search-minus", "power-off", "signal", "gear", "cog", "trash-o", "home", "file-o", "clock-o", "road", "download", "arrow-circle-o-down", "arrow-circle-o-up", "inbox", "play-circle-o", "rotate-right", "repeat", "refresh", "list-alt", "lock", "flag", "headphones", "volume-off", "volume-down", "volume-up", "qrcode", "barcode", "tag", "tags", "book", "bookmark", "print", "camera", "font", "bold", "italic", "text-height", "text-width", "align-left", "align-center", "align-right", "align-justify", "list", "dedent", "outdent", "indent", "video-camera", "photo", "image", "picture-o", "pencil", "map-marker", "adjust", "tint", "edit", "pencil-square-o", "share-square-o", "check-square-o", "arrows", "step-backward", "fast-backward", "backward", "play", "pause", "stop", "forward", "fast-forward", "step-forward", "eject", "chevron-left", "chevron-right", "plus-circle", "minus-circle", "times-circle", "check-circle", "question-circle", "info-circle", "crosshairs", "times-circle-o", "check-circle-o", "ban", "arrow-left", "arrow-right", "arrow-up", "arrow-down", "mail-forward", "share", "expand", "compress", "plus", "minus", "asterisk", "exclamation-circle", "gift", "leaf", "fire", "eye", "eye-slash", "warning", "exclamation-triangle", "plane", "calendar", "random", "comment", "magnet", "chevron-up", "chevron-down", "retweet", "shopping-cart", "folder", "folder-open", "arrows-v", "arrows-h", "bar-chart-o", "bar-chart", "twitter-square", "facebook-square", "camera-retro", "key", "gears", "cogs", "comments", "thumbs-o-up", "thumbs-o-down", "star-half", "heart-o", "sign-out", "linkedin-square", "thumb-tack", "external-link", "sign-in", "trophy", "github-square", "upload", "lemon-o", "phone", "square-o", "bookmark-o", "phone-square", "twitter", "facebook-f", "facebook", "github", "unlock", "credit-card", "rss", "hdd-o", "bullhorn", "bell", "certificate", "hand-o-right", "hand-o-left", "hand-o-up", "hand-o-down", "arrow-circle-left", "arrow-circle-right", "arrow-circle-up", "arrow-circle-down", "globe", "wrench", "tasks", "filter", "briefcase", "arrows-alt", "group", "users", "chain", "link", "cloud", "flask", "cut", "scissors", "copy", "files-o", "paperclip", "save", "floppy-o", "square", "navicon", "reorder", "bars", "list-ul", "list-ol", "strikethrough", "underline", "table", "magic", "truck", "pinterest", "pinterest-square", "google-plus-square", "google-plus", "money", "caret-down", "caret-up", "caret-left", "caret-right", "columns", "unsorted", "sort", "sort-down", "sort-desc", "sort-up", "sort-asc", "envelope", "linkedin", "rotate-left", "undo", "legal", "gavel", "dashboard", "tachometer", "comment-o", "comments-o", "flash", "bolt", "sitemap", "umbrella", "paste", "clipboard", "lightbulb-o", "exchange", "cloud-download", "cloud-upload", "user-md", "stethoscope", "suitcase", "bell-o", "coffee", "cutlery", "file-text-o", "building-o", "hospital-o", "ambulance", "medkit", "fighter-jet", "beer", "h-square", "plus-square", "angle-double-left", "angle-double-right", "angle-double-up", "angle-double-down", "angle-left", "angle-right", "angle-up", "angle-down", "desktop", "laptop", "tablet", "mobile-phone", "mobile", "circle-o", "quote-left", "quote-right", "spinner", "circle", "mail-reply", "reply", "github-alt", "folder-o", "folder-open-o", "smile-o", "frown-o", "meh-o", "gamepad", "keyboard-o", "flag-o", "flag-checkered", "terminal", "code", "mail-reply-all", "reply-all", "star-half-empty", "star-half-full", "star-half-o", "location-arrow", "crop", "code-fork", "unlink", "chain-broken", "question", "info", "exclamation", "superscript", "subscript", "eraser", "puzzle-piece", "microphone", "microphone-slash", "shield", "calendar-o", "fire-extinguisher", "rocket", "maxcdn", "chevron-circle-left", "chevron-circle-right", "chevron-circle-up", "chevron-circle-down", "html5", "css3", "anchor", "unlock-alt", "bullseye", "ellipsis-h", "ellipsis-v", "rss-square", "play-circle", "ticket", "minus-square", "minus-square-o", "level-up", "level-down", "check-square", "pencil-square", "external-link-square", "share-square", "compass", "toggle-down", "caret-square-o-down", "toggle-up", "caret-square-o-up", "toggle-right", "caret-square-o-right", "euro", "eur", "gbp", "dollar", "usd", "rupee", "inr", "cny", "rmb", "yen", "jpy", "ruble", "rouble", "rub", "won", "krw", "bitcoin", "btc", "file", "file-text", "sort-alpha-asc", "sort-alpha-desc", "sort-amount-asc", "sort-amount-desc", "sort-numeric-asc", "sort-numeric-desc", "thumbs-up", "thumbs-down", "youtube-square", "youtube", "xing", "xing-square", "youtube-play", "dropbox", "stack-overflow", "instagram", "flickr", "adn", "bitbucket", "bitbucket-square", "tumblr", "tumblr-square", "long-arrow-down", "long-arrow-up", "long-arrow-left", "long-arrow-right", "apple", "windows", "android", "linux", "dribbble", "skype", "foursquare", "trello", "female", "male", "gittip", "gratipay", "sun-o", "moon-o", "archive", "bug", "vk", "weibo", "renren", "pagelines", "stack-exchange", "arrow-circle-o-right", "arrow-circle-o-left", "toggle-left", "caret-square-o-left", "dot-circle-o", "wheelchair", "vimeo-square", "turkish-lira", "try", "plus-square-o", "space-shuttle", "slack", "envelope-square", "wordpress", "openid", "institution", "bank", "university", "mortar-board", "graduation-cap", "yahoo", "google", "reddit", "reddit-square", "stumbleupon-circle", "stumbleupon", "delicious", "digg", "pied-piper", "pied-piper-alt", "drupal", "joomla", "language", "fax", "building", "child", "paw", "spoon", "cube", "cubes", "behance", "behance-square", "steam", "steam-square", "recycle", "automobile", "car", "cab", "taxi", "tree", "spotify", "deviantart", "soundcloud", "database", "file-pdf-o", "file-word-o", "file-excel-o", "file-powerpoint-o", "file-photo-o", "file-picture-o", "file-image-o", "file-zip-o", "file-archive-o", "file-sound-o", "file-audio-o", "file-movie-o", "file-video-o", "file-code-o", "vine", "codepen", "jsfiddle", "life-bouy", "life-buoy", "life-saver", "support", "life-ring", "circle-o-notch", "ra", "rebel", "ge", "empire", "git-square", "git", "hacker-news", "tencent-weibo", "qq", "wechat", "weixin", "send", "paper-plane", "send-o", "paper-plane-o", "history", "genderless", "circle-thin", "header", "paragraph", "sliders", "share-alt", "share-alt-square", "bomb", "soccer-ball-o", "futbol-o", "tty", "binoculars", "plug", "slideshare", "twitch", "yelp", "newspaper-o", "wifi", "calculator", "paypal", "google-wallet", "cc-visa", "cc-mastercard", "cc-discover", "cc-amex", "cc-paypal", "cc-stripe", "bell-slash", "bell-slash-o", "trash", "copyright", "at", "eyedropper", "paint-brush", "birthday-cake", "area-chart", "pie-chart", "line-chart", "lastfm", "lastfm-square", "toggle-off", "toggle-on", "bicycle", "bus", "ioxhost", "angellist", "cc", "shekel", "sheqel", "ils", "meanpath", "buysellads", "connectdevelop", "dashcube", "forumbee", "leanpub", "sellsy", "shirtsinbulk", "simplybuilt", "skyatlas", "cart-plus", "cart-arrow-down", "diamond", "ship", "user-secret", "motorcycle", "street-view", "heartbeat", "venus", "mars", "mercury", "transgender", "transgender-alt", "venus-double", "mars-double", "venus-mars", "mars-stroke", "mars-stroke-v", "mars-stroke-h", "neuter", "facebook-official", "pinterest-p", "whatsapp", "server", "user-plus", "user-times", "hotel", "bed", "viacoin", "train", "subway", "medium"]
    5.54 seconds ./spec/models/filter_category_spec.rb:9
  Favoriting Resource Sites searching for a resource and favoriting it
    5.14 seconds ./spec/features/favoriting_resource_sites_spec.rb:5
  Search Dashboard Viewing the top search terms pie
    4.23 seconds ./spec/features/dashboard/search_dashboard_spec.rb:23
  Resources Dashboard when logged in as any type of admin viewing the dashboard
    2.23 seconds ./spec/features/dashboard/resources_dashboard_spec.rb:6
  Managing filter categories when logged in as a super_admin adding a filter category
    1.48 seconds ./spec/features/super_admin/managing_filter_categories_spec.rb:57
  Survey#matched_programs with survey having matches returns the resource programs of the matches of the most recent survey
    1.31 seconds ./spec/models/survey_spec.rb:177

Top 10 slowest example groups:  
  CBO Sign In
    27.21 seconds average (27.21 seconds / 1 example) ./spec/features/cbo_platform/cbo_sign_in_spec.rb:4
  Search Dashboard
    11.78 seconds average (47.11 seconds / 4 examples) ./spec/features/dashboard/search_dashboard_spec.rb:5
  Favoriting Resource Sites
    5.14 seconds average (5.14 seconds / 1 example) ./spec/features/favoriting_resource_sites_spec.rb:4
  Resources Dashboard
    1.27 seconds average (2.53 seconds / 2 examples) ./spec/features/dashboard/resources_dashboard_spec.rb:4
  Managing filter categories
    0.82909 seconds average (4.97 seconds / 6 examples) ./spec/features/super_admin/managing_filter_categories_spec.rb:3
  I18n
    0.76122 seconds average (1.52 seconds / 2 examples) ./spec/lib/i18n_keys_spec.rb:3
  Managing teams
    0.63143 seconds average (3.79 seconds / 6 examples) ./spec/features/super_admin/managing_teams_spec.rb:3
  DataRetriever::ResourcesData
    0.54797 seconds average (31.23 seconds / 57 examples) ./spec/services/data_retriever/resources_data_spec.rb:5
  ResourceSiteSearch
    0.47168 seconds average (20.28 seconds / 43 examples) ./spec/searches/resource_site_search_spec.rb:4
  Matcher::MatchRetrieval
    0.39468 seconds average (3.55 seconds / 9 examples) ./spec/lib/matcher/match_retrieval_spec.rb:3

Finished in 4 minutes 42.3 seconds (files took 33.31 seconds to load)  

Cucumber

Cucumber isn't quite so easy. Cucumber comes included with a few formatters but none seemed to do what I was looking for. I did a little digging and unfortunately I didn't come across any gems that could handle the task. Instead, with some help from this post, I wrapped all our scenarios with an Around block and timed them myself.

The end result looked something like this:

# features/support/env.rb
scenario_times = {}  
file_times = {}  
total_start = Time.now

Around() do |scenario, block|  
  start = Time.now
  block.call
  time = Time.now - start

  scenario_times["#{scenario.feature.file}::#{scenario.name}"] = time

  file = scenario.feature.file

  file_times[file] = if file_times.has_key? file
                       file_times[file] + time
                     else
                       time
                     end

end

at_exit do  
  max_size = 10
  total_time = Time.now - total_start
  file_count = file_times.size
  scenario_count = scenario_times.size

  p "="*100
  p "Finished profiling"
  p "Total Time: #{total_time.round(2)} seconds"
  p "Number of features: #{file_count}"
  p "Average feature length: #{(total_time/file_count).round(2)}"
  p "Number of scenarios: #{scenario_count}"
  p "Average scenario length: #{(total_time/scenario_count).round(2)}"

  p "="*100
  max_files = file_times.size > max_size ? max_size : file_times.size
  sorted_files = file_times.sort { |a, b| b[1] <=> a[1] }
  top_files = sorted_files[0..max_files - 1]
  top_file_time = top_files.inject(0) { |sum, i| sum = sum + i[1] }

  p "Top #{max_files} slowest features (#{top_file_time.round(2)} seconds, #{(top_file_time/total_time).round(3)*100}% of total time)"
  top_files.each do |key, value|
    p "#{value.round(2)} seconds #{key}"
  end

  p "="*100
  max_scenarios = scenario_times.size > max_size ? max_size : scenario_times.size
  sorted_scenarios = scenario_times.sort { |a, b| b[1] <=> a[1] }
  top_scenarios = sorted_scenarios[0..max_scenarios - 1]
  top_scenario_time = top_scenarios.inject(0) { |sum, i| sum = sum + i[1] }

  p "Top #{max_scenarios} slowest scenarios (#{top_scenario_time.round(2)} seconds, #{(top_scenario_time/total_time).round(3)*100}% of total time)"
  top_scenarios.each do |key, value|
    p "#{value.round(2)}  #{key}"
  end
end

And the output:

340 scenarios (15 failed, 325 passed) # Not sure why outputting this tracking causes these to fail...  
4020 steps (4020 passed)  
15m59.164s  
"===================================================================================================="
"Finished profiling"
"Total Time: 961.57 seconds"
"Number of features: 80"
"Average feature length: 12.02"
"Number of scenarios: 324"
"Average scenario length: 2.97"
"===================================================================================================="
"Top 10 slowest features (421.02 seconds, 43.8% of total time)"
"136.98 seconds features/resource_sites/searching_resource_site_index.feature"
"64.64 seconds features/resource_sites/editing_resource_site_hours.feature"
"58.95 seconds features/resource_sites/editing_resource_sites.feature"
"39.1 seconds features/admin_items/editing_clinics.feature"
"23.95 seconds features/misc/texting_patient.feature"
"22.62 seconds features/patients/viewing_patients_index.feature"
"21.51 seconds features/resource_sites/simplified_resource_site_management.feature"
"18.43 seconds features/survey/taking_the_survey.feature"
"17.68 seconds features/admin_items/resource_programs_criteria_management.feature"
"17.15 seconds features/referrals/referral_emails.feature"
"===================================================================================================="
"Top 10 slowest scenarios (864254.79 seconds, 89879.5% of total time)" # This is probably skewed due to some sort of Time Cop issue.
"864000.92  features/admin_items/creating_users.feature::Attempting to access old confirmation links" # We should look at whether time cop is causing an issue here.
"64.64  features/dashboard/overview_dashboard.feature::Viewing dashboard basics"
"51.28  features/resource_sites/editing_resource_site_hours.feature::Editing a resource site to have closed and 24-hour business days"
"37.79  features/admin_items/editing_clinics.feature::Successfully creating and editing a clinic as an admin"
"28.52  features/dashboard/users_dashboard.feature::Viewing the list all of all unconfirmed users and checking params persistence"
"20.01  features/dashboard/users_dashboard.feature::Filtering user analytics"
"14.23  features/dashboard/overview_dashboard.feature::Downloading the analytics PDF"
"13.48  features/survey/survey_acceptance_test.feature::Completing a demo survey and receiving results"
"12.08  features/resource_sites/searching_resource_site_index.feature::Viewing the eligibility filters animation and cross-eligibility-resource-program filter interactions"
"11.84  features/resource_sites/searching_resource_site_index.feature::Viewing the resource program filters animation"

We'll be reporting back over the next couple weeks about what types of mistakes were resulting in such long test runs and how we addressed them.