[02:35:56] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/hadoop/data/d/yarn/logs] [03:17:41] (03CR) 10Cwhite: "I looked closer at the data this exporter generates and discovered a problem in the source data that makes it nigh impossible to accomplis" [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/471298 (https://phabricator.wikimedia.org/T208066) (owner: 10Cwhite) [03:32:47] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 853.06 seconds [03:40:27] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 32246344 [03:41:36] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2880 [04:04:06] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 296.32 seconds [06:28:16] PROBLEM - puppet last run on wdqs1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/modprobe.d/nf_conntrack.conf] [06:58:46] RECOVERY - puppet last run on wdqs1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:48:47] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:04:07] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:49:31] (03CR) 10Framawiki: Add new throttle rule for Wikipedia event in Ireland on 2018-11-13, remove expired rule (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472691 (https://phabricator.wikimedia.org/T209037) (owner: 10Zoranzoki21) [14:51:29] (03CR) 10Framawiki: Add new throttle rule for Wikipedia event in Ireland on 2018-11-13, remove expired rule (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472691 (https://phabricator.wikimedia.org/T209037) (owner: 10Zoranzoki21) [19:05:56] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 56.62, 29.65, 19.02 [19:06:26] PROBLEM - High CPU load on API appserver on mw1222 is CRITICAL: CRITICAL - load average: 51.97, 26.65, 16.41 [19:08:06] PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 52.02, 32.45, 19.84 [19:10:47] PROBLEM - High CPU load on API appserver on mw1222 is CRITICAL: CRITICAL - load average: 43.34, 36.00, 22.96 [19:13:17] PROBLEM - High CPU load on API appserver on mw1231 is CRITICAL: CRITICAL - load average: 38.32, 35.23, 24.03 [19:15:36] PROBLEM - High CPU load on API appserver on mw1231 is CRITICAL: CRITICAL - load average: 39.03, 35.39, 25.56 [19:21:17] PROBLEM - High CPU load on API appserver on mw1231 is CRITICAL: CRITICAL - load average: 47.36, 37.90, 29.10 [19:23:16] PROBLEM - High CPU load on API appserver on mw1225 is CRITICAL: CRITICAL - load average: 44.03, 37.05, 28.72 [19:32:06] RECOVERY - High CPU load on API appserver on mw1225 is OK: OK - load average: 13.39, 19.84, 23.89 [19:33:33] hmm [19:35:36] RECOVERY - High CPU load on API appserver on mw1231 is OK: OK - load average: 9.48, 16.17, 23.53 [19:42:46] RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 7.89, 11.20, 23.41 [19:43:17] RECOVERY - High CPU load on API appserver on mw1222 is OK: OK - load average: 11.36, 12.99, 23.59 [19:48:27] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 10.74, 12.50, 23.63 [20:46:16] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:01:37] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:07:31] IRC Bots: I need help filling out the 2 "Operator" columns for various bots in here, at https://meta.wikimedia.org/wiki/IRC/Bots - ty! [21:19:14] quiddity, not all bots really have defined individuals as operators [21:19:29] like who is officially the operator of shinken-wm or icinga-wm? [21:19:52] nod. Maybe a team name? Basically, whoever should be poked if it vanishes, or starts going crazy. [21:20:12] staff team names plus individual volunteers names? [21:20:33] sounds reasonable. [21:21:18] VE seems broken on this table :/ [21:21:35] if I double click to try to start typing in a cell, it selects the contents of the whole table [21:25:51] quiddity, done [21:26:13] not sure about wikibugs [21:27:06] maybe the people listed at https://gerrit.wikimedia.org/r/#/admin/groups/900,members [21:28:23] re: VE - oh, yeah, it's a nasty set of templates. I was tempted to subst: them, but the autolinking stuff was more than I wanted to cleanup. [21:28:35] Thanks for the additions :) [21:28:41] added wikibugs links [21:53:46] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [22:08:26] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [22:25:30] (03CR) 10Thcipriani: [C: 032] Add compatibility with Construct 2.8.22 and 2.9.45 [software/keyholder] - 10https://gerrit.wikimedia.org/r/458245 (owner: 10Faidon Liambotis) [22:26:05] (03Merged) 10jenkins-bot: Add compatibility with Construct 2.8.22 and 2.9.45 [software/keyholder] - 10https://gerrit.wikimedia.org/r/458245 (owner: 10Faidon Liambotis) [23:12:36] (03CR) 10Liuxinyu970226: [C: 031] Initial configuration for liwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463479 (https://phabricator.wikimedia.org/T205710) (owner: 10Urbanecm) [23:14:53] (03CR) 10Liuxinyu970226: [C: 031] Initial configuration for shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/467323 (https://phabricator.wikimedia.org/T206777) (owner: 10Urbanecm) [23:15:21] (03CR) 10Liuxinyu970226: [C: 031] Initial configuration for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463482 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [23:45:25] (03CR) 10Zoranzoki21: "> Missing relevant task number in commit message" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472745 (owner: 10Zoranzoki21) [23:51:48] (03PS3) 10Zoranzoki21: Disable FlaggedRevs on srwikinews, add autopatrol, patrol and rollbacker rights and enable RC patrol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472745 (https://phabricator.wikimedia.org/T209251)