[00:04:24] (03CR) 10Smalyshev: [C: 04-1] "Putting -1 so it's not merged prematurely. Will remove when it's time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379426 (https://phabricator.wikimedia.org/T175741) (owner: 10Smalyshev) [00:08:13] 10Operations, 10ops-eqiad, 10Cloud-Services, 10netops: labsdb1001's switch port negociating at 100M - https://phabricator.wikimedia.org/T177130#3648066 (10ayounsi) [00:13:50] 10Operations, 10ops-eqiad: adjust flerovium power draw - https://phabricator.wikimedia.org/T177131#3648079 (10RobH) [00:44:29] RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.13 seconds [03:02:39] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1948 bytes in 0.112 second response time [03:25:19] RECOVERY - MariaDB Slave Lag: s3 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89878.95 seconds [03:42:39] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1923 bytes in 0.112 second response time [04:27:42] (03PS4) 10Jayprakash12345: Temporary IP Cap Lift on zh.wiki and commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381442 (https://phabricator.wikimedia.org/T177071) [06:15:40] RECOVERY - MariaDB Slave Lag: s1 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89910.72 seconds [06:28:09] PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.002 second response time [06:29:08] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3648201 (10Marostegui) >>! In T165348#3647760, @Dzahn wrote: >>>! In T165348#3641893, @Volans wrote: >> - I don't think puppetmasters should be whitelisted > > You have one run... [06:29:09] RECOVERY - graphite.wikimedia.org on graphite1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 0.009 second response time [06:56:30] PROBLEM - Check size of conntrack table on mw1308 is CRITICAL: CRITICAL: nf_conntrack is 92 % full [06:59:30] RECOVERY - Check size of conntrack table on mw1308 is OK: OK: nf_conntrack is 69 % full [08:32:09] PROBLEM - Check size of conntrack table on mw1308 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [08:35:09] RECOVERY - Check size of conntrack table on mw1308 is OK: OK: nf_conntrack is 71 % full [08:52:16] (03PS10) 10ArielGlenn: Template-ise rsync/public.pp hosts allow [puppet] - 10https://gerrit.wikimedia.org/r/379517 (owner: 10Reedy) [08:57:40] (03CR) 10ArielGlenn: [C: 032] Template-ise rsync/public.pp hosts allow [puppet] - 10https://gerrit.wikimedia.org/r/379517 (owner: 10Reedy) [09:16:12] (03PS1) 10ArielGlenn: move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) [09:16:42] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [09:17:53] (03PS2) 10ArielGlenn: move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) [09:18:20] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [09:21:26] (03PS3) 10ArielGlenn: move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) [09:25:53] (03PS4) 10ArielGlenn: move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) [09:29:01] (03PS5) 10ArielGlenn: move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) [09:31:36] TFW when I can't remember even the most basic puppet syntax. [09:32:45] (03CR) 10ArielGlenn: [C: 032] move hardcoded references to stats hosts from dumps module to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381524 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [09:50:30] (03PS1) 10ArielGlenn: move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 [09:50:55] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 (owner: 10ArielGlenn) [09:53:16] (03PS2) 10ArielGlenn: move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 [09:53:40] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 (owner: 10ArielGlenn) [09:54:43] (03PS3) 10ArielGlenn: move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 [09:55:44] (03CR) 10Zoranzoki21: [C: 031] Temporary IP Cap Lift on zh.wiki and commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381442 (https://phabricator.wikimedia.org/T177071) (owner: 10Jayprakash12345) [10:09:00] (03PS4) 10ArielGlenn: move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 [10:14:38] (03CR) 10ArielGlenn: [C: 032] move hardcoded refs to francium, dumps.wm.o out of dumps manifests to profiles [puppet] - 10https://gerrit.wikimedia.org/r/381525 (owner: 10ArielGlenn) [10:25:59] PROBLEM - puppet last run on ms-fe1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:29:43] (03PS1) 10ArielGlenn: move most dataset1001 and ms1001 references from dumps modules to profile [puppet] - 10https://gerrit.wikimedia.org/r/381527 (https://phabricator.wikimedia.org/T175528) [10:33:47] (03CR) 10ArielGlenn: [C: 032] move most dataset1001 and ms1001 references from dumps modules to profile [puppet] - 10https://gerrit.wikimedia.org/r/381527 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [10:54:09] RECOVERY - puppet last run on ms-fe1007 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:10:09] PROBLEM - Check size of conntrack table on mw1308 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:12:10] RECOVERY - Check size of conntrack table on mw1308 is OK: OK: nf_conntrack is 64 % full [11:55:00] PROBLEM - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] [11:56:20] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] [12:00:09] RECOVERY - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [12:02:29] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [12:20:01] 10Operations, 10Goal: Improve database backups' coverage, monitoring and data recovery time (part 1) (tracking) - https://phabricator.wikimedia.org/T169658#3648347 (10Marostegui) [12:45:29] PROBLEM - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [3000.0] [12:47:50] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [3000.0] [13:00:59] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [13:01:39] RECOVERY - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [13:39:19] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [13:40:19] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [13:44:20] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [13:45:59] PROBLEM - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [3000.0] [13:46:27] (03PS1) 10Ladsgroup: labs: Use redis lock manager for dispatching changes of Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) [13:48:19] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [3000.0] [14:12:10] RECOVERY - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [14:12:29] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [14:32:10] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [14:37:19] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [14:38:19] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [14:39:02] (03CR) 10Aude: labs: Use redis lock manager for dispatching changes of Wikibase (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) (owner: 10Ladsgroup) [14:41:20] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [14:54:29] PROBLEM - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [3000.0] [15:02:30] RECOVERY - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [17:16:18] (03CR) 10Hashar: [C: 031] "Indeed salt is gone." [puppet] - 10https://gerrit.wikimedia.org/r/379502 (owner: 10Muehlenhoff) [18:05:09] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:19:49] PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:20:20] PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:20:39] RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.003 second response time [18:23:10] RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [18:24:49] PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:29] PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:27:19] RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.006 second response time [18:28:49] RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.003 second response time [18:33:20] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [20:18:40] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [20:25:05] (03Draft3) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [20:26:05] 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10HHVM: Convert Wikimedia production HHVM instances to have hhvm.php7.all set true - https://phabricator.wikimedia.org/T173786#3539734 (10Legoktm) It is likely that we will need to implement {T172165} for the MediaWiki 1.31 release before Wikimed... [20:30:23] (03CR) 10Zoranzoki21: [C: 031] "@Framawiki Can you rebase this patch?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379316 (https://phabricator.wikimedia.org/T176199) (owner: 10Framawiki) [20:46:23] (03PS1) 10Catrope: Bump $wgResourceLoaderStorageVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381581 (https://phabricator.wikimedia.org/T176884) [20:47:39] (03CR) 10Krinkle: [C: 031] Bump $wgResourceLoaderStorageVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381581 (https://phabricator.wikimedia.org/T176884) (owner: 10Catrope) [20:47:56] (03CR) 10Zoranzoki21: [C: 031] Bump $wgResourceLoaderStorageVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381581 (https://phabricator.wikimedia.org/T176884) (owner: 10Catrope) [20:50:25] (03CR) 10Catrope: [C: 032] Bump $wgResourceLoaderStorageVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381581 (https://phabricator.wikimedia.org/T176884) (owner: 10Catrope) [20:52:00] (03Merged) 10jenkins-bot: Bump $wgResourceLoaderStorageVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381581 (https://phabricator.wikimedia.org/T176884) (owner: 10Catrope) [20:53:07] (03CR) 10jenkins-bot: Bump $wgResourceLoaderStorageVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381581 (https://phabricator.wikimedia.org/T176884) (owner: 10Catrope) [20:54:51] !log catrope@tin Synchronized wmf-config/CommonSettings.php: Bump $wgResourceLoaderStorageVersion (T176884) (duration: 00m 47s) [20:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:54:57] T176884: Icons missing throughout UI on Edge, IE 11 - https://phabricator.wikimedia.org/T176884 [21:00:39] PROBLEM - puppet last run on wdqs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:03:37] (03CR) 10Framawiki: [C: 031] Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) (owner: 10Zoranzoki21) [21:04:53] (03PS2) 10Framawiki: Enable Extension:Newsletter on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379316 (https://phabricator.wikimedia.org/T176199) [21:10:24] (03PS4) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [21:13:39] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [21:19:19] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [21:19:49] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [21:27:49] RECOVERY - puppet last run on wdqs1005 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [22:04:59] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:31:56] (03PS1) 10Hoo man: Don't persist description usages (yet) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) [22:34:10] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:37:58] (03PS2) 10Hoo man: Wikidata: Don't persist description usages (yet) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153)